Audio manipulations with Singular Value Decomposition¶

1976 Film about SVD ¶

Lets apply Singular Value Decomposition to an audio complex spectrogram $X \in \mathbb{C}^{n \times k}$, where rows $n$ are time bins, and columns $k$ are frequency bins.

$$ USV^* = X$$

$U$,$S$ and $V$ dimensions could be arranged to be $U \in \mathbb{C}^{n \times k}$, $S \in \mathbb{C}^{k \times k}$ and $V \in \mathbb{C}^{k \times k}$.

This decomposition could be analyzed as $V$ representing a spectrum basis, $S$ as scaling values, and $U$ as the score with the weight for the linear combinations of basis vectors.

This is how the log magnitude of $V^*$ looks like for this audio

Original

We could hear this basis, treating the figure as a spectrogram.

Basis elements

We can hear that the elements are ordered mainly by frequency, where the first region has tonal bubbling sounds and the last one has a high pass noise character.

Lets reconstruct the original audio with fewer elements, retaining only the first 256 basis columns. A clear low pass effect is produced.

256 elements

And now with just 32 elements

32 elements

Finally lets hear a "prepared" piano version $Y$, by performing

$$Y = V^* X^* $$

as a result, the original spectrum conjugate complex values are threated as weights for the reconstruction with elements of basis $V$.

Prepared

Audio manipulations with Singular Value Decomposition¶

1976 Film about SVD¶

1976 Film about SVD ¶