Audio manipulations with Singular Value Decomposition¶
1976 Film about SVD¶
Lets apply Singular Value Decomposition to an audio complex spectrogram $X \in \mathbb{C}^{n \times k}$, where rows $n$ are time bins, and columns $k$ are frequency bins.
$$ USV^* = X$$$U$,$S$ and $V$ dimensions could be arranged to be $U \in \mathbb{C}^{n \times k}$, $S \in \mathbb{C}^{k \times k}$ and $V \in \mathbb{C}^{k \times k}$.
This decomposition could be analyzed as $V$ representing a spectrum basis, $S$ as scaling values, and $U$ as the score with the weight for the linear combinations of basis vectors.
This is how the log magnitude of $V^*$ looks like for this audio
We could hear this basis, treating the figure as a spectrogram.
We can hear that the elements are ordered mainly by frequency, where the first region has tonal bubbling sounds and the last one has a high pass noise character.
Lets reconstruct the original audio with fewer elements, retaining only the first 256 basis columns. A clear low pass effect is produced.
And now with just 32 elements
Finally lets hear a "prepared" piano version $Y$, by performing
$$Y = V^* X^* $$as a result, the original spectrum conjugate complex values are threated as weights for the reconstruction with elements of basis $V$.