Reordering audio frames

Sound paths on descriptors space

Ansatz
, in 05 June 2017
reorder

How to reorder audio frames?

We take an audio and generate frames of 93 ms length with a step of 23 ms. Then we ask. What criteria should we use to reorder the frames?

Random sounds like this.

Original
Randomized

Next attempt will consider generating a ordering based on this sound descriptors, RMS, Pitch, Chromagram, MFCC, Spectral Contrast, and Mel spectrogram. We will sort the frames following the principal components of the merged descriptors. This is, we perform a sorting that goes with the first PC, then the second, and so on.

Ordered by descriptors PCs

The result has an overall intensity crescendo (PC 1) and local chroma transitions (PC 2, 3). To further analyze the content of the PC ordering we could generate an sound impression by a sliding projection over one direction and hearing an average sound from all other directions.

Moving on PC 0
Moving on PC 1
Moving on PC 2
Moving on PC 3

Let continue with these short duration expositions. Now a 30 second pitch reordering.

Ordered by descriptors PCs

Walking the graph of sound similarities

In most of the sound excerpts heard so far there is a continuous trembling because the sound descriptors are noisy. To overcome those fluctuations we want that consecutive frames sound as much a like as possible. From another perspective, we want a path over the complete graph of fragments that minimize the dissimilarities. This is the traveling salesman problem. We compute the sound dissimilarities as the euclidean distance of the merged descriptors.

Tavelling over the frames

This is a non optimum solution, the optimizer was faced with more than 5000 nodes. Nevertheless the result is smooth, the system took advantage of the original ordering, close frames in time are correlated so they tend to sound similar. We can hear fragments of words and continuous sounds.

A similar result is obtained by generating a smaller graph by counting a fixed number of neighbors for each fragment. Once the graph is constructed we could traverse it and expect some sound continuity.

Depth-first search