Skip to main content

Created by Tom Schaap

Signal data are pervasive in geoscience. Seismic, electromagnetic, hyperspectral – the list goes on. At Datarock, we’re always looking for ways to visualise data in novel ways. Signal data can be tricky because a single sample of amplitude from a time series does not adequately describe the context of that sample – it says nothing of the frequency or phase of the signal at that point.

This blog will demonstrate a novel workflow of feature engineering on signal data from which simple visualisations can be generated. I could have used any open source geophysical dataset for this blog, but instead I have chosen to use music as my dataset, hoping that this will make it more engaging and intuitive.

Data import and initial feature engineering

I used the following python libraries for this workflow, most of which are standard tools in the data scientist’s box.

To get started we must first import the required python libraries for this notebook and read in the pre-processed dataset.

from os.path import dirname, join as pjoin
from scipy.io import wavfile
import matplotlib.pyplot as plt
import numpy as np
from scipy.fft import fft, fftfreq
from scipy import stats
from scipy import signal
from umap.parametric_umap import ParametricUMAP
import pandas as pd
from tqdm import tqdm
import glob

# This notebook generates some warning which we will ignore for presentation purposes
import warnings
warnings.filterwarnings("ignore")   

The input data are a series of songs (.wav files saved locally to my computer). To build this database of music, I received contributions from a number of artists covering a couple of different genres with the expectation that the audible differences between each song will manifest in the final visualisation.

These artists kindly gave me permission to do weird stuff with their music:

Ultimately this little collection amounts to a mix of broad rock and jazz genres.

# Get filepaths from music database
path = r'/mnt/c/Projects/Blogs/Toms_music-signal-UMAP-blog/data/raw/*.wav'
fns = glob.glob(path)
titles = [i.replace('/mnt/c/Projects/Blogs/Toms_music-signal-UMAP-blog/data/raw/', '').replace('.wav', '') for i in fns]

# loop through files and extract raw data
for i, file in enumerate(fns):
    df = pd.DataFrame()
    sr, wav = wavfile.read(file)
    time = np.arange(0, wav.shape[0]/sr, 1/sr)

    # plot some select songs
    if titles[i] in ['David Mare - Daylight Hours', 'Greenly Beach  - Cluck Cluck Cluck', 'odd jord - Huh?']:
        fig, ax = plt.subplots(figsize = (20,5))
        ax.plot(time, wav[:,0], linewidth = 0.2, c = 'k')
        ax.set(ylabel = 'Amplitude', xlabel = 'time [s]')
        fig.suptitle('Time series: {}'.format(titles[i]))
        plt.show()

These time series representations of the song data provide some insight into the structure of the music, mostly based on how loud it is over time as reflected by the changes in signal amplitude. However, even if I were to zoom in on these plots to the point where we could distinguish individual waveforms, it would be incredibly difficult to visually discern the actual frequency content of the music – let alone how we might interpret it with our ears. Instead, we need to find a more intuitive way of representing these data.

This code reads each song and generates a spectrogram – a tabular dataset that splits the song into time intervals and calculates the Fourier transform at each interval. These spectrograms will form the tabular data that my visualisations will draw from. I think spectrograms provide a pretty good analogy to what a human experiences when listening to music. As the song progresses, the spectrogram shows the changing frequency distributions that we would hear. This makes them good input features for machine learning models because they are easy to interpret from the human point-of-view. Conversely, I tried this workflow by extracting features using a 1D convolutional neural network. The results were quite similar but the input features were almost impossible to understand!

I chose to use 0.25 second time intervals because most of the songs in this dataset have a tempo of roughly 120 beats per minute, or a beat every 0.5 s – so I am sampling at half that rate. I think this is probably about as fast as a human ear can interpret a sound, and ultimately that is the experience I am trying to replicate in this workflow.

# create some empty lists to store data from each song
sample_rates = []
wavs = []
times = []
df_list = []

# loop through songs and generate spectrograms, appending data to a dataframe as we go
for i, file in enumerate(fns):
    df = pd.DataFrame() # empty dataframe
    sr, wav = wavfile.read(file) # raw song data
    time = np.arange(0, wav.shape[0]/sr, 1/sr) # time series 
    f, t, Sxx = signal.spectrogram(wav[:,0], sr, nperseg = int(sr/4)) # spectrogram generation
    Sxx_norm = stats.zscore(Sxx, axis = 1) # z-score the spectrogram for later analyses
    frequencies = [n + 'Hz' for n in list(f.astype(str))] # create column headers for frequency samples
    # add data to dataframe
    df['time'] = t 
    df['song'] = titles[i]
    for j in range(len(frequencies)):
        df[frequencies[j]] = Sxx_norm[j]
    df_list.append(df)

    sample_rates.append(sr)
    wavs.append(wav)
    times.append(time)

    # plot selected songs
    if titles[i] in ['David Mare - Daylight Hours', 'Greenly Beach  - Cluck Cluck Cluck', 'odd jord - Huh?']:
        fig, ax = plt.subplots(figsize = (20,5))
        im = ax.pcolormesh(t, f, Sxx, vmin = np.percentile(Sxx, 5), vmax = np.percentile(Sxx, 95))
        ax.set(xlabel = 'time [s]', ylabel = 'frequency [Hz]')#, ylim = (0, 12000))
        fig.colorbar(im, shrink = 0.8, label = 'Amplitude')
        fig.suptitle('Frequency spectrogram: {}'.format(titles[i]), fontsize = 16)
        plt.show()

df = pd.concat(df_list) # create master dataframe

These spectrograms show how the frequency distribution (vertical axis) changes over time. The added dimension of frequency in these plots allows for more nuanced visual interpretation of the song data – although admittedly the scale of these plots does not do justice to that.

One feature you might notice is the varying amplitude scales. This is partly due to the dynamics of the actual music performance, but it is predominantly a factor of different recording technology and mixing/mastering processes. As such, each frequency field was z-scored before further processing, so that the original structure of the music was preserved but the data were ultimately levelled. The caveat of this is that the loudest strum of David Mare’s acoustic guitar will be equal in magnitude to the cacophony of multiple instruments that Odd Jord possesses, but sometimes in data science we just have to bear these things in mind as we go forward.

Boiling it down

This is where the fun begins. Spectrograms are a great way of visualising signal data, but the tabular dataset consists of over a thousand columns of frequency data. What if we could boil these down to just a few dimensions? Dimensionality reduction is nothing new in data science, and one of my favourite methods is the UMAP (https://umap-learn.readthedocs.io/en/latest/index.html). This algorithm takes high-dimensional data and re-projects it at a reduced number of arbitrary dimensions which represent ‘closeness’ between data points – i.e. anything that is close in reduced UMAP space will be close in the original high-dimensional data space as well. This is a powerful tool for unsupervised analysis of high-dimensional data.

In this code, I generate a three-dimensional UMAP and plot the spectrogram samples from each song in this coordinate space.

# construct a UMAP object and apply it to normalised dataset - training on a portion first
umap_obj = ParametricUMAP(n_neighbors=5, min_dist=0.5, n_components=3, metric='euclidean', random_state=69, verbose=True)
embedding = umap_obj.fit_transform(stats.zscore(df[frequencies][::10]))
embedding = umap_obj.transform(stats.zscore(df[frequencies]))
# scale UMAP axes to 99.95% of data to remove outliers
for i in range(3):
    min_perc = np.percentile(embedding[:,i], 0.05)
    max_perc = np.percentile(embedding[:,i], 99.95)
    embedding[embedding[:,i] < min_perc,i] = min_perc
    embedding[embedding[:,i] > max_perc,i] = max_perc
    embedding[:,i] = (embedding[:,i] - min_perc) / (max_perc - min_perc)
# 3D plot
import plotly.graph_objects as go
fig = go.Figure()
colors = ['red', 'blue', 'goldenrod', 'pink', 'darkgray', 'darkolivegreen', 'lavender', 'darkred', 'cornsilk', 'greenyellow', 'lightsteelblue']

for i in range(len(titles)):
    song = titles[i]
    ass_marker = dict(size=2, color=colors[i], opacity=0.5, line=dict(width=0))
    fig.add_trace(go.Scatter3d(x=embedding[df.song == song,0], 
                  y=embedding[df.song == song,1], 
                  z=embedding[df.song == song,2],
                  text = df.time, 
                  name = song,
                  mode='markers',
                  hovertemplate='<br>'.join(['Time: %{text}',
                                            'UMAP 0: %{x}',
                                            'UMAP 1: %{y}',
                                            'UMAP 2: %{z}']),
                  marker = ass_marker)
                  )

fig.update_layout(scene = dict(xaxis_title='UMAP 0',
               yaxis_title='UMAP 1',
               zaxis_title='UMAP 2'),
               width = 1000,
               height = 800,
               margin=dict(l=10, r=10, b=25, t=50)
               )
fig.show()    

Here, time windows from each song are represented by different coloured dots in UMAP space. I have restricted the plot to just the three songs I am focussing on because it gets a bit overwhelming when you look at every song at once. Note how each song is spread across UMAP space, but there are some clusters. The songs by David Mare and Greenly Beach form a pair of isolated clusters, but there is also a region where all three songs are mixed together.

Visualising UMAP coordinates as colours

Whenever I see a three-dimensional representation of data, my mind immediately wonders what it would look like if we transformed that coordinate space into an ternary red-green-blue visualisation. In geoscience, a classic example of this technique is radiometric K-Th-U mapping or multiband aerial/satellite imagery. In this case, I am taking the three UMAP coordinates above and assigning each to red, green, and blue. I can then view the original song data and colour it up by the UMAP/RGB coordinates. This has a similar output to traditional clustering techniques, except it provides a continuous colour space that is reflective of the data, rather than the arbitrary colours one might assign from a clustering algorithm. The downside is we don’t get the discrete labels that clustering provides, but there is nothing stopping us from doing both!

# turn UMAP coordinates into RGB colours
rgbembeds = embedding.copy()
for i in range(3):
    min = rgbembeds[:,i].min()
    max = rgbembeds[:,i].max()
    rgbembeds[:,i] = (rgbembeds[:,i] - min) / (max - min)

from matplotlib.colors import to_hex
rgb_colormap = np.array([to_hex(i) for i in rgbembeds])
df['color'] = rgb_colormap

# little function for finding array indices based on values 
def find_nearest(array, value):
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return idx

# Plot selected song data coloured by UMAP RGB
for i, song in enumerate(['David Mare - Daylight Hours', 'Greenly Beach  - Cluck Cluck Cluck', 'odd jord - Huh?']):
    data = df[df.song == song]
    fig, ax = plt.subplots(figsize = (20,5))
    for j in tqdm(range(len(data))):
        if j < len(data) - 1:
            song_index = titles.index(song)
            begintime = data['time'].iloc[j]
            begin_index = find_nearest(times[song_index], begintime)
            begin = times[song_index][begin_index]
            color = data['color'].iloc[j]

            endtime = data['time'].iloc[j+1]
            end_index = find_nearest(times[song_index], endtime)

            ax.plot(times[song_index][begin_index:end_index:200], wavs[song_index][begin_index:end_index:200], color = color, linewidth = 0.2)
    ax.set(title = song, xlabel = 'time [s]', ylabel = 'amplitude')

    plt.show() 

Now, not only can I interpret these plots by their amplitude, but the UMAP-RGB colouring gives me a sense of how similar (or otherwise) each section of the song is. The Greenly Beach plot is probably the best example of this – not only does the quieter bridge section stand out as a bright green colour, but the subtler differences between the verses and choruses can be differentiated by their respective pink and purple colours.

As an aside, this might be compared to an artificial (and very basic) version of chromesthesia – the human experience of associating certain sounds with colours (https://en.wikipedia.org/wiki/Chromesthesia).

Another way we might represent this is by synchronising a time series display of these colours with the actual song audio – as shown in these youtube videos:

Warning: These videos contain flashing lights

Conclusion

Hopefully this post has demonstrated how some powerful data science tools, coupled with some creative thinking and data representation, can enhance the way we experience and interpret complex signal data. I hope the parallels with earth science data are apparent; imagine how this signal-to-colour method might work with spatial data – one could colour up an entire survey area by UMAP coordinates and reveal the structure that lies within. The trick is to engineer the input features from a time-series signal into something with some interpretable context.