Linux Fu: Name That Tune

If you aren’t old enough to remember, the title of this post refers to an old game show where contestants would try to name a tune using the fewest possible notes. What can we say? Entertainment options were sparse before the Internet. However, using audio fingerprinting, computers are very good at pulling this off. The real problem is having a substantial library of fingerprints to compare with. You can probably already do this with your phone, and now you can do it with your Linux computer.

In all fairness, your computer isn’t doing the actual work. In fact, SongRec — the program in question — is just a client for Shazam, a service that can identify many songs. While this is mildly interesting if you use a Linux desktop, we could also see using the same technique with a Raspberry Pi to get some interesting projects. For example, imagine identifying a song playing and adjusting mood lighting to match. A robot that could display song information could be the hit of a nerdy party.

The Code

If you look in the repository, there is a fairly simple Python version that only recognizes songs from audio files. The main program is newer, has more options for handling audio, and uses Rust. However, if you were trying to graft it into your own program, starting with the older code might be easier. Unless, of course, you are also using Rust.

Under the Covers

Shazam downsamples audio to 16 kHz and produces four spectrograms. Each spectrogram measures a different band: 250-520 Hz, 520-1,450 Hz, 1,450-3,500Hz, and 3,500-5,500 Hz. The peaks in the spectrograms should match for the same song. The client sends information about the peaks at different times to the Shazam database, which returns information about the song. You can see an explanatory video about how it works below.

According to a paper about Shazam, if they can detect a live performance, it is a good bet the performer is lip-synching to a prerecorded track since the algorithm isn’t smart enough to get similar tracks.

In Use

If you use the main SongRec executable, you can pick files, or the program will constantly monitor the sound device of your choice (including your speakers). When it finds a song, it will show you the album art, the name, and the album. You can even export the results to a CSV file.

Back to Code

If you look at the Python code in signature_format.py, you’ll see the frequency bands there. However, a lot of the work also occurs in algorithm.py. Most of the rest of the Python code involves making an API request or gluing pieces together.

The Rust code has a similar structure but has many extra pieces, as you might expect. Overall, though, it isn’t that hard to understand.

If you are worried about your audio data being shipped over the network, relax. The code only sends the frequency information, which isn’t going to allow anyone to reconstruct anything. If you want to see what that might sound like, use the “Play a Shazam Lure” button. This button will produce audio that Shazam will recognize as the song. If you recognize the song from the lure, you can probably understand R2D2, as well.

We’ve seen audio fingerprinting used for different purposes. Or, you can make a dress that lights up when it hears the right song.

What's your reaction?

Related Posts

1 of 423