Go Serverless! Use Cloud Functions and an Event-Driven Architecture to Recognize Songs on Twitch — Background & Motivation
This blog gives some background and motivation for why I used Cloud Functions and an Event-Driven Architecture (EDA) to enable users to get track ids in Twitch chat. In other words, audiences in a Twitch music stream could type in “!trackid” and a bot would answer something like, “I recognized Sandstorm by Darude about 2 minutes ago.” The solution implemented didn’t require the DJ to take any action or install any software.
Here you can see the bot in action. The user bweagle enters “!trackid” and Nightbot responds with the song details (on loop):
2020 was an interesting year. Staying at home motivated me to look elsewhere for social outlets. I found music channels on Twitch! Originally, I watched a lot of well-known artists, as they were dealing with their own inability to tour or perform with crowds. Eventually, I found more intimate communities of great DJs who play great, eclectic music.
DJs performing on Twitch is a social experience because there is a chat that goes along with the performance. DJs participate in the chat and many listeners chat as well. Over time, a group of frequent chatters emerge and form a community. Chats are enhanced and made more interactive using emotes and commands. Commands are special terms, preceded with a “!” that invoke an action from bots that are placed in the chat by the broadcaster. There are 3 very popular services that provide bots — Nightbot, StreamElements, and StreamLabs. There are many more, but those are the big 3. In a previous blog, I outline how broadcasters and their moderators can enhance the experience for their audience. You can find that here.
One of the most common commands on a music-related Twitch channel is “!trackid”. Some channels use different terms (e.g. !music, !song, etc.), but this command will invoke a bot to return the name and artist of the song that is currently playing. This is a great command because when the audience hears a new song they love, they want to find out what it is. It might also be useful if the user can’t remember the name of the song or the artist. I think this command is also great because the DJs are effectively acting as a radio, so they are providing value to the artists they play by sharing the artist’s music with new audiences.
Getting the track information can be tricky. For DJs playing vinyl records, there is no way to get trackid locally, since there is no digital file with meta-data (trackid). For digital DJs, most solutions will somehow get that information from a computer that is running a DJ app that is connected to the DJs controllers (i.e. decks). If you are not familiar with DJ controller, here is an example from Pioneer:
As mentioned, a DJ will usually have a computer connected to the controller with an app that communicates with the controller. That app will have information on when a new song is started. Usually, a DJ will have a song playing on one deck with that deck totally faded in. Next, they will start a song on the second deck and sync the tracks (match the beats) and then fade into the song playing on the second deck. That time from starting the second song and fading over is unpredictable, so this adds uncertainty to what song is currently playing.
The only way to truly know what is currently playing is to listen to the broadcast on Twitch.
There are a few common approaches for getting track information:
- If a laptop is connected to the controller, install a plugin or app on the laptop that will be notified with a new song is “played” and tweet that on Twitter. I say “played” because, as mentioned above, that doesn’t mean the song is actually playing on Twitch. It means that the song has been loaded in the controller and ready to be faded in. Once that info is on Twitter, create the “!trackid” command that will read the latest tweet from that account and put that detail in the chat. This sounds convoluted, but it is pretty reliable and straightforward to implement.
- If a laptop is connected to the controller, get an OBS plugin that will read the track info and post it on the screen. This makes the “!trackid” command unnecessary.
- If a laptop is connected to the controller, create your own Twitch bot that will read the trackid info from the controller and post it into the chat. This requires coding of the app and running the app on the laptop.
There are two potential challenges to these approaches.
1. One potential issue that seems common with the Twitter approach is the fact that the tweet goes out when the track is loaded, not when it is heard by the audience. If you have a DJ that loads the next track early, their audience will frequently get the next song to be played, when they type “!trackid”.
2. Some DJs prefer not to use a laptop connected to their controller.
3. Some DJs play vinyl, so they have no digital trackid information associated with what they are playing.
I had some extra time during the holiday break and thought it would be fun to take a look at this problem space. I was a moderator for one DJ that pre-loaded the next track for very long and unpredictable times and another DJ that didn’t use a laptop connected to his controller. So, how could we get trackid in the chat??!! The only approach was to try to listen to the stream live and recognize that music.
There were 3 parts to the challenge:
1. How can I recognize the music playing on the Twitch stream?
2. How can I make that data available via an API I can call from the bot?
3. How can I make a bot command that will call the API and format a message?
It took quite a lot of research and prototyping, but I finally got a working solution. I go into the details of the solution implementation in this blog. At a high level, I found an online music recognition service, ACRCloud, that will listen to a stream and callback a URL you give it with the track info when it recognizes a song. I used IBM Cloud Functions and APIs for that callback URL and had the function store that data in a cloud key-value store, kvstore.io. I had another IBM Cloud Function expose an API that would accept a call from the Nightbot and return the latest value in the key-value store. Finally, I had a bot command that would call that API, extract the data, and format a human-readable message for the chat, e.g. “I recognized Sandstorm by Darude about 2 minutes ago.”
Here is a view of the architecture:
The solution consists of two flows. One flow, indicated by A in the figure, is responsible for recognizing a song and posting the info to a key-value store. The other flow, indicated by B in the figure, is responsible for listening to the Twitch chat for the “!trackid” command and responding with the last recognized song’s track info.
A bit more detail on flow A:
A1. ACRCloud is a music recognition service. This service is configured to listen to a Twitch stream. When a song is recognized, step A2 happens.
A2. When a song is recognized, ACRCloud is configured with a callback URL. It will post the recognized song’s information to that URL. In this case, that URL is a Cloud Function running on IBM Cloud.
A3. The “storeTrack” function is invoked with the track information posted as JSON. The function takes this info, selects the most relevant information (song artist and song name), and stores that data in KVStore.
A4. KVStore is a key-value store available for free. The track info is posted to a well-known key, e.g. trackinfo.
A bit more detail on flow B:
B1. Nightbot, has been configured to be listening to the Twitch channel’s chat. It provides other functionality beyond “!trackinfo”. When “!trackinfo” has been entered, the Nightbot goes to B2.
B2. Nightbot makes a request to the getVal action running as an IBM Cloud Function.
B3. The getVal action gets the latest stored track info from KVStore. It requests the info using the same key as used in step A4.
B4. getVal returns the track info to Nightbot.
B5. The command running on Nightbot gets the results, parses the track info, and responds in the chat. An example would be, “I recognized Sandstorm by Darude about 2 minutes ago.”
I describe how to implement this architecture in this blog.
Close, but No Cigar
It was truly rewarding to see the whole system work. The recognizer was more accurate than Shazam (Yes, a lot of Twitch music listeners use Shazam). The system addressed both challenges mentioned above. By listening to the live stream, it wasn’t confused by long track mixing times and it didn’t require the DJ to run an app on their laptop connected to their controller. In fact, it didn’t require the DJ to do anything except play their music!
On the other hand, it wasn’t perfect. For the types of obscure music the DJs played, I would guess the system had about an 80% recognition rate. This was really great and better than nothing; however, the DJ and mods decided it wasn’t good enough. Getting old track-ids wasn’t the best experience, so we decided to turn it off. This may have also been motivated by the fact that the ACRCloud free trial period expired. ;-)
This was a very fun project. I loved finding a problem that was relevant to my virtual friends and finding the right technology to solve the problem. All the technology involved was interesting, easy to use, and functional! It was especially fun to see people (and friends) using the service in real time. Making the decision to turn it off was also interesting. It is a great example of trying a service quickly, evolving it, and killing it if it misses the mark.
If this type of thing interests you, please check out my next blog. It goes into the nitty-gritty details of how I implemented the entire system.