My Playlist for free users - a data-driven backfill service for playlist continuation at scale

Priyanka Chakraborti
iHeartRadio Tech

--

Photo credit: Shunya Koide

The Data Science team at iHeartRadio is dedicated to providing an immersive listening experience both for our paid and free users. Our digital app is constantly being upgraded with data-driven features, including personalized and customizable playlists. Some examples include our personalized playlist called Weekly Mixtape and our user taste profiles which power playlist recommendations. In this article we focus on a new improvement to user-created playlists — an approach which allows our users to create a customized music station based on whatever tracks they’re loving in the moment.

Let’s face it-many of us have been through a time when we have painstakingly created a playlist for that workout or our drive to work. Just when we were getting into the groove, an upsell message appeared prompting us to start a subscription to continue listening. At iHeartRadio, we want to ensure a quality listening experience for both our free and paid users, and that includes ensuring that our free users never feel left out of the groove when playing their favorite tracks on their custom-made playlist. Our new ‘My Playlist’ feature allows free users to add their favorite tracks to their own playlist within the app, sit back, and listen without needing to pay a dime. Using our data-driven intelligent systems, we will augment your tracks with a list of recommendations that will ensure you get the best listening experience possible. We love it when our users get that warm feeling as they find their old favorites popping up or as they’re discovering that one hidden gem that blends in seamlessly with the rest of their playlist.

How do we do this at iHeartRadio?

One of the key requirements for this feature is that our user’s playlists must be playable and relevant for long periods of time, no matter how many songs the user has decided to add. To accomplish this, the Data Science team augments the user-provided songs in their playlist with a list of suggested tracks, referred to as the ‘backfill.’ One of the challenges in designing such an interactive, data-driven product is that recommendations need to react in real-time to user feedback and track selection. The response time of the feedback needs to be on the order of milliseconds to ensure that users feel engaged with the service. There is simply no time to conduct an extensive search for every user-specific track combination. Typically, real-time processes like these are governed by an ‘online’ algorithm, which borrows from an ‘offline’ or ‘batch’ process that builds the candidate set of potentially recommendable items. To get the best of both worlds, the engineering teams at iHeartRadio work collaboratively to ensure a seamlessly constructed pipeline that meets all criterion for a superior user experience.

Figure 1. Flowchart showing general overview of the process creating a playlist.

Building our candidate set

To start, the batch process can be a traditional machine learning algorithm or simple empirical statistical potboilers that are often just as powerful. Our implementation borrows from both. While there is no strong rationale to have a preference for one above the other, we carefully weigh the pros and cons of each to strike the balance between effectiveness and scalability — a thought exercise we repeat for all products generated by our team.

Our Data Science team has proprietary algorithms that can achieve collaborative and/or content-based filtering at different levels of information mining. We have the capability to find similar tracks and artists to those selected by the user. This flexibility allows us to do exhaustive searches in the vicinity of a track to find the best possible candidate set of music to recommend. This is of course an intensive process which we conduct in batch, once per day, scheduled using Airflow and run on our EMR clusters.

As our catalog of artists in the iHeartRadio network grows, so do the possibilities for recommended listening. So how do we connect the top artists across our network through sensible linkages? The heart of this algorithm is concerned with scoring links between parent and child artists through a combination of shared user listening and genre similarities. Later, these scores are intelligently combined into one proprietary metric to create a ranked hierarchy between a parent artist and its children. For each child artist we then gather a selection of candidate tracks which could be played.

One such method is well known within our team and powers much of the existing artist radio experience throughout our app. It has shown continuous success through a series of A/B tests and has been vetted by our internal stakeholders as being the perfect balance between business and machine intelligence.

Track ranking using Bayesian estimation

Now that a reasonable set of candidate tracks has been established, we would like to rank them in order of user enjoyment, while also taking into account general popularity. The most powerful approach we have identified utilizes Bayesian re-ranking. The power of Bayesian estimation lies in its ability to incorporate our prior belief about a target metric into our final score. Take for instance a five-star ranking system for a set of products. If we simply sorted the products by their mean rating, we’d fail to take into account the number of reviewers who ranked the product. Statistically, this reflects itself as a heavy tailed distribution of the average rating owing to the lack of reviewers on most products. This skew in a normal distribution can cause us to hallucinate, or exaggerate, the quality of a product. In other words, we are often biased into thinking a product is better than it really is.

Of course, with our app users do not explicitly rate songs. However, we were able to ‘feature engineer’ a similar four-star ranking methodology based on both implicit and explicit user listening behavior at the track level. For instance, a signal such as the number of explicit ‘thumbs up’ on a track could qualify for a four-star rank while an explicit ‘thumb down’ could count for one-star. Implicit user behavior is taken into account as well. For example, a user closing the app in the middle of a track could qualify as a two-star rank. With that in mind, consider a track that has an average rating of 3.8/4 with ratings from 200 users. This should likely be ranked higher than a product with a four-star rating from only two users. Bayes’ law allows us to naturally incorporate this information into a single metric. Figure 2 below demonstrates the skewness which we normally experience when rankings are based purely on average rating.

Figure 2. The distribution of raw ‘ratings’. Notice the heavy left tail.

Let’s look at the formula below. This is used to calculate the mean rating for a given track after taking into account the Bayesian prior distribution. Here, m is our corrected estimate for the mean of the prior distribution while C is the confidence bound on the number of overall listens on that track for us to trust the rating. Finally, nᵢ is the number of ratings in each star rating (one-star, two-star, three-star, four-star). Evidently, if we sum nᵢ over all of the star ratings it should sum to the total number of interactions for our track in this four-star rating system.

Basic equation for Bayesian re-ranking.

For a detailed explanation of how to assign your prior, and more details about how Bayesian re-ranking works this article is a great resource. Once you have determined your overall confidence C and your overall mean rating m you can use the above formula to adjust each rating. We further optimize our estimates for C and m using the Python PyStan package to greedily sample from a simulated Gaussian distribution until we hit convergence.

If we calculate the raw mean simply as the expected average of the number of listens in each of the four-star ratings we end up with the sharply skewed distribution shown in Figure 2. Instead, by using this Bayesian ranking approach the distribution becomes that shown below in Figure 3. As can be seen, a significant amount of the skewness has been removed.

Figure 3. The distribution over the same tracks used to generate Figure 2. Here we see that by utilizing Bayesian re-ranking the vast majority of the skew has been removed.

As with any model, additional business rules and guardrails will be employed on top of this to ensure quality control. One such business rule is related to ensuring the playlist remains DMCA compliant. For example, we are not able to play the tracks from the same album on repeat for the duration of the listening session. As such, after we rank tracks, we always choose a set number of tracks from each unique child artist such that our real-time process does not create non-compliant playlists. Additionally, incremental learning is an important part of Bayesian estimation. This requires that the prior be re-evaluated based on daily ingestion of data and that the posterior from the previous day becomes the new prior. From domain knowledge we do not expect this re-evaluation to differ substantially, and we diligently check for any discrepancies regularly.

Serving it up ‘Online’

As stated previously, a requirement of the online component is a fast response time (a few hundred milliseconds at most). In order to achieve this, the candidate set of recommendations from batch processing is stored in a low-latency, read-optimized database called DynamoDB on AWS. The backend online component is implemented in the Scala programming language and runs as a Kubernetes service (which we call the “backfill service”) that exposes an endpoint to the iHeart web and mobile applications. For each user-added track, the backfill service fetches a collection of the top recommended tracks from DynamoDB and combines these collections to generate a master list of candidate tracks to backfill the user-created playlist. To expedite this process, reads from DynamoDB are performed in parallel using a Scala library called ZIO. The final backfill list is a subset of tracks that have been semi-randomly sampled from the master list using an approach called fitness proportionate selection, as illustrated in Figure 4.

Figure 4. Visual depiction of how fitness proportionate selection works. Image credit: Newcastle University.

To understand this, imagine that each track occupies a segment of a roulette wheel proportional to the quality of its ranking. Selecting the final set of tracks to display is equivalent to spinning the roulette wheel multiple times, ensuring that each track can only be selected once. As such, ‘better’ quality tracks have a higher probability of being selected during the spin. In our offline testing, this procedure showed great success in building a well-balanced playlist, even when a playlist is seeded using tracks from multiple genres. In other words, if multiple tracks from different genres are entered by the user, the final playlist is suitably representative of all genres.

The end result

The first version of My Playlist was released to internal stakeholders and a select group of the Product team in early 2021. After a rigorous process of internal vetting, and some front-end design changes to improve the user experience, the feedback has been overwhelmingly positive. As of now the feature is available for all of our free users to enjoy across our mobile and web applications. You can register for a free account here to enjoy!

Figure 5: Example UI/UX for a playlist seeded with ‘Blinding Lights’ by The Weeknd. A subset of the backfill tracks are shown.

Acknowledgements

At iHeartRadio we are a small, close-knit team of scientists, engineers, and business experts with the mission to constantly improve the user listening experience across our digital platforms. We hope that you enjoy this new addition to the iHeartRadio app-all you have to do is sit back and listen.

This feature is the result of the tireless efforts put in across multiple teams in iHeartRadio — AMP, Data Engineering, Data Science, UXD, and Product. The author is representative of the Data Science effort. The author would especially like to thank Joe Chisari for his primary contribution to building an excellent artist linker and for useful insights during its incorporation into the feature. An additional special thanks to Tony Liu for helpful discussions and ideas during the initial conception of the back-end procedure and Chaoran Yu and Will French for deploying and optimizing the online process in Kubernetes. Finally, a round of thanks to all the others who have made this blogpost possible: Jamie Johnson, Brett Vintch, Jordan Rosenblum, Victor Chan and John Carr.

--

--