You May Also Like

You May Also Like

A look at development and adjustment of recommender system

Personalized recommendations are one of the most valuable and breakthrough inventions brought to us by the age of e-commerce. Everyone loves them - they really help saving a lot of time on finding the goods you need, or getting the most suitable ones out of all the vast variety in the market. Advantages for sellers are also pretty clear, for well-tuned recommender system works as an adept salesman, allowing to increase revenue per one customer. Capabilities of those systems develop rapidly - there’s a catchy ​anecdote about. Target ​(second-largest department store retailer in the US), when they started sending coupons for baby clothes and cribs to one of their customers (high-school student by moment the story occured), predicting on her recent purchases she’d need all of them soon. These coupons were a huge surprise for her father to find in his daughter’s mailbox (and even bigger one - when ​ Target’s prediction turned out to be accurate). This article is an overview of how recommender systems (RS) work, and some insights and advice, which our team accumulated while developing and tuning them.

How it works?

The main problem which RS solve is to predict which content may be interesting to the user. To achieve it RS collects information about users’ profiles (things that user reveals to system about his or her personality and preferences), history of users’ interactions with the content (for example, which movies or shows they watched and found interesting previously) and information about content (like those 25+ tags of a new song on Last.fm, which occasionally make it hard to figure out which genre it actually belongs to). Based on it, the predictive algorithm offers recommendations to users (e.g. shows or songs) in a way to keep them watching or listening - which means more sells for the service.

Who does rocket science in RS?

Name any big company in e-commerce business - and it surely will have its own (and often unique) work on RS, but there’s one company that stands out as a real trendsetter - and it’s Netflix. It’s them who held back in 2006-2009 the famous Netflix Prize, which became legendary among the RS community. Netflix challenged teams of enthusiasts to create a model which would predict ratings users will give to movies, using nothing but ratings they gave previously. By the results of the competition, 1,000,000 $ were awarded to team whose model scored 10% better results than the Netflix own algorithm.

Today Netflix uses data science and predictive algorithms for developing new shows and scheduling filming, and (which is simply amazing) experiments with RS to create outstanding user experience. Not only does their system predict which shows to recommend, but it also chooses best-suitable art to make them more intriguing and attractive to customers!

Let’s have a look under the hood

It’s high time we find out what predictive algorithm for RS is. There are two basic ideas it may be built on:

  • Content-based filtering. This method relies on objects description and content ratings based on them. For example let’s have an abstract music service: each song gets certain tags (e.g. “80-s hip-hop”, or “swedish pagan death metal”, etc.) Those tags are then compared with tags of songs which user loved best previously - and most suitable of them get in recommendations.
    Main issue of content-based filtering (CBF) is projecting recommendations on different types of content. Say, customer of our online store really loves fantasy fiction books. CBF will keep on recommending books, but it won’t offer, for instance, tabletop or video games in the same setting, which user would have loved - and it’s a loss in sales for the business.

  • Collaborative filtering (CF) - the main idea of this method is to use preferences of one group of users to predict preferences of others. In other words, if you and some other customer loved same things in the past - you’ll probably like the same things further on. It means we can recommend content, which this other customer rated highly,, to you.
    This method also has its flaws, and one of them is a problem of a “cold start”. When new user comes to the service and she/he haven’t rated many objects (songs, shows) yet, the algorithm will have hard times guessing, which users’ interests are similar to newcomer’s ones.

  • Hybrid model - the algorithm which combines features of both CBF and CF. Hybridization, like the one happening in nature, in most cases allows to create RS which generates precise and effective recommendations.

Algebra behind RS (skip this paragraph if you’re not into maths)

For RS we’ll need a basic table, in which:

Line - user’s ID; Column - content’s ID; Value of the table element - user’s rating of the content.

Let’s denote set of users in the system by U, V - all known content in the system, M - probable content ratings (subset of real numbers), D - subset U ✕ V such that for any pair (u,v) ∈ D the rating is known. NA - unknown value.

Then the basic RS table is: ƒ:U ✕ V→ M ∪ {NA] and ƒ (D) ⊂ M

Usually for many pairs (u,v) ∈ U ✕ V value ƒ (u,v) is unknown. People won’t always regularly and neatly give ratings for content (like some movie geeks on ImDB) - so the table is always gonna be sparse. In this way, we have a table with missing elements, and a task of our algorithm is to fill in these missing elements, and offer content with the best ratings to the customers.

To solve this task, one of the most widely used methods is the classic SVD (Singular-value decomposition):

Alt-atribute of image

So you’re developing your own RS

Our team have been developing and adjusting a RS for an online cinema for some time by this moment, so we have collected some thoughts on do’s and don’ts of this kind of projects. Here are the most universal and helpful of them:

Rely on a data while choosing the algorithm. When you work on a RS for a new client, you don’t want to make haste grabbing your previous-projects model and say “alright, this is what you’re looking for”. It’s only a well-structured A/B testing that will surely show, which proportion of CF and CBF in a hybrid algorithm will suit your new client’s needs best. If a user looks through recommendations and don’t pick anything - something went wrong, and you need to tune your algorithm further.

There’re some tech solutions you’ll probably want to use. There almost definitely will be cloud infrastructure, Docker containers, NoSQL database, and Grafana for analytics and monitoring. When it’s a small project you’ll probably not only going to develop the algorithm itself, but also do some work in Data Science, DevOps and Data Engineering. If you haven’t got much experience with those before - be ready to play it by ear.

Be aware of the client’s business particular qualities. For instance, in RS for movie and music services there always should be “injections” of brand-new and thus not-so-well-known movies or songs in recommendations output. Another vivid example is how we organise recommendations for adult movies (I don’t mean the xxx-stuff - here we talk about movies for adult audiences) in online cinemas. They are watched a lot, and if you deal with them like with all other content, they will soon overwhelm every user’s recommendations. So what you want to do is settle with the client right from the start that the algorithm for this type of content will be working separately in adult movie genre section.

Have patience working with inherited code. You always work with what you got, because even standard libraries don’t always may offer what you need.

react-logo Calabi–Yau manifold, or How you see inherited code for the first time

This sometimes means you become a hostage of decisions that were made before you. The best way to deal with such situations is to simply accept it and do your best to get it up and running - as R.P. Warren wrote in his All the King’s Men, ”You have to make the good out of the bad because that is all you have to make it out of“.

To sum it all up - the market of RS is still young and expanding - so good luck trying yourself in this field! There definitely going to be some sweat and tears, but with no doubt, it will be worth it!

2018/04/08