Skip to main content

YMovies: Creating a Movie Recommendation Web App

00:04:58:40

Strap in as we take this long-awaited journey into how I, a human being, created one of the greatest sites to ever launch on the internet. (Had to say that, lol!)

Well, it started just like any other stroy, I was on the couch watching Netflix.

And while I was on this great task, their "Because you watched..." feature in the home page caught my eyes.

I mean, it’s not the first time I’ve seen a recommendation system; I’m a data science student, after all.

Nevertheless, it was quite uncanny to see how accurate it was. (I know people on Reddit complain about how inaccurate it is, lol)

This whole thing sparked a question: Could I build something like that?

Fast forward to taking the Machine Learning course on Coursera, which helped me better understand how I could actually build a recommender for this app.

In this not-so-small blog, I will try to explain how I created this app.

Hopefully, I will inspire you to create something magical like it.

Check out the source code, or hit the live demo button to check it out.

View on GitHub

Live Demo

Building The Recommendation Engine

1. From Collaborative to Content-Based:

The Machine Learning course introduced me to collaborative filtering, in basic terms it is matchmaking based on user behavior.

Let's say, you and I both love Inception, and I praise Interstellar, the system might suggest it to you.

It's quite powerful, however, it needs a ton of user data.

Having beginning afresh with YMovies, I didn't have that luxury.

And so, I turned to content-based filtering. Which looks at a movie's features, genres, plot, cast... and then it finds similar ones based on what you've liked.

Which makes it perfect for a new app, especially with the TMDB API giving rich movie metadata.

For anyone starting off a project with limited user data, it's a great place to start.

2. Creating the Content-Based Recommender

At the core of YMovies is a content-based recommender built with TF-IDF vectorization and cosine similarity.

In short, I combined each movie’s key details (genres, synopsis, cast, and keywords) into a single text string.

TF-IDF turns that text into vectors, which highlights key terms across the movie catalog. Then, cosine similarity measures how close two movies are in this vector space.

For example, if you liked The Dark Knight, it might suggest Batman Begins based on shared genres (Action, Drama) and cast (Christian Bale).

Check out this section from content_based_recommender.py that emphasizes certain features, like genres and directors by giving them higher weights:

python
def _combine_features(self, row):
    features = []
    if 'genres' in row and row['genres']:
        genres = row['genres'].split()
        features.extend([g for g in genres for _ in range(3)])  # Triple weight for genres
    if 'overview' in row and row['overview']:
        features.append(row['overview'])
    if 'cast' in row and row['cast']:
        cast_list = row['cast'].split()[:5]
        features.extend([c for c in cast_list for _ in range(2)])  # Double weight for cast
    if 'director' in row and row['director']:
        features.extend([row['director']] * 3)  # Triple weight for director
    if 'keywords' in row and row['keywords']:
        features.append(row['keywords'])
    return ' '.join(features)

This enables the get_similar_movies function to highlight movies similar to your favorites, forming essentially the backbone of the "Because you liked..." feature.

3. Mixing It Up with a Hybrid Recommender

Content-based filtering was a good start, but I wanted YMovies to feel personal.

This is where the hybrid recommender in hybrid_recommender.py came in handy. It mixes content similarity with user preferences from a quiz and interaction history.

For new users, the quiz solves the cold start problem, those first recommendations when there is no data to lean on.

Here is how it works:

python
def get_recommendations(self, user_data, n=20):
    recommendations = []
    if liked_movie_ids:
        for movie_id in recent_liked_ids:
            similar_movies = self.get_because_you_liked_recommendations(movie_id, n=10)
            # Filter out watched movies and add to recommendations
    if quiz_genres:
        quiz_recs = self._get_quiz_based_recommendations(
            quiz_genres, quiz_year_range, quiz_duration, exclude_ids, n=20
        )
        # Add quiz-based picks
    return recommendations

If you have liked Inception, it suggests similar films.

If you said "Sci-Fi" and "recent" in the quiz, it narrows down to movies like Interstellar. It's a combination of math and user insight.

The "Because You Liked..." section

The Netflix-inspired section is one of the main focuses of YMovies.

The get_because_you_liked_recommendations method uses content similarity to suggest movies, tagging each with a reason like "Because you liked Inception."

I made sure to exclude movies you've already added to your watchlist, just to keep it fresh and relevant.

Solving the Cold Start with a Quiz

For newbies, the quiz is an actual lifesaver.

It asks about genres (e.g., Action, Romance), year ranges (recent or classic), and runtime preferences (short, medium, long).

The _get_quiz_based_recommendations function filters the movie pool accordingly:

  • Recent: Last 5 years:
  • Classic: Pre-2000
  • Short: Under 100 minutes

It then ranks by popularity and ratings, making sure solid picks from the start.

Quick Note: If you're building a system, a quiz like this is a quick win for onboarding users.

Technical Problems and Fixes

Building YMovies wasn't all smooth sailing. Here's what I struggled with:

  • Data Wrangling: TMDB data was a goldmine, but the messy—genre IDs needed mapping, overviews had gaps... I cleaned it the in app.py's load_movie_data function.
  • Speed: Calculating similarities for thousands of movies slowed things down by a lot.
  • Sparse matrices: for TF-IDF vectors saved the day, pro tip for handling big datasets.
  • Balancing Act: How much should quiz answers weigh versus liked movies? I changed it with a diversity_factor to mix things without being repetitive.

Production Results

I'd be lying if I said that I was a hundred precent sure I'd come through with finishing this project. But now that I'm here, I'm glad I did it. It was a slow start but I've a learned a lot. If you're just joining us here is what happens in YMovies, content_based_recommender.py finds movie twins, hybrid_recommender.py adds the user's personality, and app.py delivers it with a bow.

Next time I'm chaning this, maybe I'll build something with the ByteDance open source recoomendation algorithm.

PS: I didn't use the quiz feature after all, still, I do recommend it.