Skip to main content

YMovies: Creating a Movie Recommendation Web App

00:05:35:20

Strap in as we take this long-awaited journey into how I, a human being, created one of the greatest sites to ever launch on the internet. (Had to say that, lol!)

Well, it started just like any other stroy, I was on the couch watching Netflix.

And while I was on this great task, their "Because you watched..." feature in the home page caught my eyes.

I mean, it’s not the first time I’ve seen a recommendation system; I’m a data science student, after all.

Nevertheless, it was quite uncanny to see how accurate it was. (I know people on Reddit complain about how inaccurate it is, lol)

This whole thing sparked a question: Could I build something like that?

Fast forward to taking the Machine Learning course on Coursera, which helped me better understand how I could actually build a recommender for this app.

In this not-so-small blog, I will try to explain how I created this app.

Hopefully, I will inspire you to create something magical like it.

Check out the source code, or hit the live demo button to check it out.

View on GitHub

Live Demo

Building The Recommendation Engine

1. From Collaborative to Content-Based:

The Machine Learning course introduced me to collaborative filtering, in basic terms it is matchmaking based on user behavior.

Let's say, you and I both love Inception, and I praise Interstellar, the system might suggest it to you.

It's quite powerful, however, it needs a ton of user data.

Having beginning afresh with YMovies, I didn't have that luxury.

And so, I turned to content-based filtering. Which looks at a movie's features, genres, plot, cast... and then it finds similar ones based on what you've liked.

Which makes it perfect for a new app, especially with the TMDB API giving rich movie metadata.

For anyone starting off a project with limited user data, it's a great place to start.

2. Creating the Content-Based Recommender

At the core of YMovies is a content-based recommender built with TF-IDF vectorization and cosine similarity.

In short, I combined each movie’s key details (genres, synopsis, cast, and keywords) into a single text string.

TF-IDF turns that text into vectors, which highlights key terms across the movie catalog. Then, cosine similarity measures how close two movies are in this vector space.

For example, if you liked The Dark Knight, it might suggest Batman Begins based on shared genres (Action, Drama) and cast (Christian Bale).

Check out this section from content_based_recommender.py that emphasizes certain features, like genres and directors by giving them higher weights:

python
def _combine_features(self, row):
    features = []
    if 'genres' in row and row['genres']:
        genres = row['genres'].split()
        features.extend([g for g in genres for _ in range(3)])  # Triple weight for genres
    if 'overview' in row and row['overview']:
        features.append(row['overview'])
    if 'cast' in row and row['cast']:
        cast_list = row['cast'].split()[:5]
        features.extend([c for c in cast_list for _ in range(2)])  # Double weight for cast
    if 'director' in row and row['director']:
        features.extend([row['director']] * 3)  # Triple weight for director
    if 'keywords' in row and row['keywords']:
        features.append(row['keywords'])
    return ' '.join(features)

This enables the get_similar_movies function to highlight movies similar to your favorites, forming essentially the backbone of the "Because you liked..." feature.

3. Mixing It Up with a Hybrid Recommender

Content-based filtering was a good start, but I wanted YMovies to feel personal.

This is where the hybrid recommender in hybrid_recommender.py came in handy. It mixes content similarity with user preferences from a quiz and interaction history.

For new users, the quiz solves the cold start problem, those first recommendations when there is no data to lean on.

Here is how it works:

python
def get_recommendations(self, user_data, n=20):
    recommendations = []
    if liked_movie_ids:
        for movie_id in recent_liked_ids:
            similar_movies = self.get_because_you_liked_recommendations(movie_id, n=10)
            # Filter out watched movies and add to recommendations
    if quiz_genres:
        quiz_recs = self._get_quiz_based_recommendations(
            quiz_genres, quiz_year_range, quiz_duration, exclude_ids, n=20
        )
        # Add quiz-based picks
    return recommendations

If you have liked Inception, it suggests similar films.

If you said "Sci-Fi" and "recent" in the quiz, it narrows down to movies like Interstellar. It's a combo of math and user insight.

The "Because You Liked..." Magic

That Netflix-inspired "Because you liked..." section? It's alive in YMovies. The get_because_you_liked_recommendations method uses content similarity to suggest movies, tagging each with a reason like "Because you liked Inception." I made sure to exclude movies you've already watched or added to your watchlist—keeping it fresh and relevant.

Solving the Cold Start with a Quiz

For newbies, the quiz is a lifesaver. It asks about genres (e.g., Action, Romance), year ranges (recent or classic), and runtime preferences (short, medium, long). The _get_quiz_based_recommendations function filters the movie pool accordingly:

  • Recent: Last 5 years
  • Classic: Pre-2000
  • Short: Under 100 minutes

It then ranks by popularity and ratings, ensuring solid picks from day one. If you're building a system, a quiz like this is a quick win for onboarding users.

Technical Hurdles and Fixes

Building YMovies wasn't all smooth sailing. Here's what I wrestled with:

  • Data Wrangling: TMDB data was a goldmine, but messy—genre IDs needed mapping, overviews had gaps. I cleaned it up in app.py's load_movie_data function.
  • Speed: Calculating similarities for thousands of movies bogged things down.
  • Sparse matrices for TF-IDF vectors saved the day—pro tip for handling big datasets.
  • Balancing Act: How much should quiz answers weigh versus liked movies? I tweaked it with a diversity_factor to mix things up without being repetitive.

Serving It Up with Flask

The Flask app in app.py ties it all together, offering endpoints like:

  • /recommendations/similar/<movie_id>: Content-based picks
  • /recommendations/personalized: Full hybrid recommendations
  • /recommendations/quiz-based: Quiz-driven suggestions

It pulls movie data from TMDB, caches it, and initializes the recommenders on startup. Deployed on Vercel with Neon's serverless PostgreSQL, it scales like a dream.

What I Learned

After three iterations, YMovies taught me a ton:

  • Content-Based is Your Friend: Great for starting with item metadata.
  • Hybrid Wins: Combining approaches beats any single method.
  • Data Quality Matters: Clean, rich data fuels better recommendations.
  • User Experience is Key: It's not just about accuracy—surprise and delight matter too.

Production Results & Impact

YMovies isn't perfect, but when I see "Because you liked The Matrix" pop up with Blade Runner 2049, I know it's working. Want to build your own? Start with what data you have, add a personal touch, and iterate. Who knows—you might just recommend the next big hit!

So, there you have it! From a Netflix "aha!" moment to a full-blown recommendation system, I've built something awesome. content_based_recommender.py finds movie twins, hybrid_recommender.py adds the user's personality, and app.py delivers it with a bow. I've tackled sparse data, performance hiccups, and user experience like a pro.

Next time I'm tweaking this, maybe I'll play with weighting the hybrid inputs more dynamically—could be a fun experiment! Building YMovies taught me that the best recommendations aren't just about accuracy—they're about surprise and delight too.