Strap in as we take this long-awaited journey into how I, a human being, created one of the greatest sites to ever launch on the internet. (Had to say that, lol!)
Well, it started just like any other stroy, I was on the couch watching Netflix.
And while I was on this great task, their "Because you watched..." feature in the home page caught my eyes.
I mean, it’s not the first time I’ve seen a recommendation system; I’m a data science student, after all.
Nevertheless, it was quite uncanny to see how accurate it was. (I know people on Reddit complain about how inaccurate it is, lol)
This whole thing sparked a question: Could I build something like that?
Fast forward to taking the Machine Learning course on Coursera, which helped me better understand how I could actually build a recommender for this app.
In this not-so-small blog, I will try to explain how I created this app.
Hopefully, I will inspire you to create something magical like it.
Check out the source code, or hit the live demo button to check it out.
Building The Recommendation Engine
1. From Collaborative to Content-Based:
The Machine Learning course introduced me to collaborative filtering, in basic terms it is matchmaking based on user behavior.
Let's say, you and I both love Inception, and I praise Interstellar, the system might suggest it to you.
It's quite powerful, however, it needs a ton of user data.
Having beginning afresh with YMovies, I didn't have that luxury.
And so, I turned to content-based filtering. Which looks at a movie's features, genres, plot, cast... and then it finds similar ones based on what you've liked.
Which makes it perfect for a new app, especially with the TMDB API giving rich movie metadata.
For anyone starting off a project with limited user data, it's a great place to start.
2. Creating the Content-Based Recommender
At the core of YMovies is a content-based recommender built with TF-IDF vectorization and cosine similarity.
In short, I combined each movie’s key details (genres, synopsis, cast, and keywords) into a single text string.
TF-IDF turns that text into vectors, which highlights key terms across the movie catalog. Then, cosine similarity measures how close two movies are in this vector space.
For example, if you liked The Dark Knight, it might suggest Batman Begins based on shared genres (Action, Drama) and cast (Christian Bale).
Check out this section from content_based_recommender.py
that emphasizes certain features, like genres and
directors by giving them higher weights:
def _combine_features(self, row):
features = []
if 'genres' in row and row['genres']:
genres = row['genres'].split()
features.extend([g for g in genres for _ in range(3)]) # Triple weight for genres
if 'overview' in row and row['overview']:
features.append(row['overview'])
if 'cast' in row and row['cast']:
cast_list = row['cast'].split()[:5]
features.extend([c for c in cast_list for _ in range(2)]) # Double weight for cast
if 'director' in row and row['director']:
features.extend([row['director']] * 3) # Triple weight for director
if 'keywords' in row and row['keywords']:
features.append(row['keywords'])
return ' '.join(features)
This enables the get_similar_movies
function to highlight movies similar to your favorites, forming essentially
the backbone of the "Because you liked..." feature.
3. Mixing It Up with a Hybrid Recommender
Content-based filtering was a good start, but I wanted YMovies to feel personal.
This is where the hybrid recommender in hybrid_recommender.py
came in handy. It mixes content similarity with
user preferences from a quiz and interaction history.
For new users, the quiz solves the cold start problem, those first recommendations when there is no data to lean on.
Here is how it works:
def get_recommendations(self, user_data, n=20):
recommendations = []
if liked_movie_ids:
for movie_id in recent_liked_ids:
similar_movies = self.get_because_you_liked_recommendations(movie_id, n=10)
# Filter out watched movies and add to recommendations
if quiz_genres:
quiz_recs = self._get_quiz_based_recommendations(
quiz_genres, quiz_year_range, quiz_duration, exclude_ids, n=20
)
# Add quiz-based picks
return recommendations
If you have liked Inception, it suggests similar films.
If you said "Sci-Fi" and "recent" in the quiz, it narrows down to movies like Interstellar. It's a combo of math and user insight.
The "Because You Liked..." Magic
That Netflix-inspired "Because you liked..." section? It's alive in YMovies. The get_because_you_liked_recommendations
method uses content similarity to suggest movies, tagging each with a reason like "Because you liked Inception." I made sure to exclude movies you've already watched or added to your watchlist—keeping it fresh and relevant.
Solving the Cold Start with a Quiz
For newbies, the quiz is a lifesaver. It asks about genres (e.g., Action, Romance), year ranges (recent or classic), and runtime preferences (short, medium, long). The _get_quiz_based_recommendations
function filters the movie pool accordingly:
- Recent: Last 5 years
- Classic: Pre-2000
- Short: Under 100 minutes
It then ranks by popularity and ratings, ensuring solid picks from day one. If you're building a system, a quiz like this is a quick win for onboarding users.
Technical Hurdles and Fixes
Building YMovies wasn't all smooth sailing. Here's what I wrestled with:
- Data Wrangling: TMDB data was a goldmine, but messy—genre IDs needed mapping, overviews had gaps. I cleaned it up in
app.py
'sload_movie_data
function. - Speed: Calculating similarities for thousands of movies bogged things down.
- Sparse matrices for TF-IDF vectors saved the day—pro tip for handling big datasets.
- Balancing Act: How much should quiz answers weigh versus liked movies? I tweaked it with a
diversity_factor
to mix things up without being repetitive.
Serving It Up with Flask
The Flask app in app.py
ties it all together, offering endpoints like:
/recommendations/similar/<movie_id>
: Content-based picks/recommendations/personalized
: Full hybrid recommendations/recommendations/quiz-based
: Quiz-driven suggestions
It pulls movie data from TMDB, caches it, and initializes the recommenders on startup. Deployed on Vercel with Neon's serverless PostgreSQL, it scales like a dream.
What I Learned
After three iterations, YMovies taught me a ton:
- Content-Based is Your Friend: Great for starting with item metadata.
- Hybrid Wins: Combining approaches beats any single method.
- Data Quality Matters: Clean, rich data fuels better recommendations.
- User Experience is Key: It's not just about accuracy—surprise and delight matter too.
Production Results & Impact
YMovies isn't perfect, but when I see "Because you liked The Matrix" pop up with Blade Runner 2049, I know it's working. Want to build your own? Start with what data you have, add a personal touch, and iterate. Who knows—you might just recommend the next big hit!
So, there you have it! From a Netflix "aha!" moment to a full-blown recommendation system, I've built something
awesome. content_based_recommender.py
finds movie twins, hybrid_recommender.py
adds the user's personality,
and app.py
delivers it with a bow. I've tackled sparse data, performance hiccups, and user experience like a pro.
Next time I'm tweaking this, maybe I'll play with weighting the hybrid inputs more dynamically—could be a fun experiment! Building YMovies taught me that the best recommendations aren't just about accuracy—they're about surprise and delight too.