BookHub

Overview
A personalized book recommendation platform built on Goodreads datasets containing 228M+ user-book interactions.
BookHub is a personalized book recommendation platform designed to help readers discover books aligned with their interests. Built using Goodreads datasets containing over 228 million user-book interactions, the system combines popularity-based recommendations with collaborative filtering to deliver relevant suggestions while addressing the cold-start problem.
Key Highlights
- Processed and filtered Goodreads datasets containing 228M+ interactions.
- Designed a hybrid recommendation flow combining onboarding recommendations with collaborative filtering.
- Built an end-to-end recommendation pipeline using Next.js, FastAPI, Supabase, and Scikit-learn.
BookHub Demo
The Problem
Traditional collaborative filtering models rely on historical user interactions. New users have no rating history, making it difficult to generate meaningful recommendations. At the same time, the Goodreads dataset was too large to process efficiently without significant preprocessing and filtering.
Challenges & Solutions
| Challenge | Solution |
|---|---|
| Cold Start Problem | Popularity-based onboarding recommendations |
| Sparse Rating Data | Dataset cleaning and filtering |
| Large-Scale Processing | Chunk-based preprocessing pipeline |
Recommendation Flow
Users begin by selecting genres and rating a curated list of popular books. These initial preferences are stored in Supabase and used to bootstrap the recommendation engine.
User Journey
`text
Genres → Popular Books → User Ratings
↓
Collaborative Filtering
↓
Personalized Recommendations
`
As more ratings are collected, recommendations become increasingly personalized through similarity-based matching.
My Contributions
Frontend Development
- Developed the complete onboarding and recommendation experience using Next.js.
- Built book search, rating, filtering, and recommendation interfaces.
- Designed user flows for collecting preference data with minimal friction.
Data & Recommendation Pipeline
- Processed and cleaned large Goodreads datasets for model training.
- Assisted in building sparse user-book matrices for recommendation generation.
- Worked on collaborative filtering pipelines using cosine similarity.
- Integrated FastAPI services with Supabase for recommendation delivery.
Infrastructure & Integration
- Connected Google Colab training workflows with application APIs.
- Integrated ngrok-based development environments for model serving.
- Established data synchronization between the recommendation engine and frontend application.
Technical Details
Dataset Metrics
| Metric | Count |
|---|---|
| Books | 2.3M+ |
| Users | 876K+ |
| Interactions | 228M+ |
| Ratings | 104M+ |
Recommendation Approach
The recommendation engine uses user-based collaborative filtering.
User preference vectors are transformed into sparse matrices, and cosine similarity is applied to identify readers with similar interests. Recommendations are generated from highly similar users while popularity-based suggestions provide coverage for new accounts.
Core Techniques
- Collaborative Filtering for personalized recommendations.
- Cosine Similarity for identifying similar readers.
- Popularity-Based Recommendations for solving the cold-start problem.
- Sparse Matrix Representation for efficient large-scale computation.
Tech Stack
| Category | Technologies |
|---|---|
| Frontend | Next.js, Tailwind CSS |
| Backend | FastAPI |
| Database | Supabase (PostgreSQL) |
| Machine Learning | Pandas, NumPy, Scikit-learn |
| Techniques | Collaborative Filtering, Cosine Similarity |
| Infrastructure | Google Colab, ngrok |