Approach Outline
Instagram is similar to Twitter but image-heavy with feed ranking. Start with requirements: posting photos (up to 10 per post), following users, feed viewing, likes/comments, stories (24-hour expiry), and explore page. For scale, assume 500M daily active users, heavily read-focused, with strict latency requirements for image loading.
The critical difference from Twitter is the image processing pipeline. When a user uploads a photo, store the original in blob storage (S3), then trigger an async job to generate multiple resolutions (thumbnail 150x150, medium 640x640, full 1080x1080) for different use cases (feed thumbnail vs full view). Use a CDN aggressively since images are static and cacheable. Consider serving images from the CDN edge closest to the user for minimal latency.
For the feed, implement a ranking algorithm instead of pure chronological order. Factors include: recency (newer posts rank higher), relationship strength (close friends rank higher), engagement likelihood (predict based on past behavior), and content type (user prefers photos vs videos). Use a machine learning model trained on user engagement data. Cache ranked feeds in Redis, pre-compute for active users. Separate feed generation service from image serving for independent scaling.
Stories require different architecture: 24-hour TTL, priority on latest stories, separate storage with expiration. Explore page uses a recommendation engine analyzing user interests, trending content, and collaborative filtering. Track all engagement events (likes, comments, saves, shares) in a separate analytics pipeline (Kafka + data warehouse) for feed ranking and recommendations.
Critical Trade-offs
- Storage costs vs image quality: Multiple resolutions increase storage but improve UX and reduce bandwidth.
- Chronological vs ranked feed: Ranked feed improves engagement but adds complexity and may create filter bubbles.
- CDN costs vs latency: Aggressive CDN caching costs more but provides better user experience globally.
Social Networks
Design Twitter
Companies: Meta, Google, Netflix, Amazon, Uber, most FAANG companies
Key Concepts: Timeline generation, fanout strategies (push vs pull), caching, database sharding, celebrity problem, news feed ranking
Approach Outline
Start by clarifying functional requirements: posting tweets (280 chars), following users, timeline viewing, likes/retweets. For non-functional requirements, assume 300M daily active users, high read-to-write ratio (100:1), and real-time timeline updates within seconds.
The core challenge is timeline generation. For posting, store tweets in a distributed database (Cassandra/DynamoDB) and decide between fanout-on-write (push) or fanout-on-read (pull). Most designs use hybrid: fanout-on-write for regular users (pre-compute timelines and push to followers' feeds) but fanout-on-read for celebrities (too many followers to fanout). This solves the "celebrity problem" where a user with 50M followers would trigger 50M writes per tweet.
For timeline retrieval, check Redis cache first for pre-computed timelines. Cache hit means instant response. Cache miss means aggregate recent tweets from followed users, merge by timestamp, and cache the result. Use a CDN for media content (images/videos). Shard the database by user ID to distribute load. Implement pagination for timelines to avoid loading millions of tweets at once.
The tweet posting flow: User posts → Store in tweets DB → Fanout service reads user's followers → Write to followers' timeline cache → Send push notifications. Timeline retrieval: User requests timeline → Check Redis cache → Return cached timeline or fetch and merge from followed users' recent tweets → Cache result.
Critical Trade-offs
Common Mistakes
Design Instagram
Companies: Meta, Pinterest, Snapchat, TikTok, ByteDance
Key Concepts: Image storage and processing, feed ranking algorithms, CDN strategy, news feed generation, engagement tracking
Approach Outline
Instagram is similar to Twitter but image-heavy with feed ranking. Start with requirements: posting photos (up to 10 per post), following users, feed viewing, likes/comments, stories (24-hour expiry), and explore page. For scale, assume 500M daily active users, heavily read-focused, with strict latency requirements for image loading.
The critical difference from Twitter is the image processing pipeline. When a user uploads a photo, store the original in blob storage (S3), then trigger an async job to generate multiple resolutions (thumbnail 150x150, medium 640x640, full 1080x1080) for different use cases (feed thumbnail vs full view). Use a CDN aggressively since images are static and cacheable. Consider serving images from the CDN edge closest to the user for minimal latency.
For the feed, implement a ranking algorithm instead of pure chronological order. Factors include: recency (newer posts rank higher), relationship strength (close friends rank higher), engagement likelihood (predict based on past behavior), and content type (user prefers photos vs videos). Use a machine learning model trained on user engagement data. Cache ranked feeds in Redis, pre-compute for active users. Separate feed generation service from image serving for independent scaling.
Stories require different architecture: 24-hour TTL, priority on latest stories, separate storage with expiration. Explore page uses a recommendation engine analyzing user interests, trending content, and collaborative filtering. Track all engagement events (likes, comments, saves, shares) in a separate analytics pipeline (Kafka + data warehouse) for feed ranking and recommendations.
Critical Trade-offs
Design Facebook News Feed
Companies: Meta, LinkedIn, Reddit, Twitter, Pinterest
Key Concepts: Feed ranking, personalization, ML models, content diversity, real-time updates, ad integration
Facebook News Feed is more complex than Twitter due to diverse content types (posts, photos, videos, shared articles, ads) and sophisticated ranking. Use hybrid fanout with heavy emphasis on personalized ranking. Feed generation service aggregates potential posts, then ranking service scores each post using ML model considering engagement probability, content freshness, relationship strength, content diversity, and ad placement. Prefetch and pre-rank feeds for active users. Handle ads separately with targeting criteria and frequency caps.
Design TikTok
Companies: TikTok, Instagram (Reels), YouTube (Shorts), Snapchat
Key Concepts: Short video streaming, recommendation algorithm, content moderation, viral content detection, video transcoding
TikTok combines video streaming with highly personalized recommendations. Video upload triggers transcoding to multiple formats and resolutions. The "For You" feed is the core feature—ML-based recommendation using collaborative filtering, content-based filtering, and real-time signals (watch time, completion rate, likes, shares). Prefetch next 3-5 videos for seamless swipe experience. Track engagement metrics aggressively to feed recommendation model. Implement content moderation (ML + human review) to filter inappropriate content. Handle viral detection to scale infrastructure proactively when videos explode in popularity.
Design Reddit
Companies: Reddit, Hacker News, Stack Overflow, Discord
Key Concepts: Voting system, ranking algorithm (hot/top/controversial), threaded comments, subreddit isolation, moderation
Reddit's unique challenge is the voting-based ranking system. Posts have upvotes/downvotes that determine visibility. Implement ranking algorithms: "Hot" (upvotes with time decay), "Top" (net upvotes in time period), "Controversial" (high upvotes AND downvotes). Store posts in database with vote counts, update rankings periodically (not real-time to prevent gaming). Threaded comments require parent-child relationships in data model—use nested sets or path enumeration for efficient retrieval. Isolate subreddits as separate namespaces. Cache hot posts per subreddit. Implement moderation tools and spam detection.
Design LinkedIn Feed
Companies: LinkedIn, Twitter, Meta
Key Concepts: Professional network graph, connection recommendations, job recommendations, feed ranking by professional relevance
LinkedIn combines social feed with job search and professional networking. Feed ranking prioritizes professional content—industry news, career updates, thought leadership. Use graph database for connection network and "People You May Know" recommendations (mutual connections, shared employers/schools). Job recommendation engine matches user profile (skills, experience, location) with job postings. Search functionality for people, companies, and jobs using Elasticsearch. Feed includes posts from connections, company pages, and sponsored content. Track engagement differently than consumer social—article reads and thoughtful comments weight higher than likes.