Case Study: URL Shortener (TinyURL)
"Design TinyURL" — interview-এর সবচেয়ে কমন প্রশ্ন।
আপনি একটি ১০০ character-এর URL share করতে চান। Tweet-এ বা SMS-এ — অনেক character লাগে। সমাধান: bit.ly/abc123 — short URL যা original-এ redirect করে। কীভাবে এই system বানাবেন? এটা interview-এর classic question।
Step 1: Requirements
Functional
- Long URL → short URL generate।
- Short URL access → original-এ redirect।
- Optional: custom alias (vanity URL)।
- Optional: expiration।
- Analytics: click count, source।
Non-Functional
- High availability (99.9%+) — broken link no good।
- Low latency — <100ms redirect।
- Scalability — billions of URLs।
- Read-heavy: 100:1 ratio।
Step 2: Capacity Estimation
URLs created: 100M/month
URLs/sec (write): 100M / 30days / 86400 ≈ 40 writes/sec
Reads/sec: 40 × 100 = 4,000 reads/sec
Storage:
Per URL: ~500 bytes (URL + metadata)
100M × 500 bytes/month = 50 GB/month
5 years: 6TB
Memory cache (hot 20%):
6TB × 0.2 = 1.2TB
Step 3: API Design
POST /shorten
body: { longUrl, customAlias?, expirationDate? }
returns: { shortUrl }
GET /{shortCode}
returns: 302 redirect to longUrl
GET /{shortCode}/analytics
returns: { clicks, lastAccessed, sources }
Step 4: Data Model
URLs table:
| short_code | long_url | user_id | created_at | expires_at |
| abc123 | https://... | 42 | 2026-01-01 | 2027-01-01 |
PRIMARY KEY: short_code
INDEX: user_id (analytics)
RDBMS (PostgreSQL) বা NoSQL (DynamoDB) — depends on scale।
Short Code Generation Strategy
Approach 1: Random
Random 6-char string। Collision check + retry।
- Pros: Simple।
- Cons: Collision check expensive।
Approach 2: Hash (MD5/SHA)
Long URL hash + first 6-7 char।
- Pros: Same URL → same short।
- Cons: Collision possible।
Approach 3: Base62 Counter
Auto-increment ID → base62 encode।
62 chars: a-z, A-Z, 0-9
6 chars: 62^6 ≈ 56 billion combinations
7 chars: 62^7 ≈ 3.5 trillion
ID 1 → "1"
ID 125 → "21"
ID 999999 → "4c91"
- Pros: No collision, deterministic।
- Cons: Sequential — predictable।
Approach 4: Snowflake ID
Distributed ID generation (Twitter Snowflake, UUID v7)।
Step 5: High-Level Architecture
[Client] → [DNS] → [LB] → [Web Servers]
↓
[ID Gen Service] [Cache (Redis)]
↓ ↑
[URL Database]
↓
[Analytics Pipeline]
Step 6: Component Deep-Dive
ID Generation
Distributed counter বা Snowflake। Single source = bottleneck — Zookeeper-এ shard।
Cache Layer
- Hot URLs Redis-এ।
- LRU eviction।
- Cache hit ratio target 90%+।
Database
- SQL: PostgreSQL — moderate scale।
- NoSQL: Cassandra/DynamoDB — massive scale।
- Sharding by short_code prefix।
Redirect Flow
- GET /abc123।
- Check Redis — found? → 302 redirect।
- Not found? → DB lookup → cache → 302।
- Async: increment click count।
301 vs 302 Redirect
- 301 (permanent): Browser cache — analytics miss। Faster repeat visit।
- 302 (temporary): Hits server every time — accurate analytics।
- Most shorteners use 302।
Step 7: Scale Considerations
Read Path Optimization
- CDN edge cache।
- Geographic replicas।
- Aggressive Redis caching।
Write Path
- Async analytics write (Kafka)।
- ID generation pre-allocation (batch)।
Database Scaling
- Read replicas।
- Sharding by short_code hash।
Reliability
- Multi-region active-passive।
- Cache failover।
Advanced Topics
Custom Aliases
User-provided string → uniqueness check।
Expiration / TTL
Background job purge expired URLs।
Analytics
- Click event → Kafka → analytics DB।
- Aggregated reports।
- Geo, device, referrer।
Abuse Prevention
- Rate limit per user।
- Malicious URL detect (Google Safe Browsing)।
- CAPTCHA।
Real World
- bit.ly: Industry leader।
- TinyURL: Original।
- t.co: Twitter-এর internal।
- youtu.be: YouTube।
- goo.gl: Google (২০১৯-এ shut)।
Common Trade-offs
- Hash vs counter: collision vs predictability।
- SQL vs NoSQL: ACID vs scale।
- 301 vs 302: speed vs analytics।
- Synchronous analytics vs async: accuracy vs speed।
Common Mistakes
- UUID-এর মতো long ID — defeats purpose।
- No cache — DB overwhelmed।
- Sync analytics — write slow।
- Single ID generator — SPOF।
📌 চ্যাপ্টার সারমর্ম
- URL shortener = read-heavy 100:1, billions URLs।
- Base62 counter বা Snowflake ID generation।
- Redis cache layer essential।
- 302 redirect — analytics।
- Sharding + CDN + async analytics — scale।