๐ŸŽฌ
Video Platform
System Design
VOD Ingestion PlatformHigh-Throughput TranscodingGlobal Edge Caching

Design a Video Streaming Service

Youtube is a global online video-sharing and social media platform where users can upload, view, rate, share, comment on, and subscribe to digital video content. Launched in 2005, it operates as a major search engine and entertainment hub. Netflix is a global subscription-based streaming service that allows users to watch a vast library of TV shows, movies, documentaries, and specials on internet-connected devices. In this guide, we will design a highly scalable Video on Demand (VOD) service. We will explore how to balance the write-heavy creator pipeline (uploading, transcoding, and processing) with the read-heavy distribution network (CDN edge-caching and adaptive chunk playback) to support hundreds of millions of concurrent viewers worldwide.

Beginner's Guide

How YouTube & Netflix Work (For Beginners)

๐Ÿ’ก
In Plain Terms

Both YouTube and Netflix deliver streaming video, but their background workloads are entirely different.

YouTube is a public stage. Anyone can upload video files at any time (User-Generated Content). The system must quickly ingest, transcode, and catalog these raw uploads so viewers can discover and watch them globally in seconds.

Netflix is a curated cinema. Only administrators publish content in scheduled, high-quality batches. The engineering focus is strictly on caching and delivering a zero-lag, stutter-free playback experience to viewers globally.

The Post-Office Analogy: Imagine a global post office. An author writes a heavy book (raw video upload). The central office slices it up into small 6-second chapters (segmentation), prints them in multiple languages and sizes (transcoding), and ships them to thousands of local stands worldwide (CDNs). When a reader wants to read, they fetch chapter 1, and while reading, their desk automatically fetches the next few chapters based on how fast they read (adaptive streaming).

In this guide, we will design a highly scalable Video on Demand (VOD) service. We will explore how to balance the write-heavy creator pipeline (uploading, transcoding, and processing) with the read-heavy distribution network (CDN edge-caching and adaptive chunk playback) to support hundreds of millions of concurrent viewers worldwide.

Step 01

Functional Requirements

Must-Have Features

๐Ÿ“ค 1. Upload & Ingest

  • โœ“
    Chunked Video Upload: System must support uploading large video payloads (up to 20GB) without consuming memory limits on application servers.
  • โœ“
    Upload Pause & Resume: CRITICAL REQUIREMENT: System must support fault-tolerant uploads. If a creator's connection drops, the upload must resume from the last successful chunk without restarting.
  • โœ“
    Creator Dashboard Upload Status: Allows video publishers to monitor active upload completion and processing states.

๐Ÿ“บ 2. Playback & Streaming

  • โœ“
    Adaptive Bitrate Streaming: Transports media seamlessly using segmented protocols (HLS/DASH) across varying client network profiles.
  • โœ“
    Signed Playback Indexing: Grants authenticated clients geo-restricted playlist maps tied to transient IP keys.

๐Ÿ” 3. Search & Discovery

  • โœ“
    Fuzzy Text Vector Search: Leverages index catalogs to lookup video records, matching descriptions and metadata instantly.
  • โœ“
    Autocomplete Type-ahead Suggestions: Generates high-speed query-matching listings in under 50ms based on historical text models.

Nice-to-Have

  • โ—‹ Content Creator Studio with in-browser clip trims.
  • โ—‹ Real-time chat overlay for premier view countdowns.
  • โ—‹ Interactive dynamic ad inserts mapping client sessions.
  • โ—‹ Multi-channel playlist groupings and video collections.

Out of Scope

  • โœ• Live-streaming pipeline ingestion (focus is strictly VOD architecture).
  • โœ• Turnkey digital content licensing and copyright audit tools.
  • โœ• Payment splits or recurring subscriber subscription billing structures.
  • โœ• Real-time view counter processing pipelines (e.g. streaming engines like Apache Flink or Spark. However, we briefly evaluate how these batch engines integrate with Redis state layers if requested).
Step 02

Non-Functional Requirements

To build a bulletproof streaming platform, we focus strictly on measurable targets. We separate actual physical performance targets (Latency, Availability, and Durability) from our structural and database engine designs:
โšกPlayback Latency
  • โ€บVideo playback startup: < 200ms anywhere globally
  • โ€บEdge segment retrieval: < 30ms latency boundaries
  • โ€บPlayer buffering recovery switches in < 1s
  • โ€บOptimizes cache efficiency using strict TTL separations for dynamic playlists and static segments
๐Ÿ›กAvailability targets
  • โ€บUptime target: 99.99% for segment retrieval
  • โ€บFailover routing: Edge CDNs degrade to regional rings if key hubs fail
  • โ€บDecoupled architecture protects video playing during upload spikes
๐Ÿ“ˆSystem Scale Limits
  • โ€บSupports 500M Daily Active Users (DAUs)
  • โ€บAccepts over 500 hours of uploaded media/min
  • โ€บEgress capacity handles 75 Tbps during peak hours
  • โ€บLeverages event-driven worker clustering triggered asynchronously by Kafka streams for GPU codec transcoding
โš–๏ธSystem Consistency
  • โ€บStrict consistency for profile changes and upload confirmations
  • โ€บEventual consistency (< 3 seconds) for playlist indexing and video list results
  • โ€บNear-Real-Time (NRT) indexing for global system text searches (~1s refresh limit)
๐Ÿ”‘Asset Security
  • โ€บSigned URL hashes restricted by timestamp limits and client IP bindings
  • โ€บDRM encryption loops (Widevine, FairPlay) protecting licensed video content
  • โ€บWAF rate limit triggers filtering automated bot networks
๐Ÿ’พUpload Durability
  • โ€บRaw video assets preserved with 11-nines reliability via S3 storage classes
  • โ€บTransactional tracking of multipart file write hashes
  • โ€บImmediate write-ahead log safety for crucial transaction registers
Step 03

Back-of-the-Envelope Estimation

Throughput Math

Constant VariableCalculated TargetUnderlying Formula
Global Footprint500M DAUsActive system base
Peak Concurrent streams25M users5% of daily active footprint at peak
New content Ingestion500 min/hr720,000 hrs uploaded per day
Average video length8 minutesAvg UGC size profile
Videos created / day5.4M records720K hrs ร— 60 min / 8 min length
Daily play sessions1.5B streams500M DAUs ร— 3 streams/day average
Average Query Rate17,400 QPS1.5B play calls / 86,400 seconds
Peak Query Burst87,000 QPS5x average load multiplier

Storage & Network Math

Storage Ingestion Rate
Calculates average 5 transcoding ladders (2.25 GB per video output)
12 PB / day
Year-1 Catalog Footprint
Incremental database growth before cold archive sweeps
4.3 EB
Peak Streaming Output
Concurrent 25M active streams ร— average 3 Mbps bitrate
75 Tbps
Ingress Video Pipeline
5.4M uploads ร— average 500MB raw payload sizes
250 Gbps
Daily Edge Egress Traffic
1.5B global view transactions ร— 18 MB media segments
27 PB / day
Transcode Instance Farm
Based on 360,000 active GPU execution hours per day
45,000 GPU-instances
Step 04

High-Level Design

๐Ÿ’ก
In Plain Terms

The pipeline divides clean duties. When a video is sent: it hits the Upload Service which tells S3 where to store the heavy raw bytes directly. Once S3 confirms it has the file, it drops a message on the Kafka event bus, which alerts the background conversion workers.

On the playback side, viewers query the Streaming Service to get a signed, customized chapters map. From that point forward, the viewer talks strictly to close-by CDN Edge nodes to fetch individual chapters, keeping our core databases fast and quiet.

Our services run statelessly. Dynamic API calls are processed via the gateway layers, and all heavy assets bypass the application servers entirely by writing directly to S3 bucket keys:

๐ŸŽฌ System Data Flow Direction: Left to Right
Client Gateways App Core Storage
CLIENT TIERSDELIVERY & GATEWAYSCORE SERVICESASYNC & CACHINGPERSISTENT DBsWeb BrowserMobile AppSmart TVEdge CDNWAF ProxyLoad BalancerAPI GatewayAuth SvcUpload SvcStream SvcSearch SvcTranscode SvcApache KafkaRedis ClusterS3 StorageDynamoDB (NoSQL)ElasticsearchCassandra (NoSQL)

๐Ÿ‘† Click components to trace architecture lineages

๐Ÿ›ก๏ธ Networking Breakdown: WAF vs. Load Balancer vs. API Gateway

In a large-scale architecture, the Web Application Firewall (WAF), Load Balancer (LB), and API Gateway (APIGW) do not represent the same server. They are distinct, decoupled infrastructure layers operating sequentially:

1. WAF Proxy (Edge Security)
OSI Layer 7 Security

Located nearest to the network perimeter (often integrated at the CDN layer). Its sole job is traffic inspection: filtering SQL injection, cross-site scripting (XSS), bot scraper clusters, and Layer-7 DDoS floods before they can even touch internal services.

2. Load Balancer (Infrastructure Entry)
High-Availability Distribution

A highly specialized appliance (such as AWS ALB or an NGINX ring) optimized to route massive traffic. It distributes the filtered, decrypted HTTPS payloads across a cluster of API Gateway servers, acting as the primary point of failure protection.

3. API Gateway (App Orchestration)
Stateless Routing & Logic

The entrance to your internal microservice mesh. Unlike LBs, the API Gateway runs custom software logic. It coordinates downstream calls, routes paths to individual microservices (e.g., /upload vs. /search), checks request rate-limits, and communicates directly with the Auth Service.

๐Ÿงฉ The Core Concept: Slicing the Loaf of Bread

In system engineering, we never send a single raw video file (which could be several gigabytes) directly down a wire to a user's phone. That would cause massive buffer stalls and high data usage!

Instead, we treat a video like a loaf of bread. During the transcoding phase, we slice the video into small, 6-second segment files (like thin slices of bread) formatted in fragmented MP4 (fMP4) or TS containers.

๐Ÿ“1 Raw Movie
โ†’
โš™๏ธTranscoder
โ†’
seg_01.m4s
seg_02.m4s
seg_03.m4s

When you hit play on YouTube or Netflix, your player fetches a map index file (called an HLS playlist or manifest). It then requests these individual 6-second slices one-by-one. If your Wi-Fi speeds slow down suddenly, the player seamlessly upshifts or downshifts the resolution of the *next* slice without crashing your viewing experience!

Direct-to-S3 Upload Path
Client sends video configurations -> Upload service validates JWT -> generates presigned multi-part S3 keys -> client streams chunks directly.
Kafka Event Bus Spine
Durable log distributing upload notifications, transcode milestones, analytics heartbeats, and database updates.
GPU Auto-Scaling Farm
NVIDIA accelerated worker clusters (Kafka Consumers) scale based on queue lag to process multi-format ladders.
Multi-Tier Cache
Three caching layers: local Guava heap storage, distributed Redis arrays, and global Edge CDNs.
Segmented Streams (fMP4)
Slices files into 6s standalone segments. Solves network switches instantly without interrupting playback.
Tokenized CDN Signatures
Prevents stream link sharing. CDN edge servers confirm token HMAC hashes and client IP bindings locally.

High-Level CAP Strategy

User Metadata (RDBMS)CP (Consistent / Partition)

Strict transactional profile mapping. Relational tables guarantee absolute consistency for account state management.

Video Catalogue (NoSQL)AP (Available / Partition)

Eventual consistency of metadata. Allows write speeds to scale infinitely; index delays of 1-3 seconds are visually imperceptible.

Activity Metrics (NoSQL)AP (Available / Partition)

Optimizes write streams globally. Active comments and activity logs continuous replication targets.

โš–๏ธ Architectural Alternatives & Design Decisions

S3 Intelligent Tiering vs. Static Storage Policies
๐ŸŽฏChosen: S3 Intelligent Tiering (Hot/Cold Class Lifecycle Management)
โœ“ Pro: Reduces raw media costs by up to 50% by automatically shifting older, stale, unviewed long-tail assets down to Glacier archive layers.
โœ— Con: Restoring retired archive objects to active nodes can introduce brief retrieval latency spikes if cold items are randomly requested.
In-house CDN Infrastructure vs. Third-Party CDNs (Akamai/Fastly)
๐ŸŽฏChosen: Third-Party CDNs (Edge PoPs) + Layer 2 Origin Shield
โœ“ Pro: Eliminates immense capital expenditure (CapEx) of building global physical data centers while keeping edge retrieval times under 30ms.
โœ— Con: Puts us at the mercy of egress network traffic fees from cloud partners at extreme global scales.
Direct-to-S3 Upload vs. Gateway Proxied Ingest
๐ŸŽฏChosen: Direct-to-Storage Ingestion (Presigned Multi-Part Chunk Uploads)
โœ“ Pro: Completely bypasses application servers, eliminating CPU and network memory constraints during massive creator spikes.
โœ— Con: Increases orchestrational complexity on client player engines to manage concurrent presigned URL mapping states.
Pre-transcoding All Video Ladders vs. On-Demand Transcoding
๐ŸŽฏChosen: Pre-transcoding All Quality Ladders (Asynchronous Encoding Paths)
โœ“ Pro: Guarantees instantaneous playback startup metrics (< 200ms) globally since target segment slices are fully cached and waiting.
โœ— Con: Increases the active storage footprint by 4-5x for unviewed long-tail creator catalog assets.
Symmetric vs. Asymmetric Transcode Triggering
๐ŸŽฏChosen: Asymmetric Transcode Triggering (Event-Driven Kafka Consuming)
โœ“ Pro: Event-driven asynchronous consumer loops protect server clusters from cascade bottlenecks when heavy raw files land.
โœ— Con: Forces creators to check active processing status indicators on their dashboard while worker queues churn.
Active-Active Global Databases vs. Partitioned Region Masters
๐ŸŽฏChosen: Partitioned Region Masters + High-Read Replicas
โœ“ Pro: Provides predictable transactional writes and clean consistency models without high risks of active-active split-brain collisions.
โœ— Con: Cross-region users accessing foreign home nodes can face slight read-path delays due to replication lag limits.
Step 05

Data Model

1:N1:N1:NM:NM:NM:Nusers (PG)๐Ÿ”‘id UUIDusername VARCHARemail VARCHARpassword_hash TEXTcreated_at TIMESTAMPTZvideos (DynamoDB / NoSQL)๐Ÿ”‘id string๐Ÿ”—user_id stringtitle stringstatus ENUMview_count numbercreated_at numbervideo_files (DynamoDB)๐Ÿ”‘id string๐Ÿ”—video_id stringquality stringcodec stringfile_url stringcomments (Cassandra / NoSQL)๐Ÿ”‘video_id UUID๐Ÿ”‘created_at TIMESTAMPTZcomment_id UUIDuser_id UUIDcontent TEXTlikes (NoSQL/Dynamo)user_id string HashKeyvideo_id string RangeKeycreated_at numbersubscriptions (NoSQL/Dynamo)subscriber_id string HashKeychannel_id string RangeKeycreated_at number

Relational Identity Storage (PostgreSQL)

Used strictly for structured authentication and transaction histories requiring ACID guarantees.

sql
-- Core User Schema (Relational PostgreSQL)
CREATE TABLE users (
  id               UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  username         VARCHAR(50)  UNIQUE NOT NULL,
  email            VARCHAR(255) UNIQUE NOT NULL,
  password_hash    TEXT         NOT NULL,
  created_at       TIMESTAMPTZ  DEFAULT NOW()
);

High-Volume Scale Catalog (DynamoDB NoSQL)

Used for infinitely scalable video catalog metadata. Sharded across wide-key partitions to support hundreds of millions of objects without scaling limits.

json
// Video Catalog Schema (Amazon DynamoDB Representation)
{
  "TableName": "videos",
  "KeySchema": [
    { "AttributeName": "id", "KeyType": "HASH" },       // Partition Key (UUID)
    { "AttributeName": "user_id", "KeyType": "RANGE" }  // Sort Key (Creator UUID)
  ],
  "AttributeDefinitions": [
    { "AttributeName": "id", "AttributeType": "S" },
    { "AttributeName": "user_id", "AttributeType": "S" }
  ],
  "BillingMode": "PAY_PER_REQUEST"
}

๐Ÿ“Š Architectural Evaluation: PostgreSQL vs. NoSQL

Architectural MetricRelational (PostgreSQL)NoSQL Key-Value (DynamoDB)Winner & Selection Rationale
Write ScalingLimited by single primary writes. Sharding is manual and complex.Infinite scale out. Seamless multi-partition routing.NoSQL Matches 5.4M uploads/day footprint seamlessly.
Schema FlexStrict DDL constraints. Schema migrations require lock safety.Schema-less. Easy attribute addition.NoSQL Allows adding flexible video transcoder profiles over time.
Query OperationsSupports complex multi-join relational query states.Key-value lookups only. Joins require custom app-level joins.Tie RDBMS for core payment ledgers; NoSQL for playback catalog paths.

Elasticsearch Index Mapping

Transforms metadata records into high-performance search-as-you-type indices. By defining the primary search target as a search_as_you_type type field, Elasticsearch automatically breaks text inputs down into structured edge n-grams (e.g. "sy", "sys", "syst", "system").

This indexing step eliminates the need for expensive, platform-crashing database wildcard regex scans (LIKE %query%) in production. Instead, autocomplete responses resolve in O(1) time complexity directly from fast pre-tokenized memory banks.

json
{
  "index": "videos",
  "mappings": {
    "properties": {
      "video_id":      { "type": "keyword" },
      "title": {
        "type": "text",
        "analyzer": "standard",
        "fields": {
          "autocomplete": { "type": "search_as_you_type" }
        }
      },
      "tags":          { "type": "keyword" },
      "view_count":    { "type": "long" },
      "published_at":  { "type": "date" }
    }
  }
}

Transient High-Write Schemas (Cassandra)

Used for high-throughput time-series records. Tuned consistency levels (QUORUM for writes, ONE for reads) prioritize availability over strict transactional guarantees.

sql
-- Threaded Video Comments
CREATE TABLE comments (
  video_id    uuid,
  created_at  timestamp,
  comment_id  uuid,
  user_id     uuid,
  content     text,
  PRIMARY KEY ((video_id), created_at, comment_id)
) WITH CLUSTERING ORDER BY (created_at DESC);

โš–๏ธ Architectural Alternatives & Design Decisions

Cassandra vs. Postgres for Viewer History
๐ŸŽฏChosen: Apache Cassandra Wide-Column Segments
โœ“ Pro: Cassandra provides linear write scaling, low disk footprint for wide tables, and localized partition reads.
โœ— Con: No transactional JOIN support; requires replicating user profiles across nodes.
Step 06

API Design

RESTJSON over HTTPSToken Authorized

Begins chunked multi-part session. Returns presigned URL map for TUS.

REQUEST BODY
json
{
  "title": "My Scale System Guide",
  "file_size": 2516582400,
  "mime_type": "video/mp4"
}
RESPONSE
json
{
  "upload_id": "ul_01F9A...",
  "video_id": "vid_01F9A...",
  "chunk_size": 5242880,
  "part_urls": [
    { "part_number": 1, "url": "https://s3.amazonaws.com/raw/part1?sig=..." }
  ]
}
KNOWN ERRORS
400 Bad Request โ€“ Unsupported codec or format413 Payload Too Large
Step 07

Deep Dive Subsystems

Direct Ingestion via TUS

๐Ÿ’ก
In Plain Terms
Instead of uploading a massive 20GB video file all at once (where a brief Wi-Fi drop forces you to restart from scratch), the TUS protocol acts like a bookmark. It splits the file into 5MB chunks. If your internet disconnects on chunk 300, it resumes from chunk 300 immediately upon reconnecting.

The client negotiates with the Upload Service to generate secure, presigned S3 URLs. The client then pushes 5MB chunks directly to the S3 bucket:

VOD Upload Swimlanes

ClientAPI GatewayUpload SvcS3 StorageKafka BusTranscoder (Consumer)POST /initiate-uploadAuthorizes and mapsCreateMultipartUpload()Presigned URL arrayUpload map responsePUT chunk[0..N] (Parallel)POST /completeCompleteMultipartUpload()Publish 'video.uploaded' eventConsume & Transcode asynchronously
๐Ÿ’ก Interview Tip: This code block is strictly for architectural illustration of concepts. In a system design interview, you are expected to map components and explain API bounds rather than write detailed TypeScript lines.
typescript
// Client-side chunked concurrent uploader using S3 Multipart
async function uploadPartWithRetry(file: File, uploadId: string, parts: PartUrl[]) {
  const CHUNK_SIZE = 5 * 1024 * 1024; // 5MB standard blocks
  const etags: { partNumber: number; etag: string }[] = [];
  const CONCURRENCY_LIMIT = 4;

  for (let i = 0; i < parts.length; i += CONCURRENCY_LIMIT) {
    const batch = parts.slice(i, i + CONCURRENCY_LIMIT);
    const results = await Promise.all(
      batch.map(async (part) => {
        const start = (part.partNumber - 1) * CHUNK_SIZE;
        const chunk = file.slice(start, start + CHUNK_SIZE);
        
        // Direct-to-S3 PUT write
        const res = await fetch(part.url, {
          method: "PUT",
          body: chunk,
          headers: { "Content-Type": file.type }
        });
        
        return { partNumber: part.partNumber, etag: res.headers.get("ETag")! };
      })
    );
    etags.push(...results);
  }
  
  // Confirms state completion and triggers transcoding queue
  await fetch(`/api/v1/videos/${uploadId}/complete`, {
    method: "POST",
    body: JSON.stringify({ parts: etags })
  });
}

Internal Lifecycle State machine

Init UploadParts CompletedKafka TriggerQualities OKJob ErrorClient RetryDRAFTUPLOADINGUPLOADEDTRANSCODINGPUBLISHEDFAILED

GPU Video Transcoding DAG (Directed Acyclic Graph)

To convert high-definition raw videos into streamable packets without blocking system threads, we split tasks into an independent **Directed Acyclic Graph (DAG)**. This ensures demuxing, multi-resolution scaling, image extraction, and watermark additions proceed in parallel pathways with isolated failure recovery:

Video Processing Directed Acyclic Graph (DAG) Subsystem

1. Ingest Raw File2. DemuxingGPU Transcode 4KGPU Transcode 1080pGPU Transcode 720pApply WatermarkGen Thumbnails3. Segment & Mux4. Write Manifest
๐Ÿ‘† Click any DAG node to inspect the underlying video processing tasks
We leverage NVIDIA hardware-accelerated encodings (h264_nvenc) to execute these tasks concurrently. Why hardware acceleration? Dedicated physical silicon ASIC block arrays on modern GPUs process pixel conversions and video compression matrix math much faster and more efficiently than standard multi-core CPUs. Offloading raw framing computations to these dedicated circuits lowers total CPU utilization by up to 95% and reduces infrastructure costs tenfold, enabling concurrent rendering of multiple 4K/1080p target streaming ladders.
bash
# FFmpeg segment scale command targeting visual VMAF perceptual scores
ffmpeg -y -hwaccel cuda -hwaccel_output_format cuda -i input_raw.mov \
  -vf "scale_cuda=1920:1080" \
  -c:v h264_nvenc -preset p4 -b:v 5000k -maxrate 5500k -bufsize 11000k \
  -c:a aac -b:a 192k \
  -f hls \
  -hls_time 6 \
  -hls_playlist_type vod \
  -hls_segment_type fmp4 \
  -hls_segment_filename "s3://prod-media/video-12/1080p/seg_%03d.m4s" \
  "s3://prod-media/video-12/1080p/index.m3u8"

Adaptive Bitrate (ABR) Controller

Implemented directly on the client player via standard BOLA buffer-driven logic. It monitors player buffer health and switches quality classes to prevent playback stutters:

๐Ÿ’ก Interview Tip: This code block is strictly for architectural illustration of concepts. In a system design interview, you are expected to map components and explain API bounds rather than write detailed TypeScript lines.
typescript
// Simplified client-side quality decider matching buffer capacities
class ABRController {
  private currentQuality = "720p";
  private bufferLevelSeconds = 30; // Active player buffer
  private estimatedBandwidthBps = 4500000;

  public getTargetQuality(): string {
    // Safety drop-down boundaries
    if (this.bufferLevelSeconds < 5) {
      return "360p"; // Emergency fallback
    }
    if (this.bufferLevelSeconds > 40 && this.estimatedBandwidthBps > 8000000) {
      return "1080p"; // Upshift
    }
    return this.currentQuality;
  }
}

Multi-Tier Cache & Origin Shield

We leverage an Origin Shield (L2 regional CDN shield) in front of S3 storage bucket arrays. This collapses thundering herd request patterns on popular video releases into single calls:

Client PlayerEdge CDNStream SvcOrigin ShieldS3 BlocksGET segmentVerify TokenOrigin FetchS3 ReadReturn binary segment chunks (.m4s)
Step 08

Bottlenecks & Scaling Mitigations

๐Ÿ’ก
In Plain Terms
Scale breaks everything. At 500M DAUs, directly writing play events to a database or serving video segments from origin servers will crash the platform. We must implement rate limiters, circuit breakers, and batching layers to safeguard our services.
๐Ÿ”ฅ Hot Celebrity Content StampsHIGH: Platform-Breaking

An account with 10M subscribers publishes a video, triggering a thundering herd request pattern that bypasses local CDN caches.

๐Ÿ› ๏ธ System Mitigations:
โ†’ Pre-warm Edge CDN caches: Query subscriber registers upon publishing. If subs > 100K, pre-warm the first 3 segments of all qualities.
โ†’ Implement L2 Regional Origin Shields to merge redundant, concurrent S3 read calls into a single query.
๐Ÿ—„๏ธ Database View-Count SaturationHIGH: Platform-Breaking

Millions of viewers trigger concurrent database writes, overwhelming the primary relational database locks.

๐Ÿ› ๏ธ System Mitigations:
โ†’ View-counters are kept OUT of primary transactional paths to protect DB performance.
โ†’ To scale metrics at peak: Route events through Apache Kafka, run aggregations using streaming engines like Apache Flink, and commit changes using Redis in-memory batch loops (INCR) to update databases every 60 seconds.
๐Ÿ’พ Storage Cost ExplosionMEDIUM: Performance Degrading

The infinite growth of uploaded UGC videos quickly balloons S3 storage costs.

๐Ÿ› ๏ธ System Mitigations:
โ†’ Apply S3 lifecycle policies: Migrate assets to Infrequent Access (IA) at 30 days, then to cold Glacier at 180 days.
โ†’ Automate the cleanup of older, unpopular 4K transcoded folders to reclaim disk space.
๐Ÿ“Š Transcoding Pipeline CongestionMEDIUM: Performance Degrading

A sudden influx of creators uploading content delays transcoding times, leaving videos stuck in a pending queue.

๐Ÿ› ๏ธ System Mitigations:
โ†’ Kubernetes KEDA scales GPU transcoding workers dynamically based on Kafka consumer lag metrics.
โ†’ Prioritize the transcoding queue: Route uploads from popular, verified creators to high-priority Kafka topics.

๐Ÿ“š Quiz: Test Your Understanding

Check how well you learned the URL shortener system design. 20 questions.

Question 1 of 200 / 20 correct

Why does the system use direct-to-S3 chunked uploads (via the TUS protocol) instead of routing video bytes through the API Gateway?

VOD Streaming Platform Design Walkthrough ยท 8-Step Framework ยท Built for Staff Engineering Candidates
HLSMPEG-DASHTUS ProtocolNVIDIA TranscodingOrigin ShieldElasticsearchCassandraPostgreSQL Sharding