Course Schedule
Each day’s reading corresponds to the content that will be covered on that day’s class. Each day will have a Main paper and one or more companion papers.
Day | Lecture (Week) | Description | Readings | Presentations | Notes | |
---|---|---|---|---|---|---|
Background | ||||||
1/15 | 1 (1) | Intro | Syllabus | Intro Slides | HW0 and HW1 are out | |
1/17 | 2 (1) | Data Center Architecture | Architecture: Compute+Overall: “The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines” (Chapters 1 and 2) | Data center as a computer - Brent | ||
1/22 | 3 (2) | Data Center Networks | Architecture: Networks: “Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network” (Optional) Architecture: Network Routing: VL2 (Optional) |
Data Center Networking - Brent | ||
1/24 | 4 (2) | Search | Search: Google: “The anatomy of a large-scale hypertextual Web search engine” PerfIso: PerfIso: “PerfIso: Performance Isolation for Commercial Latency-Sensitive Services” |
Search - Brent PerfIso - Brent |
||
Storage Systems | ||||||
1/29 | 5 (3) | Intro to Storage | Storage: GFS: “The Google File System” | GFS and GFS2 - Brent | ||
1/31 | 6 (3) | Class Cancelled (Snow Day) | ||||
2/5 | 7 (4) | Highly-Available Storage and Structured Storage | Storage: Chubby: “The Chubby lock service for loosely-coupled distributed systems” (Optional) Storage: Bigtable: “Bigtable: A Distributed Storage System for Structured Data” |
Chubby1 Chubby2 - Brent Bigtable1 - Brent Bigtable2 - Brent |
||
2/7 | 8 (4) | Transactional Storage | Storage: Spanner: “Spanner: Googles Globally-Distributed Database” | Spanner - Brent Colossus - Brent |
||
Big Data | ||||||
2/12 | 9 (5) | Intro to Big Data | Execution: MR: “MapReduce Simplified Data Processing on Large Clusters” Execution: Spark: “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing” |
Map Reduce - Brent Spark - Brent |
||
More Storage Systems | ||||||
2/14 | 10 (5) | Blob Storage | Storage: f4: “f4: Facebook’s Warm BLOB Storage System” (Optional) Storage: FDS: “Flat Datacenter Storage” (Optional) |
f4 - dporte7 FDS1 and FDS2 - Brent |
||
2/19 | 11 (6) | Key-Value Stores | Storage: Dynamo: “Dynamo: Amazon’s Highly Available Key-value Store” Storage: MemcachedFacebook: “Scaling Memcache at Facebook” (Optional) |
Dynamo - Brent MemcachedFacebook - Brent |
||
Scheduling and Resource Management | ||||||
2/21 | 12 (6) | Orchestration | ResourceNeg: Borg: “Borg: Large-scale cluster management at Google with Borg” ResourceNeg: Mesos: “Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center” (Optional) ResourceNeg: YARN |
Apache Hadoop YARN: Yet Another Resource Negotiator | ResourceNeg: YARN - kshind2 | |
Batch Analytics | ||||||
2/26 | 13 (7) | SQL | SQL: SparkSQL: “Spark SQL: Relational Data Processing in Spark” SQL: Hive “Major technical advancements in Apache Hive” (Optional) SQL: Impala: “Impala: A Modern, Open-Source SQL Engine for Hadoop” (Optional) SQL: Trill: “Trill: A High-Performance Incremental Query Processor for Diverse Analytics” (Optional) |
SQL: Hive - bjain6 SQL: Impala - psingh56 |
||
Stream Analytics | ||||||
2/28 | 14 (7) | Intro to Stream Analytics | Streaming: Aurora: “Aurora: a new model and architecture for data stream management” (Optional) Streaming: SparkStreaming : “Discretized Streams: Fault-Tolerant Streaming Computation at Scale” |
Streaming: Aurora - sjamal7 Streaming: SparkStreaming - avenka35 |
||
3/5 | 15 (8) | More Streaming | Streaming: Flink: “Apache Flink: Stream and Batch Processing in a Single Engine” Streaming: Dataflow: “The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing” (Optional) Streaming: Heron: “Twitter Heron: Stream Processing at Scale” (Optional) Streaming: FacebookStreaming: “Realtime Data Processing at Facebook” (Optional) Streaming: Drizzle: “Drizzle: Fast and Adaptable Stream Processing at Scale” (Optional) Streaming: Gloss: “Gloss: Seamless Live Reconfiguration and Reoptimization of Stream Programs” (Optional) Streaming: Scaling: “ Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows” (Optional) |
Streaming: Dataflow - spani2 Streaming: Flink - Brent Streaming: Heron - Brent Streaming: Kafka - GumGum Kafka - Better Kafka - Brent |
||
3/7 | 16 (8) | Queue Messaging | QMS: Kafka: “Kafka Distributed Messaging System for Log Processing” | QMS: Kafka - wtoher2 | ||
3/12 | 17 (9) | Alternate Analytics | Social: FacebookAnalytics: “Data warehousing and analytics infrastructure at Facebook” Execution: Dryad: “Dryad:Distributed Data-Parallel Programs from Sequential Building Blocks” (Optional) Execution: CIEL: “CIEL: a universal execution engine for distributed data-flow computing” (Optional) Execution: DryadLINQ: “DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language” (Optional) |
|||
Job Scheduling | ||||||
3/14 | 18 (9) | Job Scheduling | Scheduling: Packing: “(Carbyne:) Altruistic Scheduling in Multi-Resource Clusters” Scheduling: Packing: “(Tetris:) Multi-Resource Packing for Cluster Schedulers” (Optional) Scheduling: Packing: “Quincy: Fair Scheduling for Distributed Computing Clusters” (Optional) |
|||
Machine Learning Frameworks | ||||||
3/19 | 19 (10) | Parameter Server | ML: ParamServ: “Scaling Distributed Machine Learning with the Parameter Server” ML: Facebook: “Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective” (Optional) |
|||
3/21 | 20 (10) | TensorFlow | ML: TensorFlow: “TensorFlow: A System for Large-Scale Machine Learning” ML: TPU: “In-Datacenter Performance Analysis of a Tensor Processing Unit” (Optional) |
ML: TPU - cmonta9 | ||
Geo-Distributed Analytics | ||||||
4/2 | 21 (11) | Geo-Distributed Analytics | GeoDistributed: Geode: “Global Analytics in the Face of Bandwidth and Regulatory Constraints” GeoDistributed: Clarinet : “Clarinet: WAN-Aware Optimization for Analytics Queries” (Optional) |
|||
Graph Processing | ||||||
4/4 | 22 (11) | TAO | GraphProc: TAO: “TAO: Facebook’s Distributed Data Store for the Social Graph” GraphProc: Facebook: “One Trillion Edges: Graph Processing at Facebook-Scale” (Optional) GraphProc: Pregel : “Pregel: A System for Large-Scale Graph Processing” (Optional) |
|||
4/9 | 23 (12) | GraphX | GraphProc: GraphX: “GraphX: Graph Processing in a Distributed Dataflow Framework” GraphProc: RDF: “Fast and Concurrent RDF Queries with RDMA-based Distributed Graph Exploration” (Optional) |
|||
Serverless Computing | ||||||
4/11 | 24 (12) | OpenLambda | Serverless: OpenLambda: “Serverless Computation with OpenLambda” Runtime: Weld: “Weld: A Commom Runtime for High Performance Data Analytics” (Optional) |
|||
4/16 | 25 (13) | Power outage | ||||
Video | ||||||
4/18 | 26 (13) | Serverless Video | Video: TinyThreads: “Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads” Video: Chameleon: “Chameleon: Scalable Adaptation of Video Analytics” (Optional) Video: SVE: “SVE: Distributed Video Processing at Facebook Scale” (Optional) |
Video: TinyThreads - lchen79 Video: Chameleon - srawat5 |
||
Load Balancing | ||||||
4/23 | 27 (14) | Load Balancing | Execution: LoadBalancing1: “Ananta: Cloud Scale Load Balancing” (Optional) Execution: LoadBalancing2: “Duet: Cloud Scale Load Balancing with Hardware and Software” (Optional) Execution: LoadBalancing3: “SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs” (Optional) |
|||
Approximation | ||||||
4/25 | 28 (14) | BlinkDB | Approx: BlinkDB: “BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data” Approx: BlinkML: “BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees” (Optional) |
|||
Hardware Accelerators | ||||||
4/30 | 29 (15) | Offloads | Offload: iPipe: “iPipe: A Framework for Building Datacenter Applications Using In-networking Processors” (Optional) Offload: Access: “Direct Universal Access: Making Data Center Resources Available to FPGA” Hardware: Catapult: “A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services” (Optional) |
|||
5/2 | 30 (15) | RDMA | RDMA: FaRM: “Fast Remote Memory” RDMA: eRPC: “Datacenter RPCs can be General and Fast” (Optional) RDMA: FastNetworks: “Remote Memory in the Age of Fast Networks” (Optional) |
RDMA: eRPC: - sgadho2 |