Course Schedule

Each day’s reading corresponds to the content that will be covered on that day’s class. Each day will have a Main paper and one or more companion papers.

Day Lecture (Week) Description Readings Presentations Notes  
Background            
1/15 1 (1) Intro Syllabus Intro Slides HW0 and HW1 are out  
1/17 2 (1) Data Center Architecture Architecture: Compute+Overall: “The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines” (Chapters 1 and 2) Data center as a computer - Brent    
1/22 3 (2) Data Center Networks Architecture: Networks: “Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network” (Optional)

Architecture: Network Routing: VL2 (Optional)
Data Center Networking - Brent    
1/24 4 (2) Search Search: Google: “The anatomy of a large-scale hypertextual Web search engine”

PerfIso: PerfIso: “PerfIso: Performance Isolation for Commercial Latency-Sensitive Services”
Search - Brent

PerfIso - Brent
   
Storage Systems            
1/29 5 (3) Intro to Storage Storage: GFS: “The Google File System” GFS and GFS2 - Brent    
1/31 6 (3) Class Cancelled (Snow Day)        
2/5 7 (4) Highly-Available Storage and Structured Storage Storage: Chubby: “The Chubby lock service for loosely-coupled distributed systems” (Optional)

Storage: Bigtable: “Bigtable: A Distributed Storage System for Structured Data”
Chubby1 Chubby2 - Brent

Bigtable1 - Brent

Bigtable2 - Brent
   
2/7 8 (4) Transactional Storage Storage: Spanner: “Spanner: Googles Globally-Distributed Database” Spanner - Brent

Colossus - Brent
   
Big Data            
2/12 9 (5) Intro to Big Data Execution: MR: “MapReduce Simplified Data Processing on Large Clusters”

Execution: Spark: “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing”
Map Reduce - Brent

Spark - Brent
   
More Storage Systems            
2/14 10 (5) Blob Storage Storage: f4: “f4: Facebook’s Warm BLOB Storage System” (Optional)

Storage: FDS: “Flat Datacenter Storage” (Optional)
f4 - dporte7

FDS1 and FDS2 - Brent
   
2/19 11 (6) Key-Value Stores Storage: Dynamo: “Dynamo: Amazon’s Highly Available Key-value Store”

Storage: MemcachedFacebook: “Scaling Memcache at Facebook” (Optional)
Dynamo - Brent

MemcachedFacebook - Brent
   
Scheduling and Resource Management            
2/21 12 (6) Orchestration ResourceNeg: Borg: “Borg: Large-scale cluster management at Google with Borg”

ResourceNeg: Mesos: “Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center” (Optional)

ResourceNeg: YARN
Apache Hadoop YARN: Yet Another Resource Negotiator ResourceNeg: YARN - kshind2  
Batch Analytics            
2/26 13 (7) SQL SQL: SparkSQL: “Spark SQL: Relational Data Processing in Spark”

SQL: Hive “Major technical advancements in Apache Hive” (Optional)

SQL: Impala: “Impala: A Modern, Open-Source SQL Engine for Hadoop” (Optional)

SQL: Trill: “Trill: A High-Performance Incremental Query Processor for Diverse Analytics” (Optional)
SQL: Hive - bjain6

SQL: Impala - psingh56
   
Stream Analytics            
2/28 14 (7) Intro to Stream Analytics Streaming: Aurora: “Aurora: a new model and architecture for data stream management” (Optional)

Streaming: SparkStreaming : “Discretized Streams: Fault-Tolerant Streaming Computation at Scale”
Streaming: Aurora - sjamal7

Streaming: SparkStreaming - avenka35
   
3/5 15 (8) More Streaming Streaming: Flink: “Apache Flink: Stream and Batch Processing in a Single Engine”

Streaming: Dataflow: “The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing” (Optional)

Streaming: Heron: “Twitter Heron: Stream Processing at Scale” (Optional)

Streaming: FacebookStreaming: “Realtime Data Processing at Facebook” (Optional)

Streaming: Drizzle: “Drizzle: Fast and Adaptable Stream Processing at Scale” (Optional)

Streaming: Gloss: “Gloss: Seamless Live Reconfiguration and Reoptimization of Stream Programs” (Optional)

Streaming: Scaling: “ Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows” (Optional)
Streaming: Dataflow - spani2

Streaming: Flink - Brent

Streaming: Heron - Brent

Streaming: Kafka - GumGum Kafka - Better Kafka - Brent
   
3/7 16 (8) Queue Messaging QMS: Kafka: “Kafka Distributed Messaging System for Log Processing” QMS: Kafka - wtoher2    
3/12 17 (9) Alternate Analytics Social: FacebookAnalytics: “Data warehousing and analytics infrastructure at Facebook”

Execution: Dryad: “Dryad:Distributed Data-Parallel Programs from Sequential Building Blocks” (Optional)

Execution: CIEL: “CIEL: a universal execution engine for distributed data-flow computing” (Optional)

Execution: DryadLINQ: “DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language” (Optional)
     
Job Scheduling            
3/14 18 (9) Job Scheduling Scheduling: Packing: “(Carbyne:) Altruistic Scheduling in Multi-Resource Clusters”

Scheduling: Packing: “(Tetris:) Multi-Resource Packing for Cluster Schedulers” (Optional)

Scheduling: Packing: “Quincy: Fair Scheduling for Distributed Computing Clusters” (Optional)
     
Machine Learning Frameworks            
3/19 19 (10) Parameter Server ML: ParamServ: “Scaling Distributed Machine Learning with the Parameter Server”

ML: Facebook: “Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective” (Optional)
     
3/21 20 (10) TensorFlow ML: TensorFlow: “TensorFlow: A System for Large-Scale Machine Learning”

ML: TPU: “In-Datacenter Performance Analysis of a Tensor Processing Unit” (Optional)
ML: TPU - cmonta9    
Geo-Distributed Analytics            
4/2 21 (11) Geo-Distributed Analytics GeoDistributed: Geode: “Global Analytics in the Face of Bandwidth and Regulatory Constraints”

GeoDistributed: Clarinet : “Clarinet: WAN-Aware Optimization for Analytics Queries” (Optional)
     
Graph Processing            
4/4 22 (11) TAO GraphProc: TAO: “TAO: Facebook’s Distributed Data Store for the Social Graph”

GraphProc: Facebook: “One Trillion Edges: Graph Processing at Facebook-Scale” (Optional)

GraphProc: Pregel : “Pregel: A System for Large-Scale Graph Processing” (Optional)
     
4/9 23 (12) GraphX GraphProc: GraphX: “GraphX: Graph Processing in a Distributed Dataflow Framework”

GraphProc: RDF: “Fast and Concurrent RDF Queries with RDMA-based Distributed Graph Exploration” (Optional)
     
Serverless Computing            
4/11 24 (12) OpenLambda Serverless: OpenLambda: “Serverless Computation with OpenLambda”

Runtime: Weld: “Weld: A Commom Runtime for High Performance Data Analytics” (Optional)
     
4/16 25 (13) Power outage        
Video            
4/18 26 (13) Serverless Video Video: TinyThreads: “Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads”

Video: Chameleon: “Chameleon: Scalable Adaptation of Video Analytics” (Optional)

Video: SVE: “SVE: Distributed Video Processing at Facebook Scale” (Optional)
Video: TinyThreads - lchen79

Video: Chameleon - srawat5
   
Load Balancing            
4/23 27 (14) Load Balancing Execution: LoadBalancing1: “Ananta: Cloud Scale Load Balancing” (Optional)

Execution: LoadBalancing2: “Duet: Cloud Scale Load Balancing with Hardware and Software” (Optional)

Execution: LoadBalancing3: “SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs” (Optional)
     
Approximation            
4/25 28 (14) BlinkDB Approx: BlinkDB: “BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data”

Approx: BlinkML: “BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees” (Optional)
     
Hardware Accelerators            
4/30 29 (15) Offloads Offload: iPipe: “iPipe: A Framework for Building Datacenter Applications Using In-networking Processors” (Optional)

Offload: Access: “Direct Universal Access: Making Data Center Resources Available to FPGA”

Hardware: Catapult: “A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services” (Optional)
     
5/2 30 (15) RDMA RDMA: FaRM: “Fast Remote Memory”

RDMA: eRPC: “Datacenter RPCs can be General and Fast” (Optional)

RDMA: FastNetworks: “Remote Memory in the Age of Fast Networks” (Optional)
RDMA: eRPC: - sgadho2