CS 494: Cloud Data Center Systems

Course Schedule

Each day’s reading corresponds to the content that will be covered on that day’s class. Each day will have a Main paper and one or more companion papers.

Day	Lecture (Week)	Description	Readings	Presentations	Notes
Background
1/15	1 (1)	Intro	Syllabus	Intro Slides	HW0 and HW1 are out
1/17	2 (1)	Data Center Architecture	Architecture: Compute+Overall: “The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines” (Chapters 1 and 2)	Data center as a computer - Brent
1/22	3 (2)	Data Center Networks	Architecture: Networks: “Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google’s Datacenter Network” (Optional) Architecture: Network Routing: VL2 (Optional)	Data Center Networking - Brent
1/24	4 (2)	Search	Search: Google: “The anatomy of a large-scale hypertextual Web search engine” PerfIso: PerfIso: “PerfIso: Performance Isolation for Commercial Latency-Sensitive Services”	Search - Brent PerfIso - Brent
Storage Systems
1/29	5 (3)	Intro to Storage	Storage: GFS: “The Google File System”	GFS and GFS2 - Brent
1/31	6 (3)	Class Cancelled (Snow Day)
2/5	7 (4)	Highly-Available Storage and Structured Storage	Storage: Chubby: “The Chubby lock service for loosely-coupled distributed systems” (Optional) Storage: Bigtable: “Bigtable: A Distributed Storage System for Structured Data”	Chubby1 Chubby2 - Brent Bigtable1 - Brent Bigtable2 - Brent
2/7	8 (4)	Transactional Storage	Storage: Spanner: “Spanner: Googles Globally-Distributed Database”	Spanner - Brent Colossus - Brent
Big Data
2/12	9 (5)	Intro to Big Data	Execution: MR: “MapReduce Simplified Data Processing on Large Clusters” Execution: Spark: “Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing”	Map Reduce - Brent Spark - Brent
More Storage Systems
2/14	10 (5)	Blob Storage	Storage: f4: “f4: Facebook’s Warm BLOB Storage System” (Optional) Storage: FDS: “Flat Datacenter Storage” (Optional)	f4 - dporte7 FDS1 and FDS2 - Brent
2/19	11 (6)	Key-Value Stores	Storage: Dynamo: “Dynamo: Amazon’s Highly Available Key-value Store” Storage: MemcachedFacebook: “Scaling Memcache at Facebook” (Optional)	Dynamo - Brent MemcachedFacebook - Brent
Scheduling and Resource Management
2/21	12 (6)	Orchestration	ResourceNeg: Borg: “Borg: Large-scale cluster management at Google with Borg” ResourceNeg: Mesos: “Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center” (Optional) ResourceNeg: YARN	Apache Hadoop YARN: Yet Another Resource Negotiator	ResourceNeg: YARN - kshind2
Batch Analytics
2/26	13 (7)	SQL	SQL: SparkSQL: “Spark SQL: Relational Data Processing in Spark” SQL: Hive “Major technical advancements in Apache Hive” (Optional) SQL: Impala: “Impala: A Modern, Open-Source SQL Engine for Hadoop” (Optional) SQL: Trill: “Trill: A High-Performance Incremental Query Processor for Diverse Analytics” (Optional)	SQL: Hive - bjain6 SQL: Impala - psingh56
Stream Analytics
2/28	14 (7)	Intro to Stream Analytics	Streaming: Aurora: “Aurora: a new model and architecture for data stream management” (Optional) Streaming: SparkStreaming : “Discretized Streams: Fault-Tolerant Streaming Computation at Scale”	Streaming: Aurora - sjamal7 Streaming: SparkStreaming - avenka35
3/5	15 (8)	More Streaming	Streaming: Flink: “Apache Flink: Stream and Batch Processing in a Single Engine” Streaming: Dataflow: “The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing” (Optional) Streaming: Heron: “Twitter Heron: Stream Processing at Scale” (Optional) Streaming: FacebookStreaming: “Realtime Data Processing at Facebook” (Optional) Streaming: Drizzle: “Drizzle: Fast and Adaptable Stream Processing at Scale” (Optional) Streaming: Gloss: “Gloss: Seamless Live Reconfiguration and Reoptimization of Stream Programs” (Optional) Streaming: Scaling: “ Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows” (Optional)	Streaming: Dataflow - spani2 Streaming: Flink - Brent Streaming: Heron - Brent Streaming: Kafka - GumGum Kafka - Better Kafka - Brent
3/7	16 (8)	Queue Messaging	QMS: Kafka: “Kafka Distributed Messaging System for Log Processing”	QMS: Kafka - wtoher2
3/12	17 (9)	Alternate Analytics	Social: FacebookAnalytics: “Data warehousing and analytics infrastructure at Facebook” Execution: Dryad: “Dryad:Distributed Data-Parallel Programs from Sequential Building Blocks” (Optional) Execution: CIEL: “CIEL: a universal execution engine for distributed data-flow computing” (Optional) Execution: DryadLINQ: “DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language” (Optional)
Job Scheduling
3/14	18 (9)	Job Scheduling	Scheduling: Packing: “(Carbyne:) Altruistic Scheduling in Multi-Resource Clusters” Scheduling: Packing: “(Tetris:) Multi-Resource Packing for Cluster Schedulers” (Optional) Scheduling: Packing: “Quincy: Fair Scheduling for Distributed Computing Clusters” (Optional)
Machine Learning Frameworks
3/19	19 (10)	Parameter Server	ML: ParamServ: “Scaling Distributed Machine Learning with the Parameter Server” ML: Facebook: “Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective” (Optional)
3/21	20 (10)	TensorFlow	ML: TensorFlow: “TensorFlow: A System for Large-Scale Machine Learning” ML: TPU: “In-Datacenter Performance Analysis of a Tensor Processing Unit” (Optional)	ML: TPU - cmonta9
Geo-Distributed Analytics
4/2	21 (11)	Geo-Distributed Analytics	GeoDistributed: Geode: “Global Analytics in the Face of Bandwidth and Regulatory Constraints” GeoDistributed: Clarinet : “Clarinet: WAN-Aware Optimization for Analytics Queries” (Optional)
Graph Processing
4/4	22 (11)	TAO	GraphProc: TAO: “TAO: Facebook’s Distributed Data Store for the Social Graph” GraphProc: Facebook: “One Trillion Edges: Graph Processing at Facebook-Scale” (Optional) GraphProc: Pregel : “Pregel: A System for Large-Scale Graph Processing” (Optional)
4/9	23 (12)	GraphX	GraphProc: GraphX: “GraphX: Graph Processing in a Distributed Dataflow Framework” GraphProc: RDF: “Fast and Concurrent RDF Queries with RDMA-based Distributed Graph Exploration” (Optional)
Serverless Computing
4/11	24 (12)	OpenLambda	Serverless: OpenLambda: “Serverless Computation with OpenLambda” Runtime: Weld: “Weld: A Commom Runtime for High Performance Data Analytics” (Optional)
4/16	25 (13)	Power outage
Video
4/18	26 (13)	Serverless Video	Video: TinyThreads: “Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads” Video: Chameleon: “Chameleon: Scalable Adaptation of Video Analytics” (Optional) Video: SVE: “SVE: Distributed Video Processing at Facebook Scale” (Optional)	Video: TinyThreads - lchen79 Video: Chameleon - srawat5
Load Balancing
4/23	27 (14)	Load Balancing	Execution: LoadBalancing1: “Ananta: Cloud Scale Load Balancing” (Optional) Execution: LoadBalancing2: “Duet: Cloud Scale Load Balancing with Hardware and Software” (Optional) Execution: LoadBalancing3: “SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs” (Optional)
Approximation
4/25	28 (14)	BlinkDB	Approx: BlinkDB: “BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data” Approx: BlinkML: “BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees” (Optional)
Hardware Accelerators
4/30	29 (15)	Offloads	Offload: iPipe: “iPipe: A Framework for Building Datacenter Applications Using In-networking Processors” (Optional) Offload: Access: “Direct Universal Access: Making Data Center Resources Available to FPGA” Hardware: Catapult: “A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services” (Optional)
5/2	30 (15)	RDMA	RDMA: FaRM: “Fast Remote Memory” RDMA: eRPC: “Datacenter RPCs can be General and Fast” (Optional) RDMA: FastNetworks: “Remote Memory in the Age of Fast Networks” (Optional)	RDMA: eRPC: - sgadho2