Key Papers Student Email(s)
Architecture: Storage The Hadoop Distributed File System, Schvachko et al, MSST, 2010. []
Storage: FDS Flat Datacenter Storage. Nightingale et. al, OSDI, 2012. [sjamal7, psingh56]
Storage: Bigtable Bigtable: A Distributed Storage System for Structured Data. Chang et. al, OSDI, 2006 [spani2, bjain6]
Storage: Dynamo Dynamo: Amazon’s Highly Available Key-value Store. DeCandia et. al, SOSP, 2007. [sjamal7, lchen79]
Storage: Spanner Spanner: Google’s Globally-Distributed Database. Corbett et. al, OSDI, 2012. [psingh56, lchen79]
Storage: MemcachedFacebook Scaling Memcache at Facebook. Nishtala et. al, NSDI, 2013. [sjamal7, psingh56]
Execution: MR MapReduce Simplified Data Processing on Large Clusters, Dean and Ghemawat, OSDI, 2004. [spani2, bjain6]
Execution: Dryad Dryad:Distributed Data-Parallel Programs from Sequential Building Blocks. Isard et. al, EuroSys, 2007. [psingh56, dporte7, srawat5]
Execution: CIEL CIEL: a universal execution engine for distributed data-flow computing. Murray et. al, NSDI, 2011. [psingh56]
Execution: DryadLINQ DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. Yu et. al, OSDI, 2008. [kshind2, wtoher2]
ResourceNeg: YARN Apache Hadoop YARN: Yet Another Resource Negotiator, Vavilapalli et al, SOCC, 2013. [cmonta9, avenka35]
ResourceNeg: Borg Borg: Large-scale cluster management at Google with Borg. Verma et. al, EuroSys, 2015. [lchen79]
ResourceNeg: Mesos Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center, Hindman et al, NSDI, 2011. [wtoher2]
ResourceNeg: DRF Dominant Resource Fairness: Fair Allocation of Multiple Resource Types, Ghodsi et al, NSDI, 2011. [psingh56]
Scheduling: Packing (Carbyne:) Altruistic Scheduling in Multi-Resource Clusters. [psingh56]
Scheduling: Packing Quincy: Fair Scheduling for Distributed Computing Clusters. Isard et. al, SOSP, 2009. [psingh56]
Execution: Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, Zaharia et al, NSDI, 2012. [lchen79]
Execution: LoadBalancing1 Ananta: Cloud Scale Load Balancing. Patel et. al, SIGCOMM, 2013. [sjamal7]
Execution: LoadBalancing2 Duet: Cloud Scale Load Balancing with Hardware and Software. Gandhi et. al, SIGCOMM, 2014. [sjamal7]
Execution: LoadBalancing3 SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs, Miao et. al, SIGCOMM, 2017. [sgadho2, dporte7, srawat5]
SQL: SparkSQL Spark SQL: Relational Data Processing in Spark, Armburst et al, SIGMOD, 2015. [spani2, bjain6, sgadho2]
SQL: Hive Major technical advancements in Apache Hive, Huai et al, SIGMOD, 2014. [spani2, bjain6, sgadho2]
SQL: Impala Impala: A Modern, Open-Source SQL Engine for Hadoop. Kornacker et. al, CIDR, 2015. [avenka35]
SQL: Trill Trill: A High-Performance Incremental Query Processor for Diverse Analytics. Chandramouli et. al, VLDB, 2014. [sgadho2]
GeoDistributed: Clarinet Clarinet: WAN-Aware Optimization for Analytics Queries, Viswanathan et al, OSDI, 2016. [sgadho2]
Streaming: Storm Storm @Twitter , Toshniwal et al, SIGMOD, 2014. [ wtoher2]
Streaming: Heron Twitter Heron: Stream Processing at Scale, Kulkarni et al, SIGMOD, 2015. [wtoher2]
Streaming: FacebookStreaming Realtime Data Processing at Facebook. Chen et. al, SIGMOD, 2016. [cmonta9]
Streaming: SparkStreaming Discretized Streams: Fault-Tolerant Streaming Computation at Scale, Zaharia et al, SOSP, 2013. [sgadho2]
Streaming: Flink Apache Flink: Stream and Batch Processing in a Single Engine, Carbone et al, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2015. [wtoher2]
Streaming: Drizzle Drizzle: Fast and Adaptable Stream Processing at Scale. Venkataraman et. al, SOSP, 2017. [sgadho2]
Streaming: Gloss Gloss: Seamless Live Reconfiguration and Reoptimization of Stream Programs. Rajadurai et. al, ASPLOS, 2018. [wtoher2]
QMS: Kafka Kafka Distributed Messaging System for Log Processing, Kreps et al, NetDB Workshop, 2011. Also read this comparison of widely used Queuing Messaging Processing Systems. [lchen79, cmonta9]
Streaming: rStreams StreamScope: Continuous Reliable Distributed Processing of Big Data Streams, Lin et al, NSDI, 2016. [sgadho2]
Streaming: Dataflow The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing. Akidau et. al, VLDB, 2015. [spani2, bjain6, avenka35]
Streaming: Scaling Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows. Kalavri et. al, OSDI, 2018. [dporte7, srawat5, kshind2]
GraphProc: Pregel Pregel: A System for Large-Scale Graph Processing, Malewicz et al, SIGMOD, 2010. [dporte7]
GraphProc: TAO TAO: Facebook’s Distributed Data Store for the Social Graph. Bronson et. al, USENIX ATC, 2013. [srawat5, kshind2]
GraphProc: PowerGraph PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs, Gonzalez et al, OSDI, 2012. [dporte7, srawat5, kshind2]
GraphProc: GraphX GraphX: Graph Processing in a Distributed Dataflow Framework, Gonzalez et al, OSDI, 2014. [avenka35]
GraphProc: RDF Fast and Concurrent RDF Queries with RDMA-based Distributed Graph Exploration. Shi et. al, OSDI, 2016. [avenka35]
GraphProc: Facebook One Trillion Edges: Graph Processing at Facebook-Scale. Ching et. al, VLDB, 2015. [spani2, bjain6, cmonta9]
Social: FacebookAnalytics Data warehousing and analytics infrastructure at Facebook. Thusoo et. al, SIGMOD, 2010. [spani2, sjamal7, bjain6]
Social: FacebookPhoto Finding a needle in Haystack: Facebook’s photo storage. Beaver et. al, OSDI, 2010. [spani2, sjamal7, cmonta9]
Social: Unicorn Unicorn: A System for Searching the Social Graph. Curtiss et. al, VLDB, 2013. [bjain6]
Monitor: Scuba Scuba: Diving into Data at Facebook. Abraham et. al, VLDB, 2013. [lchen79]
Video: SVE SVE: Distributed Video Processing at Facebook Scale. Huang et. al, SOSP, 2017. [cmonta9]
Runtime: Weld Weld: A Commom Runtime for High Performance Data Analytics, Palkar et al, CIDR, 2017. []
Serverless: OpenLambda Serverless Computation with OpenLambda. Hendrickson et. al, HotCloud, 2016. [sjamal7, lchen79, avenka35]
Approx: BlinkML BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees. Park et. al, SIGMOD, 2019. [kshind2, wtoher2]
RDMA: FaRM FaRM: Fast Remote Memory. Dragojevic et. al, NSDI, 2014. [kshind2, wtoher2]
RDMA: FastNetworks Remote Memory in the Age of Fast Networks. Aguilera et. al, SoCC, 2017. []
ML: Facebook Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective, Hazelwood et. al, HPCA, 2018. [cmonta9, avenka35]
ML: TensorFlow TensorFlow: A System for Large-Scale Machine Learning, Abadi et al, OSDI, 2016. [lchen79, avenka35]
ML: TPU In-Datacenter Performance Analysis of a Tensor Processing Unit. Jouppi et. al, ISCA, 2017. [cmonta9, dporte7, srawat5]
Offload: iPipe iPipe: A Framework for Building Datacenter Applications Using In-networking Processors. Liu et. al, 2018. [dporte7, srawat5, kshind2]
Offload: Access Direct Universal Access: Making Data Center Resources Available to FPGA. Shu et. al, NSDI, 2019. [dporte7, srawat5, kshind2]
     

Overview

Every student needs to complete 8 reviews in total on papers. Your posting should be based on the following template. Before the paper is presented in class (i.e., before 12:29pm on the day of class), your group must post its review to the Course Discussion Website (Piazza).

Specifically, it must contain:

  1. A one or two sentence summary of the paper
  2. A description of the problem they were trying to solve and why this is an important problem.
  3. A summary of the contributions of the paper. What is the hypothesis of the work? What is the proposed solution, and what key insight guides their solution?
  4. What did the paper evaluate and was it appropriate?
  5. What is one (or more) drawback or limitation of the proposal, and how will you improve it?
  6. At least one thing about the paper you would like to discuss in class

The writeup should not be (significantly) more than a page in length. Late write-ups will receive a zero grade.

Advice

The reading schedule for this course will be intense. Even though not every student is required to submit a review for every paper, every student is expected to read every paper.

When reading and discussing each paper (on Piazza), you are encouraged to consider the following questions: As you read, here are some questions you should consider:

  • What problem are the authors trying to solve?
    • Why was the problem important?
    • Why was the problem not solved by earlier work?
  • What is the authors solution to the problem?
    • How does their approach solve the problem?
    • How is the solution unique and innovative?
    • What are the details of their solution?
  • How do the authors evaluate their solution?
    • What specific questions do they answer?
    • What simplifying assumptions do they make?
    • What is their methodology?
    • What are the strengths and weaknesses of their solution?
    • What is left unknown?
  • What do you think?
    • Is the problem still important?
    • Did the authors solve the stated problem?
    • Did the authors adequately demonstrate that they solved the problem?
    • What future work does this research point to?

You should be prepared to discuss these questions in class. For each paper I will ask for a volunteer to summarize and address a few of these questions in class.

Here are a few links to advice on reading papers:

Writeup Grading

What I’m looking for:

  • Does the review include all sections (summary, problem, contributions, flaws, topic question)
  • Are all assertions backed up (e.g. “X is a bad idea” is not acceptable, but “X is a bad idea because Y”) is acceptable
  • Is the review concise? The summary should be a few sentences and give the essence of the design in the paper, not the problem. (E.g., “This paper is about how to build a multiprocessor operating system” is not acceptable, but “This paper is about building a multiprocessor operating system by layering abstractions that mask the existence of multiple processors” is acceptable)
  • Did the student understand the material? Are there factual flaws in the review? For example, if the paper defines a term, does the student use it appropriately? As another example, if students state that a paper is relevant because modern operating systems do things the same way, is that true?
  • Did the student consider whether the evaluation is sufficient? Does it show that the work doesn’t harm regular programs, even if it works well for some programs? Do they evaluate all the goals for the system?

Assigning grades:

  • If the review does an excellent job on all four considerations, and provides genuinely insightful comments about the problem, contributions (going beyond what the paper claims are contributions), evaluation, confusions, it should receive a check plus
  • If two or more of the four criteria are not met, the review should receive a check minus
  • Otherwise, it should receive a check. A check plus is worth 1 point, a check is 3/4 point, a check minus is 1/2 point, and not turning a review in is worth zero points.