Key | Papers | Student Email(s) |
---|---|---|
Architecture: Storage | The Hadoop Distributed File System, Schvachko et al, MSST, 2010. | [] |
Storage: FDS | Flat Datacenter Storage. Nightingale et. al, OSDI, 2012. | [sjamal7, psingh56] |
Storage: Bigtable | Bigtable: A Distributed Storage System for Structured Data. Chang et. al, OSDI, 2006 | [spani2, bjain6] |
Storage: Dynamo | Dynamo: Amazon’s Highly Available Key-value Store. DeCandia et. al, SOSP, 2007. | [sjamal7, lchen79] |
Storage: Spanner | Spanner: Google’s Globally-Distributed Database. Corbett et. al, OSDI, 2012. | [psingh56, lchen79] |
Storage: MemcachedFacebook | Scaling Memcache at Facebook. Nishtala et. al, NSDI, 2013. | [sjamal7, psingh56] |
Execution: MR | MapReduce Simplified Data Processing on Large Clusters, Dean and Ghemawat, OSDI, 2004. | [spani2, bjain6] |
Execution: Dryad | Dryad:Distributed Data-Parallel Programs from Sequential Building Blocks. Isard et. al, EuroSys, 2007. | [psingh56, dporte7, srawat5] |
Execution: CIEL | CIEL: a universal execution engine for distributed data-flow computing. Murray et. al, NSDI, 2011. | [psingh56] |
Execution: DryadLINQ | DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language. Yu et. al, OSDI, 2008. | [kshind2, wtoher2] |
ResourceNeg: YARN | Apache Hadoop YARN: Yet Another Resource Negotiator, Vavilapalli et al, SOCC, 2013. | [cmonta9, avenka35] |
ResourceNeg: Borg | Borg: Large-scale cluster management at Google with Borg. Verma et. al, EuroSys, 2015. | [lchen79] |
ResourceNeg: Mesos | Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center, Hindman et al, NSDI, 2011. | [wtoher2] |
ResourceNeg: DRF | Dominant Resource Fairness: Fair Allocation of Multiple Resource Types, Ghodsi et al, NSDI, 2011. | [psingh56] |
Scheduling: Packing | (Carbyne:) Altruistic Scheduling in Multi-Resource Clusters. | [psingh56] |
Scheduling: Packing | Quincy: Fair Scheduling for Distributed Computing Clusters. Isard et. al, SOSP, 2009. | [psingh56] |
Execution: Spark | Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, Zaharia et al, NSDI, 2012. | [lchen79] |
Execution: LoadBalancing1 | Ananta: Cloud Scale Load Balancing. Patel et. al, SIGCOMM, 2013. | [sjamal7] |
Execution: LoadBalancing2 | Duet: Cloud Scale Load Balancing with Hardware and Software. Gandhi et. al, SIGCOMM, 2014. | [sjamal7] |
Execution: LoadBalancing3 | SilkRoad: Making Stateful Layer-4 Load Balancing Fast and Cheap Using Switching ASICs, Miao et. al, SIGCOMM, 2017. | [sgadho2, dporte7, srawat5] |
SQL: SparkSQL | Spark SQL: Relational Data Processing in Spark, Armburst et al, SIGMOD, 2015. | [spani2, bjain6, sgadho2] |
SQL: Hive | Major technical advancements in Apache Hive, Huai et al, SIGMOD, 2014. | [spani2, bjain6, sgadho2] |
SQL: Impala | Impala: A Modern, Open-Source SQL Engine for Hadoop. Kornacker et. al, CIDR, 2015. | [avenka35] |
SQL: Trill | Trill: A High-Performance Incremental Query Processor for Diverse Analytics. Chandramouli et. al, VLDB, 2014. | [sgadho2] |
GeoDistributed: Clarinet | Clarinet: WAN-Aware Optimization for Analytics Queries, Viswanathan et al, OSDI, 2016. | [sgadho2] |
Streaming: Storm | Storm @Twitter , Toshniwal et al, SIGMOD, 2014. | [ wtoher2] |
Streaming: Heron | Twitter Heron: Stream Processing at Scale, Kulkarni et al, SIGMOD, 2015. | [wtoher2] |
Streaming: FacebookStreaming | Realtime Data Processing at Facebook. Chen et. al, SIGMOD, 2016. | [cmonta9] |
Streaming: SparkStreaming | Discretized Streams: Fault-Tolerant Streaming Computation at Scale, Zaharia et al, SOSP, 2013. | [sgadho2] |
Streaming: Flink | Apache Flink: Stream and Batch Processing in a Single Engine, Carbone et al, Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 2015. | [wtoher2] |
Streaming: Drizzle | Drizzle: Fast and Adaptable Stream Processing at Scale. Venkataraman et. al, SOSP, 2017. | [sgadho2] |
Streaming: Gloss | Gloss: Seamless Live Reconfiguration and Reoptimization of Stream Programs. Rajadurai et. al, ASPLOS, 2018. | [wtoher2] |
QMS: Kafka | Kafka Distributed Messaging System for Log Processing, Kreps et al, NetDB Workshop, 2011. Also read this comparison of widely used Queuing Messaging Processing Systems. | [lchen79, cmonta9] |
Streaming: rStreams | StreamScope: Continuous Reliable Distributed Processing of Big Data Streams, Lin et al, NSDI, 2016. | [sgadho2] |
Streaming: Dataflow | The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing. Akidau et. al, VLDB, 2015. | [spani2, bjain6, avenka35] |
Streaming: Scaling | Three steps is all you need: fast, accurate, automatic scaling decisions for distributed streaming dataflows. Kalavri et. al, OSDI, 2018. | [dporte7, srawat5, kshind2] |
GraphProc: Pregel | Pregel: A System for Large-Scale Graph Processing, Malewicz et al, SIGMOD, 2010. | [dporte7] |
GraphProc: TAO | TAO: Facebook’s Distributed Data Store for the Social Graph. Bronson et. al, USENIX ATC, 2013. | [srawat5, kshind2] |
GraphProc: PowerGraph | PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs, Gonzalez et al, OSDI, 2012. | [dporte7, srawat5, kshind2] |
GraphProc: GraphX | GraphX: Graph Processing in a Distributed Dataflow Framework, Gonzalez et al, OSDI, 2014. | [avenka35] |
GraphProc: RDF | Fast and Concurrent RDF Queries with RDMA-based Distributed Graph Exploration. Shi et. al, OSDI, 2016. | [avenka35] |
GraphProc: Facebook | One Trillion Edges: Graph Processing at Facebook-Scale. Ching et. al, VLDB, 2015. | [spani2, bjain6, cmonta9] |
Social: FacebookAnalytics | Data warehousing and analytics infrastructure at Facebook. Thusoo et. al, SIGMOD, 2010. | [spani2, sjamal7, bjain6] |
Social: FacebookPhoto | Finding a needle in Haystack: Facebook’s photo storage. Beaver et. al, OSDI, 2010. | [spani2, sjamal7, cmonta9] |
Social: Unicorn | Unicorn: A System for Searching the Social Graph. Curtiss et. al, VLDB, 2013. | [bjain6] |
Monitor: Scuba | Scuba: Diving into Data at Facebook. Abraham et. al, VLDB, 2013. | [lchen79] |
Video: SVE | SVE: Distributed Video Processing at Facebook Scale. Huang et. al, SOSP, 2017. | [cmonta9] |
Runtime: Weld | Weld: A Commom Runtime for High Performance Data Analytics, Palkar et al, CIDR, 2017. | [] |
Serverless: OpenLambda | Serverless Computation with OpenLambda. Hendrickson et. al, HotCloud, 2016. | [sjamal7, lchen79, avenka35] |
Approx: BlinkML | BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees. Park et. al, SIGMOD, 2019. | [kshind2, wtoher2] |
RDMA: FaRM | FaRM: Fast Remote Memory. Dragojevic et. al, NSDI, 2014. | [kshind2, wtoher2] |
RDMA: FastNetworks | Remote Memory in the Age of Fast Networks. Aguilera et. al, SoCC, 2017. | [] |
ML: Facebook | Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective, Hazelwood et. al, HPCA, 2018. | [cmonta9, avenka35] |
ML: TensorFlow | TensorFlow: A System for Large-Scale Machine Learning, Abadi et al, OSDI, 2016. | [lchen79, avenka35] |
ML: TPU | In-Datacenter Performance Analysis of a Tensor Processing Unit. Jouppi et. al, ISCA, 2017. | [cmonta9, dporte7, srawat5] |
Offload: iPipe | iPipe: A Framework for Building Datacenter Applications Using In-networking Processors. Liu et. al, 2018. | [dporte7, srawat5, kshind2] |
Offload: Access | Direct Universal Access: Making Data Center Resources Available to FPGA. Shu et. al, NSDI, 2019. | [dporte7, srawat5, kshind2] |
Overview
Every student needs to complete 8 reviews in total on papers. Your posting should be based on the following template. Before the paper is presented in class (i.e., before 12:29pm on the day of class), your group must post its review to the Course Discussion Website (Piazza).
Specifically, it must contain:
- A one or two sentence summary of the paper
- A description of the problem they were trying to solve and why this is an important problem.
- A summary of the contributions of the paper. What is the hypothesis of the work? What is the proposed solution, and what key insight guides their solution?
- What did the paper evaluate and was it appropriate?
- What is one (or more) drawback or limitation of the proposal, and how will you improve it?
- At least one thing about the paper you would like to discuss in class
The writeup should not be (significantly) more than a page in length. Late write-ups will receive a zero grade.
Advice
The reading schedule for this course will be intense. Even though not every student is required to submit a review for every paper, every student is expected to read every paper.
When reading and discussing each paper (on Piazza), you are encouraged to consider the following questions: As you read, here are some questions you should consider:
- What problem are the authors trying to solve?
- Why was the problem important?
- Why was the problem not solved by earlier work?
- What is the authors solution to the problem?
- How does their approach solve the problem?
- How is the solution unique and innovative?
- What are the details of their solution?
- How do the authors evaluate their solution?
- What specific questions do they answer?
- What simplifying assumptions do they make?
- What is their methodology?
- What are the strengths and weaknesses of their solution?
- What is left unknown?
- What do you think?
- Is the problem still important?
- Did the authors solve the stated problem?
- Did the authors adequately demonstrate that they solved the problem?
- What future work does this research point to?
You should be prepared to discuss these questions in class. For each paper I will ask for a volunteer to summarize and address a few of these questions in class.
Here are a few links to advice on reading papers:
- Efficient reading of papers in science and technology by Michael J. Hanson
- The Many Faces of Systems Research – And How to Evalute Them by Aaron Brown, Anupan Chanda, Rik Farrow, Alexandra Fedorova, Petros Maniatis, and Michael Scott
- Advice on understanding related work by Jennifer Mankoff
- Research Papers and Review Considerations by David Wetherall
Writeup Grading
What I’m looking for:
- Does the review include all sections (summary, problem, contributions, flaws, topic question)
- Are all assertions backed up (e.g. “X is a bad idea” is not acceptable, but “X is a bad idea because Y”) is acceptable
- Is the review concise? The summary should be a few sentences and give the essence of the design in the paper, not the problem. (E.g., “This paper is about how to build a multiprocessor operating system” is not acceptable, but “This paper is about building a multiprocessor operating system by layering abstractions that mask the existence of multiple processors” is acceptable)
- Did the student understand the material? Are there factual flaws in the review? For example, if the paper defines a term, does the student use it appropriately? As another example, if students state that a paper is relevant because modern operating systems do things the same way, is that true?
- Did the student consider whether the evaluation is sufficient? Does it show that the work doesn’t harm regular programs, even if it works well for some programs? Do they evaluate all the goals for the system?
Assigning grades:
- If the review does an excellent job on all four considerations, and provides genuinely insightful comments about the problem, contributions (going beyond what the paper claims are contributions), evaluation, confusions, it should receive a check plus
- If two or more of the four criteria are not met, the review should receive a check minus
- Otherwise, it should receive a check. A check plus is worth 1 point, a check is 3/4 point, a check minus is 1/2 point, and not turning a review in is worth zero points.