Overview
Cloud data center systems provide services like mail, search, maps, and ridesharing that have become ubiquitous in today’s modern life. This class will introduce students to key concepts and state-of-the-art in the design of cloud data center systems. The topics that will be covered include big data systems, batch and streaming analytics, and online latency-sensitive applications like search. After covering the basics of modern hardware and software infrastructures that these systems leverage, we will explore the design and implementation of the systems themselves from the ground up.
In this course, students will get hands on experience with these systems via CloudLab. The homeworks will require students to install and experiment with different open-source cloud systems. To support both undergraduate and graduate students, this class can be taken for either 3 or 4 credit hours. In the 4 credit hour version for graduate students, students are be expected to form small groups and collaborate on a project related to data center systems.
TOPICS COVERED
- Cluster architecture, Storage, Scheduling and Resource Management
- Big Data stacks: Hadoop, YARN
- More big data: Spark, Spark SQL, Hive
- Stream analytics: Spark Streaming
- Graph processing: GraphProc, GraphX
- Search and Social: Solr and Facebook
- Role of serverless platforms: OpenLambda
- Approximation: BlinkDB
- Machine learning frameworks: ParamServ, Tensorflow
As part of covering these topics, student groups will be expected to give a presentation (covered below). These presentations will be on papers selected by student groups from the Reading List
Prerequisites
Both undergraduate and graduate students are welcome to take this class, and this course is intended to be accessible to students with a general background in computer science. Programming skill in a language like Java or Python is required. Before taking this class, a student must have completed CS 342.
Peer Instruction
This course will be taught using Peer-Instruction, a teaching model which places stronger emphasis on classroom discussion and student interaction.
Evaluation
Grades are curved based on an aggregate course score. The course grade weighting is:
Task | % of total grade | |
---|---|---|
Assignment Track | Research Track | |
Paper Reviews (8 total) | 20 | 20 |
Paper Presentation and Class Participation | 15 | 15 |
Homeworks (~2 total) | 30 | 30 |
Final Exam | 35 | - |
Final Project | - | 35 |
Final Project Task | % of total grade |
---|---|
Research Proposal | 10% |
Mid-Semester Checkpoint | 10% |
Final Report | 15% |
Scheduled Activities
There are scheduled activities in this course, and not every student will have the same deadline. Some activities in this homework are individual, while some activities are to be completed in groups. Individual activities include homeworks and reviews. The group activities are presentations and projects. Note that groups for presentations and projects may be different and that only students enrolled in the 4 credit hour version need to join a research group.
- Declare your presentation paper preferences by January 18th, 2019.
- Declare your individual paper review preferences by January 18th, 2019.
- Declare your research group’s membership by January 25th, 2019.
After the group membership dates, we will form groups from the remaining students and then notify you of your group assignment. Similarly, after the review preferences dates, we will assign paper reviews to you if you do not submit paper review preferences.
Paper Presentation
The course will be conducted as a seminar. Most classes, two students will present as a group. Each presentation group will be assigned to present a paper at least once throughout the semester. Presentations should last 20 minutes without interruption. However, presenters should expect questions and interruptions throughout.
In the presentation, you should:
- Motivate the paper and provide background.
- Present the high level idea, approach, and/or insight (using examples, whenever appropriate).
- Discuss technical details so that one can understand the key details without carefully reading it.
- Explain the difference between this paper and related work.
- Raise questions throughout the presentation to generate discussion.
The slides for a presentation must be sent via piazza to the instructor team at least 24 hours prior to the corresponding class. You should use this template for making your slides in powerpoint.
See the presentations page for a more complete description of the presentation expectations and grading.
Paper Reviews
Each student will also be assigned to write 8 paper reviews. The paper review assigned to a student may not be the same paper they presented.
The paper reviews page details the contents of a paper review.
The paper review must be sent to the instructor team via piazza before the paper is presented in class (12:29pm). Late reviews will not be counted. You should use this template for writing your review. Allocate enough time for your reading, discuss as a group, write the review carefully, and finally, be sure to include key observations and questions to start a class discussion.
Although you do not have to write summaries/reviews for each paper, you are required to read every paper. Being able to critically judge others’ work is crucial for your understanding.
Participation
You are expected to attend all lectures (you may skip up to 2 lectures due to legitimate reasons), and more importantly, participate in class discussions.
CLASS PARTICIPATION
Participation is an incredibly important facet of this course. The baseline Class Participation grade will be based off of both participating in classroom discussion questions. Your class participation grade can exceed the maximum through exceptional participation. Additional points are a bonus reserved for substantial contributions, entirely at the instructor’s discretion. Exceptional participation includes early reports of errors in assignments, helpful discussion on Piazza, contribution of helpful code to the common good of the class (e.g. test cases and/or testing scripts) and thoughtful discussions during lecture.
HOMEWORKS
Homeworks will consist of approximately 2-3 programming projects with duration between two and four weeks. Be sure to consult the online handout or the course discussion website if you have any questions.
Extra credit will not be awarded for early turnins. Zero credit will be given in any of the following cases:
- No assignment submitted.
- An assignment submitted after the due date, without notifying the TA before hand.
- An assignment submitted after the due date, after you’ve used your two late submissions.
- An assignment submitted more than one week after the original due date.
Extra credit will be given in the following cases at the professors discretion:
- Documentation Contributions: An ideal documentation contribution would be a markdown file (e.g., README.md) that gives a step-by-step guide for setting up and running any programs and experiments.
- Infrastructure Contributions: These include vagrant images, ansible playbooks, cloudlab experiments and images, and other various configuration and installtion scripts. Ideally, such contributions are also well documented.
HOMEWORK LATE POLICY
All assignments are published due date. You have a total of 3 slip days without penalty, but you must clearly indicate the number of slip days you are using on the assignment to the professor via the course discussion website before the assignment is originally due and clearly indicate the number of slip days used in the assignment write-up.
FINAL PROJECT
If your group chooses the research track, you will have to complete substantive work an instructor-approved problem and have original contribution. Surveys are not permitted as projects; instead, each project must contain a survey of background and related work. You must meet the following milestones (unless otherwise specified in future announcements) to ensure a high-quality project at the end of the semester:
- Turn in a 2-page draft proposal (including references) by April 1st, 2019. Remember to include the names and UIC email addresses of the group members.
- Keep revising your initial idea and incorporate instructor feedback. However, your team and project proposal must be finalized and approved on or before April 5th, 2019.
- Each group must submit a 4-page mid-semester progress report and present mid-semester progress during class hours on the week of April 19th, 2019.
- Each group must present their final results during a presentation or poster session on TODO.
- Each group must turn in an 8-page final report and your code via email on or before 11:59PM CST on May 10th, 2019.The report must be submitted as a PDF file, with formatting similar to that of the papers you’ve read in the class. Both the report and the self-contained (i.e., include ALL dependencies) code must be submitted via github classroom. In each repo, the directory containing the code must include a README file with a step-by-step guide on how to compile and run the provided code.
Students will be required to form groups of 3-4 and to work together to complete a final project for this course. Final projects will be picked from a list compiled by the professor. The principle goal of this project is to create some new measurement or experiment related to the topic of data center networks and applications. To this end, every project is expected to generate at least 1 figure.
The project will be graded on the following different aspects:
- Write-up: The motivation, methodology, and results of every project must be detailed in a project writeup document in either the format of a PDF or a MarkDown Website (e.g., Jekyll). In addition to describing the project, this write-up should also discuss the implications of the project.
- Intellectual Contributions: Each project should ideally have some key intellectual contribution. Examples include new algorithms and novel measurements.
- Infrastructure: A project should generate infrastructure artifacts. These artifacts include source code, experimental scripts, and CloudLab experiments and images.
- Documentation: A project should be documented such that any other student in this class could be expected to repeat the experiments.
As with the homeworks, extra credit will be awarded for exceptional documentation contributions and infrastructure contributions.
ACADEMIC INTEGRITY
Consulting with your classmates on assignments is encouraged, except where noted. However, turn-ins are individual, and copying code from your classmates or other sources is considered plagiarism. For example, given the question “how did you do X?”, a great response would be “I used function Y, with W as the second argument. I tried Z first, but it doesn’t work.” An inappropriate response would be “here is my code, look for yourself.” You should never look at someone else’s code, or show someone else your code.
Unless stated otherwise, all work submitted for grading must be done individually. To avoid suspicion of plagiarism, you must specify your sources together with all turned-in materials. List classmates you discussed your homework with and webpages from which you got inspiration or copied (short) code snippets. Plagiarism and cheating, as in copying the work of others, paying others to do your work, etc, is obviously prohibited, and will be reported. We will be running MOSS, an automated plagiarism detection tool, on all submissions.
In particular, note that you are guilty of academic dishonesty if you extend or receive any kind of unauthorized assistance. Absolutely no transfer of program code between students is permitted (paper or electronic), and you may not solicit code from family, friends, or online forums. Other examples of academic dishonesty include emailing your program to another student, copying-pasting code from the internet, working in a group on a homework assignment, and allowing a tutor, TA, or another individual to write an answer for you.
Academic dishonesty is unacceptable, and there are consequences to cheating on two levels - the consequences for your grade, and the consequences at the university level. Within class, even the first time cheating on a programming assignment or problem set will result in failing the class. At the university level, misconduct may even lead to expulsion from the university; cases are handled via the official student conduct process described in the University’s policy. I report all academic integrity violations to the dean of students. This may result in a formal hearing where the dean of students decides on the institutional consequences. After multiple instances of academic integrity violations, students may be suspended or expelled. For all cases, the student has the option to go through a formal hearing if they think that they did not actually violate the academic integrity policy. If the dean of students agrees that they did not, then I revert their grade back to the original grade, and the matter is resolved.