Course Overview
This advanced graduate-level course focuses on the key aspects of modern datacenter networking. Students will explore the differences between traditional Internet architectures and modern datacenters, with an emphasis on cutting-edge technologies, practical applications, and ongoing research. The course is designed to equip students with the necessary knowledge to engage with state-of-the-art networking research and practices.
Office Hours and Contact
- Instructor: Balajee Vamanan
- Office Hours: Friday 3–4 PM at SEO 1310 or by appointment via email.
- Email Policy: Use Piazza for questions/discussions about lectures, papers, and projects. Email is reserved for personal matters only.
Course Goals
- Familiarize students with state-of-the-art networking research, specifically in datacenters.
- Practice reading and critiquing research papers.
- Develop skills for reproducing research results.
Resources
- Class Webpage: https://www.550.cs.uic.edu
- Piazza: https://piazza.com/uic/fall2024/cs550/home
- Blackboard: For grades and other official communications.
- Book: Lecture notes and papers
Modality of the Class
Each class will focus on the discussion of 1-2 research papers. Students are expected to read and critique papers before class, and most of the lecture time will be dedicated to in-depth discussion.
Prerequisites
- CS 450 or equivalent: Students must be familiar with basic networking concepts and be comfortable with coding and debugging.
- Project Work: The course involves a significant project component, where students will code and benchmark their work.
Grading Breakdown (Tentative)
- Class Participation: 10%
- Paper Presentations: 30% (2-3 papers per student)
- Paper Critiques: 20%
- Project: 40%
- Proposal (1-2 pages): 10%
- Presentation (15 mins + 5 mins Q&A): 10%
- Report (10-12 pages): 10%
- Demo (10 mins): 10%
Topics and Schedule (tentative)
Course Outline
Week 1: Introduction to Datacenter Networks
- Overview of datacenter architecture
- Evolution of datacenter networks
- Key challenges in datacenter networking
Week 2: Datacenter Topologies
- Traditional three-tier architecture
- Clos networks and fat-tree topologies
- Emerging topologies (e.g., DCell, BCube)
Week 3: Datacenter Network Protocols
- TCP/IP in datacenter environments
- RDMA (Remote Direct Memory Access)
- Datacenter TCP (DCTCP)
Week 4: Software-Defined Networking (SDN) in Datacenters
- SDN architecture and principles
- OpenFlow and other SDN protocols
- SDN controllers for datacenters
Week 5: Network Virtualization
- Network overlays (VXLAN, NVGRE, STT)
- Network Function Virtualization (NFV)
- Virtual Network Functions (VNFs) in datacenters
Week 6: Load Balancing in Datacenters
- Layer 4 vs. Layer 7 load balancing
- Software vs. hardware load balancers
- Advanced load balancing algorithms
Week 7: Datacenter Traffic Engineering
- Flow scheduling
- Multipath routing (ECMP, MPTCP)
- Traffic prediction and optimization
Week 8: Quality of Service (QoS) in Datacenters
- QoS requirements for different applications
- QoS mechanisms (traffic shaping, policing, marking)
- End-to-end QoS in multi-tenant environments
Week 9: Network Security in Datacenters
- Threat models for datacenter networks
- Firewalls and Intrusion Detection/Prevention Systems (IDS/IPS)
- Microsegmentation and zero-trust networking
Week 10: Datacenter Interconnects
- Intra-datacenter connectivity
- Inter-datacenter networking
- Software-defined WANs (SD-WAN) for datacenter interconnection
Week 11: Network Monitoring and Telemetry
- Network monitoring tools and techniques
- Streaming telemetry
- Network analytics and machine learning for anomaly detection
Week 12: Energy Efficiency in Datacenter Networks
- Green networking techniques
- Energy-aware routing and scheduling
- Power management in network devices
Week 13: Datacenter Network Performance
- Performance metrics and benchmarking
- Latency and throughput optimization
- Congestion control mechanisms
Week 14: Emerging Technologies in Datacenter Networking
- Programmable data planes (P4)
- Optical networking in datacenters
- Silicon photonics and co-packaged optics
Week 15: Cloud-Native Networking
- Container networking (e.g., Kubernetes networking)
- Service mesh architectures
- Network automation and Infrastructure as Code (IaC)
Week 16: Future Trends and Research Directions
- AI/ML-driven network management
- Satellite networks and their impact on datacenter connectivity
- Edge computing and its impact on datacenter networks
How to Succeed in this Class
- Keep up with the class materials and participate in discussions.
- Read and critique papers before each class.
- Prepare your presentations thoroughly.
- Start your project early and maintain regular progress.
Papers
- How to read a paper: pdf
Module 1: Topology
- (Sep 3) VL2: A Scalable and Flexible Data Center Network: pdf
- Reference: PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric pdf
- (Sep 5) A Scalable, Commodity Data Center Network Architecture: pdf
- Reference: Original fat-tree paper: pdf
- (Sep 10) Jellyfish: Networking Data Centers Randomly: pdf
Module 2: Datacenter Transport
- (Sep 12) Datacenter TCP (DCTCP): pdf
Module 3: Software-defined Networking
- (Sep 19) A Clean Slate 4D Approach to Network Control and Management: pdf
Module 4: Load Balancing
- (Sep 26) Design, implementation and evaluation of congestion control for multipath TCP: pdf
- (Oct 1) Presto: Edge-based Load Balancing for Fast Datacenter Networks: pdf
Module 4: Network Algorithmics
- (Oct 17) Neural Packet Classification: pdf
Module 4: Communication Abstractions, Collective Communication, and ML
- (Oct 24) Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem: pdf
- (Oct 31) EyeQ: Practical Network Performance Isolation at the Edge: pdf
- (Nov 5) Application-Driven Bandwidth Guarantees in Datacenters: pdf
- (Nov 7) TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches: link
- (Nov 12) MCCS: A Service-based Approach to Collective Communication for Multi-Tenant Cloud: link
- (Nov 14) Efficient sparse collective communication and its application to accelerate distributed deep learning: link
- (Nov 19) SOAR: minimizing network utilization with bounded in-network computing: link
- (Nov 21) Towards Domain-Specific Network Transport for Distributed DNN Training: link
- (Nov 26) In-Network Aggregation with Transport Transparency for Distributed Training: link
- (Dec 3) Training Job Placement in Clusters with Statistical In-Network Aggregation: link