Course Overview

This advanced graduate-level course focuses on the key aspects of modern datacenter networking. Students will explore the differences between traditional Internet architectures and modern datacenters, with an emphasis on cutting-edge technologies, practical applications, and ongoing research. The course is designed to equip students with the necessary knowledge to engage with state-of-the-art networking research and practices.

Office Hours and Contact

  • Instructor: Balajee Vamanan
  • Office Hours: Friday 3–4 PM at SEO 1310 or by appointment via email.
  • Email Policy: Use Piazza for questions/discussions about lectures, papers, and projects. Email is reserved for personal matters only.

Course Goals

  • Familiarize students with state-of-the-art networking research, specifically in datacenters.
  • Practice reading and critiquing research papers.
  • Develop skills for reproducing research results.

Resources

  • Class Webpage: https://www.550.cs.uic.edu
  • Piazza: https://piazza.com/uic/fall2024/cs550/home
  • Blackboard: For grades and other official communications.
  • Book: Lecture notes and papers

Modality of the Class

Each class will focus on the discussion of 1-2 research papers. Students are expected to read and critique papers before class, and most of the lecture time will be dedicated to in-depth discussion.

Prerequisites

  • CS 450 or equivalent: Students must be familiar with basic networking concepts and be comfortable with coding and debugging.
  • Project Work: The course involves a significant project component, where students will code and benchmark their work.

Grading Breakdown (Tentative)

  • Class Participation: 10%
  • Paper Presentations: 30% (2-3 papers per student)
  • Paper Critiques: 20%
  • Project: 40%
    • Proposal (1-2 pages): 10%
    • Presentation (15 mins + 5 mins Q&A): 10%
    • Report (10-12 pages): 10%
    • Demo (10 mins): 10%

Topics and Schedule (tentative)

Course Outline

Week 1: Introduction to Datacenter Networks

  • Overview of datacenter architecture
  • Evolution of datacenter networks
  • Key challenges in datacenter networking

Week 2: Datacenter Topologies

  • Traditional three-tier architecture
  • Clos networks and fat-tree topologies
  • Emerging topologies (e.g., DCell, BCube)

Week 3: Datacenter Network Protocols

  • TCP/IP in datacenter environments
  • RDMA (Remote Direct Memory Access)
  • Datacenter TCP (DCTCP)

Week 4: Software-Defined Networking (SDN) in Datacenters

  • SDN architecture and principles
  • OpenFlow and other SDN protocols
  • SDN controllers for datacenters

Week 5: Network Virtualization

  • Network overlays (VXLAN, NVGRE, STT)
  • Network Function Virtualization (NFV)
  • Virtual Network Functions (VNFs) in datacenters

Week 6: Load Balancing in Datacenters

  • Layer 4 vs. Layer 7 load balancing
  • Software vs. hardware load balancers
  • Advanced load balancing algorithms

Week 7: Datacenter Traffic Engineering

  • Flow scheduling
  • Multipath routing (ECMP, MPTCP)
  • Traffic prediction and optimization

Week 8: Quality of Service (QoS) in Datacenters

  • QoS requirements for different applications
  • QoS mechanisms (traffic shaping, policing, marking)
  • End-to-end QoS in multi-tenant environments

Week 9: Network Security in Datacenters

  • Threat models for datacenter networks
  • Firewalls and Intrusion Detection/Prevention Systems (IDS/IPS)
  • Microsegmentation and zero-trust networking

Week 10: Datacenter Interconnects

  • Intra-datacenter connectivity
  • Inter-datacenter networking
  • Software-defined WANs (SD-WAN) for datacenter interconnection

Week 11: Network Monitoring and Telemetry

  • Network monitoring tools and techniques
  • Streaming telemetry
  • Network analytics and machine learning for anomaly detection

Week 12: Energy Efficiency in Datacenter Networks

  • Green networking techniques
  • Energy-aware routing and scheduling
  • Power management in network devices

Week 13: Datacenter Network Performance

  • Performance metrics and benchmarking
  • Latency and throughput optimization
  • Congestion control mechanisms

Week 14: Emerging Technologies in Datacenter Networking

  • Programmable data planes (P4)
  • Optical networking in datacenters
  • Silicon photonics and co-packaged optics

Week 15: Cloud-Native Networking

  • Container networking (e.g., Kubernetes networking)
  • Service mesh architectures
  • Network automation and Infrastructure as Code (IaC)
  • AI/ML-driven network management
  • Satellite networks and their impact on datacenter connectivity
  • Edge computing and its impact on datacenter networks

How to Succeed in this Class

  • Keep up with the class materials and participate in discussions.
  • Read and critique papers before each class.
  • Prepare your presentations thoroughly.
  • Start your project early and maintain regular progress.

Papers

  • How to read a paper: pdf

Module 1: Topology

  • (Sep 3) VL2: A Scalable and Flexible Data Center Network: pdf
    • Reference: PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric pdf
  • (Sep 5) A Scalable, Commodity Data Center Network Architecture: pdf
    • Reference: Original fat-tree paper: pdf
  • (Sep 10) Jellyfish: Networking Data Centers Randomly: pdf

Module 2: Datacenter Transport

  • (Sep 12) Datacenter TCP (DCTCP): pdf

Module 3: Software-defined Networking

  • (Sep 19) A Clean Slate 4D Approach to Network Control and Management: pdf

Module 4: Load Balancing

  • (Sep 26) Design, implementation and evaluation of congestion control for multipath TCP: pdf
  • (Oct 1) Presto: Edge-based Load Balancing for Fast Datacenter Networks: pdf

Module 4: Network Algorithmics

  • (Oct 17) Neural Packet Classification: pdf

Module 4: Communication Abstractions, Collective Communication, and ML

  • (Oct 24) Rethinking Machine Learning Collective Communication as a Multi-Commodity Flow Problem: pdf
  • (Oct 31) EyeQ: Practical Network Performance Isolation at the Edge: pdf
  • (Nov 5) Application-Driven Bandwidth Guarantees in Datacenters: pdf
  • (Nov 7) TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches: link
  • (Nov 12) MCCS: A Service-based Approach to Collective Communication for Multi-Tenant Cloud: link
  • (Nov 14) Efficient sparse collective communication and its application to accelerate distributed deep learning: link
  • (Nov 19) SOAR: minimizing network utilization with bounded in-network computing: link
  • (Nov 21) Towards Domain-Specific Network Transport for Distributed DNN Training: link
  • (Nov 26) In-Network Aggregation with Transport Transparency for Distributed Training: link
  • (Dec 3) Training Job Placement in Clusters with Statistical In-Network Aggregation: link