HIT · CS Concentrations

COURSE · AI4

Scalable AI: Big-Data Algorithms

AI בקנה מידה: אלגוריתמים לנתוני עתק

the algorithmic theory of distributed training, approximate, and streaming computation

Train and serve AI at massive data and compute scale

Year 313 weeks2h lecture + 2h practiceProject-based

About this course

Learn algorithms and systems for machine learning at scale, where datasets and models exceed a single machine.

Course format. Thirteen weeks, four contact hours each: a two-hour lecture (concepts and theory) and a two-hour practice session. The course is project-based; teams carry one running project end to end and present it three times, in weeks 5, 8, and 13.
What you will build

Built a scalable training and serving pipeline in Python with PyTorch Distributed, DeepSpeed, Apache Spark, and Kafka, layering data and pipeline parallelism, ZeRO sharding, sketch-based approximate aggregation, and streaming ingestion measured for throughput and cost.

Expected outcomes

  • Analyze the complexity and communication costs of distributed algorithms
  • Derive data, model, and pipeline parallelism for large-scale training
  • Explain synchronous and asynchronous stochastic gradient descent and its convergence
  • Build big-data pipelines with MapReduce and dataflow frameworks
  • Implement approximate algorithms including sketches, hashing, and sampling
  • Design streaming algorithms with bounded memory over unbounded data
  • Quantify scaling laws relating model loss to data, parameters, and compute
  • Apply mixed precision, sharding, and gradient accumulation for memory efficiency
  • Evaluate throughput, latency, and cost trade-offs in distributed systems
  • Deploy a scalable training and serving pipeline on a cluster

Key topics

  • Distributed training
  • Big-data frameworks
  • Approximate & streaming algorithms
  • Scaling laws

Theoretical foundations

The concepts and results this course rests on.

  • Amdahl's law and the memory, compute, and communication walls
  • the MapReduce model, dataflow graphs, and fault tolerance
  • synchronous stochastic gradient descent and gradient all-reduce
  • model, pipeline, and tensor parallelism with ZeRO sharding
  • count-min sketch, HyperLogLog, and locality-sensitive hashing
  • the streaming model and the Johnson-Lindenstrauss lemma
  • neural scaling laws and compute-optimal resource allocation

Prerequisites

This is a Year-3 course. It assumes the mandatory CS core: data structures and algorithms, operating systems, computer networks, databases, software engineering, and the core mathematics (linear algebra, probability and statistics, calculus, discrete mathematics). It additionally requires the specific prior courses listed below.

Course-specific prerequisites:

  • Machine Learning
  • Algorithms and data structures
  • Distributed systems or databases basics

Weekly schedule 13 weeks · lecture + practice

Scaling foundations
Wk 1
Why scale and what breaks
LectureWe analyze the limits of single-machine computation, Amdahl's law, and the memory, compute, and communication walls.
PracticeProfile a single-node training job and identify its bottlenecks.
ProjectChoose the scalable pipeline target and establish a single-node baseline.
Wk 2
Big-data frameworks
LectureWe cover the MapReduce model, dataflow graphs, partitioning, and fault tolerance.
PracticeImplement a MapReduce-style aggregation and run it on a Spark cluster.
ProjectBuild a distributed data-ingestion stage for the project.
Wk 3
Distributed storage and data formats
LectureWe cover columnar formats, sharding, partitioning, and the CAP trade-offs of data-intensive systems.
PracticeConvert the dataset to a columnar format and benchmark scan throughput.
ProjectOptimize the project data layout for parallel access.
Distributed training
Wk 4
Data-parallel training
LectureWe derive synchronous SGD, gradient all-reduce, and the convergence effect of large batch sizes.
PracticeRun data-parallel training across multiple workers and measure scaling efficiency.
ProjectMake the training stage data-parallel.
Wk 5
Model and pipeline parallelismPresentation
LectureWe cover model sharding, pipeline parallelism, tensor parallelism, and the bubble overhead they incur.
PracticeTeam presentation: each team defends its scaling specification and target metrics.
ProjectLock the specification and prototype a sharded model stage.
Wk 6
Memory and precision efficiency
LectureWe cover mixed precision, activation checkpointing, ZeRO sharding, and gradient accumulation.
PracticeApply mixed precision and sharding to fit a larger model in memory.
ProjectScale the model size with memory-efficient training.
Approximate algorithms
Wk 7
Sketches and hashing
LectureWe derive count-min sketch, HyperLogLog, and locality-sensitive hashing with their error bounds.
PracticeImplement a count-min sketch and HyperLogLog and validate accuracy versus memory.
ProjectAdd approximate aggregation to the data pipeline.
Wk 8
Sampling and dimensionality reductionPresentation
LectureWe cover reservoir sampling, random projections, and the Johnson-Lindenstrauss lemma.
PracticeTeam presentation: interim demo of the scaled pipeline with throughput numbers.
ProjectAdd sampling and random projection to reduce data volume.
Streaming
Wk 9
Streaming algorithms
LectureWe define the streaming model, bounded memory over unbounded input, and windowed computation.
PracticeBuild a streaming aggregation over a simulated event stream.
ProjectAdd a streaming ingestion path to the pipeline.
Wk 10
Online and approximate learning
LectureWe cover online gradient descent, regret bounds, and incremental model updates.
PracticeImplement online learning that updates the model as data streams in.
ProjectEnable continuous online updates in the project.
Scaling laws
Wk 11
Scaling laws and compute budgets
LectureWe derive empirical scaling laws relating loss to data, parameters, and compute, and compute-optimal allocation.
PracticeFit a small scaling-law curve from runs at several model sizes.
ProjectUse scaling-law analysis to choose the project compute budget.
Serving
Wk 12
Scalable inference and serving
LectureWe cover batching, caching, autoscaling, and latency-throughput-cost trade-offs at serving time.
PracticeDeploy the model behind a scalable serving layer and load-test it.
ProjectStand up scalable serving for the trained model.
Capstone
Wk 13
Final defensePresentation
LectureWe synthesize distributed training, approximate and streaming algorithms, and scaling laws, and survey open problems.
PracticeTeam presentation: final demo with scaling benchmarks and an oral defense of design choices.
ProjectDeliver the complete scalable training-and-serving pipeline with benchmarks.
AI tools in this course.

Students use AI assistants to generate and refactor PyTorch Distributed and DeepSpeed launch scripts, Spark and Kafka pipeline code, and sketch-algorithm implementations, vibe-coding the all-reduce and ZeRO-sharded training stages. They prompt AI to write count-min sketch and HyperLogLog tests, synthesize streaming event data, and draft autoscaling and serving configs. AI also helps interpret throughput profiles, scaling-efficiency curves, and cost logs to locate the bottleneck a job hit.

Student project

Teams build one scalable AI pipeline that ingests, processes, trains on, and serves a large dataset, growing from a single node to a distributed cluster. The project layers in data parallelism, memory-efficient training, approximate and streaming algorithms, and scaling-law analysis, with throughput and cost measured at every stage.

Requirements

  • Build a working system, not a set of disconnected exercises.
  • Be original: a new system that solves a real problem, not a re-implementation of a tutorial or course demo.
  • Show real depth: real data, real users or realistic load, and engineering trade-offs that are measured rather than assumed.
  • Carry one running project from specification to a deployed, defensible result across the whole term.
  • Work in a team of three or four and defend the design at each of the three presentations (weeks 5, 8, and 13).

Example projects

Distributed image classifier trainingLarge-scale recommendation systemStreaming clickstream analyticsApproximate near-duplicate detection at web scaleDistributed embedding indexing serviceReal-time fraud-detection pipelinePetabyte log aggregation and alertingCompute-optimal language model pretraining study

Assessment & grading

Grading is project-based, with no written exam. Teams of three or four present one running project three times.

ComponentWhat it coversWeight
Project · SpecificationPresentation 1 (week 5): problem, objectives, and architecture20%
Project · InterimPresentation 2 (week 8): the working system demonstrated live30%
Project · FinalPresentation 3 (week 13): end-to-end demo with oral defense50%

Tools & platforms

Free online courses

Existing free, video-based courses this course can build on, for self-study or as a teaching basis.

In Hebrew · בעברית

Primary literature

Seminal works to read for graduate-level depth.

References

Books and resources link to an online or publisher page.

Role in each concentration

ConcentrationRole
Intelligent Software SystemsElective
Networking & Cyber SecurityElective
AI & RoboticsCore · Semester 1
AI and Quantum Computing for FinanceElective
Immersive Systems & Game DevelopmentElective
Defense Technologies & Autonomous SystemsElective