SE5: Engineering of AI Systems

About this course

Apply DevOps discipline to machine-learning and LLM systems, automating training, deployment, monitoring, and continuous delivery of models in production.

Course format. Thirteen weeks, four contact hours each: a two-hour lecture (concepts and theory) and a two-hour practice session. The course is project-based; teams carry one running project end to end and present it three times, in weeks 5, 8, and 13.

What you will build

Build, deploy, and operate a production AI service end to end: a containerized, CI/CD-gated REST API on Kubernetes, fed by a medallion data pipeline, with an MLflow model registry, a gateway-fronted RAG feature, live drift monitoring, and a security review against the OWASP LLM Top 10.

Expected outcomes

Explain why production AI systems fail more often in operations than in modelling, and define SLIs, SLOs, and error budgets.
Use cloud compute, storage, and networking, and choose a deployment model from IaaS to serverless.
Design and operate a CI/CD pipeline with tests, containers, infrastructure as code, and a versioned REST API.
Run services under an orchestrator with health checks, rollout patterns, and observability based on RED dashboards.
Build trustworthy data pipelines with a medallion lake, quality gates, data contracts, and dataset versioning.
Operate the model lifecycle: experiment tracking, a model registry, serving, monitoring, and drift detection.
Build and operate RAG services behind gateways, with evaluation suites, tracing, and guardrails.
Reason about the LLM token economy and engineer for cost and latency.
Build, trace, and bound tool-using agents, and apply AI security and governance.
Carry one running service from specification through to a governed production deployment.

Key topics

CI/CD pipelines
Model serving & versioning
Monitoring & drift detection
LLMOps & agent operations

Theoretical foundations

The concepts and results this course rests on.

Service-level objectives, error budgets, and reliability theory
The medallion data architecture, data contracts, and dataset versioning
The reproducibility triple: code version, data version, and environment
Data drift versus concept drift and statistical detection (PSI, KS tests)
Retrieval-augmented generation and grounded prompting
The agent loop, function calling, and step-level evaluation
Software, data, and model supply-chain security and the OWASP LLM Top 10

Prerequisites

This is a Year-3 course. It assumes the mandatory CS core: data structures and algorithms, operating systems, computer networks, databases, software engineering, and the core mathematics (linear algebra, probability and statistics, calculus, discrete mathematics). It additionally requires the specific prior courses listed below.

Course-specific prerequisites:

Machine Learning
Software engineering and Python
Operating systems and networking

Weekly schedule 13 weeks · lecture + practice

Part I: Foundations & the Cloud

Wk 1

Production Engineering & the Ops Landscape

LectureThe prototype-to-production gap and the 90/10 inversion; SLIs, SLOs, SLAs, and error budgets; toil, day-one vs day-two operations, blameless postmortems, and the five operational layers.

PracticeSet up a team repository from a template with branch protection and containerise a hello-service with pinned versions.

ProjectCreate the team repo, containerise the hello-service, and shortlist two use-case domains.

WatchMLOps Zoomcamp 1.1: Introduction (DataTalksClub) · MLOps Zoomcamp 1.5: MLOps Maturity Model (DataTalksClub)

Wk 2

Cloud Computing Fundamentals

LectureCloud primitives (compute, storage, networking); IaaS, PaaS, SaaS, and serverless; regions and availability zones; the shared-responsibility model; the cost model and blast radius.

PracticeProvision a cloud footprint on a free tier with a budget alert, deploy the hello-service in two deployment models, and tear it down.

ProjectCreate the team cloud space with a budget alert and provision the project storage bucket accessed from code.

WatchMLOps Zoomcamp 1.2: Environment Preparation on AWS (DataTalksClub)

Part II: DevOps

Wk 3

CI/CD, Testing & REST Services

LectureDORA metrics and the testing pyramid; trunk-based development; infrastructure as code with Terraform (desired state, idempotency); REST design, API versioning, and health and readiness endpoints.

PracticeBuild a CI pipeline that lints, tests, builds, and publishes an artifact, and design a versioned REST API skeleton.

ProjectCommit to the project use case; REST API skeleton with two endpoints, a health check, validation, and tests; CI gating every merge.

WatchComplete Terraform Course, Beginner to Pro (DevOps Directive)

Wk 4

Orchestration, Deployment Patterns & Observability

LectureKubernetes desired-state and reconciliation; deployment patterns (blue-green, canary) and GitOps; the three pillars of observability, the RED method, and reasoning about tail latency on percentiles.

PracticeRun the service under an orchestrator with probes and scaling, execute a canary rollout and rollback, and build a RED dashboard.

ProjectDeploy with probes and three replicas, demonstrate canary plus rollback, and record a baseline p95 on a RED dashboard.

WatchKubernetes Tutorial for Beginners, Full Course in 4 Hours (TechWorld with Nana) · MLOps Zoomcamp 4.1: Three Ways of Deploying a Model (DataTalksClub)

Part III: DataOps

Wk 5

Data Lakes, Pipelines & VersioningPresentation

LectureWarehouse vs lake vs lakehouse; the medallion architecture (bronze, silver, gold); orchestration and idempotency; data versioning and lineage.

PracticeStudent Presentation 1 (Specification): each team presents the problem statement and success metrics (SLOs), the system and data architecture, DevOps status, and a risk and governance register, then submits a written report and a tagged release.

ProjectAn orchestrated pipeline with retries and backfills, and a versioned dataset reproduced from a pinned snapshot.

WatchMLOps Zoomcamp 1.3: Reading Parquet Files (DataTalksClub)

Wk 6

Data Quality, Contracts, Streaming & Feature Stores

LectureValidation as code; data contracts between producer and consumer; streaming with Kafka (topics, partitions, consumer groups, at-least-once delivery, idempotent consumers); feature stores and train/serve skew.

PracticeAdd validation gates from bronze to silver with a quarantine table, enforce a data contract, and land a live stream into the lake.

ProjectValidation gates with quarantine, an enforced data contract, and gold feature tables built on one shared definition.

WatchWhat is Apache Kafka? Confluent Lightboard (Tim Berglund) · MLOps Zoomcamp 5.7: Data Quality Monitoring (DataTalksClub)

Part IV: MLOps

Wk 7

Experiment Tracking, Model Registry & Serving

LectureThe reproducibility triple (git SHA, data version, environment); experiment tracking; the model registry and model cards; serving patterns (online, batch, streaming) and safe rollout (shadow, canary, A/B).

PracticeInstrument training with tracking, register a model with a model card, and serve it behind REST with a safe rollout.

ProjectTracked training with the reproducibility triple pinned; model v1 registered and served; v2 canaried against v1.

WatchMLOps Zoomcamp 2.3: Experiment Tracking with MLflow (DataTalksClub) · MLOps Zoomcamp 2.5: Model Registry (DataTalksClub) · MLOps Zoomcamp 4.2: Deploying Models with Flask and Docker (DataTalksClub)

Wk 8

Monitoring, Model Drift & GovernancePresentation

LectureData drift vs concept drift and how to detect each (PSI, KS tests, embedding distance); retraining triggers paired with documented actions; audit trails and governance.

PracticeStudent Presentation 2 (Interim): teams demonstrate live a working pipeline, a tracked and versioned model in the registry, the model served with canary and a live RED dashboard, a change landing through CI/CD during the talk, and a monitoring and drift plan; submit a report and a tagged release.

ProjectDrift detectors each paired with a documented action, and retraining triggers defined.

WatchMLOps Zoomcamp 5.1: Intro to ML Monitoring (DataTalksClub) · MLOps Zoomcamp 5.4: Evidently Metrics and Data Drift (DataTalksClub)

Part V: LLMOps & AgentOps

Wk 9

LLM Foundations: AI APIs, Tokens & the Token Economy

LectureNext-token prediction and tokenization; API anatomy and structured outputs; the token economy and its cost levers (shorter context, prompt caching, batching); managed AI services.

PracticeCall an AI API with structured outputs and schema validation, build a token-economy cost model, and add a retry-with-backoff wrapper.

ProjectWire an LLM feature through the course proxy with structured outputs, a cost-per-request and monthly projection, and a pinned model version.

WatchLets Build the GPT Tokenizer (Andrej Karpathy)

Wk 10

RAG & Serving LLMs: Vector Databases & Gateways

LectureRetrieval-augmented generation end to end (embeddings, vector databases, chunking, grounded prompts); prompts as versioned code; hosted vs self-hosted serving with vLLM; the gateway pattern.

PracticeBuild a RAG service over a real corpus and route all LLM traffic through a gateway with fallbacks and a budget cap.

ProjectThe project's RAG or extraction pipeline runs behind the gateway with fallback and a budget cap.

WatchProduction RAG with LangChain and Vector Databases, Full Course (freeCodeCamp)

Wk 11

LLM Evaluation, Guardrails & Observability

LectureEvaluation that means something (faithfulness, answer relevance, retrieval recall); LLM-as-judge biases and calibration; LLM tracing and observability; guardrails and prompt injection.

PracticeBuild a representative eval set and run it as a regression suite, add request tracing, and demonstrate a guardrail against injection.

ProjectAn eval set of at least fifty items wired into CI, live tracing of prompt version, tokens, cost, and latency, and one guardrail with a measured response cache.

Wk 12

Agents & AgentOps: Tools, MCP & Managed Agents

LectureThe agent loop (plan, act, observe) and function calling; the Model Context Protocol and tool ecosystem; AgentOps (tracing, step-level evaluation, mandatory bounds); managed agent services.

PracticeBuild and trace a tool-using agent, bound it with step, cost, and permission limits, and expose a tool as an MCP server.

ProjectOne agentic capability with tracing, bounds, and step caps, and at least one tool exposed via MCP.

WatchBuilding Agents with the Model Context Protocol, Workshop with Mahesh Murag (Anthropic)

Part VI: Security & Governance

Wk 13

Security, Governance & SynthesisPresentation

LectureSoftware, data, and model supply-chain security (secrets, dependency tracking, provenance, SBOM); the OWASP Top 10 for LLM applications; governance (privacy, audit trails, model cards, NIST AI RMF); synthesis of the five layers.

PracticeStudent Presentation 3 (Final, with oral defense): teams deliver an end-to-end production demo (data in, decision out, live), observability with an actionable alert and runbook, evaluation, guardrails, and a cost and latency report, and a security and governance review against the OWASP LLM Top 10.

ProjectA governed production deployment with an audit trail, and the repository tagged v1.0.

WatchOWASP Top 10 for LLM Applications 2025 Explained

AI tools in this course.

Students treat AI assistants as a core part of the operations toolchain: generating Terraform, CI/CD, and Kubernetes config, refactoring service code, and drafting data-quality and drift checks. They wire LLM features through the course gateway, build RAG and extraction pipelines, and expose project tools as MCP servers that their own agents call under step, cost, and permission bounds. AI is used to generate evaluation sets, synthesize test data, and analyze RED dashboards, traces, and drift metrics into concrete actions. Because the course grades exactly what an assistant cannot do for a student, operating under load, reading telemetry, and defending decisions, every generated artifact is traced, evaluated, and reviewed before it ships.

Student project

Teams of three or four carry one running AI service from specification to a governed production deployment, presenting it three times across the term. Grading weights the parts an AI assistant cannot do for the student: operating a system under load, interpreting telemetry, and defending design decisions. Example domains include IoT telemetry, document question-answering, and document processing.

Requirements

Build a working system, not a set of disconnected exercises.
Be original: a new system that solves a real problem, not a re-implementation of a tutorial or course demo.
Show real depth: real data, real users or realistic load, and engineering trade-offs that are measured rather than assumed.
Carry one running project from specification to a deployed, defensible result across the whole term.
Work in a team of three or four and defend the design at each of the three presentations (weeks 5, 8, and 13).

Example projects

Predictive-maintenance IoT monitorSupport-docs Q&A chatbotInvoice or form processorMenu or receipt nutrition estimatorSmart-home energy advisorCode-review assistantResearch-paper summariserChurn-prediction service

Assessment & grading

Grading is project-based, with no written exam. Teams of three or four present one running project three times.

Component	What it covers	Weight
Project · Specification	Presentation 1 (week 5): problem, objectives, and architecture	20%
Project · Interim	Presentation 2 (week 8): the working system demonstrated live	30%
Project · Final	Presentation 3 (week 13): end-to-end demo with oral defense	50%

Tools & platforms

Docker: containerise services with pinned dependencies
Kubernetes: orchestrate, scale, and self-heal services
GitHub Actions: run CI/CD pipelines on every merge
Terraform: provision infrastructure as code
Prometheus & Grafana: collect metrics and build RED dashboards
Apache Airflow: orchestrate data pipelines with retries and backfills
DVC: version datasets and reproduce results
Great Expectations: validate data quality as code
Apache Kafka: stream events into the data lake
MLflow: track experiments and run the model registry
EvidentlyAI: monitor data and model drift
LiteLLM: route LLM traffic through one gateway with fallbacks
FAISS / Qdrant: vector search for retrieval-augmented generation
Langfuse: trace and observe LLM and agent applications
Model Context Protocol: expose tools to agents

Free online courses

Existing free, video-based courses this course can build on, for self-study or as a teaching basis.

YouTubeMLOps Course - Build Machine Learning Production Grade Projects
freeCodeCamp DevOps for ML, deployment, production workflows
YouTubeMLOps Zoomcamp
DataTalks.Club free MLflow, orchestration, deployment, monitoring

In Hebrew · בעברית

Campus ILמבוא לבינה מלאכותית: מתאוריה לפרקטיקה
קורס וידאו חינם בעברית בלמידת מכונה, אוניברסיטת בן גוריון
YouTubeמבוא ל-DevOps - קורס אונליין חינם
קורס וידאו חינם בעברית ב-DevOps

Primary literature

Seminal works to read for graduate-level depth.

PaperHidden Technical Debt in Machine Learning Systems
Sculley et al., 2015
PaperRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Lewis et al., 2020
PaperReAct: Synergizing Reasoning and Acting in Language Models
Yao et al., 2023
PaperBuilding Effective Agents
Anthropic, 2024
PaperSite Reliability Engineering
Beyer, Jones, Petoff, Murphy (eds.), 2016

References

Books and resources link to an online or publisher page.

TextbookDesigning Data-Intensive Applications
Martin Kleppmann, 2017, Foundations of reliable data systems
TextbookSite Reliability Engineering
Beyer, Jones, Petoff, Murphy (eds.), 2016, Free online; SLOs and error budgets
TextbookThe Site Reliability Workbook
Beyer, Murphy, Rensin, Kawahara, Thorne (eds.), 2018, Practical companion to the SRE book
DocumentationAWS Well-Architected Framework
Amazon Web Services, current, Cloud architecture pillars
PaperHidden Technical Debt in Machine Learning Systems
Sculley et al., 2015, NeurIPS 2015
DocumentationMLflow Documentation
MLflow project, current, Tracking and model registry
PaperRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Lewis et al., 2020, The original RAG paper
DocumentationOWASP Top 10 for LLM Applications
OWASP, 2025, LLM application security risks
DocumentationBuilding Effective Agents
Anthropic, 2024, Agent design patterns

Role in each concentration

Concentration	Role
Intelligent Software Systems	Core · Semester 2
Networking & Cyber Security	Core · Semester 2
AI & Robotics	Core · Semester 2
AI and Quantum Computing for Finance	Elective
Immersive Systems & Game Development	Elective
Defense Technologies & Autonomous Systems	Elective

← SE4 · Back-End Web Development & APIs SE6 · Mobile, IoT & Edge Software Development →