HIT · CS Concentrations

COURSE · SE5

Engineering of AI Systems

הנדסת מערכות בינה מלאכותית

the operational disciplines (DevOps, DataOps, MLOps, LLMOps, AgentOps) that carry AI systems to production

DevOps, DataOps, MLOps, LLMOps, AgentOps: one running project, end to end.

Year 313 weeks2h lecture + 2h practiceProject-based

About this course

Apply DevOps discipline to machine-learning and LLM systems, automating training, deployment, monitoring, and continuous delivery of models in production.

Course format. Thirteen weeks, four contact hours each: a two-hour lecture (concepts and theory) and a two-hour practice session. The course is project-based; teams carry one running project end to end and present it three times, in weeks 5, 8, and 13.
What you will build

Build, deploy, and operate a production AI service end to end: a containerized, CI/CD-gated REST API on Kubernetes, fed by a medallion data pipeline, with an MLflow model registry, a gateway-fronted RAG feature, live drift monitoring, and a security review against the OWASP LLM Top 10.

Expected outcomes

  • Explain why production AI systems fail more often in operations than in modelling, and define SLIs, SLOs, and error budgets.
  • Use cloud compute, storage, and networking, and choose a deployment model from IaaS to serverless.
  • Design and operate a CI/CD pipeline with tests, containers, infrastructure as code, and a versioned REST API.
  • Run services under an orchestrator with health checks, rollout patterns, and observability based on RED dashboards.
  • Build trustworthy data pipelines with a medallion lake, quality gates, data contracts, and dataset versioning.
  • Operate the model lifecycle: experiment tracking, a model registry, serving, monitoring, and drift detection.
  • Build and operate RAG services behind gateways, with evaluation suites, tracing, and guardrails.
  • Reason about the LLM token economy and engineer for cost and latency.
  • Build, trace, and bound tool-using agents, and apply AI security and governance.
  • Carry one running service from specification through to a governed production deployment.

Key topics

  • CI/CD pipelines
  • Model serving & versioning
  • Monitoring & drift detection
  • LLMOps & agent operations

Theoretical foundations

The concepts and results this course rests on.

  • Service-level objectives, error budgets, and reliability theory
  • The medallion data architecture, data contracts, and dataset versioning
  • The reproducibility triple: code version, data version, and environment
  • Data drift versus concept drift and statistical detection (PSI, KS tests)
  • Retrieval-augmented generation and grounded prompting
  • The agent loop, function calling, and step-level evaluation
  • Software, data, and model supply-chain security and the OWASP LLM Top 10

Prerequisites

This is a Year-3 course. It assumes the mandatory CS core: data structures and algorithms, operating systems, computer networks, databases, software engineering, and the core mathematics (linear algebra, probability and statistics, calculus, discrete mathematics). It additionally requires the specific prior courses listed below.

Course-specific prerequisites:

  • Machine Learning
  • Software engineering and Python
  • Operating systems and networking

Weekly schedule 13 weeks · lecture + practice

Part I: Foundations & the Cloud
Wk 1
Production Engineering & the Ops Landscape
LectureThe prototype-to-production gap and the 90/10 inversion; SLIs, SLOs, SLAs, and error budgets; toil, day-one vs day-two operations, blameless postmortems, and the five operational layers.
PracticeSet up a team repository from a template with branch protection and containerise a hello-service with pinned versions.
ProjectCreate the team repo, containerise the hello-service, and shortlist two use-case domains.
Wk 2
Cloud Computing Fundamentals
LectureCloud primitives (compute, storage, networking); IaaS, PaaS, SaaS, and serverless; regions and availability zones; the shared-responsibility model; the cost model and blast radius.
PracticeProvision a cloud footprint on a free tier with a budget alert, deploy the hello-service in two deployment models, and tear it down.
ProjectCreate the team cloud space with a budget alert and provision the project storage bucket accessed from code.
Part II: DevOps
Wk 3
CI/CD, Testing & REST Services
LectureDORA metrics and the testing pyramid; trunk-based development; infrastructure as code with Terraform (desired state, idempotency); REST design, API versioning, and health and readiness endpoints.
PracticeBuild a CI pipeline that lints, tests, builds, and publishes an artifact, and design a versioned REST API skeleton.
ProjectCommit to the project use case; REST API skeleton with two endpoints, a health check, validation, and tests; CI gating every merge.
Wk 4
Orchestration, Deployment Patterns & Observability
LectureKubernetes desired-state and reconciliation; deployment patterns (blue-green, canary) and GitOps; the three pillars of observability, the RED method, and reasoning about tail latency on percentiles.
PracticeRun the service under an orchestrator with probes and scaling, execute a canary rollout and rollback, and build a RED dashboard.
ProjectDeploy with probes and three replicas, demonstrate canary plus rollback, and record a baseline p95 on a RED dashboard.
Part III: DataOps
Wk 5
Data Lakes, Pipelines & VersioningPresentation
LectureWarehouse vs lake vs lakehouse; the medallion architecture (bronze, silver, gold); orchestration and idempotency; data versioning and lineage.
PracticeStudent Presentation 1 (Specification): each team presents the problem statement and success metrics (SLOs), the system and data architecture, DevOps status, and a risk and governance register, then submits a written report and a tagged release.
ProjectAn orchestrated pipeline with retries and backfills, and a versioned dataset reproduced from a pinned snapshot.
Wk 6
Data Quality, Contracts, Streaming & Feature Stores
LectureValidation as code; data contracts between producer and consumer; streaming with Kafka (topics, partitions, consumer groups, at-least-once delivery, idempotent consumers); feature stores and train/serve skew.
PracticeAdd validation gates from bronze to silver with a quarantine table, enforce a data contract, and land a live stream into the lake.
ProjectValidation gates with quarantine, an enforced data contract, and gold feature tables built on one shared definition.
Part IV: MLOps
Wk 7
Experiment Tracking, Model Registry & Serving
LectureThe reproducibility triple (git SHA, data version, environment); experiment tracking; the model registry and model cards; serving patterns (online, batch, streaming) and safe rollout (shadow, canary, A/B).
PracticeInstrument training with tracking, register a model with a model card, and serve it behind REST with a safe rollout.
ProjectTracked training with the reproducibility triple pinned; model v1 registered and served; v2 canaried against v1.
Wk 8
Monitoring, Model Drift & GovernancePresentation
LectureData drift vs concept drift and how to detect each (PSI, KS tests, embedding distance); retraining triggers paired with documented actions; audit trails and governance.
PracticeStudent Presentation 2 (Interim): teams demonstrate live a working pipeline, a tracked and versioned model in the registry, the model served with canary and a live RED dashboard, a change landing through CI/CD during the talk, and a monitoring and drift plan; submit a report and a tagged release.
ProjectDrift detectors each paired with a documented action, and retraining triggers defined.
Part V: LLMOps & AgentOps
Wk 9
LLM Foundations: AI APIs, Tokens & the Token Economy
LectureNext-token prediction and tokenization; API anatomy and structured outputs; the token economy and its cost levers (shorter context, prompt caching, batching); managed AI services.
PracticeCall an AI API with structured outputs and schema validation, build a token-economy cost model, and add a retry-with-backoff wrapper.
ProjectWire an LLM feature through the course proxy with structured outputs, a cost-per-request and monthly projection, and a pinned model version.
Wk 10
RAG & Serving LLMs: Vector Databases & Gateways
LectureRetrieval-augmented generation end to end (embeddings, vector databases, chunking, grounded prompts); prompts as versioned code; hosted vs self-hosted serving with vLLM; the gateway pattern.
PracticeBuild a RAG service over a real corpus and route all LLM traffic through a gateway with fallbacks and a budget cap.
ProjectThe project's RAG or extraction pipeline runs behind the gateway with fallback and a budget cap.
Wk 11
LLM Evaluation, Guardrails & Observability
LectureEvaluation that means something (faithfulness, answer relevance, retrieval recall); LLM-as-judge biases and calibration; LLM tracing and observability; guardrails and prompt injection.
PracticeBuild a representative eval set and run it as a regression suite, add request tracing, and demonstrate a guardrail against injection.
ProjectAn eval set of at least fifty items wired into CI, live tracing of prompt version, tokens, cost, and latency, and one guardrail with a measured response cache.
Wk 12
Agents & AgentOps: Tools, MCP & Managed Agents
LectureThe agent loop (plan, act, observe) and function calling; the Model Context Protocol and tool ecosystem; AgentOps (tracing, step-level evaluation, mandatory bounds); managed agent services.
PracticeBuild and trace a tool-using agent, bound it with step, cost, and permission limits, and expose a tool as an MCP server.
ProjectOne agentic capability with tracing, bounds, and step caps, and at least one tool exposed via MCP.
Part VI: Security & Governance
Wk 13
Security, Governance & SynthesisPresentation
LectureSoftware, data, and model supply-chain security (secrets, dependency tracking, provenance, SBOM); the OWASP Top 10 for LLM applications; governance (privacy, audit trails, model cards, NIST AI RMF); synthesis of the five layers.
PracticeStudent Presentation 3 (Final, with oral defense): teams deliver an end-to-end production demo (data in, decision out, live), observability with an actionable alert and runbook, evaluation, guardrails, and a cost and latency report, and a security and governance review against the OWASP LLM Top 10.
ProjectA governed production deployment with an audit trail, and the repository tagged v1.0.
AI tools in this course.

Students treat AI assistants as a core part of the operations toolchain: generating Terraform, CI/CD, and Kubernetes config, refactoring service code, and drafting data-quality and drift checks. They wire LLM features through the course gateway, build RAG and extraction pipelines, and expose project tools as MCP servers that their own agents call under step, cost, and permission bounds. AI is used to generate evaluation sets, synthesize test data, and analyze RED dashboards, traces, and drift metrics into concrete actions. Because the course grades exactly what an assistant cannot do for a student, operating under load, reading telemetry, and defending decisions, every generated artifact is traced, evaluated, and reviewed before it ships.

Student project

Teams of three or four carry one running AI service from specification to a governed production deployment, presenting it three times across the term. Grading weights the parts an AI assistant cannot do for the student: operating a system under load, interpreting telemetry, and defending design decisions. Example domains include IoT telemetry, document question-answering, and document processing.

Requirements

  • Build a working system, not a set of disconnected exercises.
  • Be original: a new system that solves a real problem, not a re-implementation of a tutorial or course demo.
  • Show real depth: real data, real users or realistic load, and engineering trade-offs that are measured rather than assumed.
  • Carry one running project from specification to a deployed, defensible result across the whole term.
  • Work in a team of three or four and defend the design at each of the three presentations (weeks 5, 8, and 13).

Example projects

Predictive-maintenance IoT monitorSupport-docs Q&A chatbotInvoice or form processorMenu or receipt nutrition estimatorSmart-home energy advisorCode-review assistantResearch-paper summariserChurn-prediction service

Assessment & grading

Grading is project-based, with no written exam. Teams of three or four present one running project three times.

ComponentWhat it coversWeight
Project · SpecificationPresentation 1 (week 5): problem, objectives, and architecture20%
Project · InterimPresentation 2 (week 8): the working system demonstrated live30%
Project · FinalPresentation 3 (week 13): end-to-end demo with oral defense50%

Tools & platforms

Free online courses

Existing free, video-based courses this course can build on, for self-study or as a teaching basis.

In Hebrew · בעברית

Primary literature

Seminal works to read for graduate-level depth.

References

Books and resources link to an online or publisher page.

Role in each concentration

ConcentrationRole
Intelligent Software SystemsCore · Semester 2
Networking & Cyber SecurityCore · Semester 2
AI & RoboticsCore · Semester 2
AI and Quantum Computing for FinanceElective
Immersive Systems & Game DevelopmentElective
Defense Technologies & Autonomous SystemsElective