About this course
Design and operate applications built for the cloud, using containers, orchestration, and managed services to achieve elasticity, resilience, and cost efficiency.
Engineered a cloud-native multi-service platform deployed to Kubernetes with Helm and Istio, provisioned end to end as Terraform infrastructure code, autoscaled under k6 load tests, and instrumented with Prometheus and Grafana to a defined cost budget.
Expected outcomes
- Explain cloud service models, regions, and the shared responsibility model
- Analyze distributed-systems foundations of availability, consistency, and partition tolerance
- Containerize applications and design minimal, reproducible images
- Operate workloads on Kubernetes using Deployments, Services, and Ingress
- Decompose systems into microservices and reason about coupling and resilience
- Provision infrastructure declaratively using infrastructure as code
- Configure autoscaling and load balancing to meet latency and availability targets
- Evaluate cloud cost and the economic trade-offs of elasticity and reservation
- Instrument clusters and services for metrics, logs, and traces
- Justify architectural choices against scalability and reliability requirements
Key topics
- Containers & Kubernetes
- Microservices & service meshes
- Infrastructure as code
- Scalability & cost management
Theoretical foundations
The concepts and results this course rests on.
- CAP theorem and the availability, consistency, partition-tolerance trade-off
- Quorum and eventual-consistency models for replicated state
- Reconciliation and the declarative desired-state control loop
- Process isolation through namespaces and cgroups
- Queueing theory and tail-latency analysis under load
- Autoscaling control theory and capacity versus cost models
- Supply-chain provenance and reproducible-build theory
Prerequisites
Course-specific prerequisites:
- Operating systems and computer networks
- Software engineering and a programming language
Weekly schedule 13 weeks · lecture + practice
Students lean on AI assistants to scaffold Dockerfiles and Kubernetes manifests, then refactor them toward smaller, reproducible images and cleaner Helm charts. They use chat-based and editor-integrated tools to draft Terraform modules, generate k6 load scripts, and synthesize realistic test traffic, while connecting agents to kubectl and cloud MCP servers to inspect cluster state and propose fixes. AI also helps interpret Prometheus and Grafana output, summarizing latency and cost data into concrete autoscaling and budget actions. Teams are expected to review every generated manifest critically, since a confidently wrong resource limit or policy can break a live deployment.
Student project
Teams build one cloud-native application and carry it from a single container to a scaled, observable, multi-service deployment on Kubernetes. The system is provisioned entirely as infrastructure code, secured and observed through a service mesh, and tuned for autoscaling within a defined cost budget.
Requirements
- Build a working system, not a set of disconnected exercises.
- Be original: a new system that solves a real problem, not a re-implementation of a tutorial or course demo.
- Show real depth: real data, real users or realistic load, and engineering trade-offs that are measured rather than assumed.
- Carry one running project from specification to a deployed, defensible result across the whole term.
- Work in a team of three or four and defend the design at each of the three presentations (weeks 5, 8, and 13).
Example projects
Assessment & grading
Grading is project-based, with no written exam. Teams of three or four present one running project three times.
| Component | What it covers | Weight |
|---|---|---|
| Project · Specification | Presentation 1 (week 5): problem, objectives, and architecture | 20% |
| Project · Interim | Presentation 2 (week 8): the working system demonstrated live | 30% |
| Project · Final | Presentation 3 (week 13): end-to-end demo with oral defense | 50% |
Tools & platforms
- Docker: build and run application containers
- Kubernetes: orchestrate containerized workloads
- Helm: package and template Kubernetes manifests
- Istio: manage service to service traffic and policy
- Terraform: provision cloud infrastructure as code
- Prometheus: collect cluster and application metrics
- Grafana: visualize metrics and build dashboards
- Amazon Web Services: host managed cloud services
- kubectl: operate clusters from the command line
- Trivy: scan container images for vulnerabilities
- k6: run load and performance tests
Free online courses
Existing free, video-based courses this course can build on, for self-study or as a teaching basis.
- YouTubeDocker Containers and Kubernetes Fundamentals (Full Hands-On Course)
- YouTubeDocker and Kubernetes - Full Course for Beginners
In Hebrew · בעברית
Primary literature
Seminal works to read for graduate-level depth.
References
Books and resources link to an online or publisher page.
- TextbookDesigning Distributed Systems, 2nd Edition: Patterns and Paradigms for Scalable, Reliable Services
- TextbookKubernetes: Up and Running, 3rd Edition
- TextbookBuilding Microservices, 2nd Edition: Designing Fine-Grained Systems
- TextbookTerraform: Up and Running, 3rd Edition: Writing Infrastructure as Code
- TextbookCloud Native Patterns: Designing Change-Tolerant Software
- DocumentationKubernetes Documentation
- DocumentationDocker Documentation
Role in each concentration
| Concentration | Role |
|---|---|
| Intelligent Software Systems | Core · Semester 1 |
| Networking & Cyber Security | Core · Semester 1 |
| AI & Robotics | Elective |
| AI and Quantum Computing for Finance | Core · Semester 1 |
| Immersive Systems & Game Development | Core · Semester 2 |
| Defense Technologies & Autonomous Systems | Elective |