About this course
Build applications on large language models and autonomous agents, covering prompting, tool use, retrieval, and orchestration of multi-step reasoning.
Built a grounded, tool-using LLM agent in Python with PyTorch, Hugging Face Transformers, LangChain, and FAISS, adding retrieval-augmented generation, a ReAct planning loop, safety guardrails, and an LLM-as-judge evaluation suite.
Expected outcomes
- Derive the transformer architecture from scaled dot-product attention, positional encoding, and residual streams
- Explain self-attention, multi-head attention, and the softmax-based information routing mathematically
- Analyze autoregressive next-token prediction, cross-entropy loss, and the backpropagation that trains it
- Engineer prompts using zero-shot, few-shot, and chain-of-thought reasoning patterns
- Implement retrieval-augmented generation with embeddings, vector search, and grounded context injection
- Construct tool-using agents with the ReAct reasoning and acting loop
- Design evaluation harnesses with task metrics, LLM-as-judge, and regression suites
- Apply guardrails for prompt injection, hallucination, and output safety filtering
- Quantify decoding strategies including temperature, top-k, top-p, and beam search
- Deploy an agent with logging, tracing, and cost-aware orchestration
Key topics
- LLM fundamentals
- Prompting & RAG
- Tool use & agents
- Evaluation & guardrails
Theoretical foundations
The concepts and results this course rests on.
- the transformer and self-attention
- autoregressive next-token prediction and the chain rule of probability
- cross-entropy loss and backpropagation through the network
- in-context learning and the conditioning of token distributions
- dense embeddings and approximate nearest-neighbor retrieval
- the ReAct reasoning and acting loop for agents
- neural scaling laws relating loss to parameters, data, and compute
Prerequisites
Course-specific prerequisites:
- Machine Learning and Deep Learning
- Probability and linear algebra
- Python
Weekly schedule 13 weeks · lecture + practice
Students lean on AI coding assistants throughout, generating and refactoring the PyTorch attention blocks, LangChain wiring, and FAISS retrieval code, then vibe-coding their way from a bare chat loop to a full agent. They use AI to scaffold ReAct tool definitions and MCP server connections, to synthesize few-shot exemplars and labeled evaluation sets, and to draft LLM-as-judge rubrics. AI assistants also help interpret tracing logs and red-team transcripts, turning raw eval output into diagnoses of why an agent hallucinated or failed an injection test.
Student project
Teams build one production-grade LLM agent over the term, growing from a bare chat loop into a grounded, tool-using, guarded, and observable system. Each week adds a capability backed by the theory taught that week, and the agent is continuously evaluated on a held-out task set.
Requirements
- Build a working system, not a set of disconnected exercises.
- Be original: a new system that solves a real problem, not a re-implementation of a tutorial or course demo.
- Show real depth: real data, real users or realistic load, and engineering trade-offs that are measured rather than assumed.
- Carry one running project from specification to a deployed, defensible result across the whole term.
- Work in a team of three or four and defend the design at each of the three presentations (weeks 5, 8, and 13).
Example projects
Assessment & grading
Grading is project-based, with no written exam. Teams of three or four present one running project three times.
| Component | What it covers | Weight |
|---|---|---|
| Project · Specification | Presentation 1 (week 5): problem, objectives, and architecture | 20% |
| Project · Interim | Presentation 2 (week 8): the working system demonstrated live | 30% |
| Project · Final | Presentation 3 (week 13): end-to-end demo with oral defense | 50% |
Tools & platforms
- PyTorch: tensors, autograd, and model implementation
- Hugging Face Transformers: pretrained models and tokenizers
- Hugging Face PEFT: LoRA and parameter-efficient fine-tuning
- LangChain: agent orchestration and tool wiring
- LlamaIndex: retrieval and indexing pipelines
- FAISS: approximate nearest-neighbor vector search
- Chroma: lightweight vector database
- Ollama: local model serving for development
- OpenAI API: hosted frontier models and function calling
- Ragas: RAG evaluation metrics
- LangSmith: tracing, logging, and observability
- Pydantic: structured output and schema validation
Free online courses
Existing free, video-based courses this course can build on, for self-study or as a teaching basis.
- YouTubeStanford CS224N: NLP with Deep Learning (Spring 2024)
- YouTubeStanford CS25: Transformers United (V6)
In Hebrew · בעברית
- Dr. Amos Azaria, Ariel University (YouTube)Deep Learning and NLP - קורס למידה עמוקה ועיבוד שפות טבעיות
- Yoav Goldberg / Ido Dagan (YouTube)עיבוד שפות טבעיות (Natural Language Processing)
- Google Cloud (Coursera)Transformer Models and BERT Model - בעברית
Primary literature
Seminal works to read for graduate-level depth.
References
Books and resources link to an online or publisher page.
- PaperAttention Is All You Need
- PaperBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- PaperRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- PaperReAct: Synergizing Reasoning and Acting in Language Models
- PaperScaling Laws for Neural Language Models
- TextbookSpeech and Language Processing, 3rd edition draft
- TextbookDive into Deep Learning
- DocumentationHugging Face Transformers Documentation
Role in each concentration
| Concentration | Role |
|---|---|
| Intelligent Software Systems | Core · Semester 1 |
| Networking & Cyber Security | Elective |
| AI & Robotics | Core · Semester 1 |
| AI and Quantum Computing for Finance | Core · Semester 1 |
| Immersive Systems & Game Development | Elective |
| Defense Technologies & Autonomous Systems | Elective |