AI1: Language AI: LLMs and Agentic Systems

About this course

Build applications on large language models and autonomous agents, covering prompting, tool use, retrieval, and orchestration of multi-step reasoning.

Course format. Thirteen weeks, four contact hours each: a two-hour lecture (concepts and theory) and a two-hour practice session. The course is project-based; teams carry one running project end to end and present it three times, in weeks 5, 8, and 13.

What you will build

Built a grounded, tool-using LLM agent in Python with PyTorch, Hugging Face Transformers, LangChain, and FAISS, adding retrieval-augmented generation, a ReAct planning loop, safety guardrails, and an LLM-as-judge evaluation suite.

Expected outcomes

Derive the transformer architecture from scaled dot-product attention, positional encoding, and residual streams
Explain self-attention, multi-head attention, and the softmax-based information routing mathematically
Analyze autoregressive next-token prediction, cross-entropy loss, and the backpropagation that trains it
Engineer prompts using zero-shot, few-shot, and chain-of-thought reasoning patterns
Implement retrieval-augmented generation with embeddings, vector search, and grounded context injection
Construct tool-using agents with the ReAct reasoning and acting loop
Design evaluation harnesses with task metrics, LLM-as-judge, and regression suites
Apply guardrails for prompt injection, hallucination, and output safety filtering
Quantify decoding strategies including temperature, top-k, top-p, and beam search
Deploy an agent with logging, tracing, and cost-aware orchestration

Key topics

LLM fundamentals
Prompting & RAG
Tool use & agents
Evaluation & guardrails

Theoretical foundations

The concepts and results this course rests on.

the transformer and self-attention
autoregressive next-token prediction and the chain rule of probability
cross-entropy loss and backpropagation through the network
in-context learning and the conditioning of token distributions
dense embeddings and approximate nearest-neighbor retrieval
the ReAct reasoning and acting loop for agents
neural scaling laws relating loss to parameters, data, and compute

Prerequisites

This is a Year-3 course. It assumes the mandatory CS core: data structures and algorithms, operating systems, computer networks, databases, software engineering, and the core mathematics (linear algebra, probability and statistics, calculus, discrete mathematics). It additionally requires the specific prior courses listed below.

Course-specific prerequisites:

Machine Learning and Deep Learning
Probability and linear algebra
Python

Weekly schedule 13 weeks · lecture + practice

Foundations

Wk 1

From language modeling to transformers

LectureWe formalize the language modeling objective, the chain rule of probability over tokens, and tokenization, then motivate why recurrence gives way to attention.

PracticeSet up the project repo, load a pretrained model from Hugging Face, and run greedy and sampled generation.

ProjectInitialize the team chatbot scaffold with a baseline LLM and a chat loop.

WatchCS224N Lecture 5: Recurrent Neural Networks · Transformers, the tech behind LLMs (3Blue1Brown) · Let's build GPT: from scratch, in code (Karpathy)

Wk 2

Attention and the transformer block

LectureWe derive scaled dot-product attention, multi-head attention, positional encodings, and the full encoder-decoder block with residuals and layer norm.

PracticeImplement a single attention head and a transformer block in PyTorch and verify shapes against a reference.

ProjectAdd a from-scratch attention visualization tool to the chatbot for explainability.

WatchCS224N Lecture 7: Attention and LLM Intro · Attention in transformers, step-by-step (3Blue1Brown) · Stanford CS25: Overview of Transformers

Wk 3

Training, scaling, and emergent behavior

LectureWe cover pretraining, cross-entropy loss, backpropagation through the transformer, and scaling laws relating loss to parameters, data, and compute.

PracticeFine-tune a small model with LoRA on a domain corpus and chart the loss curve.

ProjectSpecialize the chatbot to a chosen domain via lightweight fine-tuning.

WatchCS224N Lecture 3: Backpropagation and Neural Networks

Prompting and retrieval

Wk 4

Prompt engineering and in-context learning

LectureWe analyze in-context learning, few-shot demonstrations, chain-of-thought, and the theory of why prompts steer conditional distributions.

PracticeBuild a prompt-template library and run systematic A/B comparisons on a task set.

ProjectAdd structured prompting and few-shot exemplars to improve answer quality.

WatchCS224N Lecture 10: Post-training

Wk 5

Embeddings and vector retrievalPresentation

LectureWe study dense embeddings, cosine similarity, approximate nearest-neighbor indexing, and the bias-variance trade-offs of chunking.

PracticeTeam presentation: each team defends its project specification, target domain, and evaluation plan.

ProjectLock the project specification and stand up an embedding index over the knowledge base.

Wk 6

Retrieval-augmented generation

LectureWe formalize RAG as marginalizing over retrieved passages, and cover reranking, context-window budgeting, and grounding faithfulness.

PracticeWire a retriever to the generator and measure grounded versus ungrounded answer accuracy.

ProjectConvert the chatbot into a grounded RAG assistant over your corpus.

WatchStanford CS25: Retrieval Augmented Language Models

Agents and tools

Wk 7

Tool use and function calling

LectureWe cover structured output, JSON schemas, function calling, and how tool results extend a model beyond its parametric knowledge.

PracticeExpose calculator, search, and code tools and let the model invoke them via function calling.

ProjectGive the assistant tool access so it can act, not just answer.

Wk 8

Agentic loops and planningPresentation

LectureWe derive the ReAct loop, reasoning-acting interleaving, planning, memory, and multi-step task decomposition.

PracticeTeam presentation: interim demo of the agent executing a multi-step task end to end.

ProjectUpgrade to a ReAct agent that plans and chains tool calls toward a goal.

WatchStanford CS25: Generalist Agents in Open-Ended Worlds

Wk 9

Memory, multi-agent, and orchestration

LectureWe study short and long-term memory, agent state, and coordination patterns across multiple specialized agents.

PracticeAdd persistent memory and a second specialized agent that collaborates with the first.

ProjectIntroduce memory and a helper agent to handle complex requests.

WatchStanford CS25: Collaborative AI Agents

Evaluation and guardrails

Wk 10

Evaluation of LLM systems

LectureWe cover task metrics, reference-based and reference-free evaluation, LLM-as-judge, and the statistics of small-sample comparisons.

PracticeBuild an automated eval suite with a labeled set and an LLM-judge scorer.

ProjectAdd a regression test suite that scores every agent change.

WatchCS224N Lecture 11: Benchmarking and Evaluation

Wk 11

Safety, alignment, and guardrails

LectureWe cover RLHF and preference optimization conceptually, prompt injection, jailbreaks, and content filtering.

PracticeRed-team the agent for injection and add input and output guardrails.

ProjectHarden the agent with safety filters and injection defenses.

WatchStanford CS25: Aligning Open Language Models

Deployment

Wk 12

Serving, cost, and observability

LectureWe cover quantization, KV-cache, batching, latency-throughput trade-offs, tracing, and cost accounting.

PracticeAdd tracing and cost logging, then optimize latency with caching and batching.

ProjectMake the agent observable, cost-aware, and ready to serve.

WatchStanford CS25: Serving Transformers, Lessons from the Trenches

Capstone

Wk 13

Final defensePresentation

LectureWe synthesize the full stack from attention theory to deployed agentic systems and survey open research directions.

PracticeTeam presentation: final demo with live evaluation and an oral defense of design choices.

ProjectDeliver the complete grounded, tool-using, guarded agent with evaluation results.

WatchStanford CS25: Overview of Transformers

AI tools in this course.

Students lean on AI coding assistants throughout, generating and refactoring the PyTorch attention blocks, LangChain wiring, and FAISS retrieval code, then vibe-coding their way from a bare chat loop to a full agent. They use AI to scaffold ReAct tool definitions and MCP server connections, to synthesize few-shot exemplars and labeled evaluation sets, and to draft LLM-as-judge rubrics. AI assistants also help interpret tracing logs and red-team transcripts, turning raw eval output into diagnoses of why an agent hallucinated or failed an injection test.

Student project

Teams build one production-grade LLM agent over the term, growing from a bare chat loop into a grounded, tool-using, guarded, and observable system. Each week adds a capability backed by the theory taught that week, and the agent is continuously evaluated on a held-out task set.

Requirements

Build a working system, not a set of disconnected exercises.
Be original: a new system that solves a real problem, not a re-implementation of a tutorial or course demo.
Show real depth: real data, real users or realistic load, and engineering trade-offs that are measured rather than assumed.
Carry one running project from specification to a deployed, defensible result across the whole term.
Work in a team of three or four and defend the design at each of the three presentations (weeks 5, 8, and 13).

Example projects

Research-paper Q and A assistantCodebase navigator agentCustomer-support RAG botLegal or policy document advisorPersonal data analyst agentTravel-planning tool agentMedical-literature triage assistantFinancial-report summarizer

Assessment & grading

Grading is project-based, with no written exam. Teams of three or four present one running project three times.

Component	What it covers	Weight
Project · Specification	Presentation 1 (week 5): problem, objectives, and architecture	20%
Project · Interim	Presentation 2 (week 8): the working system demonstrated live	30%
Project · Final	Presentation 3 (week 13): end-to-end demo with oral defense	50%

Tools & platforms

PyTorch: tensors, autograd, and model implementation
Hugging Face Transformers: pretrained models and tokenizers
Hugging Face PEFT: LoRA and parameter-efficient fine-tuning
LangChain: agent orchestration and tool wiring
LlamaIndex: retrieval and indexing pipelines
FAISS: approximate nearest-neighbor vector search
Chroma: lightweight vector database
Ollama: local model serving for development
OpenAI API: hosted frontier models and function calling
Ragas: RAG evaluation metrics
LangSmith: tracing, logging, and observability
Pydantic: structured output and schema validation

Free online courses

Existing free, video-based courses this course can build on, for self-study or as a teaching basis.

YouTubeStanford CS224N: NLP with Deep Learning (Spring 2024)
Comprehensive: word vectors to LLMs, Chris Manning
YouTubeStanford CS25: Transformers United (V6)
Seminar on transformers, LLMs, latest research

In Hebrew · בעברית

Dr. Amos Azaria, Ariel University (YouTube)Deep Learning and NLP - קורס למידה עמוקה ועיבוד שפות טבעיות
Full Hebrew-spoken university course covering deep learning and natural language processing, including sequence models and language modeling.
Yoav Goldberg / Ido Dagan (YouTube)עיבוד שפות טבעיות (Natural Language Processing)
Hebrew-spoken NLP lecture series covering word representations, sequence models and core language processing methods.
Google Cloud (Coursera)Transformer Models and BERT Model - בעברית
Hebrew-narrated course on the transformer architecture, attention and BERT; free to audit.

Primary literature

Seminal works to read for graduate-level depth.

PaperAttention Is All You Need
Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin, 2017
PaperBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Devlin, Chang, Lee, Toutanova, 2018
PaperRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Lewis, Perez, Piktus, Petroni, Karpukhin, Goyal, Kuttler, Lewis, Yih, Rocktaschel, Riedel, Kiela, 2020
PaperReAct: Synergizing Reasoning and Acting in Language Models
Yao, Zhao, Yu, Du, Shafran, Narasimhan, Cao, 2022
PaperScaling Laws for Neural Language Models
Kaplan, McCandlish, Henighan, Brown, Chess, Child, Gray, Radford, Wu, Amodei, 2020

References

Books and resources link to an online or publisher page.

PaperAttention Is All You Need
Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin, 2017
PaperBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Devlin, Chang, Lee, Toutanova, 2018
PaperRetrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Lewis, Perez, Piktus, Petroni, Karpukhin, Goyal, Kuttler, Lewis, Yih, Rocktaschel, Riedel, Kiela, 2020
PaperReAct: Synergizing Reasoning and Acting in Language Models
Yao, Zhao, Yu, Du, Shafran, Narasimhan, Cao, 2022
PaperScaling Laws for Neural Language Models
Kaplan, McCandlish, Henighan, Brown, Chess, Child, Gray, Radford, Wu, Amodei, 2020
TextbookSpeech and Language Processing, 3rd edition draft
Jurafsky, Martin, 2026
TextbookDive into Deep Learning
Zhang, Lipton, Li, Smola, 2023
DocumentationHugging Face Transformers Documentation
Hugging Face, 2026

Role in each concentration

Concentration	Role
Intelligent Software Systems	Core · Semester 1
Networking & Cyber Security	Elective
AI & Robotics	Core · Semester 1
AI and Quantum Computing for Finance	Core · Semester 1
Immersive Systems & Game Development	Elective
Defense Technologies & Autonomous Systems	Elective

← CY5 · Blockchain & Decentralized Systems AI2 · Temporal AI: Time Series & Sequential Decision Making →