HIT · CS Concentrations

COURSE · AI1

Language AI: LLMs and Agentic Systems

בינת שפה: מודלי שפה גדולים ומערכות סוכניות

the mathematical theory of attention, autoregressive language models, and tool-using agents

Build, ground, and guard production LLM agents

Year 313 weeks2h lecture + 2h practiceProject-based

About this course

Build applications on large language models and autonomous agents, covering prompting, tool use, retrieval, and orchestration of multi-step reasoning.

Course format. Thirteen weeks, four contact hours each: a two-hour lecture (concepts and theory) and a two-hour practice session. The course is project-based; teams carry one running project end to end and present it three times, in weeks 5, 8, and 13.
What you will build

Built a grounded, tool-using LLM agent in Python with PyTorch, Hugging Face Transformers, LangChain, and FAISS, adding retrieval-augmented generation, a ReAct planning loop, safety guardrails, and an LLM-as-judge evaluation suite.

Expected outcomes

  • Derive the transformer architecture from scaled dot-product attention, positional encoding, and residual streams
  • Explain self-attention, multi-head attention, and the softmax-based information routing mathematically
  • Analyze autoregressive next-token prediction, cross-entropy loss, and the backpropagation that trains it
  • Engineer prompts using zero-shot, few-shot, and chain-of-thought reasoning patterns
  • Implement retrieval-augmented generation with embeddings, vector search, and grounded context injection
  • Construct tool-using agents with the ReAct reasoning and acting loop
  • Design evaluation harnesses with task metrics, LLM-as-judge, and regression suites
  • Apply guardrails for prompt injection, hallucination, and output safety filtering
  • Quantify decoding strategies including temperature, top-k, top-p, and beam search
  • Deploy an agent with logging, tracing, and cost-aware orchestration

Key topics

  • LLM fundamentals
  • Prompting & RAG
  • Tool use & agents
  • Evaluation & guardrails

Theoretical foundations

The concepts and results this course rests on.

  • the transformer and self-attention
  • autoregressive next-token prediction and the chain rule of probability
  • cross-entropy loss and backpropagation through the network
  • in-context learning and the conditioning of token distributions
  • dense embeddings and approximate nearest-neighbor retrieval
  • the ReAct reasoning and acting loop for agents
  • neural scaling laws relating loss to parameters, data, and compute

Prerequisites

This is a Year-3 course. It assumes the mandatory CS core: data structures and algorithms, operating systems, computer networks, databases, software engineering, and the core mathematics (linear algebra, probability and statistics, calculus, discrete mathematics). It additionally requires the specific prior courses listed below.

Course-specific prerequisites:

  • Machine Learning and Deep Learning
  • Probability and linear algebra
  • Python

Weekly schedule 13 weeks · lecture + practice

Foundations
Wk 1
From language modeling to transformers
LectureWe formalize the language modeling objective, the chain rule of probability over tokens, and tokenization, then motivate why recurrence gives way to attention.
PracticeSet up the project repo, load a pretrained model from Hugging Face, and run greedy and sampled generation.
ProjectInitialize the team chatbot scaffold with a baseline LLM and a chat loop.
Wk 2
Attention and the transformer block
LectureWe derive scaled dot-product attention, multi-head attention, positional encodings, and the full encoder-decoder block with residuals and layer norm.
PracticeImplement a single attention head and a transformer block in PyTorch and verify shapes against a reference.
ProjectAdd a from-scratch attention visualization tool to the chatbot for explainability.
Wk 3
Training, scaling, and emergent behavior
LectureWe cover pretraining, cross-entropy loss, backpropagation through the transformer, and scaling laws relating loss to parameters, data, and compute.
PracticeFine-tune a small model with LoRA on a domain corpus and chart the loss curve.
ProjectSpecialize the chatbot to a chosen domain via lightweight fine-tuning.
Prompting and retrieval
Wk 4
Prompt engineering and in-context learning
LectureWe analyze in-context learning, few-shot demonstrations, chain-of-thought, and the theory of why prompts steer conditional distributions.
PracticeBuild a prompt-template library and run systematic A/B comparisons on a task set.
ProjectAdd structured prompting and few-shot exemplars to improve answer quality.
Wk 5
Embeddings and vector retrievalPresentation
LectureWe study dense embeddings, cosine similarity, approximate nearest-neighbor indexing, and the bias-variance trade-offs of chunking.
PracticeTeam presentation: each team defends its project specification, target domain, and evaluation plan.
ProjectLock the project specification and stand up an embedding index over the knowledge base.
Wk 6
Retrieval-augmented generation
LectureWe formalize RAG as marginalizing over retrieved passages, and cover reranking, context-window budgeting, and grounding faithfulness.
PracticeWire a retriever to the generator and measure grounded versus ungrounded answer accuracy.
ProjectConvert the chatbot into a grounded RAG assistant over your corpus.
Agents and tools
Wk 7
Tool use and function calling
LectureWe cover structured output, JSON schemas, function calling, and how tool results extend a model beyond its parametric knowledge.
PracticeExpose calculator, search, and code tools and let the model invoke them via function calling.
ProjectGive the assistant tool access so it can act, not just answer.
Wk 8
Agentic loops and planningPresentation
LectureWe derive the ReAct loop, reasoning-acting interleaving, planning, memory, and multi-step task decomposition.
PracticeTeam presentation: interim demo of the agent executing a multi-step task end to end.
ProjectUpgrade to a ReAct agent that plans and chains tool calls toward a goal.
Wk 9
Memory, multi-agent, and orchestration
LectureWe study short and long-term memory, agent state, and coordination patterns across multiple specialized agents.
PracticeAdd persistent memory and a second specialized agent that collaborates with the first.
ProjectIntroduce memory and a helper agent to handle complex requests.
Evaluation and guardrails
Wk 10
Evaluation of LLM systems
LectureWe cover task metrics, reference-based and reference-free evaluation, LLM-as-judge, and the statistics of small-sample comparisons.
PracticeBuild an automated eval suite with a labeled set and an LLM-judge scorer.
ProjectAdd a regression test suite that scores every agent change.
Wk 11
Safety, alignment, and guardrails
LectureWe cover RLHF and preference optimization conceptually, prompt injection, jailbreaks, and content filtering.
PracticeRed-team the agent for injection and add input and output guardrails.
ProjectHarden the agent with safety filters and injection defenses.
Deployment
Wk 12
Serving, cost, and observability
LectureWe cover quantization, KV-cache, batching, latency-throughput trade-offs, tracing, and cost accounting.
PracticeAdd tracing and cost logging, then optimize latency with caching and batching.
ProjectMake the agent observable, cost-aware, and ready to serve.
Capstone
Wk 13
Final defensePresentation
LectureWe synthesize the full stack from attention theory to deployed agentic systems and survey open research directions.
PracticeTeam presentation: final demo with live evaluation and an oral defense of design choices.
ProjectDeliver the complete grounded, tool-using, guarded agent with evaluation results.
AI tools in this course.

Students lean on AI coding assistants throughout, generating and refactoring the PyTorch attention blocks, LangChain wiring, and FAISS retrieval code, then vibe-coding their way from a bare chat loop to a full agent. They use AI to scaffold ReAct tool definitions and MCP server connections, to synthesize few-shot exemplars and labeled evaluation sets, and to draft LLM-as-judge rubrics. AI assistants also help interpret tracing logs and red-team transcripts, turning raw eval output into diagnoses of why an agent hallucinated or failed an injection test.

Student project

Teams build one production-grade LLM agent over the term, growing from a bare chat loop into a grounded, tool-using, guarded, and observable system. Each week adds a capability backed by the theory taught that week, and the agent is continuously evaluated on a held-out task set.

Requirements

  • Build a working system, not a set of disconnected exercises.
  • Be original: a new system that solves a real problem, not a re-implementation of a tutorial or course demo.
  • Show real depth: real data, real users or realistic load, and engineering trade-offs that are measured rather than assumed.
  • Carry one running project from specification to a deployed, defensible result across the whole term.
  • Work in a team of three or four and defend the design at each of the three presentations (weeks 5, 8, and 13).

Example projects

Research-paper Q and A assistantCodebase navigator agentCustomer-support RAG botLegal or policy document advisorPersonal data analyst agentTravel-planning tool agentMedical-literature triage assistantFinancial-report summarizer

Assessment & grading

Grading is project-based, with no written exam. Teams of three or four present one running project three times.

ComponentWhat it coversWeight
Project · SpecificationPresentation 1 (week 5): problem, objectives, and architecture20%
Project · InterimPresentation 2 (week 8): the working system demonstrated live30%
Project · FinalPresentation 3 (week 13): end-to-end demo with oral defense50%

Tools & platforms

Free online courses

Existing free, video-based courses this course can build on, for self-study or as a teaching basis.

In Hebrew · בעברית

Primary literature

Seminal works to read for graduate-level depth.

References

Books and resources link to an online or publisher page.

Role in each concentration

ConcentrationRole
Intelligent Software SystemsCore · Semester 1
Networking & Cyber SecurityElective
AI & RoboticsCore · Semester 1
AI and Quantum Computing for FinanceCore · Semester 1
Immersive Systems & Game DevelopmentElective
Defense Technologies & Autonomous SystemsElective