HIT · CS Concentrations

COURSE · AI3

Vision AI: Deep Learning for Computer Vision

בינת ראייה: למידה עמוקה לראייה ממוחשבת

the representation-learning theory of convolutional and transformer models for vision

See the world with deep convolutional and transformer vision

Year 313 weeks2h lecture + 2h practiceProject-based

About this course

Apply deep learning to images and video for recognition, detection, and segmentation tasks.

Course format. Thirteen weeks, four contact hours each: a two-hour lecture (concepts and theory) and a two-hour practice session. The course is project-based; teams carry one running project end to end and present it three times, in weeks 5, 8, and 13.
What you will build

Built an end-to-end computer-vision pipeline in Python with PyTorch, torchvision, and Detectron2, progressing from a residual classifier to an object detector and a Mask R-CNN instance-segmentation system evaluated with mean average precision and intersection over union.

Expected outcomes

  • Derive the convolution operation, receptive fields, and parameter sharing from first principles
  • Explain backpropagation through convolutional and pooling layers
  • Build and train CNN architectures including residual networks
  • Adapt transformers to images with patch embeddings and self-attention
  • Implement object detection with anchors, region proposals, and bounding-box regression
  • Implement semantic and instance segmentation with encoder-decoder networks
  • Apply transfer learning and fine-tuning from pretrained backbones
  • Analyze representation learning, feature hierarchies, and learned invariances
  • Evaluate vision models with mean average precision and intersection over union
  • Deploy a vision pipeline with augmentation, training, and inference

Key topics

  • CNNs & vision transformers
  • Object detection
  • Image segmentation
  • Transfer learning

Theoretical foundations

The concepts and results this course rests on.

  • discrete convolution, receptive fields, and parameter sharing
  • backpropagation through convolutional and pooling layers
  • residual learning and skip connections in very deep networks
  • patch embeddings and self-attention for images
  • anchors, bounding-box regression, and non-max suppression
  • encoder-decoder networks and per-pixel segmentation loss
  • transfer learning and self-supervised pretraining

Prerequisites

This is a Year-3 course. It assumes the mandatory CS core: data structures and algorithms, operating systems, computer networks, databases, software engineering, and the core mathematics (linear algebra, probability and statistics, calculus, discrete mathematics). It additionally requires the specific prior courses listed below.

Course-specific prerequisites:

  • Deep Learning
  • Linear algebra and calculus

Weekly schedule 13 weeks · lecture + practice

Vision foundations
Wk 1
Images, pixels, and learning
LectureWe cover image formation, color spaces, and the shift from hand-crafted features to learned representations.
PracticeLoad an image dataset, build a data pipeline, and train a linear classifier baseline.
ProjectChoose the vision task, dataset, and metrics for the running project.
Wk 2
Convolution and feature maps
LectureWe derive discrete convolution, kernels, stride, padding, receptive fields, and parameter sharing.
PracticeImplement convolution and pooling in NumPy, then verify against PyTorch.
ProjectBuild a small from-scratch CNN classifier for the project dataset.
Wk 3
Training deep CNNs
LectureWe cover backpropagation through convolutions, batch normalization, initialization, and optimization dynamics.
PracticeTrain a deeper CNN with augmentation and learning-rate scheduling.
ProjectImprove the baseline classifier with regularization and augmentation.
Architectures
Wk 4
Residual and modern CNNs
LectureWe derive residual learning, skip connections, and why they ease optimization of very deep networks.
PracticeImplement and train a ResNet and compare with the plain CNN.
ProjectAdopt a residual backbone for the project task.
Wk 5
Vision transformersPresentation
LectureWe adapt attention to images via patch embeddings, class tokens, and positional encoding, and contrast inductive biases with CNNs.
PracticeTeam presentation: each team defends its task specification and evaluation plan.
ProjectLock the specification and prototype a vision transformer backbone.
Transfer learning
Wk 6
Pretraining and fine-tuning
LectureWe cover transfer learning, feature reuse, fine-tuning strategies, and self-supervised pretraining.
PracticeFine-tune a pretrained backbone and compare against training from scratch.
ProjectSwitch the project to a pretrained, fine-tuned backbone.
Detection
Wk 7
Object detection foundations
LectureWe cover anchors, region proposals, bounding-box regression, non-max suppression, and the detection loss.
PracticeTrain a detector on the project images and inspect predicted boxes.
ProjectExtend the project from classification to object detection.
Wk 8
Two-stage and one-stage detectorsPresentation
LectureWe contrast Faster R-CNN with single-stage detectors and analyze the speed-accuracy trade-off.
PracticeTeam presentation: interim demo of detection results with mAP.
ProjectBenchmark a one-stage detector against the two-stage baseline.
Segmentation
Wk 9
Semantic segmentation
LectureWe derive encoder-decoder networks, the U-Net architecture, skip connections, and per-pixel loss.
PracticeTrain a U-Net for semantic segmentation and score with IoU.
ProjectAdd pixel-level segmentation to the project pipeline.
Wk 10
Instance segmentation
LectureWe cover Mask R-CNN, region-of-interest alignment, and joint detection and masking.
PracticeTrain Mask R-CNN and visualize instance masks.
ProjectUpgrade segmentation to instance-level masks.
Representation and robustness
Wk 11
Interpretability and invariance
LectureWe study learned feature hierarchies, saliency, class activation maps, and adversarial robustness.
PracticeGenerate saliency and CAM visualizations and probe adversarial examples.
ProjectAdd interpretability visualizations to the project.
Deployment
Wk 12
Optimization and serving
LectureWe cover quantization, pruning, knowledge distillation, and inference latency.
PracticeOptimize the model with quantization and export it for inference.
ProjectMake the vision pipeline fast and deployable.
Capstone
Wk 13
Final defensePresentation
LectureWe synthesize convolutional and transformer vision and survey open research directions.
PracticeTeam presentation: final demo with metrics and an oral defense of design choices.
ProjectDeliver the complete detection-and-segmentation pipeline with results.
AI tools in this course.

Students use AI assistants to generate and refactor torchvision data pipelines, custom CNN and ViT modules, and Detectron2 and YOLO training configs, vibe-coding from a classifier up to a Mask R-CNN system. They prompt AI to build Albumentations augmentation stacks, synthesize annotation-format converters, and generate unit tests for IoU and mAP computation. AI also helps analyze confusion matrices, saliency maps, and failed detections, explaining why a model misclassifies or where boxes drift.

Student project

Teams build one end-to-end computer-vision pipeline on a chosen image dataset, growing from a classifier into a detector and finally an instance-segmentation system. Each week adds capability backed by the convolution and representation theory taught that week, with continuous evaluation on held-out images.

Requirements

  • Build a working system, not a set of disconnected exercises.
  • Be original: a new system that solves a real problem, not a re-implementation of a tutorial or course demo.
  • Show real depth: real data, real users or realistic load, and engineering trade-offs that are measured rather than assumed.
  • Carry one running project from specification to a deployed, defensible result across the whole term.
  • Work in a team of three or four and defend the design at each of the three presentations (weeks 5, 8, and 13).

Example projects

Retail product detection and countingMedical image segmentationAutonomous-driving scene parsingWildlife camera-trap recognitionDefect detection in manufacturingSatellite land-cover mappingSports player trackingDocument layout analysis

Assessment & grading

Grading is project-based, with no written exam. Teams of three or four present one running project three times.

ComponentWhat it coversWeight
Project · SpecificationPresentation 1 (week 5): problem, objectives, and architecture20%
Project · InterimPresentation 2 (week 8): the working system demonstrated live30%
Project · FinalPresentation 3 (week 13): end-to-end demo with oral defense50%

Tools & platforms

Free online courses

Existing free, video-based courses this course can build on, for self-study or as a teaching basis.

In Hebrew · בעברית

Primary literature

Seminal works to read for graduate-level depth.

References

Books and resources link to an online or publisher page.

Role in each concentration

ConcentrationRole
Intelligent Software SystemsElective
Networking & Cyber SecurityElective
AI & RoboticsCore · Semester 1
AI and Quantum Computing for FinanceElective
Immersive Systems & Game DevelopmentCore · Semester 1
Defense Technologies & Autonomous SystemsCore · Semester 1