AI3: Vision AI: Deep Learning for Computer Vision

About this course

Apply deep learning to images and video for recognition, detection, and segmentation tasks.

Course format. Thirteen weeks, four contact hours each: a two-hour lecture (concepts and theory) and a two-hour practice session. The course is project-based; teams carry one running project end to end and present it three times, in weeks 5, 8, and 13.

What you will build

Built an end-to-end computer-vision pipeline in Python with PyTorch, torchvision, and Detectron2, progressing from a residual classifier to an object detector and a Mask R-CNN instance-segmentation system evaluated with mean average precision and intersection over union.

Expected outcomes

Derive the convolution operation, receptive fields, and parameter sharing from first principles
Explain backpropagation through convolutional and pooling layers
Build and train CNN architectures including residual networks
Adapt transformers to images with patch embeddings and self-attention
Implement object detection with anchors, region proposals, and bounding-box regression
Implement semantic and instance segmentation with encoder-decoder networks
Apply transfer learning and fine-tuning from pretrained backbones
Analyze representation learning, feature hierarchies, and learned invariances
Evaluate vision models with mean average precision and intersection over union
Deploy a vision pipeline with augmentation, training, and inference

Key topics

CNNs & vision transformers
Object detection
Image segmentation
Transfer learning

Theoretical foundations

The concepts and results this course rests on.

discrete convolution, receptive fields, and parameter sharing
backpropagation through convolutional and pooling layers
residual learning and skip connections in very deep networks
patch embeddings and self-attention for images
anchors, bounding-box regression, and non-max suppression
encoder-decoder networks and per-pixel segmentation loss
transfer learning and self-supervised pretraining

Prerequisites

This is a Year-3 course. It assumes the mandatory CS core: data structures and algorithms, operating systems, computer networks, databases, software engineering, and the core mathematics (linear algebra, probability and statistics, calculus, discrete mathematics). It additionally requires the specific prior courses listed below.

Course-specific prerequisites:

Deep Learning
Linear algebra and calculus

Weekly schedule 13 weeks · lecture + practice

Vision foundations

Wk 1

Images, pixels, and learning

LectureWe cover image formation, color spaces, and the shift from hand-crafted features to learned representations.

PracticeLoad an image dataset, build a data pipeline, and train a linear classifier baseline.

ProjectChoose the vision task, dataset, and metrics for the running project.

WatchStanford CS231N Lecture 1: Introduction · UMich EECS 498 Lecture 1: Intro to DL for Computer Vision

Wk 2

Convolution and feature maps

LectureWe derive discrete convolution, kernels, stride, padding, receptive fields, and parameter sharing.

PracticeImplement convolution and pooling in NumPy, then verify against PyTorch.

ProjectBuild a small from-scratch CNN classifier for the project dataset.

WatchUMich EECS 498 Lecture 7: Convolutional Networks

Wk 3

Training deep CNNs

LectureWe cover backpropagation through convolutions, batch normalization, initialization, and optimization dynamics.

PracticeTrain a deeper CNN with augmentation and learning-rate scheduling.

ProjectImprove the baseline classifier with regularization and augmentation.

WatchStanford CS231N Lecture 4: Neural Networks and Backpropagation · UMich EECS 498 Lecture 6: Backpropagation · UMich EECS 498 Lecture 10: Training Neural Networks Part 1

Architectures

Wk 4

Residual and modern CNNs

LectureWe derive residual learning, skip connections, and why they ease optimization of very deep networks.

PracticeImplement and train a ResNet and compare with the plain CNN.

ProjectAdopt a residual backbone for the project task.

WatchStanford CS231N Lecture 6: CNN Architectures · UMich EECS 498 Lecture 8: CNN Architectures

Wk 5

Vision transformersPresentation

LectureWe adapt attention to images via patch embeddings, class tokens, and positional encoding, and contrast inductive biases with CNNs.

PracticeTeam presentation: each team defends its task specification and evaluation plan.

ProjectLock the specification and prototype a vision transformer backbone.

WatchStanford CS231N Lecture 8: Attention and Transformers · UMich EECS 498 Lecture 13: Attention

Transfer learning

Wk 6

Pretraining and fine-tuning

LectureWe cover transfer learning, feature reuse, fine-tuning strategies, and self-supervised pretraining.

PracticeFine-tune a pretrained backbone and compare against training from scratch.

ProjectSwitch the project to a pretrained, fine-tuned backbone.

WatchStanford CS231N Lecture 12: Self-Supervised Learning

Detection

Wk 7

Object detection foundations

LectureWe cover anchors, region proposals, bounding-box regression, non-max suppression, and the detection loss.

PracticeTrain a detector on the project images and inspect predicted boxes.

ProjectExtend the project from classification to object detection.

WatchUMich EECS 498 Lecture 15: Object Detection

Wk 8

Two-stage and one-stage detectorsPresentation

LectureWe contrast Faster R-CNN with single-stage detectors and analyze the speed-accuracy trade-off.

PracticeTeam presentation: interim demo of detection results with mAP.

ProjectBenchmark a one-stage detector against the two-stage baseline.

WatchUMich EECS 498 Lecture 16: Detection and Segmentation

Segmentation

Wk 9

Semantic segmentation

LectureWe derive encoder-decoder networks, the U-Net architecture, skip connections, and per-pixel loss.

PracticeTrain a U-Net for semantic segmentation and score with IoU.

ProjectAdd pixel-level segmentation to the project pipeline.

WatchStanford CS231N Lecture 9: Detection, Segmentation, Visualizing · UMich EECS 498 Lecture 16: Detection and Segmentation

Wk 10

Instance segmentation

LectureWe cover Mask R-CNN, region-of-interest alignment, and joint detection and masking.

PracticeTrain Mask R-CNN and visualize instance masks.

ProjectUpgrade segmentation to instance-level masks.

WatchUMich EECS 498 Lecture 16: Detection and Segmentation

Representation and robustness

Wk 11

Interpretability and invariance

LectureWe study learned feature hierarchies, saliency, class activation maps, and adversarial robustness.

PracticeGenerate saliency and CAM visualizations and probe adversarial examples.

ProjectAdd interpretability visualizations to the project.

WatchUMich EECS 498 Lecture 14: Visualizing and Understanding · Stanford CS231N Lecture 9: Detection, Segmentation, Visualizing

Deployment

Wk 12

Optimization and serving

LectureWe cover quantization, pruning, knowledge distillation, and inference latency.

PracticeOptimize the model with quantization and export it for inference.

ProjectMake the vision pipeline fast and deployable.

WatchStanford CS231N Lecture 11: Large-Scale Distributed Training

Capstone

Wk 13

Final defensePresentation

LectureWe synthesize convolutional and transformer vision and survey open research directions.

PracticeTeam presentation: final demo with metrics and an oral defense of design choices.

ProjectDeliver the complete detection-and-segmentation pipeline with results.

AI tools in this course.

Students use AI assistants to generate and refactor torchvision data pipelines, custom CNN and ViT modules, and Detectron2 and YOLO training configs, vibe-coding from a classifier up to a Mask R-CNN system. They prompt AI to build Albumentations augmentation stacks, synthesize annotation-format converters, and generate unit tests for IoU and mAP computation. AI also helps analyze confusion matrices, saliency maps, and failed detections, explaining why a model misclassifies or where boxes drift.

Student project

Teams build one end-to-end computer-vision pipeline on a chosen image dataset, growing from a classifier into a detector and finally an instance-segmentation system. Each week adds capability backed by the convolution and representation theory taught that week, with continuous evaluation on held-out images.

Requirements

Build a working system, not a set of disconnected exercises.
Be original: a new system that solves a real problem, not a re-implementation of a tutorial or course demo.
Show real depth: real data, real users or realistic load, and engineering trade-offs that are measured rather than assumed.
Carry one running project from specification to a deployed, defensible result across the whole term.
Work in a team of three or four and defend the design at each of the three presentations (weeks 5, 8, and 13).

Example projects

Retail product detection and countingMedical image segmentationAutonomous-driving scene parsingWildlife camera-trap recognitionDefect detection in manufacturingSatellite land-cover mappingSports player trackingDocument layout analysis

Assessment & grading

Grading is project-based, with no written exam. Teams of three or four present one running project three times.

Component	What it covers	Weight
Project · Specification	Presentation 1 (week 5): problem, objectives, and architecture	20%
Project · Interim	Presentation 2 (week 8): the working system demonstrated live	30%
Project · Final	Presentation 3 (week 13): end-to-end demo with oral defense	50%

Tools & platforms

PyTorch: model implementation and training
torchvision: datasets, transforms, and pretrained models
timm: state-of-the-art image backbones
Detectron2: detection and segmentation framework
Ultralytics YOLO: real-time one-stage detection
Albumentations: image augmentation
OpenCV: image processing utilities
Hugging Face Transformers: vision transformer models
ONNX Runtime: optimized cross-platform inference
Weights and Biases: experiment tracking
Grad-CAM: visual explanations
Roboflow: dataset management and annotation

Free online courses

Existing free, video-based courses this course can build on, for self-study or as a teaching basis.

YouTubeStanford CS231N: Deep Learning for Computer Vision (Spring 2025)
Latest CS231n lectures, CNNs and vision
YouTubeDeep Learning for Computer Vision, EECS 498-007 (Justin Johnson)
Michigan course, full CNN and vision lectures

In Hebrew · בעברית

Dr. Amos Azaria, Ariel University (YouTube)Deep Learning and NLP - קורס למידה עמוקה ועיבוד שפות טבעיות
Hebrew-spoken deep learning course covering neural networks and representation learning used in computer vision.
Prof. Yossi Keshet (YouTube)למידת מכונה (Machine Learning)
Hebrew-spoken machine learning lecture series foundational to deep learning and vision.

Primary literature

Seminal works to read for graduate-level depth.

PaperDeep Residual Learning for Image Recognition
He, Zhang, Ren, Sun, 2015
PaperAn Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dosovitskiy, Beyer, Kolesnikov, Weissenborn, Zhai, Unterthiner, Dehghani, Minderer, Heigold, Gelly, Uszkoreit, Houlsby, 2020
PaperU-Net: Convolutional Networks for Biomedical Image Segmentation
Ronneberger, Fischer, Brox, 2015
PaperMask R-CNN
He, Gkioxari, Dollar, Girshick, 2017
PaperYou Only Look Once: Unified, Real-Time Object Detection
Redmon, Divvala, Girshick, Farhadi, 2015

References

Books and resources link to an online or publisher page.

PaperDeep Residual Learning for Image Recognition
He, Zhang, Ren, Sun, 2015
PaperAn Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Dosovitskiy, Beyer, Kolesnikov, Weissenborn, Zhai, Unterthiner, Dehghani, Minderer, Heigold, Gelly, Uszkoreit, Houlsby, 2020
PaperU-Net: Convolutional Networks for Biomedical Image Segmentation
Ronneberger, Fischer, Brox, 2015
PaperMask R-CNN
He, Gkioxari, Dollar, Girshick, 2017
TextbookComputer Vision: Algorithms and Applications, 2nd edition
Szeliski, 2022
TextbookDeep Learning
Goodfellow, Bengio, Courville, 2016
TextbookDive into Deep Learning
Zhang, Lipton, Li, Smola, 2023
DocumentationPyTorch Documentation
PyTorch Foundation, 2026

Role in each concentration

Concentration	Role
Intelligent Software Systems	Elective
Networking & Cyber Security	Elective
AI & Robotics	Core · Semester 1
AI and Quantum Computing for Finance	Elective
Immersive Systems & Game Development	Core · Semester 1
Defense Technologies & Autonomous Systems	Core · Semester 1

← AI2 · Temporal AI: Time Series & Sequential Decision Making AI4 · Scalable AI: Big-Data Algorithms →