About this course
Apply deep learning to images and video for recognition, detection, and segmentation tasks.
Built an end-to-end computer-vision pipeline in Python with PyTorch, torchvision, and Detectron2, progressing from a residual classifier to an object detector and a Mask R-CNN instance-segmentation system evaluated with mean average precision and intersection over union.
Expected outcomes
- Derive the convolution operation, receptive fields, and parameter sharing from first principles
- Explain backpropagation through convolutional and pooling layers
- Build and train CNN architectures including residual networks
- Adapt transformers to images with patch embeddings and self-attention
- Implement object detection with anchors, region proposals, and bounding-box regression
- Implement semantic and instance segmentation with encoder-decoder networks
- Apply transfer learning and fine-tuning from pretrained backbones
- Analyze representation learning, feature hierarchies, and learned invariances
- Evaluate vision models with mean average precision and intersection over union
- Deploy a vision pipeline with augmentation, training, and inference
Key topics
- CNNs & vision transformers
- Object detection
- Image segmentation
- Transfer learning
Theoretical foundations
The concepts and results this course rests on.
- discrete convolution, receptive fields, and parameter sharing
- backpropagation through convolutional and pooling layers
- residual learning and skip connections in very deep networks
- patch embeddings and self-attention for images
- anchors, bounding-box regression, and non-max suppression
- encoder-decoder networks and per-pixel segmentation loss
- transfer learning and self-supervised pretraining
Prerequisites
Course-specific prerequisites:
- Deep Learning
- Linear algebra and calculus
Weekly schedule 13 weeks · lecture + practice
Students use AI assistants to generate and refactor torchvision data pipelines, custom CNN and ViT modules, and Detectron2 and YOLO training configs, vibe-coding from a classifier up to a Mask R-CNN system. They prompt AI to build Albumentations augmentation stacks, synthesize annotation-format converters, and generate unit tests for IoU and mAP computation. AI also helps analyze confusion matrices, saliency maps, and failed detections, explaining why a model misclassifies or where boxes drift.
Student project
Teams build one end-to-end computer-vision pipeline on a chosen image dataset, growing from a classifier into a detector and finally an instance-segmentation system. Each week adds capability backed by the convolution and representation theory taught that week, with continuous evaluation on held-out images.
Requirements
- Build a working system, not a set of disconnected exercises.
- Be original: a new system that solves a real problem, not a re-implementation of a tutorial or course demo.
- Show real depth: real data, real users or realistic load, and engineering trade-offs that are measured rather than assumed.
- Carry one running project from specification to a deployed, defensible result across the whole term.
- Work in a team of three or four and defend the design at each of the three presentations (weeks 5, 8, and 13).
Example projects
Assessment & grading
Grading is project-based, with no written exam. Teams of three or four present one running project three times.
| Component | What it covers | Weight |
|---|---|---|
| Project · Specification | Presentation 1 (week 5): problem, objectives, and architecture | 20% |
| Project · Interim | Presentation 2 (week 8): the working system demonstrated live | 30% |
| Project · Final | Presentation 3 (week 13): end-to-end demo with oral defense | 50% |
Tools & platforms
- PyTorch: model implementation and training
- torchvision: datasets, transforms, and pretrained models
- timm: state-of-the-art image backbones
- Detectron2: detection and segmentation framework
- Ultralytics YOLO: real-time one-stage detection
- Albumentations: image augmentation
- OpenCV: image processing utilities
- Hugging Face Transformers: vision transformer models
- ONNX Runtime: optimized cross-platform inference
- Weights and Biases: experiment tracking
- Grad-CAM: visual explanations
- Roboflow: dataset management and annotation
Free online courses
Existing free, video-based courses this course can build on, for self-study or as a teaching basis.
- YouTubeStanford CS231N: Deep Learning for Computer Vision (Spring 2025)
- YouTubeDeep Learning for Computer Vision, EECS 498-007 (Justin Johnson)
In Hebrew · בעברית
- Dr. Amos Azaria, Ariel University (YouTube)Deep Learning and NLP - קורס למידה עמוקה ועיבוד שפות טבעיות
- Prof. Yossi Keshet (YouTube)למידת מכונה (Machine Learning)
Primary literature
Seminal works to read for graduate-level depth.
References
Books and resources link to an online or publisher page.
- PaperDeep Residual Learning for Image Recognition
- PaperAn Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- PaperU-Net: Convolutional Networks for Biomedical Image Segmentation
- PaperMask R-CNN
- TextbookComputer Vision: Algorithms and Applications, 2nd edition
- TextbookDeep Learning
- TextbookDive into Deep Learning
- DocumentationPyTorch Documentation
Role in each concentration
| Concentration | Role |
|---|---|
| Intelligent Software Systems | Elective |
| Networking & Cyber Security | Elective |
| AI & Robotics | Core · Semester 1 |
| AI and Quantum Computing for Finance | Elective |
| Immersive Systems & Game Development | Core · Semester 1 |
| Defense Technologies & Autonomous Systems | Core · Semester 1 |