About this course
Model data that unfolds over time and learn to make sequential decisions under uncertainty.
Built an end-to-end forecasting and control agent in Python with PyTorch, statsmodels, and Gymnasium, coupling a neural sequence forecaster to a deep reinforcement-learning controller trained with DQN and PPO inside a simulated environment.
Expected outcomes
- Formalize stochastic processes, stationarity, and autocorrelation for temporal data
- Derive classical forecasting models including ARIMA and exponential smoothing
- Build sequence models with RNNs, LSTMs, and temporal transformers
- Define Markov decision processes with states, actions, rewards, and discounting
- Derive the Bellman equations and prove value and policy iteration convergence
- Implement value-based and policy-gradient reinforcement learning algorithms
- Quantify uncertainty with predictive intervals and Bayesian decision theory
- Evaluate forecasts and policies with proper scoring rules and backtesting
- Apply exploration-exploitation strategies including epsilon-greedy and bandits
- Deploy a sequential decision agent in a simulated environment
Key topics
- Time-series forecasting
- Sequence models
- Reinforcement learning
- Decision under uncertainty
Theoretical foundations
The concepts and results this course rests on.
- stochastic processes, stationarity, and autocorrelation
- ARIMA and exponential smoothing with maximum-likelihood estimation
- recurrent networks and backpropagation through time
- Markov decision processes and the Bellman optimality equations
- temporal-difference learning and the policy-gradient theorem
- multi-armed bandits, regret bounds, and Thompson sampling
- Bayesian decision theory and expected-utility maximization
Prerequisites
Course-specific prerequisites:
- Machine Learning
- Probability and statistics
- Linear algebra
Weekly schedule 13 weeks · lecture + practice
Students use AI assistants to generate and refactor the statsmodels ARIMA baselines, the LSTM and temporal-transformer forecasters, and the Gymnasium environment and reward code, vibe-coding the DQN and PPO training loops. They prompt AI to synthesize simulated episodes and edge-case series, to wire reward shaping and replay buffers, and to set up Optuna sweeps. AI also helps read backtest plots and learning curves, explaining why a policy diverged or a forecast lost calibration.
Student project
Teams build one temporal decision agent that first forecasts a real time series and then acts on those forecasts inside a simulated environment. The project grows from classical baselines through neural sequence models into a reinforcement-learning controller, ending with an integrated forecast-driven decision policy.
Requirements
- Build a working system, not a set of disconnected exercises.
- Be original: a new system that solves a real problem, not a re-implementation of a tutorial or course demo.
- Show real depth: real data, real users or realistic load, and engineering trade-offs that are measured rather than assumed.
- Carry one running project from specification to a deployed, defensible result across the whole term.
- Work in a team of three or four and defend the design at each of the three presentations (weeks 5, 8, and 13).
Example projects
Assessment & grading
Grading is project-based, with no written exam. Teams of three or four present one running project three times.
| Component | What it covers | Weight |
|---|---|---|
| Project · Specification | Presentation 1 (week 5): problem, objectives, and architecture | 20% |
| Project · Interim | Presentation 2 (week 8): the working system demonstrated live | 30% |
| Project · Final | Presentation 3 (week 13): end-to-end demo with oral defense | 50% |
Tools & platforms
- PyTorch: sequence model implementation
- statsmodels: ARIMA and classical time-series models
- sktime: unified time-series forecasting API
- Prophet: decomposable trend and seasonality forecasting
- Gymnasium: reinforcement-learning environments
- Stable-Baselines3: reference RL algorithm implementations
- Ray RLlib: scalable distributed RL
- NumPy: numerical computation
- pandas: time-series data handling
- Matplotlib: forecast and policy visualization
- Optuna: hyperparameter optimization
- Weights and Biases: experiment tracking
Free online courses
Existing free, video-based courses this course can build on, for self-study or as a teaching basis.
- YouTubeDeepMind x UCL: Introduction to Reinforcement Learning (David Silver)
- YouTubeBerkeley CS285: Deep Reinforcement Learning (Fall 2023)
- MIT OCWTime Series Analysis I (MIT 18.S096)
In Hebrew · בעברית
- Dr. Amos Azaria, Ariel University (YouTube)Reinforcement Learning 1 - למידה מונחית חיזוקים
- Ben-Gurion University (Campus IL)מבוא לבינה מלאכותית: מתאוריה לפרקטיקה
Primary literature
Seminal works to read for graduate-level depth.
References
Books and resources link to an online or publisher page.
- TextbookReinforcement Learning: An Introduction, 2nd edition
- TextbookForecasting: Principles and Practice, 3rd edition
- TextbookProbabilistic Machine Learning: An Introduction
- PaperPlaying Atari with Deep Reinforcement Learning
- PaperProximal Policy Optimization Algorithms
- TextbookDive into Deep Learning
- DocumentationRay Documentation
Role in each concentration
| Concentration | Role |
|---|---|
| Intelligent Software Systems | Elective |
| Networking & Cyber Security | Elective |
| AI & Robotics | Core · Semester 1 |
| AI and Quantum Computing for Finance | Core · Semester 1 |
| Immersive Systems & Game Development | Elective |
| Defense Technologies & Autonomous Systems | Core · Semester 2 |