AI2: Temporal AI: Time Series & Sequential Decision Making

About this course

Model data that unfolds over time and learn to make sequential decisions under uncertainty.

Course format. Thirteen weeks, four contact hours each: a two-hour lecture (concepts and theory) and a two-hour practice session. The course is project-based; teams carry one running project end to end and present it three times, in weeks 5, 8, and 13.

What you will build

Built an end-to-end forecasting and control agent in Python with PyTorch, statsmodels, and Gymnasium, coupling a neural sequence forecaster to a deep reinforcement-learning controller trained with DQN and PPO inside a simulated environment.

Expected outcomes

Formalize stochastic processes, stationarity, and autocorrelation for temporal data
Derive classical forecasting models including ARIMA and exponential smoothing
Build sequence models with RNNs, LSTMs, and temporal transformers
Define Markov decision processes with states, actions, rewards, and discounting
Derive the Bellman equations and prove value and policy iteration convergence
Implement value-based and policy-gradient reinforcement learning algorithms
Quantify uncertainty with predictive intervals and Bayesian decision theory
Evaluate forecasts and policies with proper scoring rules and backtesting
Apply exploration-exploitation strategies including epsilon-greedy and bandits
Deploy a sequential decision agent in a simulated environment

Key topics

Time-series forecasting
Sequence models
Reinforcement learning
Decision under uncertainty

Theoretical foundations

The concepts and results this course rests on.

stochastic processes, stationarity, and autocorrelation
ARIMA and exponential smoothing with maximum-likelihood estimation
recurrent networks and backpropagation through time
Markov decision processes and the Bellman optimality equations
temporal-difference learning and the policy-gradient theorem
multi-armed bandits, regret bounds, and Thompson sampling
Bayesian decision theory and expected-utility maximization

Prerequisites

This is a Year-3 course. It assumes the mandatory CS core: data structures and algorithms, operating systems, computer networks, databases, software engineering, and the core mathematics (linear algebra, probability and statistics, calculus, discrete mathematics). It additionally requires the specific prior courses listed below.

Course-specific prerequisites:

Machine Learning
Probability and statistics
Linear algebra

Weekly schedule 13 weeks · lecture + practice

Temporal foundations

Wk 1

Time series and stochastic processes

LectureWe define stochastic processes, stationarity, autocorrelation, and the decomposition of series into trend, seasonality, and noise.

PracticeLoad real time-series data, plot ACF and PACF, and test for stationarity.

ProjectFrame the running forecasting-and-control problem and load the target dataset.

WatchTime Series Talk: Stationarity · Time Series Talk: Autocorrelation and PACF

Wk 2

Classical forecasting

LectureWe derive autoregressive, moving-average, ARIMA, and exponential smoothing models and the maximum-likelihood estimation behind them.

PracticeFit ARIMA and exponential smoothing baselines and backtest them.

ProjectEstablish classical forecasting baselines for the project series.

WatchTime Series Talk: ARMA Model · Time Series Talk: ARIMA Model

Wk 3

Probabilistic forecasting and uncertainty

LectureWe cover predictive distributions, prediction intervals, proper scoring rules, and calibration.

PracticeProduce probabilistic forecasts and score them with pinball and CRPS metrics.

ProjectAdd calibrated uncertainty bands to the baseline forecasts.

Sequence models

Wk 4

Recurrent neural networks

LectureWe derive RNNs, backpropagation through time, vanishing gradients, and the LSTM and GRU gating that fix them.

PracticeTrain an LSTM forecaster and compare against the classical baselines.

ProjectIntroduce a neural sequence forecaster to the project.

WatchRecurrent Neural Networks (Data Science Concepts) · LSTM, Clearly Explained (StatQuest)

Wk 5

Temporal transformersPresentation

LectureWe adapt attention to sequences with causal masking and positional encoding for long-horizon forecasting.

PracticeTeam presentation: each team defends its problem specification, dataset, and metrics.

ProjectLock the specification and prototype a transformer-based forecaster.

WatchAttention for Neural Networks, Clearly Explained (StatQuest) · Attention in transformers, step-by-step (3Blue1Brown)

Decision theory

Wk 6

Markov decision processes

LectureWe define MDPs, returns, discounting, value functions, and the Bellman optimality equations.

PracticeImplement a gridworld MDP and solve it with value iteration.

ProjectReframe the project as a sequential decision problem with an explicit MDP.

WatchDavid Silver RL Lecture 2: Markov Decision Process

Wk 7

Dynamic programming and planning

LectureWe prove convergence of policy iteration and value iteration and discuss the contraction-mapping argument.

PracticeImplement policy iteration and compare convergence with value iteration.

ProjectCompute an optimal planning policy for the known-model version of the task.

WatchDavid Silver RL Lecture 3: Planning by Dynamic Programming

Reinforcement learning

Wk 8

Model-free value methodsPresentation

LectureWe derive Monte Carlo and temporal-difference learning, Q-learning, and the exploration-exploitation dilemma.

PracticeTeam presentation: interim demo of a learned policy in the simulator.

ProjectTrain a tabular or function-approximated Q-learning agent.

WatchDavid Silver RL Lecture 4: Model-Free Prediction · David Silver RL Lecture 5: Model-Free Control

Wk 9

Deep reinforcement learning

LectureWe cover DQN, experience replay, target networks, and the deadly triad of function approximation.

PracticeImplement a DQN agent and stabilize training with replay and target nets.

ProjectUpgrade the agent to deep value-based control on raw features.

Wk 10

Policy gradients

LectureWe derive the policy-gradient theorem, REINFORCE, advantage estimation, actor-critic, and PPO.

PracticeImplement an actor-critic or PPO agent and compare sample efficiency.

ProjectAdd a policy-gradient controller and benchmark it against the value agent.

WatchDavid Silver RL Lecture 7: Policy Gradient Methods

Uncertainty and bandits

Wk 11

Bandits and exploration

LectureWe cover multi-armed bandits, regret bounds, UCB, and Thompson sampling.

PracticeImplement UCB and Thompson sampling and chart cumulative regret.

ProjectAdd principled exploration to the decision agent.

WatchDavid Silver RL Lecture 9: Exploration and Exploitation

Integration

Wk 12

Forecast-driven decisions

LectureWe combine forecasting and decision-making through Bayesian decision theory and expected-utility maximization.

PracticeCouple the forecaster to the controller so predictions drive actions.

ProjectIntegrate forecasting and control into one end-to-end agent.

Capstone

Wk 13

Final defensePresentation

LectureWe synthesize probabilistic forecasting and sequential decision-making and survey open problems.

PracticeTeam presentation: final demo with backtests and an oral defense of design choices.

ProjectDeliver the integrated forecasting-and-control agent with evaluation results.

AI tools in this course.

Students use AI assistants to generate and refactor the statsmodels ARIMA baselines, the LSTM and temporal-transformer forecasters, and the Gymnasium environment and reward code, vibe-coding the DQN and PPO training loops. They prompt AI to synthesize simulated episodes and edge-case series, to wire reward shaping and replay buffers, and to set up Optuna sweeps. AI also helps read backtest plots and learning curves, explaining why a policy diverged or a forecast lost calibration.

Student project

Teams build one temporal decision agent that first forecasts a real time series and then acts on those forecasts inside a simulated environment. The project grows from classical baselines through neural sequence models into a reinforcement-learning controller, ending with an integrated forecast-driven decision policy.

Requirements

Build a working system, not a set of disconnected exercises.
Be original: a new system that solves a real problem, not a re-implementation of a tutorial or course demo.
Show real depth: real data, real users or realistic load, and engineering trade-offs that are measured rather than assumed.
Carry one running project from specification to a deployed, defensible result across the whole term.
Work in a team of three or four and defend the design at each of the three presentations (weeks 5, 8, and 13).

Example projects

Energy demand forecasting and storage controlInventory and supply-chain replenishmentAlgorithmic trading agentTraffic signal controlCloud autoscaling controllerRide-hailing fleet dispatchDynamic pricing agentSmart-grid load balancing

Assessment & grading

Grading is project-based, with no written exam. Teams of three or four present one running project three times.

Component	What it covers	Weight
Project · Specification	Presentation 1 (week 5): problem, objectives, and architecture	20%
Project · Interim	Presentation 2 (week 8): the working system demonstrated live	30%
Project · Final	Presentation 3 (week 13): end-to-end demo with oral defense	50%

Tools & platforms

PyTorch: sequence model implementation
statsmodels: ARIMA and classical time-series models
sktime: unified time-series forecasting API
Prophet: decomposable trend and seasonality forecasting
Gymnasium: reinforcement-learning environments
Stable-Baselines3: reference RL algorithm implementations
Ray RLlib: scalable distributed RL
NumPy: numerical computation
pandas: time-series data handling
Matplotlib: forecast and policy visualization
Optuna: hyperparameter optimization
Weights and Biases: experiment tracking

Free online courses

Existing free, video-based courses this course can build on, for self-study or as a teaching basis.

YouTubeDeepMind x UCL: Introduction to Reinforcement Learning (David Silver)
Classic 10-lecture RL foundations course
YouTubeBerkeley CS285: Deep Reinforcement Learning (Fall 2023)
Deep RL and sequential decision making
MIT OCWTime Series Analysis I (MIT 18.S096)
Time series lecture series with public video

In Hebrew · בעברית

Dr. Amos Azaria, Ariel University (YouTube)Reinforcement Learning 1 - למידה מונחית חיזוקים
Hebrew-spoken university course on reinforcement learning.
Ben-Gurion University (Campus IL)מבוא לבינה מלאכותית: מתאוריה לפרקטיקה
Free Hebrew video course; includes a unit on reinforcement learning (general AI fallback).

Primary literature

Seminal works to read for graduate-level depth.

PaperPlaying Atari with Deep Reinforcement Learning
Mnih, Kavukcuoglu, Silver, Graves, Antonoglou, Wierstra, Riedmiller, 2013
PaperProximal Policy Optimization Algorithms
Schulman, Wolski, Dhariwal, Radford, Klimov, 2017
PaperContinuous control with deep reinforcement learning
Lillicrap, Hunt, Pritzel, Heess, Erez, Tassa, Silver, Wierstra, 2015
PaperAsynchronous Methods for Deep Reinforcement Learning
Mnih, Badia, Mirza, Graves, Lillicrap, Harley, Silver, Kavukcuoglu, 2016
PaperTemporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting
Lim, Arik, Loeff, Pfister, 2019

References

Books and resources link to an online or publisher page.

TextbookReinforcement Learning: An Introduction, 2nd edition
Sutton, Barto, 2018
TextbookForecasting: Principles and Practice, 3rd edition
Hyndman, Athanasopoulos, 2021
TextbookProbabilistic Machine Learning: An Introduction
Murphy, 2022
PaperPlaying Atari with Deep Reinforcement Learning
Mnih, Kavukcuoglu, Silver, Graves, Antonoglou, Wierstra, Riedmiller, 2013
PaperProximal Policy Optimization Algorithms
Schulman, Wolski, Dhariwal, Radford, Klimov, 2017
TextbookDive into Deep Learning
Zhang, Lipton, Li, Smola, 2023
DocumentationRay Documentation
Anyscale, 2026

Role in each concentration

Concentration	Role
Intelligent Software Systems	Elective
Networking & Cyber Security	Elective
AI & Robotics	Core · Semester 1
AI and Quantum Computing for Finance	Core · Semester 1
Immersive Systems & Game Development	Elective
Defense Technologies & Autonomous Systems	Core · Semester 2

← AI1 · Language AI: LLMs and Agentic Systems AI3 · Vision AI: Deep Learning for Computer Vision →