DX 704 - AI in the Field

The notes linked below are intended to supplement live sessions, Blackboard content and assigned readings for DX 704. They aim to provide broader depth while being more accessible than the textbooks. Beware: these notes are AI-generated, so read carefully and please check with the instructor if you find any mistakes.

Week 1: Introduction and Portfolio Selection

Introduction to Artificially Intelligent Agents
Financial Portfolio Selection

Week 2: Financial Time Series Analysis

Time Series Analysis
Financial Time Series and Their Applications

Week 3: Online Content Selection

Introduction to Sequential decision-making
- Policies (what)
- Multi-armed bandits (what)
- Exploration vs. exploitation (what)
- Epsilon-greedy
  - Epsilon-greedy (what)
  - Decaying epsilon-greedy achieves logarithmic regret (why)
- Upper confidence bound
  - Upper confidence bound (UCB) (what)
  - UCB1 achieves logarithmic regret (why)
- Thompson sampling
  - Thompson sampling (what)
  - Thompson sampling achieves logarithmic regret (why)
- Lower bounds
  - The Lai-Robbins regret lower bound (why)
Sequential Decision Making for Online Content Selection

Week 4: Personalized Recommendations

Contextual Bandits
- Contextual bandits (what)
- Linear bandits
  - Linear bandits (what)
  - OFUL achieves \(\tilde{O}(d\sqrt{T})\) regret (why)
Personalized Recommendations

Week 5: Planning Multiple Steps Ahead

Introduction to Planning
- Minimax
  - Minimax value (what)
  - The minimax theorem (why)
  - Minimax search (what)
- Alpha-beta pruning
  - Alpha-beta search (what)
  - Alpha-beta returns the minimax value (why)
  - Alpha-beta with perfect ordering is \(O(b^{d/2})\) (why)
  - Alpha-beta with random ordering is \(O(b^{3d/4})\) (why)
- Monte Carlo methods
  - Rollouts (what)
  - Monte Carlo tree search (what)
  - UCT converges to the minimax value (why)
Training Agents

Week 6: Better Treatment Decision Making

Markov Decision Processes
- MDP foundations
  - Markov decision processes (what)
  - Bellman equations (what)
  - An optimal deterministic policy exists (why)
  - The Bellman operator is a contraction (why)
- Planning with known models
  - Value iteration (what)
  - Value iteration converges to \(V^*\) (why)
  - Policy iteration (what)
  - Policy iteration converges to the optimal policy (why)
Optimizing Health Care with Markov Decision Processes

Week 7: Controlling Simple Physical Systems

Linear Quadratic Regulators
- Linear quadratic regulator
  - Linear quadratic regulator (LQR) (what)
  - LQR is solved by the backward Riccati recursion (why)
- Kalman filter
  - Kalman filter (what)
  - The Kalman gain minimizes the posterior covariance (why)
Controlling Simple Physical Systems

Week 8: Controlling Systems without Models

Model-Free Control
- Temporal difference methods
  - Temporal difference learning (what)
  - Q-learning (what)
  - Tabular Q-learning converges to \(Q^*\) (why)
- Policy gradients
  - Policy gradient (what)
  - REINFORCE (what)
  - The policy gradient theorem (why)
- Exploration
  - Boltzmann exploration (what)
  - Intrinsic motivation (what)
Controlling Real World Systems without Models

Week 9: Moderating Online Content

Designing a Binary Classifier for Text
Moderating Online Content

Week 10: Finding Relevant Documents

Comparing Documents with Document Vectors
Finding and Matching Documents

Week 11: Using Large Language Models

Capabilities of Large Language Models
Using Large Language Models in Applications

Week 12: Leveraging Pre-trained Models

Post-Training Large Models
Adapting Models to New Applications

Week 13: Thinking Harder and Smarter

Thinking Harder
Thinking Smarter

Week 14: AI for Science

AI for Nature
AI for Medicine