CS 640 - Graduate Artificial Intelligence
The notes linked below are intended to supplement lecture material and assigned readings for CS 640. They give more depth than the lecture slides while aiming to be more accessible than the textbooks. Beware: these notes are AI-generated, so read carefully and please check with the instructor if you find any mistakes.
Searching: Finding solutions
- Course Introduction and Evaluation
- Kinds of AI (what)
- AI goals and rationality (what)
- AI evaluation (what)
- Generalization (what)
- Bias-variance decomposition (what)
- Rule-based systems
- Production systems (what)
- Forward chaining (what)
- Forward chaining is complete for Horn clauses (why)
- Backward chaining (what)
- Resolution and unification (what)
- Resolution is refutation-complete (why)
- State-based search
- Uninformed search (what)
- Breadth-first search (what)
- BFS finds the shallowest solution (why)
- Depth-first search (what)
- DFS is not optimal (why)
- Iterative deepening (what)
- IDS finds the shallowest solution (why)
- Iterative deepening time complexity (why)
- Heuristics (what)
- A* search (what)
- A* with an admissible heuristic returns an optimal solution (why)
- Consistent heuristics are admissible and prevent node re-expansion (why)
- Constraint satisfaction
- Boolean satisfiability (what)
- SAT is NP-complete (why)
- Backtracking search (what)
- Arc consistency (what)
- Davis-Putnam-Logemann-Loveland (what)
- Adversarial search
- Minimax value (what)
- The minimax theorem (why)
- Minimax search (what)
- Alpha-beta search (what)
- Alpha-beta returns the minimax value (why)
- Alpha-beta with perfect ordering is \(O(b^{d/2})\) (why)
- Alpha-beta with random ordering is \(O(b^{3d/4})\) (why)
Acting: Decisions over time
- Sequential decision-making
- Policies (what)
- Multi-armed bandits (what)
- Exploration vs. exploitation (what)
- Epsilon-greedy (what)
- Decaying epsilon-greedy achieves logarithmic regret (why)
- Upper confidence bound (UCB) (what)
- UCB1 achieves logarithmic regret (why)
- Thompson sampling (what)
- Thompson sampling achieves logarithmic regret (why)
- The Lai-Robbins regret lower bound (why)
- Markov decision processes (what)
- Bellman equations (what)
- An optimal deterministic policy exists (why)
- The Bellman operator is a contraction (why)
- Planning with known models
- Value iteration (what)
- Value iteration converges to V* (why)
- Policy iteration (what)
- Policy iteration converges to the optimal policy (why)
- Monte Carlo planning
- Rollouts (what)
- Monte Carlo tree search (what)
- UCT converges to the minimax value (why)
- Planning under uncertainty
- Model-free reinforcement learning
- Planning versus learning
- Temporal difference learning (what)
- Q-learning (what)
- Tabular Q-learning converges to Q* (why)
- Boltzmann exploration (what)
- Intrinsic motivation (what)
- Policy gradients
- Policy gradient (what)
- REINFORCE (what)
- The policy gradient theorem (why)
Learning representations
- Hidden Markov Models and Latent Variables
- Hidden Markov models (what)
- Forward-backward algorithm (what)
- Viterbi algorithm (what)
- Viterbi returns the optimal path (why)
- Baum-Welch (what)
- Expectation-maximization (what)
- EM monotonically increases the likelihood (why)
- Neural networks
- Multilayer perceptron (what)
- Activation functions (what)
- Loss functions (what)
- Cross-entropy minimization equals maximum likelihood (why)
- Universal approximation (what)
- The universal approximation theorem (why)
- Training neural networks
- Recurrent Neural Networks and Sequence Modeling
- Recurrent neural networks (what)
- Backpropagation through time (what)
- Vanishing and exploding gradients (what)
- The Jacobian product bound on gradient norms (why)
- Gradient clipping (what)
- LSTM (what)
- LSTM mitigates vanishing gradients (why)
- GRU (what)
- Convolutional Neural Networks
- Convolutional networks (what)
- Convolution layer (what)
- Pooling (what)
- Translation equivariance and weight sharing (what)
- Convolution is translation equivariant (why)
- Receptive fields (what)
- Batch normalization (what)
- Residual connections (what)
- Residual connections preserve gradient norm at initialization (why)
- Generative Models I (VAE)
- Autoencoder (what)
- Latent-variable generative models (what)
- Kullback-Leibler divergence (what)
- Evidence lower bound (ELBO) (what)
- The ELBO lower bounds the log evidence (why)
- Variational autoencoder (what)
- The VAE objective is the ELBO with a continuous-latent encoder (why)
- Reparameterization trick (what)
- Reparameterization has lower variance than the score-function estimator (why)
- Generative Models II (diffusion)
- Forward Gaussian noising process (what)
- Denoising diffusion probabilistic models (DDPM) (what)
- The DDPM simple loss is a weighted ELBO (why)
- Score matching (what)
- Denoising score matching (what)
- Denoising score matching equals explicit score matching (Vincent identity) (why)
- The score-based SDE view of diffusion (what)
- Classifier-free guidance (what)
Thinking: Language and reasoning
- Grammars and parsing
- Dependency structure (what)
- Information extraction (what)
- Classical machine translation
- Phrase-based translation (what)
- Sequence-to-sequence models (what)
- Attention and transformers
- Attention (what)
- Transformer (what)
- Large language models
- Pretraining (what)
- Prompting (what)
- Fine-tuning (what)
- LLM inference (what)
- Reasoning language models
- Chain-of-thought prompting (what)
- Post-training reinforcement learning (what)
- Limits of reasoning (what)
- Logic, knowledge representation, and neurosymbolic reasoning
- Knowledge representation (what)
- Knowledge graphs (what)
- Ontologies (what)
- Logical entailment (what)
- Neurosymbolic AI (what)
- LLMs as interfaces to symbolic tools (what)
- Neural proposal and symbolic verification (what)