Linear Quadratic Regulator (LQR)
Motivation
Most physical systems near an operating point behave approximately like linear dynamical systems, and many engineering objectives penalize deviation from a target quadratically. The linear quadratic regulator (LQR) exploits this structure to find the optimal controller in closed form using dynamic programming. It is the foundational result of modern optimal control theory and the building block of the Kalman filter and LQG control.
Setup
The system evolves according to discrete-time linear dynamics:
\[ x_{t+1} = A x_t + B u_t, \]
where \(x_t \in \mathbb{R}^n\) is the state, \(u_t \in \mathbb{R}^m\) is the control input, and \(A \in \mathbb{R}^{n \times n}\), \(B \in \mathbb{R}^{n \times m}\) are known system matrices. Stochastic disturbances \(w_t \sim \mathcal{N}(0, W)\) can be added; by linearity they do not affect the optimal control gains.
The cost over a finite horizon \(T\) is
\[ J = x_T^\top Q_f\, x_T + \sum_{t=0}^{T-1} \bigl( x_t^\top Q\, x_t + u_t^\top R\, u_t \bigr), \]
where \(Q, Q_f \succeq 0\) penalize state deviation and \(R \succ 0\) penalizes control effort.
Optimal Control: Backward Riccati Recursion
Define the cost-to-go \(V_t(x) = \min_{u_t, \ldots, u_{T-1}} J_t\), where \(J_t\) is the remaining cost from state \(x\) at time \(t\). The Bellman equations give
\[ V_t(x) = \min_{u} \Bigl[ x^\top Q\, x + u^\top R\, u + V_{t+1}(Ax + Bu) \Bigr], \qquad V_T(x) = x^\top Q_f x. \]
Because the terminal cost is quadratic and dynamics are linear, the value function remains quadratic at every step. Substituting \(V_{t+1}(x) = x^\top P_{t+1} x\) and minimizing over \(u\) in closed form yields the optimal control and the backward Riccati recursion (proof):
\[ P_T = Q_f, \]
\[ P_t = Q + A^\top P_{t+1} A - A^\top P_{t+1} B \left( R + B^\top P_{t+1} B \right)^{-1} B^\top P_{t+1} A, \]
with optimal control
\[ u_t^* = -K_t x_t, \qquad K_t = \left( R + B^\top P_{t+1} B \right)^{-1} B^\top P_{t+1} A. \]
The optimal policy is linear state feedback: the control is proportional to the current state. The optimal cost from \(x_0\) is \(V_0(x_0) = x_0^\top P_0 x_0\).
Infinite-Horizon LQR
When \(T \to \infty\) and the system is stabilizable (can be driven to zero by some control) and detectable (state deviations are observable from the cost), the matrices \(P_t\) converge to a stationary solution \(P\) of the algebraic Riccati equation (ARE):
\[ P = Q + A^\top P A - A^\top P B \left( R + B^\top P B \right)^{-1} B^\top P A. \]
The optimal stationary controller is
\[ u^* = -Kx, \qquad K = \left( R + B^\top P B \right)^{-1} B^\top P A, \]
and the closed-loop matrix \(A - BK\) is Schur stable (all eigenvalues strictly inside the unit circle) (Anderson and Moore 1990).
LQG: Adding Noise and Partial Observability
When the state is unobserved and must be estimated from noisy measurements \(y_t = Cx_t + v_t\), the optimal policy remains linear by the separation principle: design the LQR gain \(K\) as if the state were fully observed, design the Kalman filter gain independently, and combine them. The optimal control is \(u_t^* = -K\hat{x}_{t|t}\), where \(\hat{x}_{t|t}\) is the Kalman filter estimate. The two designs decouple exactly.
Computational Complexity
The Riccati recursion runs backward over \(T\) steps at \(O(n^3)\) per step (dominated by the matrix inversion). The ARE can be solved by iterating the Riccati recursion until convergence or by eigendecomposition of the \(2n \times 2n\) Hamiltonian matrix, both running in \(O(n^3)\).
Extensions
- Continuous-time LQR. Replace \(x_{t+1} = Ax_t + Bu_t\) with \(\dot{x} = Ax + Bu\). The Riccati recursion becomes a differential equation; the infinite-horizon version solves the symmetric ARE \(PA + A^\top P - PBR^{-1}B^\top P + Q = 0\).
- Model-predictive control (MPC). Solve LQR over a receding finite horizon, re-optimizing at each step as new measurements arrive. This handles hard constraints on \(u\) and \(x\) that pure LQR cannot accommodate.
- Model-free LQR. When \(A\) and \(B\) are unknown, algorithms such as random least-squares shooting or policy gradient methods can learn a stabilizing gain from data.