using LinearAlgebra, Interpolations
"""
dynprog(L, ϕ, f, x_min, x_max, u_min, u_max, nxs, nus, N)
Applies dynamic programming to solve the discrete-time optimal control problem:
minimize ∑ₖ L(xₖ, uₖ) + ϕ(x_N)
subject to xₖ₊₁ = f(xₖ, uₖ), k = 1, ..., N-1
x_min ≤ xₖ ≤ x_max, k = 1, ..., N
u_min ≤ uₖ ≤ u_max, k = 1, ..., N-1.
where
- `x` is the state,
- `u` is the control input,
- `L(x, u)` is the stage cost function,
- `ϕ(x)` is the terminal cost function,
- `f(x, u)` is the state transition function,
- `x_min` and `x_max` are vectors of the lower and upper bounds on the state,
- `u_min` and `u_max` are vectors of the lower and upper bounds on the control input,
- `nxs` is a tuple of the number of grid points along each axis of the state space,
- `nus` is a tuple of the number of grid points along each axis of the control input space,
- `N` is the number of time steps.
# Output
- `policy(x, k)`: A function that returns the optimal control action at state `x` and time `k`.
- `J_opt`: The optimal cost-to-go function on the state space grid.
- `u_opt`: The optimal control policy on the state space grid.
- This function uses interpolation to approximate the optimal control for continuous `x`.
"""
function dynprog(
::Function,
L::Function,
ϕ::Function,
f::Vector{Float64},
x_min::Vector{Float64},
x_max::Vector{Float64},
u_min::Vector{Float64},
u_max::NTuple,
nxs::NTuple,
nus::Int64
N
)
= length(nxs)
n = length(nus)
m
# Discretize each axis of the state space
= map(i -> LinRange(x_min[i], x_max[i], nxs[i]), 1:n)
xs
# Take the Cartesian product of the axes
= Iterators.product(xs...)
X
# Discretize each axis of the control input space
= map(i -> LinRange(u_min[i], u_max[i], nus[i]), 1:m)
us
# Take the Cartesian product of the axes
= Iterators.product(us...)
U
# The optimal cost-to-go function
= zeros(length(X), N) # J[i, k] is the cost-to-go at state X[i] and time k
J_opt
# TODO Initialize the cost-to-go function at the terminal time
:, N] = zeros(length(X))
J_opt[
# The optimal control policy
= zeros(m, length(X), N-1) # u_opt[:, i, k] is the optimal control at state X[i] and time k
u_opt
# Creates an interpolator for an array defined on a grid onto the entire state space
function ss_itp(A)
= scale(extrapolate(interpolate(reshape(A, nxs...), BSpline(Cubic(Line(OnGrid())))), Flat()),Tuple(xs))
itp fun_itp(x) = itp(x...)
return fun_itp
end
for k = N-1:-1:1 # Backward recursion
# Interpolate the cost-to-go function outside of the grid points
= ss_itp(J_opt[:, k+1]) # e.g. next_cost_itp(x) returns J_opt(x, k+1)
next_cost_itp
for (i, x) in enumerate(X) # Loop over the state space
= collect(x) # tuple -> vector
x
# TODO: Implement the Bellman recursion to compute J_opt[i, k] and u_opt[:, i, k] for all states x ∈ X.
= 0.0
J_opt[i, k] :, i, k] .= 0.0
u_opt[end
end
# Creates an interpolator for the optimal control policy
= [ss_itp(u_opt[i, :, k]) for i in 1:m, k in 1:N-1]
u_opt_itps policy(x, k) = map(i -> u_opt_itps[i, k](x), 1:m)
return J_opt, u_opt, policy
end
Homework
Dynamic programming solver implementation
In this homework, you will implement a dynamic programming solver for a general discrete-time optimal control problem in Julia. Specifically, you will solve the following problem: \begin{align*} \underset{\mathbf{u}_k}{\text{minimize}} \quad & \phi(\mathbf{x}_N) + \sum_{k=1}^{N-1} L(\bm{x}_k, \bm{u}_k)\\ \text{subject to} \quad & \bm{x}_{k+1} = \mathbf{f}(\bm{x}_k, \bm{u}_k), \qquad k = 1, 2, \ldots, N-1,\\ & \mathbf{u}_{\text{min}} \leq \bm{u}_k \leq \mathbf{u}_{\text{max}}, \qquad k = 1, 2, \ldots, N-1,\\ & \mathbf{x}_{\text{min}} \leq \bm{x}_k \leq \mathbf{x}_{\text{max}}, \qquad k = 1, 2, \ldots, N.\\ \end{align*} where \phi is the terminal penalty function, L is the additive cost, \mathbf{f} is state transition function, \bm{x}_k \in \mathbb{R}^{n} is the state, and \bm{u}_k \in \mathbb{R}^{m} is the control input. The state and control input constraints are given by \mathbf{x}_{\text{min}}, \mathbf{x}_{\text{max}}, \mathbf{u}_{\text{min}}, and \mathbf{u}_{\text{max}}, respectively. The goal is to construct the optimal control policy \bm{u}_k^*(\bm{x}) that minimizes the cost-to-go function J_k(\bm{x}) for each state \bm{x} at time k.
Your solver will work with discretized state and control input spaces X = X_1 \times X_2 \times \ldots \times X_n, \quad U = U_1 \times U_2 \times \ldots \times U_m, where each X_i and U_i are finite sets created by uniformly discretizing intervals [\mathbf{x}_{\text{min}}^{(i)}, \mathbf{x}_{\text{max}}^{(i)}] and [\mathbf{u}_{\text{min}}^{(i)}, \mathbf{u}_{\text{max}}^{(i)}] into n_{x}^{(i)} and n_{u}^{(i)} points, respectively.
Task description
- Implement the
dynprog
function in the code cell below by filling in the missing parts. Specifically, you need to:- Initialize the cost-to-go function J_N at the terminal time N.
- Implement the Bellman recursion to compute J_k(\bm{x}) and \bm{u}_k^*(\bm{x}) for all states \bm{x} \in X at time k.
- Your solution should be contained in a single file named
hw.jl
, which you will upload to the BRUTE system.
Handling off-grid states
Since the next state \bm{x}_{k+1} = \mathbf{f}(\bm{x}_k, \bm{u}_k) is generally not on the discretized grid, interpolation is required. In the template below, we provide the function ss_itp
, which creates an interpolator for a function (array) defined on the state space grid. This function can be used to interpolate the cost-to-go function J to an arbitrary state \bm{x}. The optimal control policy \bm{u}_k^*(\bm{x}) is also interpolated in a similar way.
Interpolating J introduces its own set of problems. For example, we cannot truly enforce the state constraints. This is due to the fact that if \bm{x}_{k+1} lies outside the bounds for all possible \bm{u}_k, then we would like to evaluate J_k as \infty or appropropriately high value. However, this high value would, in time, propagate to the entire state space, leading to incorrect results. We, therefore, opt for evaluating J_k for the out of bounds states by extrapolating the value from the nearest grid point. This is already implemented for you, but you should be aware of this limitation.
For testing your implementation, you can use the following example problem: \begin{align*} \underset{u_k}{\text{minimize}} \quad & 2\lvert x_3 - 1 \rvert + \sum_{k=1}^{2} \lvert x_k \rvert + \lvert u_k \rvert\\ \text{subject to} \quad & x_{k+1} = x_k + u_k, \qquad k = 1, 2,\\ & -2 \leq x_k \leq 2, \qquad k = 1, 2, 3,\\ & -1 \leq u_k \leq 1, \qquad k = 1, 2. \end{align*} Furthermore, n_x = 5, n_u = 3, leading to the discretized state and control input spaces X = \{-2, -1, 0, 1, 2\} and U = \{-1, 0, 1\}, respectively. The problem is already implemented for you in the code cell below.
f(x,u) = [x[1] + u[1]]
L(x,u) = abs(x[1]) + abs(u[1])
ϕ(x) = 2.0abs(x[1] - 1.0)
= [-2.0]
x_min = [2.0]
x_max = [-1.0]
u_min = [1.0]
u_max = (5,)
nxs = (3,)
nus
= 3
N
= dynprog(L, ϕ, f, x_min, x_max, u_min, u_max, nxs, nus, N)
J_opt, u_opt, policy
= [-1.0]
x0 = zeros(1, N)
xhist 1] = x0[1]
xhist[= zeros(1, N-1)
uhist
for k = 1:N-1
:, k] = policy(xhist[:, k], k)
uhist[:, k+1] = f(xhist[:, k], uhist[:, k])
xhist[end
using Plots
plot(0:N-1, xhist[1,:], linetype=:steppost)
xlabel!("Time")
ylabel!("State")
title!("State trajectory")