Homework

Dynamic programming solver implementation

In this homework, you will implement a dynamic programming solver for a general discrete-time optimal control problem in Julia. Specifically, you will solve the following problem: \begin{align*} \underset{\mathbf{u}_k}{\text{minimize}} \quad & \phi(\mathbf{x}_N) + \sum_{k=1}^{N-1} L(\bm{x}_k, \bm{u}_k)\\ \text{subject to} \quad & \bm{x}_{k+1} = \mathbf{f}(\bm{x}_k, \bm{u}_k), \qquad k = 1, 2, \ldots, N-1,\\ & \mathbf{u}_{\text{min}} \leq \bm{u}_k \leq \mathbf{u}_{\text{max}}, \qquad k = 1, 2, \ldots, N-1,\\ & \mathbf{x}_{\text{min}} \leq \bm{x}_k \leq \mathbf{x}_{\text{max}}, \qquad k = 1, 2, \ldots, N.\\ \end{align*} where \phi is the terminal penalty function, L is the additive cost, \mathbf{f} is state transition function, \bm{x}_k \in \mathbb{R}^{n} is the state, and \bm{u}_k \in \mathbb{R}^{m} is the control input. The state and control input constraints are given by \mathbf{x}_{\text{min}}, \mathbf{x}_{\text{max}}, \mathbf{u}_{\text{min}}, and \mathbf{u}_{\text{max}}, respectively. The goal is to construct the optimal control policy \bm{u}_k^*(\bm{x}) that minimizes the cost-to-go function J_k(\bm{x}) for each state \bm{x} at time k.

Your solver will work with discretized state and control input spaces X = X_1 \times X_2 \times \ldots \times X_n, \quad U = U_1 \times U_2 \times \ldots \times U_m, where each X_i and U_i are finite sets created by uniformly discretizing intervals [\mathbf{x}_{\text{min}}^{(i)}, \mathbf{x}_{\text{max}}^{(i)}] and [\mathbf{u}_{\text{min}}^{(i)}, \mathbf{u}_{\text{max}}^{(i)}] into n_{x}^{(i)} and n_{u}^{(i)} points, respectively.

Task description

Implement the dynprog function in the code cell below by filling in the missing parts. Specifically, you need to:
- Initialize the cost-to-go function J_N at the terminal time N.
- Implement the Bellman recursion to compute J_k(\bm{x}) and \bm{u}_k^*(\bm{x}) for all states \bm{x} \in X at time k.
Your solution should be contained in a single file named hw.jl, which you will upload to the BRUTE system.

Handling off-grid states

Since the next state \bm{x}_{k+1} = \mathbf{f}(\bm{x}_k, \bm{u}_k) is generally not on the discretized grid, interpolation is required. In the template below, we provide the function ss_itp, which creates an interpolator for a function (array) defined on the state space grid. This function can be used to interpolate the cost-to-go function J to an arbitrary state \bm{x}. The optimal control policy \bm{u}_k^*(\bm{x}) is also interpolated in a similar way.

Interpolating J introduces its own set of problems. For example, we cannot truly enforce the state constraints. This is due to the fact that if \bm{x}_{k+1} lies outside the bounds for all possible \bm{u}_k, then we would like to evaluate J_k as \infty or appropropriately high value. However, this high value would, in time, propagate to the entire state space, leading to incorrect results. We, therefore, opt for evaluating J_k for the out of bounds states by extrapolating the value from the nearest grid point. This is already implemented for you, but you should be aware of this limitation.

using LinearAlgebra, Interpolations

"""
    dynprog(L, ϕ, f, x_min, x_max, u_min, u_max, nxs, nus, N)

Applies dynamic programming to solve the discrete-time optimal control problem:

    minimize ∑ₖ L(xₖ, uₖ) + ϕ(x_N)
    subject to xₖ₊₁ = f(xₖ, uₖ),  k = 1, ..., N-1
               x_min ≤ xₖ ≤ x_max, k = 1, ..., N
               u_min ≤ uₖ ≤ u_max, k = 1, ..., N-1.

where
- `x` is the state,
- `u` is the control input,
- `L(x, u)` is the stage cost function,
- `ϕ(x)` is the terminal cost function,
- `f(x, u)` is the state transition function,
- `x_min` and `x_max` are vectors of the lower and upper bounds on the state,
- `u_min` and `u_max` are vectors of the lower and upper bounds on the control input,
- `nxs` is a tuple of the number of grid points along each axis of the state space,
- `nus` is a tuple of the number of grid points along each axis of the control input space,
- `N` is the number of time steps.

# Output
- `policy(x, k)`: A function that returns the optimal control action at state `x` and time `k`.
- `J_opt`: The optimal cost-to-go function on the state space grid.
- `u_opt`: The optimal control policy on the state space grid.
  - This function uses interpolation to approximate the optimal control for continuous `x`.

"""
function dynprog(
    L::Function,
    ϕ::Function,
    f::Function,
    x_min::Vector{Float64},
    x_max::Vector{Float64},
    u_min::Vector{Float64},
    u_max::Vector{Float64},
    nxs::NTuple,
    nus::NTuple,
    N::Int64
)

    n = length(nxs)
    m = length(nus)

    # Discretize each axis of the state space
    xs = map(i -> LinRange(x_min[i], x_max[i], nxs[i]), 1:n)

    # Take the Cartesian product of the axes
    X = Iterators.product(xs...)

    # Discretize each axis of the control input space
    us = map(i -> LinRange(u_min[i], u_max[i], nus[i]), 1:m)

    # Take the Cartesian product of the axes
    U = Iterators.product(us...)

    # The optimal cost-to-go function
    J_opt = zeros(length(X), N) # J[i, k] is the cost-to-go at state X[i] and time k

    # TODO Initialize the cost-to-go function at the terminal time
    J_opt[:, N] = zeros(length(X))

    # The optimal control policy
    u_opt = zeros(m, length(X), N-1) # u_opt[:, i, k] is the optimal control at state X[i] and time k

    # Creates an interpolator for an array defined on a grid onto the entire state space
    function ss_itp(A)
        itp = scale(extrapolate(interpolate(reshape(A, nxs...), BSpline(Cubic(Line(OnGrid())))), Flat()),Tuple(xs))
        fun_itp(x) = itp(x...)
        return fun_itp
    end

    for k = N-1:-1:1 # Backward recursion
     
        # Interpolate the cost-to-go function outside of the grid points
        next_cost_itp = ss_itp(J_opt[:, k+1]) # e.g. next_cost_itp(x) returns J_opt(x, k+1)
        
        for (i, x) in enumerate(X) # Loop over the state space
            x = collect(x) # tuple -> vector

            # TODO: Implement the Bellman recursion to compute J_opt[i, k] and u_opt[:, i, k] for all states x ∈ X.
            J_opt[i, k] = 0.0
            u_opt[:, i, k] .= 0.0  
        end

    end

    # Creates an interpolator for the optimal control policy
    u_opt_itps = [ss_itp(u_opt[i, :, k]) for i in 1:m, k in 1:N-1]
    policy(x, k) = map(i -> u_opt_itps[i, k](x), 1:m)

    return J_opt, u_opt, policy
end

For testing your implementation, you can use the following example problem: \begin{align*} \underset{u_k}{\text{minimize}} \quad & 2\lvert x_3 - 1 \rvert + \sum_{k=1}^{2} \lvert x_k \rvert + \lvert u_k \rvert\\ \text{subject to} \quad & x_{k+1} = x_k + u_k, \qquad k = 1, 2,\\ & -2 \leq x_k \leq 2, \qquad k = 1, 2, 3,\\ & -1 \leq u_k \leq 1, \qquad k = 1, 2. \end{align*} Furthermore, n_x = 5, n_u = 3, leading to the discretized state and control input spaces X = \{-2, -1, 0, 1, 2\} and U = \{-1, 0, 1\}, respectively. The problem is already implemented for you in the code cell below.

f(x,u) = [x[1] + u[1]]
L(x,u) = abs(x[1]) + abs(u[1])
ϕ(x) = 2.0abs(x[1] - 1.0)

x_min = [-2.0]
x_max = [2.0]
u_min = [-1.0]
u_max = [1.0]
nxs = (5,)
nus = (3,)

N = 3

J_opt, u_opt, policy = dynprog(L, ϕ, f, x_min, x_max, u_min, u_max, nxs, nus, N)

x0 = [-1.0]
xhist = zeros(1, N)
xhist[1] = x0[1]
uhist = zeros(1, N-1)

for k = 1:N-1
    uhist[:, k] = policy(xhist[:, k], k)
    xhist[:, k+1] = f(xhist[:, k], uhist[:, k])
end

using Plots

plot(0:N-1, xhist[1,:], linetype=:steppost)
xlabel!("Time")
ylabel!("State")
title!("State trajectory")