function projection_on_box!(x,xₗ,xᵤ)
for i=1:length(x)
if x[i] < xₗ[i]
= xₗ[i]
x[i] elseif x[i] > xᵤ[i]
= xᵤ[i]
x[i] end
end
end
projection_on_box! (generic function with 1 method)
We keep adhering to our previous decision to focus on the algorithms that use derivatives. But even then the number of derivative-based algorithms for constrained optimization – considering both equality and inequality constraints – is huge. They can be classified in many ways. Here we choose the very pragmatic point of view of the immediate use within our course, and within the discipline of optimal control in general. It is certainly a bit narrow point of view, but it will get us going… In this viewpoint we admit inspiration by the overview paper [1]. And there is a wealth of literature providing a more rigorous classification, which we give references to.
There are essentially two types of optimization problems (aka mathematical programms) that dominate the discipline of optimal control:
We will therefore focus our discussion of methods to these two.
We consider the problem
\begin{aligned} \operatorname*{minimize}_{\bm x \in \mathbb{R}^n} &\quad \frac{1}{2}\bm{x}^\top\mathbf{Q}\bm{x} + \mathbf{c}^\top\bm{x}\\ \text{subject to} &\quad \mathbf A_\text{eq} \bm x = \mathbf b_\text{eq},\\ &\quad \mathbf A_\text{ineq} \bm x \leq \mathbf b_\text{ineq}. \end{aligned}
#TODO
#TODO
The first-order methods (the methods using only the first derivatives) seem to be going through some rennaissance in the last two decades or so. Being computationally simpler than their higher-order counterparts (such as the Newton’s and Quasi-Newton methods), and enhanced with some clever acceleration modifications, they are the methods of choice in machine learning applications, where the size of the problems (the number of variables and the number of equations and inequalities) can easily reach millions and more. Although in optimal control we are typically not encountering optimization problems of this size, we can still benefit from the trend. While the size of the optimization problem can be medium or even small (a few dozens of variables and constraints), the time budget for its solution can be extremely small (easily bellow a millisecond, or even down to a few tens of microseconds). Furthermore, such optimization may be performed on some embedded hardware with limited resource. Computational simplicity of first-order methods (they do not rely on more advance linear algebra computations such as matrix decompositions) makes them particularly suited for these applications.
One way to extend the standard gradient method to constraint optimization is to combine it with a suitable projection operator. The idea behind the algorithm is that after the standard gradient descent update, a projection onto the feasible set is performed.
Commonly, an orthogonal projection is used, which is defined as P_\mathcal{C}(x) \coloneqq \arg\min_{\bm y\in\mathcal{C}} \|\bm y - \bm x\|_2.
For a general set \mathcal C, the projection can be computationaly expensive. But for some simple yet useful sets, the projection is trivial. The prominent example is a box (a multidimensional interval):
function projection_on_box!(x,xₗ,xᵤ)
for i=1:length(x)
if x[i] < xₗ[i]
= xₗ[i]
x[i] elseif x[i] > xᵤ[i]
= xᵤ[i]
x[i] end
end
end
projection_on_box! (generic function with 1 method)
In our implementation of the algorithm we use a fixed step lenght based on the maximum curvature of the Hessian.
using LinearAlgebra
function projected_gradient_quadratic(Q,c,xₗ,xᵤ,x₀,ϵ,N)
= x₀ # initializing x
x f(x) = 1/2*dot(x,Q,x)+dot(x,c)
∇f(x) = Q*x+c # defining the gradient
= maximum(diag(Q,0)) # maximum curvature (here we assume just a diagonal Q, otherwise max(eigvals))
L = 1/L # step length
α = 0
k = 1+ϵ # initial value of the distance between two solutions (epsilon plus whatever)
d while (norm(d) > ϵ/L)
= x
xold = x - α*∇f(x) # the step in the descent direction
x projection_on_box!(x,xₗ,xᵤ) # the projection of the descent step on the box
= x-xold # the current step (after the projection)
d = k+1
k if k >= N
return f(x),x
end
end
return f(x),x
end
projected_gradient_quadratic (generic function with 1 method)
= [1.5, 1.5] # the initial vector
x₀
= [0.0, 0.0] # the lower bound
xₗ = [2.0, 2.0] # the upper bound
xᵤ
= [1 0; 0 3] # the positive definite matrix defining the quadratic form
Q = [1; 2] # the vector defining the linear part
c
= 1e-5 # the tolerance
ϵ = 100; # the maximum number of steps N
= projected_gradient_quadratic(Q,c,xₗ,xᵤ,x₀,ϵ,N) f_opt,x_opt
(0.0, [0.0, 0.0])
Below we also give a bit more “decorated” version that produces the sequence of solutions that we can also plot.
using Printf
function projected_gradient_quadratic(Q,c,xₗ,xᵤ,x₀,ϵ,N)
= x₀ # initializing x
x = x # the vector of vectors that will be output
X f(x) = 1/2*dot(x,Q,x)+dot(x,c)
= f(x)
fx = [fx,]
F ∇f(x) = Q*x+c # building the gradient
= ∇f(x)
gx = maximum(diag(Q,0)) # maximum curvature (here I assume just diagonal Q, otherwise max(eigvals))
L = 1/L # step length
α #α = 1/5 # just to explore the behaviour when the step is longer or shorter than to the boundary
= 0
k = 1
d while (norm(d) > ϵ/L)
= k+1
k = x
xold = x - α*gx # step in the descent direction
x projection_on_box!(x,xₗ,xᵤ)
= x-xold
d @printf("iter = %3d ||∇f(x)|| = %6.4e f(x) = %6.4e\n",k,norm(gx),fx)
= ∇f(x)
gx = f(x)
fx = hcat(X,x)
X push!(F,fx)
if k >= N
return F,X
end
end
return F,X
end
projected_gradient_quadratic (generic function with 1 method)
= projected_gradient_quadratic(Q,c,xₗ,xᵤ,x₀,ϵ,N) F,X
iter = 1 ||∇f(x)|| = 6.9642e+00 f(x) = 9.0000e+00
iter = 2 ||∇f(x)|| = 2.6034e+00 f(x) = 8.8889e-01
iter = 3 ||∇f(x)|| = 2.2879e+00 f(x) = 1.1728e-01
iter = 4 ||∇f(x)|| = 2.2361e+00 f(x) = 0.0000e+00
([9.0, 0.8888888888888891, 0.117283950617284, 0.0, 0.0], [1.5 0.6666666666666667 … 0.0 0.0; 1.5 0.0 … 0.0 0.0])
= x2_grid = -2:0.01:4;
x1_grid f(x) = 1/2*dot(x,Q,x)+dot(x,c)
= [f([x1,x2]) for x2=x2_grid, x1=x1_grid];
z_grid
= -Q\c # the stationary point of the unconstrained problem
xs
using Plots
plot(Shape([(2,2),(2,0),(0,0),(0,2),(2,2)]),opacity=0.2,label="bounds")
contour!(x1_grid,x2_grid,z_grid)
plot!(X[1,:],X[2,:],label="xₖ",marker=:diamond,aspect_ratio=1)
scatter!([x₀[1],],[x₀[2],],label="x₀")
scatter!([xs[1],],[xs[2],],label="x⋆ unconstrained")
xlabel!("x₁");ylabel!("x₂")
#xlims!(-4,4); ylims!(-4,4)
#TODO
KKT conditions for a nonlinear program with equality constraints solved by Newton’s method. Interpretation: at each iteration, we solve a quadratic program (QP) with linear constraints.
#TODO