Linear Programming

June 06, 2022

Introduction to LP Problems

Standard Form vs General Form

KEY: equivalence of (S) and (G)

Any linear programming problem, i.e

minimizecx subject to aixbi,iM1,aixbi,iM2,aix=bi,iM3,xj0,jN1,xj0,jN2.\begin{array}{rll} \operatorname{minimize} & \mathbf{c}^{\prime} \mathbf{x} & \\ \text { subject to } & \mathbf{a}_{i}^{\prime} \mathbf{x} \geq b_{i}, & i \in M_{1}, \\ & \mathbf{a}_{i}^{\prime} \mathbf{x} \leq b_{i}, & i \in M_{2}, \\ & \mathbf{a}_{i}^{\prime} \mathbf{x}=b_{i}, & i \in M_{3}, \\ & x_{j} \geq 0, & j \in N_{1}, \\ & x_{j} \leq 0, & j \in N_{2} . \end{array}

can be expressed compactly in both standard form and general form.

Definition 1.1.1 (General Form) A linear programming problem can be written as

minimizecx subject to Axb\begin{array}{rll} \operatorname{minimize} & \mathbf{c}^{\prime} \mathbf{x} \\ \text { subject to } & \mathbf{A x} \geq \mathbf{b} \end{array}

Definition 1.1.2 (Standard Form) A linear programming problem of the form

minimizecx subject to Ax=bx0\begin{array}{rr} \operatorname{minimize} & \mathbf{c}^{\prime} \mathbf{x} \\ \text { subject to } & \mathbf{A x}=\mathbf{b} \\ & \mathbf{x} \geq \mathbf{0} \end{array}

is said to be in standard form.

Polyhedron and convex sets

KEY: definition of polyhedron; definition of convex set.

We might as well revisit some definitions in convex optimization.

Definition 1.2 (Convexity and Concavity)

(a) A function f:nf: \Re^{n} \mapsto \Re is called convex if for every x,yn\mathbf{x}, \mathbf{y} \in \Re^{n}, and every λ[0,1]\lambda \in[0,1], we have

f(λx+(1λ)y)λf(x)+(1λ)f(y)f(\lambda \mathbf{x}+(1-\lambda) \mathbf{y}) \leq \lambda f(\mathbf{x})+(1-\lambda) f(\mathbf{y})

(b) A function f:nf: \Re^{n} \mapsto \Re is called concave if for every x,yn\mathbf{x}, \mathbf{y} \in \Re^{n}, and every λ[0,1]\lambda \in[0,1], we have

f(λx+(1λ)y)λf(x)+(1λ)f(y)f(\lambda \mathbf{x}+(1-\lambda) \mathbf{y}) \geq \lambda f(\mathbf{x})+(1-\lambda) f(\mathbf{y})

We also start with the formal definition of a polyhedron as well as a convex set.

Definition 1.3 (Polyhedron) A polyhedron is a set that can be described in the form {xnAxb}\left\{\mathbf{x} \in \Re^{n} \mid \mathbf{A} \mathbf{x} \geq \mathbf{b}\right\}, where A\mathbf{A} is an m×nm \times n matrix and b\mathbf{b} is a vector in m\Re^{m}.

Definition 1.4 (Convex Set) A set SnS \subset \Re^{n} is convex if for any x,yS\mathbf{x}, \mathbf{y} \in S, and any λ[0,1]\lambda \in[0,1], we have λx+(1λ)yS\lambda \mathbf{x}+(1-\lambda) \mathbf{y} \in S.

A polyhedron can either “extend to infinity,” or can be confined in a finite region. The definition that follows refers to this distinction.

Definition 1.5 (Boundedness) A set SnS \subset \Re^{n} is bounded if there exists a constant KK such that the absolute value of every component of every element of SS is less than or equal to KK.

The convex set also provide some interesting facts.

Theorem 1.1

(a) The intersection of convex sets is convex.

(b) Every polyhedron is a convex set.

(c) A convex combination of a finite number of elements of a convex set also belongs to that set.

Extreme Points and Basic Feasible Solutions

KEY: BFS = extreme point; geometric property of extreme point.

Our first definition defines an extreme point of a polyhedron as a point that cannot be expressed as a convex combination of two other elements of the polyhedron. This definition is entirely geometric and does not refer to a specific representation of a polyhedron in terms of linear constraints.

Definition 1.6 (Extreme Point) Let PP be a polyhedron. A vector xP\mathrm{x} \in P is an extreme point of PP if we cannot find two vectors y,zPy, z \in P, both different from x\mathbf{x}, and a scalar λ[0,1]\lambda \in[0,1], such that x=λy+(1λ)z\mathbf{x}=\lambda \mathbf{y}+(1-\lambda) \mathbf{z}.

To introduce the definition of basic feasible solution, we will first introduce the notation of active set.

Definition 1.7 (Active Set) If a vector x\mathbf{x}^{*} satisfies aix=bi\mathbf{a}_{i}^{\prime} \mathbf{x}^{*}=b_{i} for some ii in M1,M2M_{1}, M_{2}, or M3M_{3}, we say that the corresponding constraint is active at x\mathbf{x}^{*}. I={iaix=bi}I=\left\{i \mid \mathbf{a}_{i}^{\prime} \mathbf{x}^{*}=b_{i}\right\} is the set of indices of constraints that are active at xx^*, or active set.

From basic linear algebra knowledge, we can have the following theorem.

Theorem 1.2 Let x\mathbf{x}^{*} be an element of n\Re^{n} and let I={iaix=bi}I=\left\{i \mid \mathbf{a}_{i}^{\prime} \mathbf{x}^{*}=b_{i}\right\} be the active set of x\mathbf{x}^{*}. Then, the following are equivalent:

(a) There exist nn vectors in the set {aiiI}\left\{\mathbf{a}_{i} \mid i \in I\right\}, which are linearly independent.

(b) The span of the vectors ai,iI\mathbf{a}_{i}, i \in I, is all of n\Re^{n}, that is, every element of n\Re^{n} can be expressed as a linear combination of the vectors ai,iI\mathbf{a}_{i}, i \in I.

(c) The system of equations aix=bi,iI\mathbf{a}_{i}^{\prime} \mathbf{x}=b_{i}, i \in I, has a unique solution.

We are now ready to provide an algebraic definition of a corner point, as a feasible solution at which there are nn linearly independent active constraints. However, note that this procedure has no guarantee of leading to a feasible vector. Basic solution is a totally different notion with basic feasible solution.

Definition 1.8 (Basic Solution vs Basic Feasible Solution) Consider a polyhedron PP defined by linear equality and inequality constraints, and let x\mathbf{x}^{*} be an element of n\Re^{n}.

(a) The vector x\mathrm{x}^{*} is a basic solution if:

​ (i) All equality constraints are active;

​ (ii) Out of the constraints that are active at x\mathbf{x}^{*}, there are nn of them that are linearly independent.

(b) If x\mathbf{x}^{*} is a basic solution that satisfies all of the constraints, we say that it is a basic feasible solution.

These two different definitions are, however, meant to capture the same concept geometrically and algebraically. This could be verified by contradiction.

Theorem 1.3 Let PP be a nonempty polyhedron and let xP\mathbf{x}^{*} \in P. Then, the following are equivalent:

(a) x\mathbf{x}^{*} is an extreme point;

(b) x\mathbf{x}^{*} is a basic feasible solution.

And we can easily derive the corollary.

Corollary 1.1 Given a finite number of linear inequality constraints, there can only be a finite number of basic or basic feasible solutions.

Standard Form Polyhedron

KEY: Theorem 1.4.

Recall that at any basic solution, there must be nn linearly independent constraints that are active. Furthermore, every basic solution must satisfy the equality constraints Ax=b\mathbf{A} \mathbf{x}=\mathbf{b}, which provides us with mm linear independent active constraints. Therefore, there are nmn-m more constraints from nn constraints of xi0x_i\ge 0.

Theorem 1.4 Consider the constraints Ax=b\mathbf{A x}=\mathbf{b} and x0\mathbf{x} \geq \mathbf{0} and assume that the m×nm \times n matrix A\mathbf{A} has linearly independent rows. A vector xn\mathbf{x} \in \Re^{n} is a basic solution if and only if we have Ax=b\mathbf{A} \mathbf{x}=\mathbf{b}, and there exist indices B(1),,B(m)B(1), \ldots, B(m) such that:

(a) The columns AB(1),,AB(m)\mathbf{A}_{B(1)}, \ldots, \mathbf{A}_{B(m)} are linearly independent;

(b) If iB(1),,B(m)i \neq B(1), \ldots, B(m), then xi=0x_{i}=0.

In this way, we can construct basic feasible solutions in the following way.

  1. Choose mm linearly independent columns AB(1),,AB(m)\mathbf{A}_{B(1)}, \ldots, \mathbf{A}_{B(m)}.
  2. Let xi=0x_{i}=0 for all iB(1),,B(m)i \neq B(1), \ldots, B(m).
  3. Solve the system of mm equations Ax=b\mathbf{A} \mathbf{x}=\mathbf{b} for the unknowns xB(1)x_{B(1)}, ,xB(m)\ldots, x_{B(m)}.

We can also obtain an m×mm \times m matrix B\mathbf{B}, called a basis matrix, by arranging the mm basic columns next to each other. We can similarly define a vector xB\mathbf{x}_{B} with the values of the basic variables. Thus,

B=[AB(1)AB(2)AB(m)],xB=[xB(1)xB(m)]\mathbf{B}=\left[\begin{array}{cccc} \mid & \mid & & \mid \\ \mathbf{A}_{B(1)} & \mathbf{A}_{B(2)} & \cdots & \mathbf{A}_{B(m)} \\ \mid & \mid & & \mid \end{array}\right], \quad \mathbf{x}_{B}=\left[\begin{array}{c} x_{B(1)} \\ \vdots \\ x_{B(m)} \end{array}\right]

The basic variables are determined by solving the equation BxB=b\mathbf{B} \mathbf{x}_{B}=\mathbf{b} whose unique solution is given by

xB=B1b\mathbf{x}_{B}=\mathbf{B}^{-1} \mathbf{b}

Existence of Extreme Points

KEY: has extreme point \leftrightarrow does not contain a line; a line has at most two intersection with a polyhedron.

We need the following definition

Definition 1.9 A polyhedron PnP \subset \Re^{n} contains a line if there exists a vector xP\mathbf{x} \in P and a nonzero vector dn\mathbf{d} \in \Re^{n} such that x+λdP\mathbf{x}+\lambda \mathbf{d} \in P for all scalars λ\lambda.

to formulate the following theorem.

Theorem 1.5 Suppose that the polyhedron P={xnaixP=\left\{\mathbf{x} \in \Re^{n} \mid \mathbf{a}_{i}^{\prime} \mathbf{x} \geq\right. bi,i=1,,m}\left.b_{i}, i=1, \ldots, m\right\} is nonempty. Then, the following are equivalent:

(a) The polyhedron PP has at least one extreme point.

(b) The polyhedron PP does not contain a line.

(c) There exist nn vectors out of the family a1,,am\mathbf{a}_{1}, \ldots, \mathbf{a}_{m}, which are linearly independent.

Since a bounded polyhedron does not contain a line, we will have the following corollary

Corollary 1.2 Every nonempty bounded polyhedron and every nonempty polyhedron in standard form has at least one basic feasible solution.

Optimality of Extreme Points

If P={xnAxb}P=\left\{\mathbf{x} \in \Re^{n} \mid \mathbf{A} \mathbf{x} \geq \mathbf{b}\right\} is the feasible set and vv is the optimal value of the cost cx\mathbf{c}^{\prime} \mathbf{x}. Then, Q={xnAxb,cx=v}Q=\left\{\mathbf{x} \in \Re^{n} \mid \mathbf{A} \mathbf{x} \geq \mathbf{b}, \mathbf{c}^{\prime} \mathbf{x}=v\right\} is also a polyhedron. With this, we can prove the following theorem that, as long as a linear programming problem has an optimal solution and as long as the feasible set has at least one extreme point, we can always find an optimal solution within the set of extreme points of the feasible set .

Theorem 1.6 Consider the linear programming problem of minimizing cx\mathbf{c}^{\prime} \mathbf{x} over a polyhedron PP. Suppose that PP has at least one extreme point and that there exists an optimal solution. Then, there exists an optimal solution which is an extreme point of PP.

A stronger result shows that the existence of an optimal solution can be taken for granted, as long as the optimal cost is finite.

Theorem 1.7 Consider the linear programming problem of minimizing cx\mathbf{c}^{\prime} \mathbf{x} over a polyhedron PP. Suppose that PP has at least one extreme point. Then, either the optimal cost is equal to -\infty, or there exists an extreme point which is optimal.

As any linear programming problem can be transformed into an equivalent problem in standard form, we will have the following corollary.

Corollary 1.3 Consider the linear programming problem of minimizing cx\mathbf{c}^{\prime} \mathbf{x} over a nonempty polyhedron. Then, either the optimal cost is equal to -\infty or there exists an optimal solution.

Miscellaneous

One might be doubtful why A\mathbf{A} is always assumed to be full row rank. The following theorem shows that the full row rank assumption on the matrix A\mathbf{A} results in no loss of generality.

Theorem 1.8 Let P={xAx=b,x0}P=\{\mathbf{x} \mid \mathbf{A} \mathbf{x}=\mathbf{b}, \mathbf{x} \geq \mathbf{0}\} be a nonempty polyhedron, where A\mathbf{A} is a matrix of dimensions m×nm \times n, with rows a1,,am\mathbf{a}_{1}^{\prime}, \ldots, \mathbf{a}_{m}^{\prime}. Suppose that rank(A)=k<m\operatorname{rank}(\mathbf{A})=k<m and that the rows ai1,,aik\mathbf{a}_{i_{1}}^{\prime}, \ldots, \mathbf{a}_{i_{k}}^{\prime} are linearly independent. Consider the polyhedron

Q={xai1x=bi1,,aikx=bik,x0}Q=\left\{\mathbf{x} \mid \mathbf{a}_{i_{1}}^{\prime} \mathbf{x}=b_{i_{1}}, \ldots, \mathbf{a}_{i_{k}}^{\prime} \mathbf{x}=b_{i_{k}}, \mathbf{x} \geq \mathbf{0}\right\}

Then Q=PQ=P.

In some cases, the basic feasible solution is degenerate.

Definition 1.10 (Degeneracy) Consider the standard form polyhedron P={xP=\{\mathbf{x} \in nAx=b,x0}\left.\Re^{n} \mid \mathbf{A x}=\mathbf{b}, \mathbf{x} \geq \mathbf{0}\right\} and let x\mathbf{x} be a basic solution. Let mm be the number of rows of A\mathbf{A}. The vector x\mathbf{x} is a degenerate basic solution if more than nmn-m of the components of x\mathbf{x} are zero.

An important comment is that degeneracy is not a pure geometric property. That is to say, a degenerate basic feasible solution under one representation could be nondegenerate under another representation.

Simplex Method

Optimality conditions

**KEY: feasible direction; **

Suppose that we are at a point xP\mathbf{x} \in P and that we contemplate moving away from x\mathbf{x}, in the direction of a vector dn\mathbf{d} \in \Re^{n}. Clearly, we should only consider those choices of d\mathbf{d} that do not immediately take us outside the feasible set.

Definition 2.1 (Feasible Direction) Let x\mathrm{x} be an element of a polyhedron PP. A vector dn\mathbf{d} \in \Re^{n} is said to be a feasible direction at x\mathrm{x}, if there exists a positive scalar θ\theta for which x+θdP\mathbf{x}+\theta \mathbf{d} \in P.

Given that we are only interested in feasible solutions, we require A(x+θd)=b\mathbf{A}(\mathbf{x}+\theta \mathbf{d})=\mathbf{b}, and since x\mathbf{x} is feasible, we also have Ax=b\mathbf{A} \mathbf{x}=\mathbf{b}. Thus, for the equality constraints to be satisfied for θ>0\theta>0, we need Ad=0\mathbf{A d}=\mathbf{0}. If we want to move to adjacent extreme point, it must satisfy the condition that dj=1d_{j}=1, and di=0d_{i}=0 for all other non-basic indices ii. Then,

0=Ad=i=1nAidi=i=1mAB(i)dB(i)+Aj=BdB+Aj.\mathbf{0}=\mathbf{A d}=\sum_{i=1}^{n} \mathbf{A}_{i} d_{i}=\sum_{i=1}^{m} \mathbf{A}_{B(i)} d_{B(i)}+\mathbf{A}_{j}=\mathbf{B d}_{B}+\mathbf{A}_{j} .

Since the basis matrix B\mathbf{B} is invertible, we obtain

dB=B1Aj\mathbf{d}_{B}=-\mathbf{B}^{-1} \mathbf{A}_{j} \text {. }

Definition 2.2 (Reduced Cost) Let x\mathbf{x} be a basic solution, let B\mathbf{B} be an associated basis matrix, and let cB\mathbf{c}_{B} be the vector of costs of the basic variables. For each jj, we define the reduced cost cˉj\bar{c}_{j} of the variable xjx_{j} according to the formula

cˉj=cjcBB1Aj\bar{c}_{j}=c_{j}-\mathbf{c}_{B}^{\prime} \mathbf{B}^{-1} \mathbf{A}_{j}

With reduced cost, we can provide optimality conditions of linear programming.

Theorem 2.1 Consider a basic feasible solution x\mathrm{x} associated with a basis matrix B\mathbf{B}, and let c\overline{\mathbf{c}} be the corresponding vector of reduced costs.

(a) If c0\overline{\mathbf{c}} \geq \mathbf{0}, then x\mathbf{x} is optimal.

(b) If x\mathbf{x} is optimal and nondegenerate, then c0\overline{\mathbf{c}} \geq \mathbf{0}.

Proof.

(a) We assume that c0\overline{\mathbf{c}} \geq \mathbf{0}, we let y\mathbf{y} be an arbitrary feasible solution, and we define d=yx\mathbf{d}=\mathbf{y}-\mathbf{x}. Feasibility implies that Ax=Ay=b\mathbf{A x}=\mathbf{A y}=\mathbf{b} and, therefore, Ad=0\mathbf{A d}=\mathbf{0}. The latter equality can be rewritten in the form

BdB+iNAidi=0\mathbf{B d}_{B}+\sum_{i \in N} \mathbf{A}_{i} d_{i}=\mathbf{0}

where NN is the set of indices corresponding to the nonbasic variables under the given basis. Since B is invertible, we obtain

dB=iNB1Aidi\mathbf{d}_{B}=-\sum_{i \in N} \mathbf{B}^{-1} \mathbf{A}_{i} d_{i}

and

cd=cBdB+iNcidi=iN(cicBB1Ai)di=iNcˉidi\mathbf{c}^{\prime} \mathbf{d}=\mathbf{c}_{B}^{\prime} \mathbf{d}_{B}+\sum_{i \in N} c_{i} d_{i}=\sum_{i \in N}\left(c_{i}-\mathbf{c}_{B}^{\prime} \mathbf{B}^{-1} \mathbf{A}_{i}\right) d_{i}=\sum_{i \in N} \bar{c}_{i} d_{i}

For any nonbasic index iNi \in N, we must have xi=0x_{i}=0 and, since y\mathbf{y} is feasible, yi0y_{i} \geq 0. Thus, di0d_{i} \geq 0 and cˉidi0\bar{c}_{i} d_{i} \geq 0, for all iNi \in N. We conclude that c(yx)=cd0\mathbf{c}^{\prime}(\mathbf{y}-\mathbf{x})=\mathbf{c}^{\prime} \mathbf{d} \geq 0, and since y\mathbf{y} was an arbitrary feasible solution, x\mathbf{x} is optimal.

(b) Suppose that x\mathrm{x} is a nondegenerate basic feasible solution and that cˉj<0\bar{c}_{j}<0 for some jj. Since the reduced cost of a basic variable is always zero, xjx_{j} must be a non-basic variable and cˉj\bar{c}_{j} is the rate of cost change along the jj th basic direction. Since x\mathrm{x} is nondegenerate, the jj th basic direction is a feasible direction of cost decrease, as discussed earlier. By moving in that direction, we obtain feasible solutions whose cost is less than that of x\mathbf{x}, and x\mathbf{x} is not optimal.

Simplex Method (Naive)

An iteration of the simplex method

  1. In a typical iteration, we start with a basis consisting of the basic columns AB(1),,AB(m)\mathbf{A}_{B(1)}, \ldots, \mathbf{A}_{B(m)}, and an associated basic feasible solution x\mathrm{x}.
  2. Compute the reduced costs cˉj=cjcBB1Aj\bar{c}_{j}=c_{j}-\mathbf{c}_{B}^{\prime} \mathbf{B}^{-1} \mathbf{A}_{j} for all non-basic indices jj. If they are all nonnegative, the current basic feasible solution is optimal, and the algorithm terminates; else, choose some jj for which cˉj<0\bar{c}_{j}<0.
  3. Compute u=dB=B1Aj\mathbf{u}=-\mathbf{d_B}=\mathbf{B}^{-1} \mathbf{A}_{j}. If no component of u\mathbf{u} is positive, we have θ=\theta^{*}=\infty, the optimal cost is -\infty, and the algorithm terminates.
  4. If some component of u\mathbf{u} is positive, let
θ=min{i=1,,mui>0}xB(i)ui.\theta^{*}=\min _{\left\{i=1, \ldots, m \mid u_{i}>0\right\}} \frac{x_{B(i)}}{u_{i}} .
  1. Let \ell be such that θ=xB()/u\theta^{*}=x_{B(\ell)} / u_{\ell}. Form a new basis by replacing AB()\mathbf{A}_{B(\ell)} with Aj\mathbf{A}_{j}. If y\mathbf{y} is the new basic feasible solution, the values of the new basic variables are yj=θy_{j}=\theta^{*} and yB(i)=xB(i)θuiy_{B(i)}=x_{B(i)}-\theta^{*} u_{i}, ii \neq \ell.

If we define Bˉ(i)={B(i),ij,i=,\bar{B}(i)= \begin{cases}B(i), & i \neq \ell \\ j, & i=\ell,\end{cases} which is the new basis of the new basic feasible solution. This can be verified by the following theorem.

Theorem 2.2

(a) The columns AB(i),i\mathbf{A}_{B(i)}, i \neq \ell, and Aj\mathbf{A}_{j} are linearly independent and, therefore, B\overline{\mathbf{B}} is a basis matrix.

(b) The vector y=x+θd\mathbf{y}=\mathbf{x}+\theta^{*} \mathbf{d} is a basic feasible solution associated with the basis matrix B\overline{\mathbf{B}}.

The problem of this naive implementation is that the matrix inverse has O(n3)O(n^3) complexity, thus resulting in O(m3+mn)O(m^3 + mn) computational effort per iteration. Nevertheless, simplex method is guaranteed convergence.

Theorem 2.3 (Convergence of Simplex Method) Assume that the feasible set is nonempty and that every basic feasible solution is nondegenerate. Then, the simplex method terminates after a finite number of iterations. At termination, there are the following two possibilities:

(a) We have an optimal basis B\mathbf{B} and an associated basic feasible solution which is optimal.

(b) We have found a vector d\mathbf{d} satisfying Ad=0,d0\mathbf{A d}=\mathbf{0}, \mathbf{d} \geq \mathbf{0}, and cd<0\mathbf{c}^{\prime} \mathbf{d}<0, and the optimal cost is -\infty.

Simplex Method (Revised)

Notice that since B1B=I\mathbf{B}^{-1} \mathbf{B}=\mathbf{I}, we have the property that B1AB(i)\mathbf{B}^{-1} \mathbf{A}_{B(i)} is the ii th unit vector ei\mathbf{e}_{i}. Using this observation, we have

B1B=[e1e1ue+1em]=[1u1uum1]=Q1\mathbf{B}^{-1} \overline{\mathbf{B}}=\left[\begin{array}{ccccccc}\mid & & \mid & \mid & \mid & & \mid \\ \mathbf{e}_{1} & \cdots & \mathbf{e}_{\ell-1} & \mathbf{u} & \mathbf{e}_{\ell+1} & \cdots & \mathbf{e}_{m} \\ \mid & & \mid & \mid & \mid & & \mid\end{array}\right]\\ =\left[\begin{array}{ccccc}1 & & u_{1} & & \\ & \ddots & \vdots & & \\ & & u_{\ell} & & \\ & & \vdots & \ddots & \\ & & u_{m} & & 1\end{array}\right] = \mathbf{Q}^{-1}

where u=dB=B1Aj\mathbf{u}=-\mathbf{d_B}=\mathbf{B}^{-1} \mathbf{A}_{j}. Since the result is the identity, we have QB1B=I\mathbf{Q B}{ }^{-1} \overline{\mathbf{B}}=\mathbf{I}, which yields QB1=B1\mathbf{Q} \mathbf{B}^{-1}=\overline{\mathbf{B}}^{-1}. We can update B1\mathbf{B}^{-1} to B1\overline{\mathbf{B}}^{-1} by row operations with Q\mathbf{Q}.

Q=[1u1ul1uu1ul1]Q=\left[\begin{array}{ccccc}1 & & -\frac{u_{1}}{u_l} & & \\ & \ddots & \vdots & & \\ & & \frac{1}{u_{\ell}} & & \\ & & \vdots & \ddots & \\ & & -\frac{u_{1}}{u_l} & & 1\end{array}\right]

This will reduce the operation per iteration to O(m2+mn)O(m^2+mn).

Simplex Method (Tableau)

Another implementation of simplex method is called full tableau. Here, instead of maintaining and updating the matrix B1\mathbf{B}^{-1}, we maintain and update the m×(n+1)m \times(n+1) matrix

B1[bA]\mathbf{B}^{-1}[\mathbf{b} \mid \mathbf{A}]

with columns B1b\mathbf{B}^{-1} \mathbf{b} and B1A1,,B1An\mathbf{B}^{-1} \mathbf{A}_{1}, \ldots, \mathbf{B}^{-1} \mathbf{A}_{n}. This matrix is called the simplex tableau. Note that the column B1b\mathbf{B}^{-1} \mathbf{b}, called the zeroth column, contains the values of the basic variables.

It is customary to augment the simplex tableau by including a top row, to be referred to as the zeroth row. The entry at the top left corner contains the value cBxB-\mathbf{c}_{B}^{\prime} \mathbf{x}_{B}, which is the negative of the current cost. The rest of the zeroth row is the row vector of reduced costs, that is, the vector c=ccBB1A\overline{\mathbf{c}}^{\prime}=\mathbf{c}^{\prime}-\mathbf{c}_{B}^{\prime} \mathbf{B}^{-1} \mathbf{A}. Thus, the structure of the tableau is:

cBB1bccBB1AB1bB1A\begin{array}{|c|c|} \hline-\mathbf{c}_{B}^{\prime} \mathbf{B}^{-1} \mathbf{b} & \mathbf{c}^{\prime}-\mathbf{c}_{B}^{\prime} \mathbf{B}^{-1} \mathbf{A} \\ \hline \mathbf{B}^{-1} \mathbf{b} & \mathbf{B}^{-1} \mathbf{A} \\ \hline \end{array}

Complexity

 Full tableau  Revised simplex  Memory O(mn)O(m2) Worst-case time O(mn)O(mn) Best-case time O(mn)O(m2)\begin{array}{|l|c|c|} \hline & \text { Full tableau } & \text { Revised simplex } \\ \hline \text { Memory } & O(m n) & O\left(m^{2}\right) \\ \hline \text { Worst-case time } & O(m n) & O(m n) \\ \hline \text { Best-case time } & O(m n) & O\left(m^{2}\right) \\ \hline \end{array}

Finding an Initial Basic Feasible Solution (Two-Phase Algorithm)

Consider the problem

minimizecx subject to Ax=bx0\begin{array}{rll} \operatorname{minimize} & \mathbf{c}^{\prime} \mathbf{x} \\ \text { subject to } & \mathbf{A} \mathbf{x}=\mathbf{b} \\ & \mathbf{x} \geq \mathbf{0} \end{array}

By possibly multiplying some of the equality constraints by 1-1, we can assume, without loss of generality, that b0\mathbf{b} \geq \mathbf{0}. We now introduce a vector ym\mathbf{y} \in \Re^{m} of artificial variables and use the simplex method to solve the auxiliary problem

minimizey1+y2++ym subject to Ax+y=bx0y0.\begin{array}{rll} \operatorname{minimize} & y_{1}+y_{2}+\cdots+y_{m} \\ \text { subject to } & \mathbf{A} \mathbf{x} +\mathbf{y} =\mathbf{b} \\ & \mathbf{x} \geq \mathbf{0}\\ & \mathbf{y} \geq 0 . \end{array}

Initialization is easy for the auxiliary problem: by letting x=0\mathbf{x}=\mathbf{0} and y=b\mathbf{y}=\mathbf{b}, we have a basic feasible solution and the corresponding basis matrix is the identity.

If x\mathbf{x} is a feasible solution to the original problem, this choice of x\mathbf{x} together with y=0\mathbf{y}=\mathbf{0}, yields a zero cost solution to the auxiliary problem. Therefore, if the optimal cost in the auxiliary problem is nonzero, we conclude that the original problem is infeasible. Otherwise, the solution x\mathbf{x} will be a basic feasible solution.

Dual Simplex Method

The Dual Problem

Let A\mathbf{A} be a matrix with rows ai\mathbf{a}_{i}^{\prime} and columns Aj\mathbf{A}_{j}. Given a primal problem with the structure shown on the left, its dual is defined to be the maximization problem shown on the right:

 PRIMAL  minimize  maximize  DUAL bi0 constraints bi0 variables =bi free 0cj variables 0cj constraints  free =cj\begin{array}{|c|c||c|c|} \hline \text { PRIMAL } & \text { minimize } & \text { maximize } & \text { DUAL } \\ \hline & \geq b_{i} & \geq 0 & \\ \text { constraints } & \leq b_{i} & \leq 0 & \text { variables } \\ & =b_{i} & \text { free } & \\ & \geq 0 & \leq c_{j} & \\ \text { variables }& \leq 0 & \geq c_{j} & \text { constraints } \\ & \text { free } & =c_{j} & \\ \hline \end{array}

An interesting fact from the above table is

Theorem 3.1 If we transform the dual into an equivalent minimization problem and then form its dual, we obtain a problem equivalent to the original problem.

And this theorem can be even stronger.

Theorem 3.2 Suppose that we have transformed a linear programming problem Π1\Pi_{1} to another linear programming problem Π2\Pi_{2}, by a sequence of transformations of the following types:

(a) Replace a free variable with the difference of two nonnegative variables.

(b) Replace an inequality constraint by an equality constraint involving a nonnegative slack variable.

(c) If some row of the matrix A\mathbf{A} in a feasible standard form problem is a linear combination of the other rows, eliminate the corresponding equality constraint.

Then, the duals of Π1\Pi_{1} and Π2\Pi_{2} are equivalent, i.e., they are either both infeasible, or they have the same optimal cost.

Duality Theorem

We have discussed too much duality theorems in convex optimization course, so I will just list out several theorems for reference.

Theorem 3.3 (Weak duality) If x\mathrm{x} is a feasible solution to the primal problem and p\mathbf{p} is a feasible solution to the dual problem, then

pbcx\mathbf{p}^{\prime} \mathbf{b} \leq \mathbf{c}^{\prime} \mathbf{x}

Theorem 3.4 (Strong duality) If a linear programming problem has an optimal solution, so does its dual, and the respective optimal costs are equal.

Theorem 3.5 (Complementary slackness) Let xx and p\mathbf{p} be feasible solutions to the primal and the dual problem, respectively. The vectors x\mathbf{x} and p\mathbf{p} are optimal solutions for the two respective problems if and only if:

pi(aixbi)=0,i(cjpAj)xj=0,j\begin{aligned} p_{i}\left(\mathbf{a}_{i}^{\prime} \mathbf{x}-b_{i}\right)=0, & & \forall i \\ \left(c_{j}-\mathbf{p}^{\prime} \mathbf{A}_{j}\right) x_{j}=0, & & \forall j \end{aligned}

Dual Simplex Method

An iteration of the dual simplex method

  1. A typical iteration starts with the tableau associated with a basis matrix B\mathbf{B} and with all reduced costs nonnegative.
  2. Examine the components of the vector B1b\mathbf{B}^{-1} \mathbf{b} in the zeroth column of the tableau. If they are all nonnegative, we have an optimal basic feasible solution and the algorithm terminates; else, choose some \ell such that xB()<0x_{B(\ell)}<0.
  3. Consider the \ell th row of the tableau, with elements xB(),v1,x_{B(\ell)}, v_{1}, \ldots, vnv_{n} (the pivot row). If vi0v_{i} \geq 0 for all ii, then the optimal dual cost is ++\infty and the algorithm terminates.
  4. For each ii such that vi<0v_{i}<0, compute the ratio cˉi/vi\bar{c}_{i} /\left|v_{i}\right| and let jj be the index of a column that corresponds to the smallest ratio. The column AB()\mathbf{A}_{B(\ell)} exits the basis and the column Aj\mathbf{A}_{j} takes its place.
  5. Add to each row of the tableau a multiple of the \ell th row (the pivot row) so that vjv_{j} (the pivot element) becomes 1 and all other entries of the pivot column become 0 .

Interior Point Method

Intro

There are three types of Interior Point Methods.

  • Affine Scaling Algorithm
  • Potential Reduction
  • Path Following ✌
    • Primal -> Barrier Method
    • Primal-Dual -> KKT System

In this chapter, we will consider the linear programming problem in standard form.

mincTxs.tAx=bx>0\begin{align*} \min\, &c^Tx\\ \textrm{s.t}\, &Ax=b\\ &x>0 \end{align*}

Definition 4.1 Given polyhedron P={xAx=b,x0}P=\{x|Ax=b, x\ge 0\}, the interior point of PP is defined as {xPx>0}\{x\in P|x>0\}.

Primal Method

If we only consider the primal problem, we could integrates inequality constraints to the objective function using a logarithmic barrier ϕ(x)=i=1nlog(xi)\phi(x) = \sum_{i=1}^n log(x_i).

Definition 4.2 (Barrier Method) To solve the LP problem in standard form, we introduce a variable μ>0\mu > 0, and the problem can be reformulated as (Pμ)(P_\mu),

mincTxμi=1nlog(xi)s.tAx=b\begin{align*} \min &\quad c^Tx-\mu\sum^n_{i=1}log(x_i)\\ \textrm{s.t}&\quad Ax=b \end{align*}

μ>0\forall \mu > 0, \exist unique x(μ)x(\mu) that solves (Pμ)(P_\mu). When μ0\mu\downarrow0, limμ0x(μ)=x\lim_{\mu\rightarrow0}x(\mu) = x^*.

Primal-Dual Method

Intro

If we consider both the primal and dual problem, we could solve the KKT condition of the LP problem. Let us start with the KKT condition of the standard form.

{Ax=bAy+s=cxisi=0i=1,,2,n(x,s)>0\left\{\begin{array}{l} A x=b \\ A^{\top} y+s=c \\ x_{i} s_{i}=0 \quad i=1, \cdots, 2, n \\ (x, s)>0 \end{array}\right.

Define the non-linear system,

F(x,y,s)=(AxbAT+scxs)=0F(x, y, s)= \begin{pmatrix} Ax-b\\ A^T+s-c\\ x\circ s \end{pmatrix} = 0

Intuitively, we will use Newton method to solve non-linear systems. But here, we will first define a feasible set F={(x,y,s)Ax=b,ATy+s=c,(x,s)>0}\mathcal{F}=\{(x, y, s)|Ax=b, A^Ty+s=c, (x, s)>0\}, and a central path (x(μ),y(μ),s(μ))F(x(\mu), y(\mu), s(\mu))\in \mathcal{F}, which converges to F(x,y,s)=0F(x, y, s)=0. The central path is given by,

F(x(μ),y(μ),s(μ))=(AxbAT+scxsμ)=0F(x(\mu), y(\mu), s(\mu))= \begin{pmatrix}Ax-b\\ A^T+s-c\\ x\circ s - \mu \end{pmatrix} = 0

Definition 4.3 (Primal-Dual Method) Given the above central path. μ>0\forall \mu > 0, \exist unique (x(μ),y(μ),s(μ))F(x(\mu), y(\mu), s(\mu))\in \mathcal{F} that solves F(x(μ),y(μ),s(μ))=0F(x(\mu), y(\mu), s(\mu))=0. When μ0\mu\downarrow0, limμ0(x(μ),y(μ),s(μ))=(x,y,s)\lim_{\mu\rightarrow0}(x(\mu), y(\mu), s(\mu)) = (x^*, y^*, s^*).

Lemma 9.5 (of Bertsimas) If x=x(μ)x^*=x(\mu), y=y(μ)y^*=y(\mu) and s=s(μ)s^*=s(\mu) satisfy condition F(x(μ),y(μ),s(μ))=0F(x(\mu), y(\mu), s(\mu))=0, then they are also optimal solution to mincTxμi=1nlog(xi)s.tAx=b\min \, c^Tx-\mu\sum^n_{i=1}log(x_i) \quad \textrm{s.t}\, Ax=b.

Proof: Let xx be an arbitrary vector that satisfies x0x\ge0 and Ax=bAx=b. The barrier function will be,

Bμ(x)=cTxμi=1nlog(xi)=cTx+yT(bAx)μi=1nlog(xi)=(cATy)Tx+yTbμi=1nlog(xi)=sTx+yTbμi=1nlog(xi)\begin{align*} B_\mu(x) &= c^Tx-\mu\sum^n_{i=1}log(x_i)\\ &= c^Tx+y^{*T}(b-Ax)-\mu\sum^n_{i=1}log(x_i)\\ &= (c-A^Ty^*)^Tx+y^{*T}b-\mu\sum^n_{i=1}log(x_i)\\ &= s^{*T}x+y^{*T}b-\mu\sum^n_{i=1}log(x_i)\\ \end{align*}

Take out sTxμi=1nlog(xi)s^{*T}x-\mu\sum^n_{i=1}log(x_i), and set the derivative to zero, we have,

siμxi=0s_i^*-\mu x_i=0

Replacing xi=siμx_i = \dfrac{s_i^*}{\mu}, we will have,

Bμ(x)nμ+yTbμi=1nlog(siμ)B_\mu(x)\ge n\mu+y^{*T}b-\mu\sum^n_{i=1}log(\dfrac{s_i^*}{\mu})

This means x=sμx^*=\dfrac{s^*}{\mu} is the optimal solution to the barrier function, Bμ(x)Bμ(x)B_\mu(x^*)\le B_\mu(x).

You may also derive the KKT condition for log-barrier method (Pμ)(P_\mu), which is equivalent to the KKT condition in central path.

Recap: Newton’s Method

Recall that we can solve a non-linear system with Newton-Raphson method.

f(xk+dk)f(xk)+f(xk)dk=0xk+1xk+dk=xkf(xk)f(xk)f(x_k+d_k) \approx f(x_k) + f'(x_k) d_k = 0\\ x_{k+1} \leftarrow x_k+d_k = x_k - \dfrac{f(x_k)}{f'(x_k)}

And for multivariable cases,

F(xk+dk)F(xk)+Jf(xk)dk=0xk+1xk+dk=xk[JF(xk)]1F(xk)F(x_k+d_k) \approx F(x_k) + Jf(x_k)d_k=0\\ x_{k+1}\leftarrow x_k+d_k=x_k-[JF(x_k)]^{-1}F(x_k)

When using Newton-Raphson method, we must follow

  • JF(xk)JF(x_k) must be non-singular
  • step-length=1\textrm{step-length}=1
  • Fast local convergence (super linear), but no global convergence guaranteed

Theorem 4.1 (Convergence of Newton Method) Newton method has super linear convergence, i.e xk+1x2Mxkx22\|x_{k+1}-x_*\|_2\le M\|x_k-x_*\|^2_2.

Proof We first define g(x)=xf(x)f(x)g(x) = x-\dfrac{f(x)}{f'(x)}. It is clear that g(x)=f(x)f(x)[f(x)]2g'(x) = \dfrac{f(x)f''(x)}{[f'(x)]^2} and g(x)=0g'(x_*)=0. Then, we can find that,

xk+1x=xkf(xk)f(xk)x=(xkf(xk)f(xk))(xf(x)f(x))=g(xk)g(x)=g(ξ)(xkx)=g(ξ)g(x)xkxg(η)xkx2Mxkx2\begin{align*} \|x_{k+1}-x_*\| &= \|x_k-\dfrac{f(x_k)}{f'(x_k)} - x_*\|\\ &= \|(x_k-\dfrac{f(x_k)}{f'(x_k)}) - (x_*-\dfrac{f(x_*)}{f'(x_*)})\|\\ &= \|g(x_k)-g(x_*)\|\\ &= \|g'(\xi)(x_k-x_*)\|\\ &= |g'(\xi)-g'(x_*)|\cdot\|x_k-x_*\|\\ &\le |g''(\eta)|\cdot\|x_k-x_*\|^2\\ &\le M\cdot\|x_k-x_*\|^2 \end{align*}

This method is local quadratically convergent to a solution.

We can derive one Newton step as follows.

(xk+1yk+1sk+1)(xkyksk)(ΔxkΔykΔsk)\begin{pmatrix} x_{k+1}\\ y_{k+1}\\ s_{k+1} \end{pmatrix}\leftarrow \begin{pmatrix} x_{k}\\ y_{k}\\ s_{k} \end{pmatrix} - \begin{pmatrix} \Delta x_{k}\\ \Delta y_{k}\\ \Delta s_{k} \end{pmatrix}

where

(ΔxkΔykΔsk)=[JF(xk,yk,sk)]1F(xk,yk,sk)\begin{pmatrix} \Delta x_{k}\\ \Delta y_{k}\\ \Delta s_{k} \end{pmatrix} = [JF(x^k, y^k, s^k)]^{-1} F(x^k, y^k, s^k)

is the solution to the following linear equations.

(0ATIA00S0X)(ΔxkΔykΔsk)=(ATy+scAxbsx)\begin{pmatrix} 0 & A^T & I\\ A & 0 & 0\\ S & 0 & X \end{pmatrix} \begin{pmatrix} \Delta x_{k}\\ \Delta y_{k}\\ \Delta s_{k} \end{pmatrix} =\begin{pmatrix} A^Ty+s-c\\ Ax-b\\ s\circ x \end{pmatrix}
Practical Newton Method in IPM

Note that the Newton method doesn’t guarantee global convergence, we need to avoid a single sixi<0s_ix_i<0. Otherwise, (xk,yk,sk)(x_k, y_k, s_k) will fall out of the feasible set F={(x,y,s)Ax=b,ATy+s=c,x,s>0}\mathcal{F}=\{(x, y, s)|Ax=b, A^Ty+s=c, x, s>0\}.

4 1

Therefore, we will introduce an intermediate variable μ\mu to keep both optimality and centrality, and solve sx=μs\circ x=\mu instead of sx=0s\circ x=0.

Primal-Dual IPM-Naive:

Initial state:

(x0,y0,s0)F0τ=σμσ(0,1)μ=1nxs\left(x^{0}, y^{0}, s^{0}\right) \in F_{0} \quad \tau=\sigma \mu \quad \sigma \in(0,1) \quad \mu=\frac{1}{n} x^{\top} s

For k=0,1,k=0,1, \cdots

​ Use Newton’s Method with starting point (xk,yk,sk)\left(x^{k}, y^{k}, s^{k}\right) to solve F(x,y,s)=(00σμke)F(x, y, s)=\left(\begin{array}{c}0 \\ 0 \\ \sigma \mu_{k} e\end{array}\right) and get,

(xk+1,yk+1,sk+1)F0\left(x^{k+1}, y^{k+1}, s^{k+1}\right) \in F^{0}

end

However, there is no necessity to compute the full Newton iteration. We can compute one Newton step and change the direction.

Primal-Dual IPM-Practical

(Only one iteration for inner Newton step.)

Initial state:

(x0,y0,s0),(x0,s0)>0\left(x^{0}, y^{0}, s^{0}\right),\left(x^{0}, s^{0}\right)>0

For k=0,1,2,k=0,1,2, \cdots

​ Choose σk[0,1]\sigma_{k} \in[0,1], μk=1nxTs\mu_{k}=\dfrac{1}{n} x^T s

​ Find the Newton step of

F~k(x,y,s)=(Ay+scAxbxsσkμke)=0\widetilde{F}_{k}(x, y, s)=\begin{pmatrix} A^{\top} y+s-c \\ A x-b \\ x \circ s-\sigma_k \cdot \mu_{k} \cdot e \end{pmatrix}=0

​ with the linear equations,

(0AIA00Sk0Xk)(ΔxkΔykΔsk)=(Ay+scAxbxsσkμke)\begin{pmatrix} 0 & A^{\top} & I \\ A & 0 & 0 \\ S^{k} & 0 & X^{k} \end{pmatrix} \begin{pmatrix} \Delta x^{k} \\ \Delta y^{k} \\ \Delta s^{k} \end{pmatrix}=\begin{pmatrix} A^{\top} y+s-c \\ A x-b \\ x \circ s -\sigma_k \cdot \mu_{k} \cdot e \end{pmatrix}

​ Find step-size αk\alpha_k, subject to (xk+1,sk+1)>0(x_{k+1}, s_{k+1})>0

(xk+1,yk+1,sk+1)(xk,yk,sk)+αk(Δxk,Δyk,Δsk)\left(x_{k+1}, y_{k+1}, s_{k+1}\right)\leftarrow\left(x_{k}, y_{k}, s_{k}\right)+\alpha_{k}\left(\Delta x_{k}, \Delta y_{k}, \Delta s_{k}\right)

End

The following graph shows the path of convergence. If σ=1\sigma=1, τ=μ\tau=\mu, the central path converges to μ\mu. If σ=0\sigma=0, τ=0\tau=0, the central path may not converge. We will discuss how to find σk\sigma_k and αk\alpha_k in the following section.

4 2
Path-Following Methods

For the sake of simplicity, we will define two types of neighborhoods.

N2(θ)={(x,y,s)Fxsμeθμ}N(γ)={(x,y,s)Fxisiγμ,i=1,2,,n}\mathcal{N}_2(\theta)=\{(x, y, s)\in\mathcal{F}|\|x\circ s-\mu e\|\le\theta \mu\}\\ \mathcal{N}_{-\infty}(\gamma)=\{(x, y, s)\in\mathcal{F}|x_is_i\ge\gamma\mu, i=1,2,\cdots,n\}\\

We have different path-following routines. These routines will guarantee us the convergence.

  • Long-step Path-Following \rightarrow N\mathcal{N}_{-\infty}
  • Short-step Path-Following \rightarrow N2\mathcal{N}_{2}

Long-Step Path-Following IPM

Initial State:

​ Given γ(0,1)\gamma\in(0, 1), 0<σminσmax<10<\sigma_{min}\le\sigma_{max}<1, (x0,y0,s0)N(γ)(x^0, y^0, s^0) \in \mathcal{N}_{-\infty}(\gamma)

For k=0,1,k=0, 1, \cdots

​ Choose σk[σmin,σmax]\sigma_k\in[\sigma_{min}, \sigma_{max}], μk=1nxTs\mu_{k}=\dfrac{1}{n} x^T s

(0AIA00Sk0Xk)(ΔxkΔykΔsk)=(Ay+scAxbxsσkμke)\begin{pmatrix} 0 & A^{\top} & I \\ A & 0 & 0 \\ S^{k} & 0 & X^{k} \end{pmatrix} \begin{pmatrix} \Delta x^{k} \\ \Delta y^{k} \\ \Delta s^{k} \end{pmatrix}=\begin{pmatrix} A^{\top} y+s-c \\ A x-b \\ x \circ s -\sigma_k \cdot \mu_{k} \cdot e \end{pmatrix}

​ Choose αk\alpha_k as the largest value of α\alpha such that (xk(α),yk(α),sk(α))N(γ)(x^k(\alpha), y^k(\alpha), s^k(\alpha))\in \mathcal{N}_{-\infty}(\gamma), i.e.

xk(α):=xk+αΔxkyk(α):=yk+αΔyksk(α):=sk+αΔskx^k(\alpha) := x^k+\alpha\Delta x^k\\ y^k(\alpha) := y^k+\alpha\Delta y^k\\ s^k(\alpha) := s^k+\alpha\Delta s^k\\

​ We have μk(α)=1nxk(α)Tsk(α)\mu^k(\alpha)=\dfrac{1}{n}x^k(\alpha)^Ts^k(\alpha), and xik(α)sik(α)γμk(α)x_i^k(\alpha) s_i^k(\alpha)\ge\gamma\mu^k(\alpha).

​ Set (xk+1,yk+1,sk+1)=(xk,yk,sk)+αk(Δxk,Δyk,Δsk)(x^{k+1}, y^{k+1}, s^{k+1}) = (x^k, y^k, s^k) + \alpha_k (\Delta x^k, \Delta y^k, \Delta s^k)

However, there are several concerns about this method.

  • Can we find positive αk\alpha_k?
  • Can we reduce μk\mu^k?

Both answers are YES. We will go through Lemma 4.1 and Theorem 4.2 to prove this.

Lemma 4.1 (Lemma 14.2 in Numerical Optimization) If (x,y,s)N(γ)(x, y, s) \in \mathcal{N}_{-\infty}(\gamma), then, ΔXΔSe23/2(1+1γ)nμ\|\Delta X\Delta S e\|\le2^{-3/2}(1+\dfrac{1}{\gamma})n\mu.

Proof

Note that

UVe23/2u+v22\|UVe\|\le2^{-3/2}\|u+v\|^2_2

Define D=X1/2S1/2D=X^{1/2} S^{-1/2}. We can also derive D1Δx+DΔs=(XS)1/2(XSe+σμe)D^{-1}\Delta x+D\Delta s=(XS)^{-1/2}(-XSe+\sigma \mu e).

ΔXΔSe=(D1ΔX)(DΔS)e23/2D1Δx+DΔs2=23/2(XS)1/2(XSe+σμe)2=23/2(<e,XSe>2σμeTe+σ2μ2i=1n1xisi)23/2(xTs2nσμ+σ2μ2nγμ)=23/2[12σ+σ2γ]nμ23/2(1+1/γ)nμ\begin{align*} \|\Delta X \Delta S e\| &= \|(D^{-1}\Delta X)(D\Delta S)e\|\\ &\le 2^{-3/2}\|D^{-1}\Delta x+D\Delta s\|^2\\ &= 2^{-3/2}\|(XS)^{-1/2}(-XSe+\sigma \mu e)\|^2\\ &= 2^{-3/2}(\left<e, XSe\right>-2\sigma\mu e^Te+\sigma^2\mu^2\sum^n_{i=1}\dfrac{1}{x_is_i})\\ &\le 2^{-3/2}(x^Ts-2n\sigma\mu+\sigma^2\mu^2\dfrac{n}{\gamma\mu})\\ &=2^{-3/2}\left[1-2\sigma+\dfrac{\sigma^2}{\gamma}\right]n\mu\\ &\le 2^{-3/2}(1+1/\gamma)n\mu \end{align*}

We can also derive that ΔxikΔsikΔxikΔsikΔXkΔSke223/2(1+1/γ)nμ\Delta x_i^k\Delta s_i^k\ge-|\Delta x_i^k\Delta s_i^k|\ge-\|\Delta X^k\Delta S^ke\|_2\ge-2^{-3/2}(1+1/\gamma)n\mu.

Theorem 4.2 (Theorem 14.3 in Numerical Optimization) Given γ\gamma, σmin\sigma_{min}, σmax\sigma_{max} in algorithm, δ>0\exist \delta>0 independent of nn, such that

μk+1(1δn)μk\mu_{k+1}\le (1-\dfrac{\delta}{n})\mu_k

Proof

We will first prove that (xk(α),yk(α),sk(α))N(γ)(x^k(\alpha), y^k(\alpha), s^k(\alpha)) \in \mathcal{N}_{-\infty}(\gamma), α[0,23/2γ1γ1+γσkn]\forall \alpha \in\left[0, 2^{3/2}\gamma\dfrac{1-\gamma}{1+\gamma}\dfrac{\sigma_k}{n}\right].

This is equivalent to proving α[0,23/2γ1γ1+γσkn]\forall \alpha \in\left[0, 2^{3/2}\gamma\dfrac{1-\gamma}{1+\gamma}\dfrac{\sigma_k}{n}\right]

sik(α)xik(α)γxk(α)Tsik(α)ns_i^k(\alpha)x_i^k(\alpha)\ge\gamma\dfrac{x^k(\alpha)^Ts^k_i(\alpha)}{n}

From the Newton step equations we can derive sikΔxik+xikΔsik=xiksik+σμks_i^k\Delta x_i^k+x_i^k\Delta s_i^k=-x_i^ks_i^k+\sigma \mu_k. And since ΔxikΔsik23/2(1+1/γ)nμ\Delta x_i^k\Delta s_i^k\ge-2^{-3/2}(1+1/\gamma)n\mu.

xik(α)sik(α)=(xik+αΔxik)(sik+αΔsik)=xiksik+α(sikΔxik+xikΔsik)+α2ΔxikΔsik=(1α)xiksik+ασkμk+α2ΔxikΔsikγ(1α)μk+ασkμkα223/2(1+1/γ)nμk\begin{align*} x_i^k(\alpha)s_i^k(\alpha)&=(x_i^k+\alpha\Delta x_i^k)(s_i^k+\alpha\Delta s_i^k)\\ &=x_i^ks_i^k+\alpha(s_i^k\Delta x_i^k+x_i^k\Delta s_i^k)+\alpha^2\Delta x_i^k\Delta s_i^k\\ &=(1-\alpha)x_i^ks_i^k+\alpha\sigma_k\mu_k+\alpha^2\Delta x_i^k\Delta s_i^k\\ &\ge \gamma(1-\alpha)\mu_k+\alpha\sigma_k\mu_k-\alpha^22^{-3/2}(1+1/\gamma)n\mu_k \end{align*}

We also have ΔxTΔs=0\Delta x^T \Delta s=0, and

μk(α)=(x+αΔx)T(s+αΔs)n=μkα(μkσμk)=(1α+ασ)μk\begin{align*} \mu_k(\alpha)&=\dfrac{(x+\alpha \Delta x)^T(s+\alpha \Delta s)}{n}\\ &=\mu_k-\alpha(\mu_k-\sigma\mu_k)\\ &=(1-\alpha+\alpha\sigma)\mu_k \end{align*}

To satisfy xik(α)sik(α)γμk(α)x_i^k(\alpha)s_i^k(\alpha)\ge\gamma\mu_k(\alpha), we have to let

γ(1α)μk+ασkμkα223/2(1+1/γ)nμkγ(1α+ασ)μk\gamma(1-\alpha)\mu_k+\alpha\sigma_k\mu_k-\alpha^22^{-3/2}(1+1/\gamma)n\mu_k\ge \gamma(1-\alpha+\alpha\sigma)\mu_k

Rearranging this expression, we obtain

ασkμk(1γ)α223/2nμk(1+1/γ) \alpha\sigma_k\mu_k(1-\gamma)\ge\alpha^22^{-3/2}n\mu_k(1+1/\gamma)

which is

α23/2nσkγ1γ1+γ\alpha\le\dfrac{2^{3/2}}{n}\sigma_k\gamma\dfrac{1-\gamma}{1+\gamma}

We choose α(0,23/2nσkγ1γ1+γ]\alpha\in\left(0, \dfrac{2^{3/2}}{n}\sigma_k\gamma\dfrac{1-\gamma}{1+\gamma}\right]

μk+1=(1αk(1σk))μk(123/2nσk(1σk)γ1γ1+γ)μk\begin{align*} \mu_{k+1} &= (1-\alpha_k(1-\sigma_k))\mu_k\\ &\le (1-\dfrac{2^{3/2}}{n}\sigma_k(1-\sigma_k)\gamma\dfrac{1-\gamma}{1+\gamma})\mu_k \end{align*}

If δ=maxσk{23/2σk(1σk)γ1γ1+γ}\delta=\max_{\sigma_k}\left\{2^{3/2}\sigma_k(1-\sigma_k)\gamma\dfrac{1-\gamma}{1+\gamma}\right\}, we will prove that μk+1(1δn)μk\mu_{k+1}\le(1-\dfrac{\delta}{n})\mu_k.#

This means we only need O(nlog1/ϵ)O(n\log1/\epsilon) steps to reduce μk\mu_k to ϵμ0\epsilon\cdot\mu_0.

Mehrotra Algorithm (1990s)

Profile picture

Edited by Daniel Shao. Homepage