avatar

Hongyi Li

I am a PhD student in the University of Macau, supervised by Prof Hongcai Zhang and Prof Yonghua Song. As a participant of UM-Imperial College London 1+3 PhD program, I am currently studying at Imperial as an MSc student. At Imperial, I work closely with Dr Papadaskalopoulos Dimitrios. My research interest includes smart grid technologies, consensus-based coordination and distributed optimization.

Linear Programming

I am learning the book “Optimization Models” recently and I see a blogger recommends Prof. Shu-Cheng Fang’s course “Linear Programming” as a supplement. The OCW course video is found in Bilibili.


Lecture 0 - Lecture 2: Introduction, Perliminaries

In linear programming (LP), we established a LP model to solve a realistic problem (in approximate way). To establish the LP model, the steps is as followed:

  1. What are the variables to be involved? (This is the first thing we should think about)
  2. What’s the objective function?
  3. How are the variables constrained? (The order of step 2 and 3 is not fixed, since they are independent)

Standard Form of LP Model

Including:

  • n variables
  • 1 objective function
  • m constraints
  • non-negative variables

Explicit Form

Minimize  z=c1x1+c2x2++cnxnsubject toa11x1+a12x2++a1nxn=b1a21x1+a22x2++a2nxn=b2am1x1+am2x2++amnxn=bmx10,  x20,  ,,  xn0\begin{aligned} & \text{Minimize} &\;\bold{z}=c_1x_1+c_2x_2+\cdots+c_nx_n\\ & \text{subject to} &\\ & & a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n=b_1\\ & & a_{21}x_1+a_{22}x_2+\cdots+a_{2n}x_n=b_2\\ & & \vdots\\ & & a_{m1}x_1+a_{m2}x_2+\cdots+a_{mn}x_n=b_m\\ & & x_1\geq0,\;x_2\geq0,\;,\cdots,\;x_n\geq0 \end{aligned}

  • Minimizing one objective function.
  • Equality constrains.
  • Non-negative variables.

Matrix Form

Min  cTxs.t.  AX=bx0\begin{aligned} && \text{Min} \; \bold{c}^T\bold{x}\\ && \text{s.t.} \; \bold{AX}=\bold{b}\\ && \bold{x}\geq0 \end{aligned}

in which, c=[c1c2cn]\bold{c}=\left[\begin{array}{c}c_1\\c_2\\\vdots\\c_n\end{array}\right] is the cost vector, x=[x1x2xn]\bold{x}=\left[\begin{array}{c}x_1\\x_2\\\vdots\\x_n\end{array}\right] is the solution vector, b=[b1b2bn]\bold{b}=\left[\begin{array}{c}b_1\\b_2\\\vdots\\b_n\end{array}\right] is the right-hand-side vector and A=[a11a12a1na21a22a12nan1an2ann]\bold{A}=\left[\begin{array}{cccc}a_{11}&a_{12}&\cdots&a_{1n}\\a_{21}&a_{22}&\cdots&a_{12n}\\\vdots&\vdots&\ddots&\vdots\\a_{n1}&a_{n2}&\cdots&a_{nn}\\\end{array}\right]is the constrain matrix.

Embedded Assumptions in LP

  1. Proportionality Assumption

    • No discount
    • No economy of return to scale
  2. Additivity Assumption

    • Total contribution = Sum of contributions of individual variables
  3. Divisibility Assumption

    • Any fractional value is allowed
  4. Certainty Assumption

    • Each parameter is know for sure

Converting to Standard LP

Rule 1: Unrestricted (free) variables

For any xiRx_i\in\bold{R}, we can divide it into a positive component xi+x_i^+ and a negative component xix_i^-, which satisfy

xi+={xi+,  if  xi00,  otherwisex_i^+= \left\{ \begin{aligned} &x_i^+,&\;\text{if}\;x_i\geq0\\ &0,&\;\text{otherwise} \end{aligned} \right.

xi={0,  if  xi0xi,  otherwisex_i^-= \left\{ \begin{aligned} &0,&\;\text{if}\;x_i\geq0\\ &x_i^-,&\;\text{otherwise} \end{aligned} \right.

Therefore, xix_i can be written as xi=xi++xix_i=x_i^++x_i^-, in which xi+,  xi0x_i^+,\;x_i^-\geq0. If we have an absolute valve xi|x_i| in the LP model, it can be displaced by xi=xi++xi|x_i|=x_i^++x_i^-.
Notice: Such decomposition introduces an extra constraint: xi+×xi=0x_i^+\times x_i^-=0, which is important by usually ignored. This second-ordered constraint makes sure the uniqueness of the solution.

Rule 2: Inequality constraints

For

a11x1+a12x2++a1nxnb1a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n\leq b_1

we introduce a slack variable sis_i and the original inequality constraint can be transformed to

a11x1+a12x2++a1nxn+si=b1a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n+s_i=b_1

in which si0s_i\geq0.

Similarly, for

a11x1+a12x2++a1nxnb1a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n\geq b_1

we introduce an excess variable eie_i and the original inequality constraint can be transformed to

a11x1+a12x2++a1nxnei=b1a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n-e_i=b_1

in which ei0e_i\geq0.

Rule 3: Minimization of the Objective Funtion

Max  cT=Min(cTx)\boldsymbol{Max\;c^T=-Min(-c^Tx)}

Potential Problem caused by Standardizing

  1. One quadratic constraint xi+×xi=0x_i^+\times x_i^-=0 is missing in the LP model
  2. Dimensionality increased
  3. One original solution corresponds to many new solutions
  4. Since x|x| is a convex function while x-|x| is a concave function, maximize cxc|x| can only be solved when c is negative, or the solution lies on the infinite.

Lecture 3: Geometry of LP

Terminologies

Baseline model:

Min    cTxs.t.        Ax=bx0\begin{aligned} Min \;\;\bold{c^Tx} &\\ s.t. \;\;\;\;\bold{Ax=b} &\\ \bold{x}\geq0 & \end{aligned}

Feasible domain

P={xRnAx=b,x0}P=\{\bold{x\in R^n|Ax=b,x\geq0}\}

Feasible solution

If xP\bold{x}\in P, then x is a feasible solution.

Consistency

When PϕP\neq\phi, LP is consistent.

Background knowledge

Definition of hyperplane

Each equality constraint in the standard form LP is a “hyperplane” in the solution space. In the 2-D space, it is a line. In 3-D space, it is a plane.

For a vector aRn,a0\bold{a}\in\bold{R}^n, a\neq0 and a scaler βR\beta\in\bold{R}

H={xRnaTx=β}H=\{\bold{x}\in\bold{R}^n|\bold{a}^T\bold{x}=\beta\}

is defined as a hyperplane, which divides the solution space into 3 parts, the open upper-half space Hui\boldsymbol{H}_u^i (in which aTx>β\bold{a}^T\bold{x}\gt\beta), the hyperplane and the open lower-half space Hli\boldsymbol{H}_l^i (in which aTx<β\bold{a}^T\bold{x}<\beta). The upper-half space Hu=Hui+H\boldsymbol{H}_u=\boldsymbol{H}_u^i+\boldsymbol{H}, similarly for Hl\boldsymbol{H}_l. H\boldsymbol{H} is also called the bounded hyperplane of Hui\boldsymbol{H}_u^i and Hli\boldsymbol{H}_l^i.

The vector a is the normal of the hyperplane and it points to the upper-half in the direction which increases $(in which aTx\bold{a}^T\bold{x} fastest.

Properties of hyperplanes

Property 1: The normal vector a is orthogonal to all vectors in the hyperplane H.

Property 2: The normal vector is directed toward the upper half space.

Properties of feasible solution set

Property 3: The feasible domain of a standard form LP

P={xRnAx=b,x0}P=\{\bold{x\in R^n|Ax=b,x\geq0}\}

is a polyhedral set.

A polyhedral set or polyhedron is a set formed by the intersection of a finite number of closed half spaces. If it is nonempty or bounded, it is a polytope.

Properties of optimal solutions

Property 4:if P0P\neq0 and βR\exist\beta\in\bold{R} such that

PHL:={xRncTxβ}P\subset H_L:=\{\bold{x\in R^n|-c^Tx\leq\beta}\}

then minxPcTxβ\min\limits_{x\in P}\bold{c^Tx\geq-\beta}.

Moreover, if xPH\bold{x^*}\in P\cap H, then xP\bold{x^*}\in P^*

Graphic method

Step 1:

Draw the feasible domain PP

Step 2:

Use c-\bold{c} as normal vector at each vertex to see if PHL:={xRncTxβ}P\subset H_L:=\{\bold{x\in R^n|-c^Tx\leq\beta}\} for some βR\beta\in \bold{R}. If yes, then we found the optimal solution.

  • Advantage: Intuitional / Geometrically simple
  • Disadvantage: Impractical for large LP problems / Algebraically difficult

Fundamental theorem of LP

Background knowledge

Definition: Let x1,x2,,xpRn,  λ1,λ2,,λpRn\bold{x^1,x^2,\dotsb,x^p\in R^n,\; \lambda_1,\lambda_2,\dotsb,\lambda_p\in R^n}, for

x=i=1pλixi\bold{x}=\sum_{i=1}^{p}\bold{\lambda_ix^i}

we say x is a linear combination of {x1,,xp}\{\bold{x^1,\dotsb,x^p}\}.

  • If i=1pλi=1\sum_{i=1}^{p}\lambda_i=1, w say x is an affine combination of {x1,,xp}\{\bold{x^1,\dotsb,x^p}\}.
    If the affine combination of any two points of S falls in S, then S is an affine set.
  • If λi0\lambda_i\geq0, we say x is a conic combination of {x1,,xp}\{\bold{x^1,\dotsb,x^p}\}.
    If λxS\lambda\bold{x}\in S for all xS\bold{x}\in S and λ0\lambda\geq0, then S is a cone.
  • If i=1pλi=1,  λi0\sum_{i=1}^{p}\lambda_i=1,\;\lambda_i\geq0, w say x is an convex combination of {x1,,xp}\{\bold{x^1,\dotsb,x^p}\}.
    If the convex combination of ant two points of S falls in S, then S is a convex set.

The geometric meaning of the feasible domain

P={xRnAx=b,x0}P=\{\bold{x\in R^n|Ax=b,x\geq0}\}

  1. P is a polyhedral set.
  2. P is a convex set
  3. P is the intersection of m hyperplanes and the cone of the first orthant.

Seeing Ax=b\bold{Ax=b} in the column picture (see MIT 1806), if the vector b falls in the cone generated by the columns of constraint matrix A.

Interior and boundary points

Definition: Given a set SRnS\in\bold{R^n}, a point xS\bold{x\in S} is an interior point of S if

  ϵ>0  such that the ball  B={yRn  xyϵ}S\exist\;\epsilon>0\;\text{such that the ball}\;\bold{B=\{y\in R^n|\;||x-y||\leq\epsilon\}\subset S}

called xint(S)\bold{x\in}int(S)

Otherwise, x is a boundary point of S, we say xbdry(S)\bold{x}\in bdry(S).

How to define a convex set S

  • For all interior point, the segment between any two interior points falls in the convex set S.
  • For \forall boundary point x, \exist a hyperplane H that xS\bold{x\in S} and S falls in HL  or  HU\bold{H_L}\;\text{or}\;\bold{H_U}.
  • For \forall outside point x, \exist a hyperplane H that x lies in HL\bold{H_L} and S falls in the HU\bold{H_U}, or vise versa.

Difference among boundary points

x is defined as an extreme point of a convex set S if x cannot be expressed as a convex combination of other points in S.

The extreme points are not exactly the same as the vertices according to their definition. But for the feasible domain P of an LP, it vertices are the extreme points.

Finding extreme points

Theorem: A point xP={xRnAx=b,x0}\bold{x\in}P=\{\bold{x\in R^n|Ax=b,x\geq0}\} is an extreme point of P if and only if the columns of A corresponding to the positive components of x are linearly independent.

Proof:
Without loss of generality, we may assume that the first p components of x are positive and the rest are zero, i.e.,

x=(xˉ0)  where  xˉ=(x1xp)  >0\bold{x=\left(\begin{array}{c} \bar{x}\\0 \end{array}\right)} \;\text{where}\; \bold{\bar{x}}=\left(\begin{array}{c} x_1\\\vdots\\x_p \end{array}\right)\;>0

also denote the first p column of A by Aˉ\bold{\bar{A}} then Ax=Aˉxˉ=b\bold{Ax=\bar{A}\bar{x}=b}.

Suppose that the columns of Aˉ\bold{\bar{A}} are not linearly independent, then   wˉ0\bold{\exist\;\bar{w}\neq0} such that Aˉwˉ=0\bold{\bar{A}\bar{w}=0}.

Notice that for ϵ\epsilon is small enough

xˉ±ϵwˉ  and  Aˉ(xˉ±ϵwˉ)=Aˉxˉ=b\bold{\bar{x}\plusmn\epsilon\bar{w}\;\text{and}\;\bar{A}(\bar{x}\plusmn\epsilon\bar{w})=\bar{A}\bar{x}=b}

Hence,

y1=(xˉ+ϵwˉ0)P\bold{y_1=\left(\begin{array}{c} \bar{x}+\epsilon\bar{w}\\0 \end{array}\right)\in \textit{P}}

y2=(xˉϵwˉ0)P\bold{y_2=\left(\begin{array}{c} \bar{x}-\epsilon\bar{w}\\0 \end{array}\right)\in \textit{P}}

and x=12y1+12y2\bold{x}=\frac{1}{2}y_1+\frac{1}{2}y_2, i.e. x cannot be a vertex (extreme point) of P.

Thus, x is an extreme point when the columns of Aˉ\bold{\bar{A}} are linearly indeoendent.

Suppose that x is not an extreme point, then x=λy1+λy2\bold{x=\lambda y_1+\lambda y_2} for some y1,  y2P,  y1y2  and  0<λ<1\bold{y_1,\;y_2\in P,\; y_1\neq y_2\;\text{and}\;}0<\lambda<1.

Since y10,  y20\bold{y_1\geq0,\;y_2\geq0} and 0<λ<10<\lambda<1, the last npn-p components of y1\bold{y_1} must be zero, i.e.

y1=(yˉ0)\bold{y_1=\left(\begin{array}{c} \bar{\bold{y}}\\0 \end{array}\right)}

Now

xy1=(xˉyˉ10)0\bold{x-y_1=\left(\begin{array}{c} \bar{\bold{x}}-\bar{\bold{y}}_1\\0 \end{array}\right)\neq0}

and A(xyˉ1)=AxAyˉ1=bb=0\bold{A(x-\bold{\bar{y}}_1)=Ax-A\bar{\bold{y}}_1=b-b=0}, which indicates that the columns of A are linearly dependent.

Thus, the theorem is proofed.

Managing extreme points algebraically

Full rank matrix: Let A be an m by n matrix with mnm\leq n, we say A has full rank (since mnm\leq n, full row rank) if A has m linearly independent columns.

Rearrange x as

x=(xBxN)basic variablsnon-basic variables        A=(B        N)        basis        non-basis\bold{ x=\left(\begin{array}{c} \bold{x_B}\\\bold{x_N} \end{array}\right) \begin{array}{c} - & \text{basic variabls}\\ - & \text{non-basic variables} \end{array} }\;\;\;\; \begin{aligned} \bold{A}= (&\bold{B}&|\text{\;\;\;\;}&\bold{N}&)\\ &\uparrow&\text{\;\;\;\;}& \uparrow&\\ &\text{basis}&\text{\;\;\;\;}&\text{non-basis}& \end{aligned}

  • If we set xN=0\bold{x_N}=0 and solve xB\bold{x_B} for Ax=BxB=b\bold{Ax=Bx_B=b} then x is a basic solution\underline{basic\ solution} (bs).
  • Furthermore, if xB0\bold{x_B}\geq0, then x is a basic feasible solution\underline{basic\ feasible\ solution}.

We can see, when A does not have full rank, then either

  • Ax = b has no solution\underline{no\ solution} and hence P=0P=0, or
  • some constraints are redundant (which can be avoided by preprocessing matrix A).

Thus,

  1. A point x in P is an extreme point of P if and only if x is a basic feasible solution corresponding to some basis B.
  2. The polyhedron P has only a finite number of extreme points (less than CmnC_m^n).

What do extreme points bring us?

When P={xRnAx=b,x0}P=\{\bold{x\in R^n|Ax=b,x\geq0}\} is a nonempty polytope, then any point in P can be represented as a convex combination of the extreme points of P.

If P is unbounded, there exists an extremal direction.

  • Definition: A vector d(0)Rn\bold{d(\neq0)\in R^n} is an extremal direction of P, if {xRnx=x0+λd, λ0}P\{\bold{x\in R^n|x=x_0+\lambda d,\ \lambda\geq0\}\subset }P for all x0P\bold{x_0}\in P.
  • d satisfies Ad=0\bold{Ad=0} and d0\bold{d\geq 0}.

Resolution theorem

Let V={viRniI}\bold{V = \{v_i\in R^n|}i\in I\} be a set of extreme points of P, then for xP\forall x \in P

x=iIλivi+d\bold{x}=\sum\limits_{i\in I} \lambda_iv_i+\bold{d}

where

iIλi=1, λi0, iI.\sum\limits_{i\in I}\lambda_i=1,\ \lambda_i\geq0,\ \forall i\in I.

and either d=0\bold{d=0} or d is an extremal direction of P

Fundamental theorem of LP

For a standard form LP, if its feasible domain P is nonempty, then the optimal objective value of z=cTx over P\bold{z=c^Tx}\ over\ P is either unbounded below, or it is attained at (at least) an extreme point of P.

Lecture 4: Simplex Method

Basic Idea:

  • Phase 1:

    • Step 1 (Starting): Find an initial extreme point (ep) or declare P in null.
  • Phase 2:

    • Step 2 (Checking optimality): if the current eo is optimal, STOP.
    • Step 3 (Pivoting): Move to a better ep, then return to Step 2.

If we do not repeat using the same extreme point, then the algorithm will always terminates in finite number of iterations.

But how to efficiently generate better extreme points?

In the algebra term, we just need to use basic feasible solution to replace extreme point above.

If there exists at least one basic variable becoming zero, the extreme point may correspond to more than one basic feasible solution, in this case we say the bfs is a degenerate one.

Nondegeneracy

Property 1:

If a bfs x is nondegenerate, then x is uniquely determined by n hyperplanes.

Let M=[BN0I]\bold{M=}\left[\begin{array}{cc} \bold{B} & \bold{N}\\ \bold0 & \bold{I} \end{array}\right], M is a nonsingular matrix.

x is uniquely determined by n linearly independent hyperplanes since

Mx=[BN0I][xBxN]=[b0]\bold{Mx}=\left[\begin{array}{cc} \bold{B} & \bold{N}\\ \bold{0} & \bold{I} \end{array}\right]\left[\begin{array}{c} \bold{x_B}\\ \bold{x_N} \end{array}\right]=\left[\begin{array}{c} \bold{b}\\ \bold{0} \end{array}\right]

x=[xBxN]=M1[b0]\bold{x}=\left[\begin{array}{c} \bold{x_B}\\\bold{x_N} \end{array}\right]=\bold{M^{-1}}\left[\begin{array}{c} \bold{b}\\ \bold{0} \end{array}\right]

in which,

M1=[B1B1N0I]\bold{M^{-1}}=\left[\begin{array}{cc} \bold{B^{-1}} & \bold{-B^{-1}N}\\ \bold{0} & \bold{I} \end{array}\right]

We call M1\bold{M^{-1}} (or M) the fundamental matrix of LP.

Property 2:

If a bfs x is degenerate, then x is overdetermined by more than n hyperplanes.

Other than n hyperplanes determined by non-basic variables, there exists at least one basic variable that xi=0x_i=0, which indicates an extra hyperplane.

Simplex method under nondegeneracy

Basic idea:

Instead of considering all bfs at the same time, wo just consider some neighboring bfs, moving from one bfs to another with a simple pivoting scheme.

Definition

Two basic feasible solution are adjacent if they have m-1\textit{m-1} basic variables in common.

Each bfs has m-n\textit{m-n} adjacent neighbors, which can be reached by increasing one non basic variable from zero to positive and decrease one basic variable to 0, called pivoting.