Welcome to Hongyi Li's website

I am learning the book “Optimization Models” recently and I see a blogger recommends Prof. Shu-Cheng Fang’s course “Linear Programming” as a supplement. The OCW course video is found in Bilibili.

Lecture 0 - Lecture 2: Introduction, Perliminaries
Lecture 3: Geometry of LP
Lecture 4: Simplex Method

Lecture 0 - Lecture 2: Introduction, Perliminaries

In linear programming (LP), we established a LP model to solve a realistic problem (in approximate way). To establish the LP model, the steps is as followed:

What are the variables to be involved? (This is the first thing we should think about)
What’s the objective function?
How are the variables constrained? (The order of step 2 and 3 is not fixed, since they are independent)

Standard Form of LP Model

Including:

n variables
1 objective function
m constraints
non-negative variables

Explicit Form

\begin{aligned} & \text{Minimize} &\;\bold{z}=c_1x_1+c_2x_2+\cdots+c_nx_n\\ & \text{subject to} &\\ & & a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n=b_1\\ & & a_{21}x_1+a_{22}x_2+\cdots+a_{2n}x_n=b_2\\ & & \vdots\\ & & a_{m1}x_1+a_{m2}x_2+\cdots+a_{mn}x_n=b_m\\ & & x_1\geq0,\;x_2\geq0,\;,\cdots,\;x_n\geq0 \end{aligned}

Minimizing one objective function.
Equality constrains.
Non-negative variables.

Matrix Form

\begin{aligned} && \text{Min} \; \bold{c}^T\bold{x}\\ && \text{s.t.} \; \bold{AX}=\bold{b}\\ && \bold{x}\geq0 \end{aligned}

in which, $\bold{c}=\left[\begin{array}{c}c_1\\c_2\\\vdots\\c_n\end{array}\right]$ is the cost vector, $\bold{x}=\left[\begin{array}{c}x_1\\x_2\\\vdots\\x_n\end{array}\right]$ is the solution vector, $\bold{b}=\left[\begin{array}{c}b_1\\b_2\\\vdots\\b_n\end{array}\right]$ is the right-hand-side vector and $\bold{A}=\left[\begin{array}{cccc}a_{11}&a_{12}&\cdots&a_{1n}\\a_{21}&a_{22}&\cdots&a_{12n}\\\vdots&\vdots&\ddots&\vdots\\a_{n1}&a_{n2}&\cdots&a_{nn}\\\end{array}\right]$ is the constrain matrix.

Embedded Assumptions in LP

Proportionality Assumption
- No discount
- No economy of return to scale
Additivity Assumption
- Total contribution = Sum of contributions of individual variables
Divisibility Assumption
- Any fractional value is allowed
Certainty Assumption
- Each parameter is know for sure

Converting to Standard LP

Rule 1: Unrestricted (free) variables

For any $x_i\in\bold{R}$ , we can divide it into a positive component $x_i^+$ and a negative component $x_i^-$ , which satisfy

x_i^+= \left\{ \begin{aligned} &x_i^+,&\;\text{if}\;x_i\geq0\\ &0,&\;\text{otherwise} \end{aligned} \right.

x_i^-= \left\{ \begin{aligned} &0,&\;\text{if}\;x_i\geq0\\ &x_i^-,&\;\text{otherwise} \end{aligned} \right.

Therefore, $x_i$ can be written as $x_i=x_i^++x_i^-$ , in which $x_i^+,\;x_i^-\geq0$ . If we have an absolute valve $|x_i|$ in the LP model, it can be displaced by $|x_i|=x_i^++x_i^-$ .
Notice: Such decomposition introduces an extra constraint: $x_i^+\times x_i^-=0$ , which is important by usually ignored. This second-ordered constraint makes sure the uniqueness of the solution.

Rule 2: Inequality constraints

For

a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n\leq b_1

we introduce a slack variable $s_i$ and the original inequality constraint can be transformed to

a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n+s_i=b_1

in which $s_i\geq0$ .

Similarly, for

a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n\geq b_1

we introduce an excess variable $e_i$ and the original inequality constraint can be transformed to

a_{11}x_1+a_{12}x_2+\cdots+a_{1n}x_n-e_i=b_1

in which $e_i\geq0$ .

Rule 3: Minimization of the Objective Funtion

\boldsymbol{Max\;c^T=-Min(-c^Tx)}

Potential Problem caused by Standardizing

One quadratic constraint $x_i^+\times x_i^-=0$ is missing in the LP model
Dimensionality increased
One original solution corresponds to many new solutions
Since $|x|$ is a convex function while $-|x|$ is a concave function, maximize $c|x|$ can only be solved when c is negative, or the solution lies on the infinite.

Lecture 3: Geometry of LP

Terminologies

Baseline model:

\begin{aligned} Min \;\;\bold{c^Tx} &\\ s.t. \;\;\;\;\bold{Ax=b} &\\ \bold{x}\geq0 & \end{aligned}

Feasible domain

P=\{\bold{x\in R^n|Ax=b,x\geq0}\}

Feasible solution

If $\bold{x}\in P$ , then x is a feasible solution.

Consistency

When $P\neq\phi$ , LP is consistent.

Background knowledge

Definition of hyperplane

Each equality constraint in the standard form LP is a “hyperplane” in the solution space. In the 2-D space, it is a line. In 3-D space, it is a plane.

For a vector $\bold{a}\in\bold{R}^n, a\neq0$ and a scaler $\beta\in\bold{R}$

H=\{\bold{x}\in\bold{R}^n|\bold{a}^T\bold{x}=\beta\}

is defined as a hyperplane, which divides the solution space into 3 parts, the open upper-half space $\boldsymbol{H}_u^i$ (in which $\bold{a}^T\bold{x}\gt\beta$ ), the hyperplane and the open lower-half space $\boldsymbol{H}_l^i$ (in which $\bold{a}^T\bold{x}<\beta$ ). The upper-half space $\boldsymbol{H}_u=\boldsymbol{H}_u^i+\boldsymbol{H}$ , similarly for $\boldsymbol{H}_l$ . $\boldsymbol{H}$ is also called the bounded hyperplane of $\boldsymbol{H}_u^i$ and $\boldsymbol{H}_l^i$ .

The vector a is the normal of the hyperplane and it points to the upper-half in the direction which increases $(in which $\bold{a}^T\bold{x}$ fastest.

Properties of hyperplanes

Property 1: The normal vector a is orthogonal to all vectors in the hyperplane H.

Property 2: The normal vector is directed toward the upper half space.

Properties of feasible solution set

Property 3: The feasible domain of a standard form LP

P=\{\bold{x\in R^n|Ax=b,x\geq0}\}

is a polyhedral set.

A polyhedral set or polyhedron is a set formed by the intersection of a finite number of closed half spaces. If it is nonempty or bounded, it is a polytope.

Properties of optimal solutions

Property 4:if $P\neq0$ and $\exist\beta\in\bold{R}$ such that

P\subset H_L:=\{\bold{x\in R^n|-c^Tx\leq\beta}\}

then $\min\limits_{x\in P}\bold{c^Tx\geq-\beta}$ .

Moreover, if $\bold{x^*}\in P\cap H$ , then $\bold{x^*}\in P^*$

Graphic method

Step 1:

Draw the feasible domain $P$

Step 2:

Use $-\bold{c}$ as normal vector at each vertex to see if $P\subset H_L:=\{\bold{x\in R^n|-c^Tx\leq\beta}\}$ for some $\beta\in \bold{R}$ . If yes, then we found the optimal solution.

Advantage: Intuitional / Geometrically simple
Disadvantage: Impractical for large LP problems / Algebraically difficult

Fundamental theorem of LP

Background knowledge

Definition: Let $\bold{x^1,x^2,\dotsb,x^p\in R^n,\; \lambda_1,\lambda_2,\dotsb,\lambda_p\in R^n}$ , for

\bold{x}=\sum_{i=1}^{p}\bold{\lambda_ix^i}

we say x is a linear combination of $\{\bold{x^1,\dotsb,x^p}\}$ .

If $\sum_{i=1}^{p}\lambda_i=1$ , w say x is an affine combination of $\{\bold{x^1,\dotsb,x^p}\}$ .
If the affine combination of any two points of S falls in S, then S is an affine set.
If $\lambda_i\geq0$ , we say x is a conic combination of $\{\bold{x^1,\dotsb,x^p}\}$ .
If $\lambda\bold{x}\in S$ for all $\bold{x}\in S$ and $\lambda\geq0$ , then S is a cone.
If $\sum_{i=1}^{p}\lambda_i=1,\;\lambda_i\geq0$ , w say x is an convex combination of $\{\bold{x^1,\dotsb,x^p}\}$ .
If the convex combination of ant two points of S falls in S, then S is a convex set.

The geometric meaning of the feasible domain

P=\{\bold{x\in R^n|Ax=b,x\geq0}\}

P is a polyhedral set.
P is a convex set
P is the intersection of m hyperplanes and the cone of the first orthant.

Seeing $\bold{Ax=b}$ in the column picture (see MIT 1806), if the vector b falls in the cone generated by the columns of constraint matrix A.

Interior and boundary points

Definition: Given a set $S\in\bold{R^n}$ , a point $\bold{x\in S}$ is an interior point of S if

\exist\;\epsilon>0\;\text{such that the ball}\;\bold{B=\{y\in R^n|\;||x-y||\leq\epsilon\}\subset S}

called $\bold{x\in}int(S)$

Otherwise, x is a boundary point of S, we say $\bold{x}\in bdry(S)$ .

How to define a convex set S

For all interior point, the segment between any two interior points falls in the convex set S.
For $\forall$ boundary point x, $\exist$ a hyperplane H that $\bold{x\in S}$ and S falls in $\bold{H_L}\;\text{or}\;\bold{H_U}$ .
For $\forall$ outside point x, $\exist$ a hyperplane H that x lies in $\bold{H_L}$ and S falls in the $\bold{H_U}$ , or vise versa.

Difference among boundary points

x is defined as an extreme point of a convex set S if x cannot be expressed as a convex combination of other points in S.

The extreme points are not exactly the same as the vertices according to their definition. But for the feasible domain P of an LP, it vertices are the extreme points.

Finding extreme points

Theorem: A point $\bold{x\in}P=\{\bold{x\in R^n|Ax=b,x\geq0}\}$ is an extreme point of P if and only if the columns of A corresponding to the positive components of x are linearly independent.

Proof:
Without loss of generality, we may assume that the first p components of x are positive and the rest are zero, i.e.,

\bold{x=\left(\begin{array}{c} \bar{x}\\0 \end{array}\right)} \;\text{where}\; \bold{\bar{x}}=\left(\begin{array}{c} x_1\\\vdots\\x_p \end{array}\right)\;>0

also denote the first p column of A by $\bold{\bar{A}}$ then $\bold{Ax=\bar{A}\bar{x}=b}$ .

Suppose that the columns of $\bold{\bar{A}}$ are not linearly independent, then $\bold{\exist\;\bar{w}\neq0}$ such that $\bold{\bar{A}\bar{w}=0}$ .

Notice that for $\epsilon$ is small enough

\bold{\bar{x}\plusmn\epsilon\bar{w}\;\text{and}\;\bar{A}(\bar{x}\plusmn\epsilon\bar{w})=\bar{A}\bar{x}=b}

Hence,

\bold{y_1=\left(\begin{array}{c} \bar{x}+\epsilon\bar{w}\\0 \end{array}\right)\in \textit{P}}

\bold{y_2=\left(\begin{array}{c} \bar{x}-\epsilon\bar{w}\\0 \end{array}\right)\in \textit{P}}

and $\bold{x}=\frac{1}{2}y_1+\frac{1}{2}y_2$ , i.e. x cannot be a vertex (extreme point) of P.

Thus, x is an extreme point when the columns of $\bold{\bar{A}}$ are linearly indeoendent.

Suppose that x is not an extreme point, then $\bold{x=\lambda y_1+\lambda y_2}$ for some $\bold{y_1,\;y_2\in P,\; y_1\neq y_2\;\text{and}\;}0<\lambda<1$ .

Since $\bold{y_1\geq0,\;y_2\geq0}$ and $0<\lambda<1$ , the last $n-p$ components of $\bold{y_1}$ must be zero, i.e.

\bold{y_1=\left(\begin{array}{c} \bar{\bold{y}}\\0 \end{array}\right)}

Now

\bold{x-y_1=\left(\begin{array}{c} \bar{\bold{x}}-\bar{\bold{y}}_1\\0 \end{array}\right)\neq0}

and $\bold{A(x-\bold{\bar{y}}_1)=Ax-A\bar{\bold{y}}_1=b-b=0}$ , which indicates that the columns of A are linearly dependent.

Thus, the theorem is proofed.

Managing extreme points algebraically

Full rank matrix: Let A be an m by n matrix with $m\leq n$ , we say A has full rank (since $m\leq n$ , full row rank) if A has m linearly independent columns.

Rearrange x as

\bold{ x=\left(\begin{array}{c} \bold{x_B}\\\bold{x_N} \end{array}\right) \begin{array}{c} - & \text{basic variabls}\\ - & \text{non-basic variables} \end{array} }\;\;\;\; \begin{aligned} \bold{A}= (&\bold{B}&|\text{\;\;\;\;}&\bold{N}&)\\ &\uparrow&\text{\;\;\;\;}& \uparrow&\\ &\text{basis}&\text{\;\;\;\;}&\text{non-basis}& \end{aligned}

If we set $\bold{x_N}=0$ and solve $\bold{x_B}$ for $\bold{Ax=Bx_B=b}$ then x is a $\underline{basic\ solution}$ (bs).
Furthermore, if $\bold{x_B}\geq0$ , then x is a $\underline{basic\ feasible\ solution}$ .

We can see, when A does not have full rank, then either

Ax = b has $\underline{no\ solution}$ and hence $P=0$ , or
some constraints are redundant (which can be avoided by preprocessing matrix A).

Thus,

A point x in P is an extreme point of P if and only if x is a basic feasible solution corresponding to some basis B.
The polyhedron P has only a finite number of extreme points (less than $C_m^n$ ).

What do extreme points bring us?

When $P=\{\bold{x\in R^n|Ax=b,x\geq0}\}$ is a nonempty polytope, then any point in P can be represented as a convex combination of the extreme points of P.

If P is unbounded, there exists an extremal direction.

Definition: A vector $\bold{d(\neq0)\in R^n}$ is an extremal direction of P, if $\{\bold{x\in R^n|x=x_0+\lambda d,\ \lambda\geq0\}\subset }P$ for all $\bold{x_0}\in P$ .
d satisfies $\bold{Ad=0}$ and $\bold{d\geq 0}$ .

Resolution theorem

Let $\bold{V = \{v_i\in R^n|}i\in I\}$ be a set of extreme points of P, then for $\forall x \in P$

\bold{x}=\sum\limits_{i\in I} \lambda_iv_i+\bold{d}

where

\sum\limits_{i\in I}\lambda_i=1,\ \lambda_i\geq0,\ \forall i\in I.

and either $\bold{d=0}$ or d is an extremal direction of P

Fundamental theorem of LP

For a standard form LP, if its feasible domain P is nonempty, then the optimal objective value of $\bold{z=c^Tx}\ over\ P$ is either unbounded below, or it is attained at (at least) an extreme point of P.

Lecture 4: Simplex Method

Basic Idea:

Phase 1:
- Step 1 (Starting): Find an initial extreme point (ep) or declare P in null.
Phase 2:
- Step 2 (Checking optimality): if the current eo is optimal, STOP.
- Step 3 (Pivoting): Move to a better ep, then return to Step 2.

If we do not repeat using the same extreme point, then the algorithm will always terminates in finite number of iterations.

But how to efficiently generate better extreme points?

In the algebra term, we just need to use basic feasible solution to replace extreme point above.

If there exists at least one basic variable becoming zero, the extreme point may correspond to more than one basic feasible solution, in this case we say the bfs is a degenerate one.

Nondegeneracy

Property 1:

If a bfs x is nondegenerate, then x is uniquely determined by n hyperplanes.

Let $\bold{M=}\left[\begin{array}{cc} \bold{B} & \bold{N}\\ \bold0 & \bold{I} \end{array}\right]$ , M is a nonsingular matrix.

x is uniquely determined by n linearly independent hyperplanes since

\bold{Mx}=\left[\begin{array}{cc} \bold{B} & \bold{N}\\ \bold{0} & \bold{I} \end{array}\right]\left[\begin{array}{c} \bold{x_B}\\ \bold{x_N} \end{array}\right]=\left[\begin{array}{c} \bold{b}\\ \bold{0} \end{array}\right]

\bold{x}=\left[\begin{array}{c} \bold{x_B}\\\bold{x_N} \end{array}\right]=\bold{M^{-1}}\left[\begin{array}{c} \bold{b}\\ \bold{0} \end{array}\right]

in which,

\bold{M^{-1}}=\left[\begin{array}{cc} \bold{B^{-1}} & \bold{-B^{-1}N}\\ \bold{0} & \bold{I} \end{array}\right]

We call $\bold{M^{-1}}$ (or M) the fundamental matrix of LP.

Property 2:

If a bfs x is degenerate, then x is overdetermined by more than n hyperplanes.

Other than n hyperplanes determined by non-basic variables, there exists at least one basic variable that $x_i=0$ , which indicates an extra hyperplane.

Simplex method under nondegeneracy

Basic idea:

Instead of considering all bfs at the same time, wo just consider some neighboring bfs, moving from one bfs to another with a simple pivoting scheme.

Definition

Two basic feasible solution are adjacent if they have $\textit{m-1}$ basic variables in common.

Each bfs has $\textit{m-n}$ adjacent neighbors, which can be reached by increasing one non basic variable from zero to positive and decrease one basic variable to 0, called pivoting.