Skip to content

Constraint Optimization - Exam Notes

This chapter presents constrained optimization in a clean, exam-ready structure: theory first, formulas second, then solved problems, intuition, and high-priority exam points.


SECTION 1: THEORY CONCEPTS - DEFINITIONS & RELATIONSHIPS

1.1 Constrained Optimization

A constrained optimization problem minimizes or maximizes an objective while satisfying constraints:

\[ \min_x f_0(x) \quad \text{subject to} \quad f_i(x) \le 0\;(i=1,\dots,m), \quad h_j(x)=0\;(j=1,\dots,p) \]

1.2 Linear Programming vs Quadratic Programming

  • Linear programming (LP): objective and constraints are linear.
  • Quadratic programming (QP): objective and/or constraints contain quadratic terms.

1.3 Feasible Region and Active Constraints

  • Feasible region: all points satisfying constraints.
  • Active (tight) constraint: holds with equality at optimum.
  • In many problems, optimum occurs at boundary points.

1.3A Geometric Visualization

Constrained Optimization Geometry

In this geometry: - the blue curve is objective landscape, - the red vertical wall is the constraint boundary, - unconstrained minimum may be infeasible, - constrained optimum is the best feasible point.

1.4 Lagrangian Interpretation

The Lagrangian combines objective and constraints into one function:

\[ \mathcal{L}(x,\lambda,\nu) = f_0(x) + \sum_{i=1}^{m}\lambda_i f_i(x) + \sum_{j=1}^{p}\nu_j h_j(x), \quad \lambda_i \ge 0 \]

Meaning: - feasible choices avoid violation penalty, - infeasible choices receive larger penalty through multipliers.

1.5 Primal and Dual

  • Primal: original optimization in variables \(x\).
  • Dual: transformed optimization in multipliers \((\lambda,\nu)\).

Dual is often easier to analyze and gives bounds on primal optimum.

1.6 Weak Duality, Strong Duality, Duality Gap

  • Weak duality: dual optimum never exceeds primal optimum (for minimization primal).
  • Strong duality: primal and dual optima are equal.
  • Duality gap:
\[ p^\star - d^\star \ge 0 \]

For convex problems under suitable regularity (for example Slater condition), strong duality typically holds.

1.7 KKT Conditions (Optimality System)

KKT is the main condition set for constrained optimization: 1. primal feasibility 2. dual feasibility 3. complementary slackness 4. stationarity

For convex problems, KKT is the standard test for optimality.

1.8 Constraint Qualifications

  • LICQ: active constraint gradients are linearly independent.
  • Slater condition (convex inequalities): at least one strictly feasible point exists.

These conditions support strong duality and reliable KKT use.

1.9 Sensitivity (Shadow Price Meaning)

Multiplier values indicate how optimal objective changes when constraints are tightened/relaxed.


SECTION 2: FORMULAE

2.1 Standard Constraint Conversion

Convert inequalities to \(\le 0\) form before writing Lagrangian:

\[ x \ge 2 \Rightarrow 2-x \le 0 \]
\[ x+y \ge 1 \Rightarrow 1-x-y \le 0 \]

2.2 Dual Function and Dual Problem

\[ g(\lambda,\nu) = \inf_x \mathcal{L}(x,\lambda,\nu) \]
\[ \max_{\lambda \ge 0,\nu} g(\lambda,\nu) \]

If primal is minimization, dual is maximization.

2.3 Minimax Inequality (Weak Duality Form)

\[ \max_{\lambda\ge0,\nu} \min_x \mathcal{L}(x,\lambda,\nu) \le \min_x \max_{\lambda\ge0,\nu} \mathcal{L}(x,\lambda,\nu) \]

Equivalent statement:

\[ d^\star \le p^\star \]

2.4 KKT Conditions (Differentiable Form)

Given candidate \((x^{\star},\lambda^{\star},\nu^{\star})\):

(A) Primal feasibility

\[ f_i(x^{\star}) \le 0,\quad i=1,\dots,m \]
\[ h_j(x^{\star}) = 0,\quad j=1,\dots,p \]

(B) Dual feasibility

\[ \lambda_i^{\star} \ge 0,\quad i=1,\dots,m \]

(C) Complementary slackness

\[ \lambda_i^{\star} f_i(x^{\star}) = 0,\quad i=1,\dots,m \]

(D) Stationarity

\[ \nabla f_0(x^{\star}) + \sum_{i=1}^{m}\lambda_i^{\star}\nabla f_i(x^{\star}) + \sum_{j=1}^{p}\nu_j^{\star}\nabla h_j(x^{\star}) = 0 \]

Quick reading rule: - \(\lambda_i^{\star} > 0\) implies constraint \(i\) is active. - If a constraint is strictly inactive, its multiplier is zero.

2.5 Three Exam Workflows

Workflow A: Solve a primal with Lagrangian 1. Convert constraints to standard form. 2. Build \(\mathcal{L}\). 3. Write KKT equations. 4. Solve system and verify feasibility.

Workflow B: Convert primal to dual 1. Write Lagrangian: \(\mathcal{L}(x,\lambda,\nu)\). 2. Minimize \(\mathcal{L}\) w.r.t. \(x\): treat \((\lambda,\nu)\) as constants, and find the \(x\)-values that minimize \(\mathcal{L}\) by setting derivatives to zero. Call this \(x(\lambda,\nu)\). 3. Substitute back: plug \(x(\lambda,\nu)\) into the Lagrangian. 4. Simplify: the result is \(g(\lambda,\nu)\). The dual problem is "maximize \(g(\lambda,\nu)\)" subject to multiplier feasibility constraints (for inequalities, \(\lambda \ge 0\)).

Workflow C: Solve a dual problem 1. Differentiate \(g\) and solve \(\nabla g = 0\). 2. Enforce multiplier feasibility. 3. Evaluate objective at optimum.


SECTION 3: SOLVED PROBLEMS

Problem 1: Equality-Constrained Quadratic (Primal Solve)

Minimize:

\[ f(x,y)=2x^2+y^2 \quad \text{s.t.} \quad x+y=3 \]

Lagrangian:

\[ \mathcal{L}=2x^2+y^2+\nu(3-x-y) \]

Stationarity:

\[ 4x-\nu=0 \Rightarrow x=\nu/4, \qquad 2y-\nu=0 \Rightarrow y=\nu/2 \]

Constraint gives:

\[ \nu/4 + \nu/2 = 3 \Rightarrow \nu=4 \]

So:

\[ x^{\star}=1,\quad y^{\star}=2, \quad f^{\star}=2(1)^2+(2)^2=6 \]

Problem 2: Inequality-Constrained Circle Problem

Minimize:

\[ f(x,y)=-x-y \quad \text{s.t.} \quad x^2+y^2\le2 \]

Lagrangian:

\[ \mathcal{L}=-x-y+\lambda(x^2+y^2-2),\quad \lambda\ge0 \]

Stationarity:

\[ -1+2\lambda x=0 \Rightarrow x=\frac{1}{2\lambda}, \qquad -1+2\lambda y=0 \Rightarrow y=\frac{1}{2\lambda} \]

Complementary slackness implies active constraint (\(\lambda\neq0\)):

\[ x^2+y^2=2 \]

Substitute:

\[ 2\left(\frac{1}{2\lambda}\right)^2=2 \Rightarrow \lambda=\frac{1}{2} \Rightarrow x^{\star}=1,\; y^{\star}=1 \]

Optimal value:

\[ f^{\star}=-2 \]

Problem 3: Three-Variable Equality Geometry

Minimize:

\[ f(x,y,z)=x^2+y^2+z^2 \quad \text{s.t.} \quad x+2y+3z=14 \]

Lagrangian:

\[ \mathcal{L}=x^2+y^2+z^2+\nu(14-x-2y-3z) \]

Stationarity:

\[ x=\nu/2,\quad y=\nu,\quad z=3\nu/2 \]

Use constraint:

\[ \nu/2+2\nu+3(3\nu/2)=14 \Rightarrow \nu=2 \]

Hence:

\[ x^{\star}=1,\;y^{\star}=2,\;z^{\star}=3, \quad f^{\star}=14 \]

Problem 4: Primal to Dual Conversion (Single Variable)

Primal:

\[ \min_x (x-3)^2 \quad \text{s.t.} \quad x\le1 \]

Lagrangian:

\[ \mathcal{L}=(x-3)^2+\lambda(x-1),\quad \lambda\ge0 \]

Minimize over \(x\):

\[ 2(x-3)+\lambda=0 \Rightarrow x=3-\lambda/2 \]

Substitute into \(\mathcal{L}\):

\[ g(\lambda)=2\lambda-\frac{\lambda^2}{4} \]

Dual:

\[ \max_{\lambda\ge0}\left(2\lambda-\frac{\lambda^2}{4}\right) \]

Problem 5: Primal to Dual Conversion (Two Constraints)

Primal:

\[ \min_{x,y}\left(\frac12x^2+\frac12y^2\right) \quad \text{s.t.} \quad x\ge1,\; y\ge2 \]

Standard form:

\[ 1-x\le0,\quad 2-y\le0 \]

Lagrangian:

\[ \mathcal{L}=\frac12x^2+\frac12y^2+\lambda_1(1-x)+\lambda_2(2-y) \]

Stationarity gives:

\[ x=\lambda_1,\quad y=\lambda_2 \]

Dual function:

\[ g(\lambda_1,\lambda_2) = \lambda_1-\frac12\lambda_1^2 + 2\lambda_2-\frac12\lambda_2^2 \]

Dual problem:

\[ \max_{\lambda_1,\lambda_2\ge0} \left(\lambda_1-\frac12\lambda_1^2+2\lambda_2-\frac12\lambda_2^2\right) \]

Problem 6: Solving a Dual Fully

Primal:

\[ \min_{x,y} x^2+y^2 \quad \text{s.t.} \quad 2x+y\le-5 \]

Use \(g(x,y)=2x+y+5\le0\), Lagrangian:

\[ \mathcal{L}=x^2+y^2+\lambda(2x+y+5),\;\lambda\ge0 \]

Minimize over \(x,y\):

\[ x=-\lambda,\quad y=-\lambda/2 \]

Dual function:

\[ g(\lambda)=5\lambda-1.25\lambda^2 \]

Maximize:

\[ g'(\lambda)=5-2.5\lambda=0 \Rightarrow \lambda^{\star}=2 \]

Dual optimum:

\[ g(2)=5 \]

Problem 7: Minimax Verification (No Duality Gap)

For:

\[ \min_x x^2 \quad \text{s.t.} \quad x\ge2 \]

Primal optimum:

\[ p^{\star}=4 \]

Dual from Problem 4 type form gives optimum:

\[ d^{\star}=4 \]

Hence:

\[ d^{\star}=p^{\star} \]

Weak duality holds and strong duality also holds here.


SECTION 4: TRICKS & MENTAL MODELS

4.1 Fence-Penalty Model

Think of constraints as fenced boundaries: - inside boundary: no violation cost, - outside boundary: multiplier penalty increases cost.

4.2 Active Constraint Fast Test

Use complementary slackness immediately:

\[ \lambda_i^{\star} f_i(x^{\star})=0 \]

If \(\lambda_i^{\star}>0\), constraint must be tight.

4.3 Game View (Min-Max Thinking)

Primal selects variables; dual selects penalty strengths. Dual maximization pushes the best valid lower bound on primal optimum.

4.4 Exam Speed Pattern

Most conversion questions follow one template: 1. normalize constraints, 2. write Lagrangian, 3. stationarity, 4. substitute, 5. final dual objective.

4.5 Geometry Memory Hook

At optimum, objective improvement direction is blocked by active constraints; this is exactly what stationarity encodes.

SVM optimization is a constrained optimization problem. Dual form is often preferred for cleaner optimization and kernelized formulations.


SECTION 5: IMPORTANT POINTS FOR EXAM

  1. Always convert inequalities into standard \(\le0\) form before forming \(\mathcal{L}\).
  2. One inequality constraint corresponds to one nonnegative multiplier.
  3. For minimization primal, dual is maximization.
  4. Write weak duality correctly: \(d^{\star}\le p^{\star}\).
  5. KKT must be written as four parts: primal feasibility, dual feasibility, complementary slackness, stationarity.
  6. In convex problems with Slater condition, KKT is typically sufficient and duality gap is zero.
  7. If exam asks only "convert primal to dual," stop after writing dual objective with multiplier constraints.
  8. If exam asks to solve dual, optimize \(g\) and check multiplier feasibility.
  9. Use complementary slackness early to reduce algebra.
  10. Add one interpretation line (active constraint, penalty meaning, or duality gap meaning) after derivation for higher-quality answers.

SECTION 6: PRACTICAL IMPLEMENTATION, PSEUDOCODE, AND SMART TRICKS

6.1 Which Solver to Use (Practical Rule)

  • Convex problem with clear algebraic form: use CVXPY modeling.
  • Smooth nonlinear constraints with gradients/Jacobians: use scipy.optimize.minimize with trust-constr or SLSQP.
  • Derivative-free / noisy constraints: use COBYLA/COBYQA-style methods.
  • Large streaming ML setup: prefer first-order primal-dual or projected methods.

6.2 Pseudocode: Primal-Dual (Inequality Constraints)

Input: f(x), g_i(x) <= 0, step sizes eta_x, eta_lambda
Initialize x0, lambda0 >= 0
for t = 0..T-1:
    # Primal step (descent on Lagrangian wrt x)
    x_{t+1} = x_t - eta_x * grad_x L(x_t, lambda_t)

    # Dual step (ascent on multipliers)
    lambda_{t+1} = lambda_t + eta_lambda * g(x_{t+1})

    # Projection to dual-feasible region
    lambda_{t+1} = max(lambda_{t+1}, 0)   # elementwise
end
Return x_T, lambda_T

6.3 Pseudocode: Augmented Lagrangian (Equality Constraints)

Input: f(x), c(x)=0, penalty rho>0
Initialize x0, lambda0
for k = 0..K-1:
    # Minimize augmented Lagrangian wrt x
    x_{k+1} = argmin_x [ f(x) + lambda_k^T c(x) + (rho/2)||c(x)||^2 ]

    # Multiplier update
    lambda_{k+1} = lambda_k + rho * c(x_{k+1})
end
Return x_K, lambda_K

6.4 Python Blueprint (SciPy)

import numpy as np
from scipy.optimize import minimize, LinearConstraint, Bounds

def f(x):
    return (x[0]-1.0)**2 + (x[1]-2.0)**2

def grad_f(x):
    return np.array([2*(x[0]-1.0), 2*(x[1]-2.0)])

lin_con = LinearConstraint([[1.0, 1.0]], [-np.inf], [2.0])  # x0 + x1 <= 2
bounds = Bounds([0.0, 0.0], [np.inf, np.inf])

res = minimize(
    f, x0=np.array([0.5, 0.5]),
    jac=grad_f,
    method="trust-constr",
    constraints=[lin_con],
    bounds=bounds
)

6.5 Smart Technical Tricks

  1. Normalize constraints early into \(\le 0\) and \(=0\) forms before any derivation.
  2. Check convexity signs before assuming strong duality.
  3. Use active-constraint detection via complementary slackness to reduce equation count.
  4. In numerical solvers, provide gradients/Jacobians when possible; convergence improves significantly.
  5. Always verify primal feasibility after optimization; objective value alone is not enough.
  6. Track KKT residuals as a stopping diagnostic in iterative methods.

6.6 Common Failure Modes

  • Infeasible initial assumptions (no feasible point).
  • Ignoring dual feasibility (\(\lambda \ge 0\)).
  • Treating nonconvex problems as if strong duality must hold.
  • Reporting stationary points without feasibility checks.

6.7 Competition-Grade Workflow (Kaggle-Style)

  1. Start with a simple feasible baseline and verify constraint satisfaction.
  2. Lock validation protocol first (split strategy + metric), then optimize.
  3. Tune model hyperparameters and constraint penalties jointly.
  4. Track both objective metric and violation metric on validation folds.
  5. Ensemble only models that meet feasibility thresholds reliably.

6.8 Missing Advanced Point: Many-Constraint ML Problems

In fairness/ranking/safety settings, number of constraints can be very large.
A practical approach is to parameterize multipliers (multiplier models) rather than storing one multiplier per constraint; this improves scalability while preserving primal-dual structure.

References