Least squares constraints and "soft" Lagrange multiplier method are equivalent.

weblog/

Often I end up minimizing a quadratic objective subject to linear equality constraints:

``````min ½x'Qx subject to Ax=b
``````

This can be achieved with a single linear system solve via the Lagrange multiplier method. You add an auxiliary variable for each constraint in a vector `λ` and find the solution to:

``````saddle_x,λ ½x'Qx + λ'(Ax - b)
``````

setting gradients with respect to each `x` and `λ` to zero leads to the linear system of Karush–Kuhn–Tucker conditions:

``````/ Q  A' \ / x \ = / 0 \
\ A  0  / \ λ /  \ b /
``````

The second equation ensures that the constraint `Ax = b` is satisfied and the top equation uses `A'λ` to pull the solution away from `Qx=0` (the solution of the unconstrained problem).

What happens if we change that `0` in the bottom right corner of the system matrix to an identity matrix? Or rather a scaled identity matrix?

I'll consider a special choice of placing an Identity matrix scaled by `-1/ω`:

``````/ Q  A'     \ / x \ = / 0 \                    (1)
\ A  -1/ω I / \ λ /   \ b /
``````

Solving the bottom equation for λ we have:

``````λ = ω(Ax - b)
``````

Plugging this into the top equation we have

``````(Q + ωA'A) x = ω A' b
``````

This looks familiar! Returning to our original minimization problem, we could have chosen to enforce the constraint Ax=b in a least squares sense, say, with a weight of `ω`:

``````min ½x'Qx + ω/2 ‖Ax - b‖²
``````

Finding the minimizer results from setting the gradient of this energy with respect to x to zero and solving the system:

``````(Q + ωA'A) x = ω A' b
``````

So, algebraically these methods are the same. Solving the direct penalty method involves a |x| by |x| system, where as solving the KKT system in `(1)` is a larger, |x|+|b| by |x|+|b| system.

However, if `ω` is very large or the conditioning on `A'A` is poor then adding `+ ωA'A` might blow the lid off your numerical solver (supposedly; I haven't actually confirmed this on my own examples, but read it).

Solving the larger system involving `1/ω I` and `A` instead should have better conditioning and stabler numerics.