# Least squares constraints and "soft" Lagrange multiplier method are equivalent.

## Alec Jacobson

## May 16, 2019

Often I end up minimizing a quadratic objective subject to linear equality constraints:

```
min ½x'Qx subject to Ax=b
```

This can be achieved with a single linear system solve via the Lagrange multiplier method. You add an auxiliary variable for each constraint in a vector `λ`

and find the solution to:

```
saddle_x,λ ½x'Qx + λ'(Ax - b)
```

setting gradients with respect to each `x`

and `λ`

to zero leads to the linear system of Karush–Kuhn–Tucker conditions:

```
/ Q A' \ / x \ = / 0 \
\ A 0 / \ λ / \ b /
```

The second equation ensures that the constraint `Ax = b`

is satisfied and the top equation uses `A'λ`

to pull the solution away from `Qx=0`

(the solution of the unconstrained problem).

What happens if we change that `0`

in the bottom right corner of the system matrix to an identity matrix? Or rather a scaled identity matrix?

I'll consider a special choice of placing an Identity matrix scaled by `-1/ω`

:

```
/ Q A' \ / x \ = / 0 \ (1)
\ A -1/ω I / \ λ / \ b /
```

Solving the bottom equation for λ we have:

```
λ = ω(Ax - b)
```

Plugging this into the top equation we have

```
(Q + ωA'A) x = ω A' b
```

This looks familiar! Returning to our original minimization problem, we could have chosen to enforce the constraint Ax=b in a least squares sense, say, with a weight of `ω`

:

```
min ½x'Qx + ω/2 ‖Ax - b‖²
```

Finding the minimizer results from setting the gradient of this energy with respect to x to zero and solving the system:

```
(Q + ωA'A) x = ω A' b
```

So, algebraically these methods are the same. Solving the direct penalty method involves a |x| by |x| system, where as solving the KKT system in `(1)`

is a larger, |x|+|b| by |x|+|b| system.

However, if `ω`

is very large or the conditioning on `A'A`

is poor then adding `+ ωA'A`

might blow the lid off your numerical solver (*supposedly; I haven't actually confirmed this on my own examples, but read it*).

Solving the larger system involving `1/ω I`

and `A`

instead should have better conditioning and stabler numerics.