Mosek, a commercial solver, writes in its documentation that it’s often advantageous to convert a separable quadratic program:

```
min_x 1/2 |Fx - c|^2
subject to Ax <= b
```

with `x`

a n-long vector, into a conic program of the form

```
min_{x,t,y} y - x'F'c
subject to Ax <= b
and Fx - t = 0
and 2y >= t't
```

introducing `t`

as a new vector variable and `y`

as a new *scalar* variable.

It’s rather straightforward to *verify* that these two problems are equivalent. But what’s the intuition behind it?

To understand this, we’ll need to be comfortable with two ways of visualizing an energy function: as a line plot or height-field surface and viewed *from above* as a topographic map of isolines.

Recall that an energy function `E(x)`

is just a scalar function of one or more variables in the vector `x`

. To be sure, this means `E(x)`

takes in a list of numbers in `x`

and spits out a single number.

In our original quadratic program, the energy was a *quadratic* function of `x`

. If `x`

were a only a single variable (scalar) then we can rewrite `1/2 |Fx - c|^2`

as `f^2/2 x^2 - 2fc x + c^2`

. This is simply a quadratic function `x`

, familiar after substitution as: `ax^2 + bx + c`

. We know that plotting this will reveal a parabola:

Minimizing `E(x)`

in this picture means walking along values of `x`

and tracing the curve until reaching the *vertex*, the point with minimal `E(x)`

and, notably, zero derivative.

Conceptually when we convert this simpler 1D quadratic problem `E(x) = ax^2 + bx + c`

into a conic program we introduce a new variable `y`

. It’s important that we really recognize `y`

as a *variable*. So far we don’t know its value. If we stopped here we’d have an optimization problem:

```
min_{x,y} ax^2 + bx + c
```

where `y`

spans an infinite *null space of solutions*: the energy only measures the effect of `x`

; any choice of `y`

is OK.

But here comes the trick, we place `y`

directly in the objective function: `E_C(x,y) = y`

. This new energy is linear in only `y`

. If we know plot it as a topographical map over the `xy`

-plane we see a function decreasing indefinitely in the downward direction:

Obviously if we stop here and minimize over `x`

and `y`

, we could get any value of `x`

and we’ll simply get `y=-∞`

. The final step is to add a *conic constraint* on `y`

according to the original quadratic function of `x`

. Namely we ask that `y >= ax^2 + bx + c`

: `y`

should be *above* the original quadratic function. Since we’re only working with scalar `x`

and scalar `y`

we can plot the feasible region as on the `xy`

-plane. `y`

must be above the red region:

Now we can imagine minimizing `y`

by traveling downward, we’ll eventually *hit* the red region’s wall. Since we have full freedom to move in the `x`

direction, too, we can continue downward only if we *slide* along the wall toward the minimum of the original quadratic function `E(x)`

.

**Higher dimensions:** This scenario is significantly easier to visualize for the 1d problem. However, the intuition extends to higher dimensions. If `x`

is a vector then `E(x)`

is a hyper-paraboloid over `x`

. When we introduce `y`

, we still just add a *single* new orthogonal dimension. Then minimizing the value of `y`

we’ll smash into this hyper-paraboloid and slide toward the minimum.

**Solvers:** This explanation is meant to give some intuition as to *why* this conversion works. It’s *not* meant as a explanation of how interior-point conic programming solvers work. I also do not explain why it’s sometimes more efficient to convert least squares quadratic programs to conic programs. Admittedly the conditions for a successful conversion in terms of efficiency still allude me.