# Partition of Unity Constraints for Sparse Quadratic Matrix Programs

## weblog/

This note demonstrates how to efficiently enforce partition of unity constraints for a specific class of convex quadratic matrix optimizations. Compared to vectorization and Lagrange multiplier method, this approach requires solving against a smaller (positive definite) system. Compared to vectorization and the nullspace method, this approach maintains sparsity.

# Problem Statement

We consider equality constrained optimizations where we seek a matrix $$\mathbf{X}∈\mathbb{R}^{n×k}$$ that minimizes a sum of quadratic cost functions of its $$k$$ columns while ensuring partition of unity across its $$n$$ rows: $$\label{equ:sum} \min_\mathbf{X}\, ∑_{j=1}^k (\frac{1}{2}\mathbf{x}_j^\top \mathbf{Q}\mathbf{x}_j - \mathbf{x}_j^\top \mathbf{f}_j) \ \ \text{ subject to } ∑_{j=1}^k x_{ij} = 1 \ ∀ i∈[1,n], \tag{1}$$ where $$\mathbf{x}_j$$ is the $$j$$th column of $$\mathbf{X}$$. The linear coefficients $$\mathbf{f}_j ∈ \mathbb{R}^n$$ may vary for each column, but the quadratic coefficients must be the same symmetric positive definite matrix $$\mathbf{Q}∈ \mathbb{R}^{n×n}$$.

Finding the unconstrained solution, each column of $$\mathbf{X}$$ could be solved independently and could reuse a factorization of the (possibly sparse) $$n×n$$ $$\mathbf{Q}$$ matrix (i.e., $$\mathbf{x}_j = \mathbf{Q}^{-1} \mathbf{f}_j$$). However, the constraints in Equation (\ref{equ:sum})  couple the columns together.

Let us write the problem in Equation (\ref{equ:sum}) in matrix form: \begin{aligned} \min_\mathbf{X} \, \mathop{\text{tr}\left(\frac{1}{2}\mathbf{X}^\top\mathbf{Q}\mathbf{X}- \mathbf{X}^\top \mathbf{F}\right)} \quad \text{subject to } \mathbf{X}\unicode{x1D7D9}_k = \unicode{x1D7D9}_n \end{aligned} where $$\unicode{x1D7D9}_k ∈ \mathbb{R}^k$$ is a vector of $$k$$ ones.

If we write the Lagrangian \begin{aligned} \mathcal{L}(\mathbf{X},λ) = \min_\mathbf{X} \, \mathop{\text{tr}\left(\frac{1}{2}\mathbf{X}^\top\mathbf{Q}\mathbf{X}- \mathbf{X}^\top \mathbf{F}+ λ\left(\mathbf{X}\unicode{x1D7D9}_k - \unicode{x1D7D9}_n\right)^\top\right)}, \end{aligned} where $$λ∈\mathbb{R}^n$$ is a vector of Lagrange multiplier values, then we can identify the optimal values for $$\mathbf{X}$$ and $$λ$$ as the solution to the system of equations formed by setting all partial derivatives to zero: \begin{align} \label{equ:dLdX} \frac{∂\mathcal{L}}{∂\mathbf{X}} & = \mathbf{Q}\mathbf{X}- \mathbf{F}+ λ \unicode{x1D7D9}_k^\top = 0, \tag{2}\\ \label{equ:dLdl} \frac{∂\mathcal{L}}{∂λ} & = \mathbf{X}\unicode{x1D7D9}_k - \unicode{x1D7D9}_n = 0. \tag{3} \end{align} We can solve Equation (\ref{equ:dLdX}) to express $$\mathbf{X}$$ in terms of $$λ$$: \begin{align} %\Q \X - \F - \one_k λ^\top = 0 %\Q \X = + \F + \one_k λ^\top \label{equ:X} \mathbf{X}= \mathbf{Q}^{-1}\left( \mathbf{F}- λ \unicode{x1D7D9}_k^\top\right), \tag{4} \end{align} and then substitute this Equation (\ref{equ:X}) into (\ref{equ:dLdl}) to solve for $$λ$$: \begin{align} \mathbf{Q}^{-1}\left( \mathbf{F}- λ \unicode{x1D7D9}_k^\top\right)\unicode{x1D7D9}_k - \unicode{x1D7D9}_n &= 0, \\ \label{equ:l} λ & = \frac{1}{k}\mathbf{F}\unicode{x1D7D9}_k - \frac{1}{k}\mathbf{Q}\unicode{x1D7D9}_n. \tag{5} \end{align} Substituting Equation (\ref{equ:l}) back into (\ref{equ:X}), we identify the optimal $$\mathbf{X}$$: \begin{aligned} \mathbf{X}&= \mathbf{Q}^{-1}\left( \mathbf{F}+ \left( \frac{1}{k}\mathbf{F}\unicode{x1D7D9}_k - \frac{1}{k}\mathbf{Q}\unicode{x1D7D9}_n \right) \unicode{x1D7D9}_k^\top\right), \\ \text{or more simply} \\ \mathbf{X}&= \mathbf{Q}^{-1}\left(\mathbf{F}- \frac{1}{k} \mathbf{F}\unicode{x1D7D9}_{kk}\right) - \frac{1}{k} \unicode{x1D7D9}_{nk}, \end{aligned} where $$\mathbf{F}- \frac{1}{k} \mathbf{F}\unicode{x1D7D9}_{kk}$$ subtracts the average over the columns of $$\mathbf{F}$$ from each of its columns and $$-\frac{1}{k} \unicode{x1D7D9}_{nk}$$ subtracts the scalar $$\frac{1}{k}$$ from each entry. The sparse (Cholesky) factorization of the $$n×n$$ positive definite matrix $$\mathbf{Q}$$ can be reused to solve against the $$k$$ columns on its right hand side.

## Pitfalls of Vectorization

It is tempting to vectorize Equation (\ref{equ:sum}) by stacking columns of $$\mathbf{X}$$: \begin{align} \label{equ:vec} \min_{\mathbf{x}_1, …, \mathbf{x}_k} & \frac{1}{2} %\begin{bmatrix}\x_1^\top & … & \x_k^\top \end{bmatrix} \begin{bmatrix}\mathbf{x}_1 \\ \vdots \\ \mathbf{x}_k \end{bmatrix}^\top \begin{bmatrix} \mathbf{Q}& & \\ & \ddots & \\ & & \mathbf{Q} \end{bmatrix} \begin{bmatrix}\mathbf{x}_1 \\ \vdots \\ \mathbf{x}_k \end{bmatrix} - \begin{bmatrix}\mathbf{x}_1 \\ \vdots \\ \mathbf{x}_k \end{bmatrix}^\top \begin{bmatrix}\mathbf{f}_1 \\ \vdots \\ \mathbf{f}_k \end{bmatrix} \tag{6} \\ \text{subject to } & \begin{bmatrix} \mathbf{I}_n & \dots & \mathbf{I}_n \end{bmatrix} \begin{bmatrix}\mathbf{x}_1 \\ \vdots \\ \mathbf{x}_k \end{bmatrix} = \unicode{x1D7D9}_n \end{align} where $$\mathbf{I}_n ∈ \mathbb{R}^{n×n}$$ is the identity matrix.

Applying the Lagrange multiplier method, we can identify the minimizer as a solution to the KKT system: $$\begin{bmatrix} \mathbf{Q}& & & \mathbf{I}_n \\ & \ddots & & \vdots \\ & & \mathbf{Q}& \mathbf{I}_n \\ \mathbf{I}_n & … & \mathbf{I}_n & \\ \end{bmatrix} \begin{bmatrix}\mathbf{x}_1 \\ \vdots \\ \mathbf{x}_k \\ λ \end{bmatrix} = \begin{bmatrix}\mathbf{f}_1 \\ \vdots \\ \mathbf{f}_k \\ \unicode{x1D7D9}_n \end{bmatrix},$$ where $$λ∈\mathbb{R}^n$$ is the vector of auxiliary Lagrange multipliers. However, (as written) this requires solving against a $$n(k+1) × n(k+1)$$ matrix. Moreover, even if $$\mathbf{Q}$$ is positive definite, this KKT system will be indefinite, requiring less efficient factorization methods (e.g., LDLT instead of Cholesky).

Another possibility would be to build a matrix $$\mathbf{N}∈ \mathbb{R}^{nk × n(k-1)}$$ that spans the nullspace of the vectorized constraints, so that \begin{align} \label{equ:y} \begin{bmatrix} \mathbf{I}_n & \dots & \mathbf{I}_n \end{bmatrix} \mathbf{N}\mathbf{y}+ \tilde{\mathbf{x}} = \unicode{x1D7D9}_n, ∀ \mathbf{y}∈ \mathbb{R}^{n(k-1)}, \tag{7} \end{align} where $$\tilde{\mathbf{x}} ∈ \mathbb{R}^{nk}$$ is some feasible solution, for example $$\tilde{\mathbf{x}} = \text{vec}(\frac{1}{k}\unicode{x1D7D9}_{nk})$$.

Substituting Equation (\ref{equ:y}) into (\ref{equ:vec}) and solving for $$\mathbf{y}$$ reveals: $\mathbf{y}= \left( \underbrace{ \mathbf{N}^\top \begin{bmatrix} \mathbf{Q}& & \\ & \ddots & \\ & & \mathbf{Q} \end{bmatrix}\mathbf{N}}_{\tilde{Q}}\right)^{-1} \begin{bmatrix}\mathbf{f}_1 \\ \vdots \\ \mathbf{f}_k \end{bmatrix}.$ Although the matrix $$\tilde{\mathbf{Q}} ∈ \mathbb{R}^{n(k-1)×n(k-1)}$$ is smaller, it will be dense even if $$\mathbf{Q}$$ and $$\mathbf{N}$$ are sparse, ruling out efficient sparse direct solvers.