Archive for December, 2014

Alias for universal latex makefile

Wednesday, December 31st, 2014

A friend recently turned me on to latex-makefile. This little script generates a universal Makefile for building latex documents. I was pretty impressed. All you need to do is copy the generated Makefile into your working directory of tex files and issue:

make

or if you have many .tex files

make [main_document]

I caught myself copying this makefile into a few places which seemed a bit silly so instead I put it in a reasonable place and made an alias to find it there:

export LMAKEFILE="/usr/local/include/UniversalLatexMakefile"
alias lmake="make -f $LMAKEFILE"

Now I can type from anywhere:

lmake [main document]

without copying the Makefile

Export undeformed models from Big Buck Bunny character .blend files

Wednesday, December 31st, 2014

Every time I open Blender I want to cry. This time I struggled through the interface and found the way to export the raw undeformed models from the big buck bunny characters .blend files.

After finding it, the solution is simple:

File > Export > Wavefront (.OBJ)

Then on the left side under “Export OBJ” uncheck “Apply Modifiers” and uncheck “Selection Only” and “Animation”. Then save.

Shallow depth of field rendering in MATLAB

Saturday, December 27th, 2014

cheburashka shallow depth of field

I tried a little experiment with creating lens blur effects using the simple 3D plotter.

Here’s a little chuck of code to move the camera around a point aimed at a target point in the scene and accumulate the result in a buffer.

t = tsurf(F,V,'EdgeColor','none','SpecularStrength',0,'CData',C,fphong);
camproj('persp');
im = [];
count = 0;
sam = 50;
w = 0.2;
target = camtarget;
pos = campos;
[THETA,PHI] = meshgrid(w*linspace(-2*pi,2*pi,sam),w*linspace(-2*pi,2*pi,sam));
prev_theta = 0;
prev_phi = 0;
for s = randperm(numel(THETA))
  theta = THETA(s);
  phi = PHI(s);
  camorbit(-prev_theta,-prev_phi);
  prev_theta = theta;
  prev_phi = phi;
  camorbit(theta,phi);
  frame = getframe(gcf);
  frame = im2double(frame.cdata);
  count = count+1;
  if isempty(im)
    im = zeros(size(frame));
  end
  im = im+frame;
end
im = imtrim(im/count,'Threshold',0);
close all;
imshow(im);

You can get away with fewer samples if the “size of the lens” is smaller. This one uses w=0.1; sam=10:

cheburashka less shallow depth of field

Parallel Quadratic Programming for Image Processing Matlab implementation

Thursday, December 18th, 2014

A few years ago, we came across this interesting method for solving QPs, described in “Parallel Quadratic Programming for Image Processing” by [Brand and Chen 2011]. I’ve implemented this in matlab. Save the following in a file called merl_quadprog.m:

function x = merl_quadprog(varargin)
  % MERL_QUADPROG Optimize problems of the form:
  % 
  % min_x 1/2 x' Q x - x' h
  % 
  % s.t. x>=0
  %
  % According to the method described in "PARALLEL QUADRATIC PROGRAMMING FOR
  % IMAGE PROCESSING" [Brand & Chen 11]
  %
  % x = merl_quadprog(Q,h,'ParamterName',ParameterValue)
  %
  % Inputs:
  %   Q  n by n symmetric positive semi-definite quadratic coefficients matrix
  %   h  n by 1 linear coefficents vector (note sign)
  % Outputs:
  %   x  n by 1 solution vector
  %
  %

  Q = varargin{1};
  h = varargin{2};

  n = size(Q,1);
  assert(size(Q,2) == n);
  assert(numel(h) == n);
  h = h(:);

  max_iter = inf;
  tol = 1e-7;

  % initial guess
  x0 = ones(n,1);

  v = 3;
  while v <= numel(varargin)
    switch varargin{v}
    case 'X0'
      assert((v+1)<=numel(varargin));
      v = v+1;
      x0 = varargin{v};
    otherwise
      error(['Unknown parameter: ' varargin{v}]);
    end
    v = v+1;
  end

  %% "some r ∈ R^n ≥ 0"
  %r = sparse(n,1);
  % ri = max(Qii , ∑ j Q−i j )
  r = max(diag(Q),max(-Q,[],2));

  % "similarly" ... "si>0"
  s = ones(n,1);

  Qp = max(Q,0) + diag(r);
  Qm = max(-Q,0) + diag(r);
  assert(issparse(Qp));
  hp = max(h,0) + s;
  hm = max(-h,0) + s;

  x = x0;
  it = 0;
  while true
    x_prev = x;
    x = x.*(hp+Qm*x)./(hm+Qp*x);
    it = it + 1;
    diff = max(abs(x-x_prev));
    %fprintf('%d: %g\n',it,diff);
    if it >= max_iter
      return;
    end
    if diff<=tol
      return;
    end
  end
end

This seems like an interesting way to solve non-negative least squares problems. Though the theory extends to general QPs, I’m not sure how efficient this method would be in the case on general linear inequality constraints.

I’ve also added this to the gptoolbox.

SuiteSparse vs MATLAB built-in solvers (a quick geometry processing benchmark)

Tuesday, December 16th, 2014

Recently, a student I’m working with installed the SuiteSparse toolbox for matlab and reported wild speedups (20x) over matlab’s built-in backslash (\ or mldivide) linear system solver. I thought surely precomputing a cholesky factorization with good ordering (chol with 3 output params) will fair better. But the student was claiming even 5-10x speed ups over that.

Too good to be true? I decided to do a quick benchmark on some common problems I (and others in geometry processing) run into.

I started with solving a Laplace equation over a triangle mesh in 2d. I generated Delaunay triangle meshes of increasing resolution in the unit square. Then I solved the system A x = b in three ways:

\ (mldivide):

x = A\b;

chol + solve:

[L,~,Q] = chol(A,'lower');
x = Q * (L' \ (L \ ( Q' * b)));

cholmod2 (SuiteSparse):

x = cholmod2(A,b);

For the Laplace equation with a “single right hand side” (b is a n long column vector), I didn’t see any noticeable improvement:

matlab vs suitesparse 2d harmonic

The only thing surprising here is actually how handily backslash is beating the other two.

Laplace operators in 2D are very sparse matrices (~7 entries per row). Next I tried solving a bi-Laplace equation which has the sparsity pattern of squaring the Laplace operator (~17 entries per row). Here we see a difference but it doesn’t seem to scale well.

matlab vs suitesparse 2d biharmonic

The sparsity patterns of Laplace and bi-Laplace operators are larger in 3d. In particular a nicely meshed unit cube might have upwards of 32 entries per row in its bi-Laplacian. Solving these systems we start to see a clear difference. cholmod2 from SuiteSparse is seeing a 4.7x speed up over \ (mldivide) and a 2.9x speed up over chol+solve. There is even the hint of a difference in slope in this log-log plot suggesting SuiteSparse also has better asymptotic behavior.

matlab vs suitesparse 3d biharmonic

So far we’re only considering a single right hand side. Matlab’s parallelism for multiple right hand sides (during “back substitution”) seems pretty pitiful compared to SuiteSparse. By increasing the number of right sides to 100 (b is now n by 100) we see a clear win for SuiteSparse over matlab: 12x speed up over \ (mldivide) and a 8.5x speed up over chol+solve. Here the asymptotic differences are much more obvious.

matlab vs suitesparse 3d biharmonic 100

My cursory conclusion is that SuiteSparse is beating matlab for denser sparse matrices and multiple right-hand sides.

Coffee Shop Compiler

Monday, December 15th, 2014

Here’s a proposal for a compiler/build system for developers like myself who often find themselves coding in a coffee shop with strong wifi connection, but poor power outlet availability and limited battery life.

The standard coding and debugging cycle looks like:

  1. Code for a bit
  2. Think for a bit
  3. Repeat steps 1 and 2 for a bit
  4. Compile (make)
  5. If compile failed return to step 2
  6. Run executable
  7. Return to step 2

The power bottleneck (for my typical cycle) is by far step 4. My idea is to outsource compiling to a server (plugged into a wall outlet). The naive implementation would change the cylce to:

  1. Code for a bit
  2. Think for a bit
  3. Repeat steps 1 and 2 for a bit
  4. compile on server
    1. send code to server
    2. compile on server
    3. if compile failed, send back errors and return to 2
    4. send back executable
  5. Run executable
  6. Return to step 2

A better implementation might roll git into the make command on the local machine:

  1. Code for a bit
  2. Think for a bit
  3. Repeat steps 1 and 2 for a bit
  4. special make
    1. git commit; git push on client
    2. git pull on server
    3. regular make on server
    4. if compile failed, send back errors and return to 2
    5. send back executable
  5. Run executable
  6. Return to step 2

An even fancier implementation might try to keep the changing files in sync during idle time in step 2.

I guess this is assuming my executable is relatively small. Here dynamic (shared) libraries would be especially handy.

MATLAB high idle CPU usage

Sunday, December 14th, 2014

Using my laptop at a coffee shop today I was dismayed to find my battery down to 85% after just 15 minutes. The culprit seems to be that MATLAB’s idle IDE was keeping my CPU busy (6-15%) and the bug is well known. The fix was to issue:

com.mathworks.mlwidgets.html.HtmlComponentFactory.setDefaultType('HTMLRENDERER');

and restart matlab. Now it’s down to around 1% idle CPU usage.

Automatic Differentiation Intuition Dump

Saturday, December 13th, 2014

Every so often I re-read the wikipedia page for automatic differentiation and get confused about forward and reverse accumulation. Both are neat and have appropriate and inappropriate applications. There are many tutorials online, and in addition here’s my intuition.

Forward accumulation

At each step of computation, maintain a derivative value. We seed each initial variable with derivative 1 or 0 according to whether we’re differentiating with respect to it.

Augmenting numerical types with a “dual value” (X := x + x'ε) such that ε*ε=0and overloading math operations is an easy way to implement this method.

For f:R→Rⁿ and n>>1 this is ideal since we end up computing and storing 1 value at
each computation step. If there are m computation variables then we track m derivatives. Work and memory is O(m) to get the n-long vector derivative.

For f:Rⁿ→R this is not ideal. To take a gradient we need to store n derivatives for each computation variable or sweep through the computation n times: O(mn) work.

Backward accumulation

At each step of computation, maintain the current step’s gradient with respect to its inputs and pointers to its input variables. When retrieving derivatives, evaluate the outermost gradient apply the chain rule recursively to its remembered arguments.

Can also implement this with a special numerical type and mathematics operation overloading. This type should maintain the entire expression graph of the computation (or at least store the most immediate computation and live pointers to previous computation variables of the same type), with gradients also provided by each mathematical operation. I suppose one way to implement this is by altering math operations to augment their output with handles to functions computing gradients. Traditionally compilers should be bad at evaluating this stored expression graph, but I wonder if modern compilers with inline function optimization couldn’t optimize this?

In any case, for f:Rⁿ→R and n>>1 this is ideal since a single derivative extraction traversal involving m computation
variables will (re)visit each computation variable once: O(m).

For f:R→Rⁿ this is not ideal. At each computation variable we need to store n derivatives and keep them around until evaluation: O(m*n) memory and work. Whereas forward accumulation just tracks n values across m computation variables: O(m) memory.