Posts Tagged ‘math’

Automatic Differentiation Intuition Dump

Saturday, December 13th, 2014

Every so often I re-read the wikipedia page for automatic differentiation and get confused about forward and reverse accumulation. Both are neat and have appropriate and inappropriate applications. There are many tutorials online, and in addition here’s my intuition.

Forward accumulation

At each step of computation, maintain a derivative value. We seed each initial variable with derivative 1 or 0 according to whether we’re differentiating with respect to it.

Augmenting numerical types with a “dual value” (X := x + x'ε) such that ε*ε=0and overloading math operations is an easy way to implement this method.

For f:R→Rⁿ and n>>1 this is ideal since we end up computing and storing 1 value at
each computation step. If there are m computation variables then we track m derivatives. Work and memory is O(m) to get the n-long vector derivative.

For f:Rⁿ→R this is not ideal. To take a gradient we need to store n derivatives for each computation variable or sweep through the computation n times: O(mn) work.

Backward accumulation

At each step of computation, maintain the current step’s gradient with respect to its inputs and pointers to its input variables. When retrieving derivatives, evaluate the outermost gradient apply the chain rule recursively to its remembered arguments.

Can also implement this with a special numerical type and mathematics operation overloading. This type should maintain the entire expression graph of the computation (or at least store the most immediate computation and live pointers to previous computation variables of the same type), with gradients also provided by each mathematical operation. I suppose one way to implement this is by altering math operations to augment their output with handles to functions computing gradients. Traditionally compilers should be bad at evaluating this stored expression graph, but I wonder if modern compilers with inline function optimization couldn’t optimize this?

In any case, for f:Rⁿ→R and n>>1 this is ideal since a single derivative extraction traversal involving m computation
variables will (re)visit each computation variable once: O(m).

For f:R→Rⁿ this is not ideal. At each computation variable we need to store n derivatives and keep them around until evaluation: O(m*n) memory and work. Whereas forward accumulation just tracks n values across m computation variables: O(m) memory.

Recover Lagrange multiplier values for known values in quadratic energy minimization

Monday, March 5th, 2012

Often I’m minimizing energies of the form:

E = ½ xT yT Q x + xTyT b

where Q is a quadratic coefficient matrix, b is a vector of linear coefficients, x are unknown, and y are known. Now normally if I just want to minimize this energy I treat the known values (y) as constants and take the derivative with respect to the unknowns (x) and set it to zero. Then I just move all the terms involving y to the right hand side.

However if I’d like to know how setting these known values affected the energy minimization then its useful to treat the known values not as constant but as constraints. So I have the same energy as above, where now both x and y are variables, but subject to the constraints that

y = y*

where y* are the known values of y.
I can enforce these via Lagrange multipliers. Then once I’ve found my solution the values of the Lagrange multipliers corresponding to each y will tell me the effect of that y‘s constraint on the energy minimization. The problem now becomes finding the saddle point (equilibrium) of the following Lagrangian:

Λ = ½ xT yT Q x + xTyT b + λT y - λT y*

Or equivalently by repeating the λTy terms:

Λ = ½ xT yT λT Qxx Qxy 0 x + xTyT λT bx
              Qyx Qyy I y           by
              0   I   0 λ          -y*

Now, let’s start to take derivatives with respect to each of the sets of variables. We begin with λ. Setting ∂Λ/∂λT = 0 gives us:

∂Λ    0 I 0 x + -y*
--- =       y       = 0
∂λT         λ

which reduces simply to revealing our known values:

y = y*

Now look at setting ∂Λ/∂xT = 0 which gives us:

∂Λ    Qxx Qxy 0 x + bx
--- =           y       = 0
∂xT             λ

which after substituting our result y=y* reduces to the system of equations we are used to seeing when minimizing quadratic energies with fixed values:

Qxx x = -bx - Qxy y*

So we can invert Qxx and solve for x:

x* = Qxx-1 ( -bx - Qxy y* )

Now x and y are known, all that’s left is to solve for λ by taking the last set of derivatives. So we set ∂Λ/∂yT = 0 which gives us:

∂Λ    Qyx Qyy I x + by
--- =           y       = 0
∂yT             λ

Plugging in our known values and moving them to the right-hand side reveals the values of λ:

λ* = - Qyx x* - Qyy y* - by

The great thing about all of this is that you don’t have to do anything extra. If you’re already set up to immediately solve for x treating y as constant you’ve got everything you need to determine the values of λ without every actually implementing the equivalent Lagrangian.

LU decomposition using two Cholesky decompositions (part two)

Sunday, March 4th, 2012

I previously posted about how often you can compute a LU decomposition of a matrix Q using two Cholesky decompositions. Specifically when your matrix is of the form:

Q = / A  C \
    \ CT * /

In that post I assumed you had a subroutine that computes the Cholesky factorization of a symmetric positive-definite matrix A:


Where L is a lower-triangular matrix.

Often, routines exist to give you an even better factorization which includes a permutation matrix S:


This often results in a more sparse L.

Working through the LU decomposition using this type of Cholesky factorization is a bit tedious and requires some careful book keeping.

This time I want to fill in the missing parts of the following system:

/ ? * \ / A  C \ / ? * \ = / ? * \ / ? ? \
\ * ? / \ CT * / \ * ? /   \ ? ? / \ * ? /

Since A is symmetric positive definite we use our factorization STAS = LLT which gives us:

/ ST * \ / A  C \ / S * \ = / L * \ / LT ? \
\ *  ? / \ CT * / \ * ? /   \ ? ? / \ *  ? /

But this means that know we need that STC = L ?. But L is lower triangular so it’s easy to compute:

M = L-1STC

We can then place M into our LU decomposition:

/ ST * \ / A  C \ / S * \ = / L  * \ / LT M \
\ *  ? / \ CT * / \ * ? /   \ MT ? / \ *  ? /

Now at this point we could use cholesky factorization without the permutation matrix to complete the missing parts, but we might as well use our better decomposition again. To do this we must be a bit careful. We want to create the zeros in the lower right corner of the system. To do this we need a factorization of MTM. But our new routine gives us:


where K another lower triangular matrix and T another permutation matrix.

So we replace M with MT in our system and plug in K. This gives us:

/ ST * \ / A  C \ / S * \ = / L    * \ / LT  MT \
\ *  ? / \ CT * / \ * ? /   \ TTMT K / \ *  -KT /

Finally, compute the missing pieces by evaluating the right hand side and we indeed get our original system conjugated with a permutation matrix composed of the permutation matrices from both of the cholesky factorizations:

/ ST *  \ / A  C \ / S * \ = / L    * \ / LT  MT \
\ *  TT / \ CT * / \ * T /   \ TTMT K / \ *  -KT /

2D rotation matrix plus another 2D rotation matrix is a similarity matrix (scaled 2D rotation)

Wednesday, November 16th, 2011

Consider we have two rotation matrices:

R1 = / cos(θ1) -sin(θ1) \
     \ sin(θ1)  cos(θ1) /

R2 = / cos(θ2) -sin(θ2) \
     \ sin(θ2)  cos(θ2) /

Then we can show that:

R1 + R2 = s * R3

where s is a scalar and R3 is another 2D rotation.

First we just add the components to get:

R1 + R2 = / cos(θ1)+cos(θ2)   -(sin(θ1)+sin(θ2)) \
          \ sin(θ1)+sin(θ2)    cos(θ1)+cos(θ2)   /

Notice we already have the right pattern:

R1 + R2 =  / A -B \
           \ B  A /

we just need to show that we can pull out a common scaling term and angle from A and B.

We can use trigonometric identities to turn the sums of cosines and the sums of sines into products, namely:

cos(θ1)+cos(θ2) = 2*cos((θ12)/2)*cos((θ12)/2)
sin(θ1)+sin(θ2) = 2*sin((θ12)/2)*cos((θ12)/2)

Now we declare that:

s = 2*cos((θ12)/2)
θ3 = (θ12)/2

Then it follows that:

R1 + R2
  / s*cos((θ12)/2)  -s*sin((θ12)/2) \
  \ s*sin((θ12)/2)   s*cos((θ12)/2) /
  s * / cos(θ3)  -sin(θ3) \
      \ sin(θ3)   cos(θ3) /
  s * R3

Minimizing quadratic energies with constant constraints

Tuesday, June 21st, 2011

There are a many different ways to minimize a quadratic energy with constant constraints. I wrote out the exact steps for three such ways:

  • Separate the energy into knowns and unknowns, then solve for the unknowns by setting gradient with respect to those unknowns to zero.
  • Add additional term to energy that punishes not satisfying the known values of your constant constraints in the least squares sense, the solve for all variables by setting gradient with respect to all variables to zero.
  • Add additional variables called lagrange multipliers to energy so that the problem is no longer to find a minimum but a stationary point in the new energy. Such a stationary point guarantees that our constraints are met. Find the stationary point by the usual method of setting the gradient to zero, this time the gradient is with respect to all original variables and these new variables.

The pretty print type-up with MATLAB pseudocode.
A plain text, unicode version:

Quadratic energy
E = ZT * Q * Z + ZT * L + C

We want to fix some of the values of Z, we can rearrange the Z's that are still
free and the ones we want to fix so that Z = [X;Y], where X are the free values
and Y are the fixed values;

Split energy into knowns and unknowns
Z = [X;Y]
Z(unknown) = X and Z(known) = Y
E = [X;Y]T * Q * [X;Y] + [X;Y]T * L + C
E = [X;0]T * Q * [X;Y] + [0;Y]T * Q * [X;Y] + [X;0] * L + [X;Y]T * L + C
E = [X;0]T * Q * [X;0] + 
    [X;0]T * Q * [0;Y] + 
    [0;Y]T * Q * [X;0] + 
    [0;Y]T * Q * [0;Y] + 
    [X;0]T * L + 
    [0;Y]T * L + 
E = [X;0]T * [Q(unknown,unknown) 0; 0 0] * [X;0] + 
    [X;0]T * [0 Q(unknown,known); 0 0] * [0;Y] + 
    [0;Y]T * [0 0; Q(known,unknown) 0] * [X;0] + 
    [0;Y]T * [0 0; 0 Q(known,known)] * [0;Y] + 
    [X;0]T * L + 
    [0;Y]T * L + 
E = XT * Q(unknown,unknown) * X +
  XT * Q(unknown,known) * Y +
  YT * Q(known,unknown) * X +
  YT * Q(known,known) * Y + 
  XT * L(unknown) +
  YT * L(known) +

E = XT * Q(unknown,unknown) * X +
  XT * Q(unknown,known) * Y +
  (YT * Q(known,unknown) * X)T +
  YT * Q(known,known) * Y + 
  XT * L(unknown) +
  YT * L(known) +

E = XT * Q(unknown,unknown) * X +
  XT * Q(unknown,known) * Y +
  XT * Q(known,unknown)T * Y +
  YT * Q(known,known) * Y + 
  XT * L(unknown) +
  YT * L(known) +

E = XT * NQ * X + XT * NL + NC
NQ = Q(unknown,unknown)
NL = Q(unknown,known) * Y + Q(known,unknown)T * Y + L(unknown)
NC = YT * Q(known,known) * Y + YT * L(known) + C

∂E/∂XT = 2 * NQ * X + NL

Solve for X with:
X = (- 2 * NQ)^-1 * NL

Enforce fixed values via soft contraints with high weights
E = ZT * Q * Z + ZT * L + C 
Add new energy term punishing being far from fixed variables
NE = ZT * Q * Z + ZT * L + C + w * (Z(known) - Y)T * I * (Z(known) - Y)
where w is a large number, e.g. 10000
NE = ZT * Q * Z + ZT * L + C + w * (Z - [0;Y])T * [0 0;0 I(known,known)] * (Z - [0;Y])
where W is a diagonal matrix with Wii = 0 if i∈unknown and Wii = wii if j∈known
NE = ZT * Q * Z + ZT * L + C + (Z - [0;Y])T * W * (Z - [0;Y])
NE = ZT * Q * Z + ZT * L + C + ZT * W * Z - 2 * ZT * W * [0;Y] + [0;Y]T * W * [0;Y]
NE = ZT * NQ * Z + ZT * NL + NC
NQ = Q + W
NL = L - 2W * [0;Y]
NC = C + [0;Y]T * W * [0;Y]

Differentiate with respect to ZT
∂E/∂ZT = 2 * NQ * Z + NL

Solve for Z with:
Z = -0.5 * NQ^-1 * NL
or re-expanded
Z = -0.5 * (Q + W)^-1 * (L - 2 * W * [0;Y])

Discard known parts of Z to find X
X = Z(unknown)
But be careful to look at Z(known) - Y to be sure your fixed values are being
met, if not then w is too low. If the X values look like garbage then perhaps
you're getting numerical error because w is too high.

Lagrange multipliers
We want to minimize
E = ZT * Q * Z + ZT * L + C 
subject to the constant constraints:
Z(known) = Y
Find stationary point of new energy:
NE = E + ∑λi *(Zi - Yi)
NE = E + [Z;λ]T * QC * [Z;λ] - [Z;λ]T * [0;Y]
where QC = [0 0 0;0 0 I(known,known);0 I(known,known) 0]
NE = [Z;λ]T * NQ * [Z;λ] + [Z;λ]T * NL + C 
NQ = [Q 0;0 0] + QC =  [         Q        0             ]
                       [                  I(known,known)]
                       [ 0 I(known,known) 0             ]
NL = [L;0] + [0;Y];
Differentiate with respect to all variables, including lagrange multipliers
∂E/∂[Z;λ]T  = 2 * NQ * Z + NL
Solve with
[Z;λ] = -0.5 * NQ^-1 * NL

Discard fixed values and langrange multipliers.
X = Z(known)
The value of λi is the "force" of the constraint or how hard we had to pull the
energery minimum to meet the constraints. See

Common mesh energies, sum notation and matrix notation

Monday, June 20th, 2011

I’m always re-deriving the matrix notation of common triangle mesh energies like the Dirichlet energy, Laplacian energy, or local rigidity energy. Here’s my usual strategy. I’ll assume that if your energy came from a continuous integral you can at least get it to a sum over mesh elements. Here I write my sums over vertices and their neighbors, but surely you could also sum over triangles.

mesh energies sum to matrix form
Pretty-printed pdf document

Uglier plain text, unicode version:

∑i∑j∈N(i) vi  = ∑i * #N(i) * v = VT * D * 1, 
  where D is diagonal with Dii = #N(i)
∑i∑j∈N(i) vj  = VT * A * 1, where A is the adjacency matrix
∑i∑j∈N(i) vi - vj = VT * D * 1 - VT * A * 1 = VT * (D-A) * 1

∑i∑j∈N(i) wij  = 1 * Aw * 1, where Aw adjacency with Awij = wij, if j∈N(i)
∑i∑j∈N(i) wij*vi  = ∑i vi * ∑j∈N(i) wij  = VT * Dw * 1, 
  where Dw diagonal with Dwii = ∑j∈N(i) wij
∑i∑j∈N(i) wij*vj  = VT * Aw * 1

∑i mi * ∑j∈N(i) wij vi = ∑i mi * vi * ∑j∈N(i) wij = VT * M * Dw * 1,
  where M is diagonal with Mii = wi
∑i mi * ∑j∈N(i) wij vj = VT * Aw * M * 1

∑i∑j∈N(i) vi² = VT * D * V
∑i∑j∈N(i) vivj = VT * A * V
∑i∑j∈N(i) vj² = VT * RD * V, where RD is diagonal, RDjj = ∑i 1 if j∈N(i)
  if j∈N(i) implies i∈N(j) then
  D = RD
  so ∑i∑j∈N(i) vj² = ∑i∑j∈N(i) vi² = VT * D * V
∑i∑j∈N(i) (vi - vj)² = ∑i∑j∈N(i) vi² - 2 * ∑i∑j∈N(i) vivj + ∑i∑j∈N(i) vj²
                     = 2 * ∑i∑j∈N(i) vi² - 2 * ∑i∑j∈N(i) vivj
                     = 2 * VT * D * V - 2 * VT * A * V
                     = 2 * VT * (D - A) * V
                     = 2 * VT * L * V

∑i∑j∈N(i) wij*vi²  = ∑i vi² * ∑j∈N(i) wij = VT * Dw * V
∑i∑j∈N(i) wij*vivj = VT * Aw * V
∑i∑j∈N(i) wij*vj² = VT * RDw * V, where DW is diagonal, RDwjj = ∑i wij if j∈N(i)
  if j∈N(i) implies i∈N(j) and wij = wji then 
  Dw = RDw
  so ∑i∑j∈N(i) wij*vj² = ∑i∑j∈N(i) wij*vi² = VT * Dw * V
∑i∑j∈N(i) wij(vi - vj)² = 
  = ∑i∑j∈N(i) wij*vi² - 2 * ∑i∑j∈N(i) wij*vivj + ∑i∑j∈N(i) wij*vj²
  = 2 * ∑i∑j∈N(i) wij*vi² - 2 * ∑i∑j∈N(i) wij*vivj
  = 2 * VT * Dw * V - 2 * VT * Aw * V
  = 2 * VT * (Dw - Aw) * V
  = 2 * VT * Lw * V

∑i mi * ∑j∈N(i) wij vi² = ∑i mi * vi² * ∑j∈N(i) wij = VT * M * Dw * V
∑i mi * ∑j∈N(i) wij vj² = VT * RDw * M * V
∑i mi * ∑j∈N(i) wij vivj = VT * Aw * M * V
∑i mi * ∑j∈N(i) wij(vi - vj)² = 
  = ∑i mi * ∑j∈N(i) wij*vi² 
    - 2 * ∑i mi * ∑j∈N(i) wij*vivj 
    + ∑i mi *∑j∈N(i) wij*vj²
  = 2 * ∑i mi * ∑j∈N(i) wij*vi² - 2 * ∑i mi * ∑j∈N(i) wij*vivj
  = 2 * VT * M * Dw * V - 2 * VT * Aw * M * V
  = 2 * VT * M * (Dw - Aw) * V
  = 2 * VT * M * Lw * V
  Note: let wij = cwij = cot(αij) + cot(βij) / mi and let mi = voronoi area at 
  vi, then the familar cotangent form of the discrete Dirichlet energy is
  EDirichlet = 2 * VT * M * Lcw * V

Split triangular prism into three tetrahedra

Wednesday, June 15th, 2011

triangular prism split into three tetrahedra small

Recently I began looking into tet-meshes again. Some colleagues and I were trying to wrap our heads around how many tetrahedra result from splitting up a triangular prism. we were trying to draw out the cases on the white board. Occasionally some one would say, “I’m sure it’s 3.” Then somebody else would draw another set of pictures and’d say “No, I’m sure it’s 4”.

Well, to have a definitive answer, a triangular prism may be split into three tetrahedra. If you don’t believe me, see the nice description (which actually lists all possible splits) in “How to Subdivide Pyramids, Prisms and Hexahedra into Tetrahedra” by J. Dompierre et al. in 1999. There is another interesting discussion in “Space-filling Tetrahedra in Euclidean Space” by D.M.Y. Sommerville in 1923, in which they consider the conditions under which these tetrahedra are congruent.

Including cmath and math.h and using isfinite(x)

Wednesday, April 20th, 2011

I’ve been trying to compile a C++ program with a set of header files one a Solaris SPARC machine. It’s my first experience with GCC outside of the usual intel/unix/mac world and it’s been a bumpy ride.

One problem that I had was that gcc on this machine doesn’t seem to be using the same standards. This was easy to fix once I’d identified it as a problem: just add -ansi to the compiler flags.

The code I’m compiling uses the std::isfinite function. On Macs gets included with cmath, but on the solaris machine it does not. Instead isfinite (notice that this time there is no std namespace) can be included from math.h. So fine, I change the headers to include from math.h instead of cmath and use isfinite instead of std::isfinite.

But! I’m still getting the same old:

...: error: ‘isfinite’ was not declared in this scope

I looked and looked and finally found that earlier in the headers, before getting around to including math.h they include cmath. Including cmath then math.h apparently nukes any ability of retaining isfinite as a global function.

Getting rid of other headers, here’s a simple program that won’t compile:

#include <cstdio>
#include <cmath>
#include <math.h>

int main(int argc,char * argv[])
  double b = 1.0;
  bool foo = isfinite(b);
  printf("isfinite(%g): %s\n",b,(foo ? "true" : "false"));

  b = 1.0/0.0;
  foo = isfinite(b);
  printf("isfinite(%g): %s\n",b,(foo ? "true" : "false"));

  return 0;

I compile with:

g++ test.cpp -o test -ansi

Okay, at this point I could try to get rid of the cmath include(s), but now I’m starting to mess with these headers a lot and I hadn’t planned on having to edit them at all.

Instead I came up with this solution. Before isfinite is used, I insert:

#define isfinite(x) !std::isinf(x)

Here’s a test program that shows how it works:

#include <cstdio>
#include <cmath>
#include <math.h>
#define isfinite(x) !std::isinf(x)

int main(int argc,char * argv[])
  double b = 1.0;
  bool foo = isfinite(b);
  bool bar = isinf(b);
  printf("isfinite(%g): %s\n",b,(foo ? "true" : "false"));
  printf("isinf(%g): %s\n",b,(bar ? "true" : "false"));

  b = 1.0/0.0;
  foo = isfinite(b);
  bar = isinf(b);
  printf("isfinite(%g): %s\n",b,(foo ? "true" : "false"));
  printf("isinf(%g): %s\n",b,(bar ? "true" : "false"));

  return 0;

Now, of course, math.h is no longer necessary since we’re not using its isfinite anyway. But the point is to show that it works with both cmath and math.h in the includes.
Update: This doesn’t work… I’m hopefully going to get back to this with something that does.

Rotate a point around another point

Tuesday, February 22nd, 2011

Composing rotations and translations is one of the most important operations in computer graphics. One useful application is the ability to compose rotations and translations to rotate a point around another point.

rotate around other point

Here’s how I learned to do this. I find this method the most intuitive.

  1. Translate so that the other point is at the origin
  2. Rotate about the origin by however much you want
  3. Translate back so that the other point is back to its original position

rotate around other point

This works out nicely with matrix math since rotations around the origin are easily stored as 2×2 matrices:

                               / cos θ  -sin θ\
Rotation around origin by θ:  |                |
                               \ sin θ   cos θ/

If we call that matrix, R, then we can write the whole operation that rotates a point, a, around another point, b,, as:
R*(a-b) + b. Be careful to note the order of operations: (a-b) corresponds to step 1, then left multiply with R to step 2, and finally adding back b is step 3.

One convenient fact is that when we look at the transformation as a whole where T(a): a → R*(a-b) + b. Because T is just the composition of rotations and translations it can be decomposed into a single rotation followed by a single translation. Namely, T(a): a → R*(a-b) + b may be re-written as T(a): a → S(a) + t, for some rotation matrix S and some translation vector t. To see this just use the distributive law: R*(a-b) + b = R*a – R*b + b, then S = R and t = R*b + b.

So that gives a new derivation of the transformation that rotates a point, a, around another point, b,: R*a – R*b + b. Building an intuition as to why this works is a little tricky. We have just seen how it can be derived using linear algebra but actually seeing this version as each step is elusive. Turns out it helps to make a subtle change. Reverse the last two terms, so that you have: R*a + (b – R*b). Now intuitively we can follow the order of operations and build an intuition:

  1. Rotate first the point about the origin (since the other point is not the origin we’ve “missed” where we should have been rotated by a certain error amount)
  2. Rotate the other point about the origin by the same amout
  3. Translate by the difference between the original other point and the rotated other point (this is the error amount, because we know that rotating other pointabout itself shouldn’t change its position)

rotate around other point

Update: Here’s an image comparing these two compositions:

rotate around other point

Barycentric coordinates and point-triangle queries

Friday, February 4th, 2011

Recently I needed to test whether a bunch of points where on a given triangle in 3D. There are many ways of querying for this, but I found one that was for me both intuitive and convenient. The method uses barycentric coordinates.

Given a triangle made up of points A, B, and C you can describe any point, p, inside the triangle as a linear combination of A, B, and C: p = λa*A + λb*B + λc*C. That is to say, p can be reproduced as a weighted average of A, B and C. Where the λs are the weights.

One way to realize this is true is to notice that a triangle can be defined by two vectors, say A→B and A→C, and as long as these two vectors are not parallel we know from Linear Algebra that we can span the whole plane that they define (including the triangle A, B,C which lies on that plane).

abc) are called the “barycentric coordinates” of p. Another (more or less unused) name for these are the “area coordinates” of p. This is because λa, λb, and λc are very conveniently defined as:
λa = area(B,C,P)/area(A,B,C)
λb = area(C,A,P)/area(A,B,C)
λc = area(A,B,P)/area(A,B,C)

An important quality of barycentric coordinates is that they partition unity, that is: λabc = 1. To see this is true, notice that these sub triangles exactly cover the area of the whole triangle and since we divide be that whole triangle’s area we are normalizing the sub triangle areas such that they must sum to one: the barycentric coordinates are the percentages of the area covered by the sub triangles and the whole triangle area. Each sub triangle corresponds to the original triangle corner that is not a corner of the sub triangle. Thus, A corresponds to the sub triangle B, C, p and so forth.

As we move p inside the triangle, ABC, the barycentric coordinates shift. Notice that all coordinates are positive as long as p is inside the triangle.

If we move p onto an edge of the triangle then the area of the sub triangle corresponding to the corner opposite that edge becomes zero, thus the barycentric coordinate for p corresponding to that corner is zero.

If we move p onto a corner of the triangle then the sub triangle corresponding to that corner is the same (and thus has the as area) as the original triangle, so its corresponding barycentric coordinate to p is 1, the other areas and coordinates being 0.

If we move p outside of the triangle the total area covered by the sub triangles will be greater than the original triangle. This is easy to see as the sub triangles always cover the original triangle. We could use this to test whether p is inside the triangle ABC, but there actually a more elegant way that will be handy later.

If instead of computing regular (non-negative) area, we compute signed area then even when we move p outside of ABC the areas will sum to the original area of the triangle ABC. One way to think of signed area is by looking at the orientation of the triangles. Start with p inside ABC.

Imagine coloring the side of each triangle facing you green and the back side red. When we move p the triangles change shape, but as long as p stays inside the triangle we still see the same, green side.

When we pull p outside the triangle the triangle on the edge we crossed gets flipped, and now we see the back, red, side. Now we declare that if we can see the green side of the triangle, its area is positive and if we see the red side of the triangle its area is negative. Now, finally notice that the red side of the flipped triangle when p is outside ABC is always covered exactly by the other two, green triangles. Thus the negative area is cancelled out and we are only left with exactly the original ABC area.

So if we use signed area to compute barycentric coordinates of p, we can determine if p is inside, on, or outside a triangle by looking to see if all coordinates are positive, if any are 0 or if any are negative.

If our triangle is on the 2D plane then this sums up everything we might want to know about p and a given triangle ABC. But if ABC is an arbitrary triangle in 3D and p is also an arbitrary point, then we may also want to know if p whether p is on the plane of the triangle ABC, or if it is above or below the triangle.

We showed above that if p is on the plane of the triangle then the total un-signed area of the sub triangles is greater than the original triangle area and the total signed area of the sub triangles is always exactly equal to the area of the original triangle.

However, if we pull p off the plane of the sum of the total un-signed area of the sub triangles will necessarily be greater than the area of the original triangle. To see this notice that by pulling p off the triangle we are making a tetrahedron pABC with a certain amount of volume, where before pABC was flattened onto the base triangle ABC with no volume. Where ever p is in space it makes such a tetrahedron with ABC and we can always flatten (project) it to the plane of the triangle where we know the total area of the flattened triangles must cover (be greater than) the original triangle ABC. Now finally notice that “flattening” a triangle can never increase the triangle’s area (imagine holding a paper triangle in front of your face, your eyes project it onto the plane which you see. There’s no way of rotating the triangle so that the projected area (the area you see) is bigger than the actual area of the triangle)).

Finally, we just showed that pulling p away from the plane of the triangle increases the magnitude of each of the sub triangles’ area. We may also use this fact to see that the total signed area of these sub triangles is only ever equal to the area of the original triangle if we keep p on the triangle’s plane. If we pull p above the triangle (off its front side) then the total signed area is always greater than the original area, and if we pull p below the triangle (off its back side) then the total signed area is always less than the original area. This is harder to see and hopefully I will find a nice way of explaining it.

It’s important to note that the areas we get when we pull p off of the plane of the triangle ABC can no longer partition unity, in the sense that if we use these areas as above to define λa, λb, and λc, p will still equal λa*A + λb*B + λc*C, but λa + λb + λc will not equal 1.