Archive for May, 2013

Setting row of sparse matrix to zero in matlab

Friday, May 24th, 2013

I wanted to set a whole row of a sparse matrix A to zero:

A = sparse(1e8,1e8);
A(r,:) = 0;

But this was rather slow. About 1.7 seconds.

Turns out all rows and columns are not created equal in matlab. Because of how matlab stores sparse matrices it’s much faster to set a whole column to zero:

A(:,c) = 0;

This is about 0.0002 seconds.

So if you can refactor your code you’re better off messing with columns. If you can’t refactor, you might think it would be fast to transpose before and after setting a row to zero:

A = A';
A(:,r) = 0;
A = A';

But this helps not at all. It seems transposes are just stored as flags, which is usually a good thing. But here it means we don’t gain anything, since after the transpose, now rows the cheap route. This also implies that refactoring could be nontrivial.

One might also think you could first find all the non-zero entries and then set them each to zero. But just the rowwise find is expensive:


~1 second

In fact just the row-wise access is expensive


~1 second

Seems like there’s no happy answer.

Geometric proof for sum of arctangent identity

Friday, May 24th, 2013

Today a colleague showed me an interesting trigonometric identity:

atan(α) + atan(β) = atan( (α + β) / (1 – αβ) )

We couldn’t come up with a geometric interpretation immediately, but I found a pretty nice understanding. Begin by first recalling what atan(α) measures: the angle θα between the x-axis and some point (xα,yα):

atan(α) = atan(yα/xα) = θα

The distance of the point (xα,yα) to the origin doesn’t matter, since the angle will be the same. Thus we can always normalize (xα,yα) to be unit length or otherwise.

When we look at the identity at the top then, we see that the left-hand side is really summing angles:

atan(α) + atan(β) = atan( (α + β) / (1 – αβ) )
atan(yα/xα) + atan(yβ/xβ) =
θα + θβ =
= θ?
= atan(y?/x?)

And this means the the numerator and denominator inside the atan in right-hand side are telling us the y and x values respectively of some point (x?,y?) along the ray with angle θ? = θα + θβ.

To see how we arrive at these particular values for x? and y? we can make a couple assumptions. Without loss of generality assume that xα = 1. If not we can always divide both xα and yα by xα without changing θα or atan(yα/xα) or atan(α). Also, w.l.o.g. assume that xβ = sqrt( xα² + yα² ), that is, the length of (xα,yα) as a vector, for short let’s write this xβ = ‖xαyα‖. This illustration shows vectors (xα,yα) and (xβ,yβ).

atan sum identity

Now, notice that we can sum θα and θβ by rotating the vector (xβ,yβ) by θα and reading off the angle θ? = θα + θβ. Because we’ve chosen our lengths carefully, this is very easy.

atan sum identity

Because xβ = ‖xαyα‖, we just need to move yβ units in the direction perpendicular to the vector (xα,yα). In other words:

(x?,y?) = (xα,yα) + yβ (xα,yα) / ‖xαyα
= (xα,yα) + yβ (-yα,xα) / ‖xαyα
= (1,yα) + yβ (-yα,1) / xβ

where we immediately simplify the known quantities.

Recall that:
α = yα / xα = yα
β = yβ / xβ,
and now consider each coordinate separately:

y? = yα + yβ/xβ = α + β
x? = 1 – yβyα/xβ = 1 + α β

Smoothly parameterized ease curve

Thursday, May 23rd, 2013

Using polynomial ease curves, it’s easy to create a function f(x) which obtains f(0) = 0 and f(1) = 1 as well as reaching 0 derivatives f'(0) = 0, f'(1) = 0, f”(0) = 0, f”(1) = 0, etc. One can start with a simple cubic polynomial: f(x) = 3x²-2x³ which achieves first order smoothness (i.e. f'(0) = 0 and f'(1) = 0. Then we can compose this to have higher and higher continuity f(f(x)), f(f(f(x))). This creates a 9th and 27th order polynomials. Here’s a sequence of repeated composing this cubic function. The title shows the order: polynomial ease curves We can also derive such a smooth, symmetric polynomial ease curve for any order 2*k+1, k in ℕ+ polynomial. Thus with k we can have finer, albeit incremental, control over the shape of the function. Here’s a ruby script to generate a maple script that will generate the necessary coefficients (there must be a pattern here, but I’m too lazy right now to find it):

# Script to generate Maple code for solving for 2*k+1 order polynomial f(x)
# with value f(0) = 0, f(1) = 2 and d<sup>k</sup>f/dx<sup>k</sup>|<sub>0</sub> = 0 d<sup>c</sup>f/dx<sup>c</sup>|<sub>1</sub> = 1, c=1,...,k
smoothness = 3
(1..smoothness).each do |k|
  as = (k+1..2*k+1).map{|c| "a#{c}"}.join(",");
  puts "f := (#{as},x) -> "
  puts "  #{(k+1..2*k+1).map{|c| "a#{c}*x^#{c}"}.join("+")};"
  puts "solve({"
  puts "  f(#{as},0)=0,"
  puts "  f(#{as},1)=1,"
  diffs = (1..k).map do |c|
    "  eval(diff(f(#{as},x),#{(1..c).map{|d| "x"}.join(",")}),x=0)=0,\n"+
    "  eval(diff(f(#{as},x),#{(1..c).map{|d| "x"}.join(",")}),x=1)=0"
  puts "#{diffs.join(",\n")}},"
  puts "  {#{(k+1..2*k+1).map{|c| "a#{c}"}.join(",")}});"

Using these coefficients we can plot the first few: polynomial incremental order smoothness ease fit These sequences converge on a scaled and shifted Heaviside step function. But we only have an integer-valued parameter. It’d be nice to have a smooth parameter. If we only cared about this property of the curve getting steeper in the middle as we increase or parameter (while maintaining interpolation and at least C1 continuity, then we could try to do this with a cubic spline. Now we can smoothly increase the steepness, but before reaching the step function the spline curves to far, losing injectivity when treated as a function of x. spline ease curve The fact the we cannot reproduce the Heaviside function with a cubic spline should have been obvious anyway. The Heaviside function is loosely a degree ∞ polynomial, with ∞ smoothness at 0 and 1. We can achieve what we want by taking a function which indeed converges to the Heaviside step function: f(x) = 1/(1+e-2tx). We can adapt this to our unit square domain and enforce interpolation for any k: g(x) = (f(2x-1)-1/2)/(f(1)-1/2)/2+1/2, or in MATLAB:

g = @(x,t) (1./(1+exp(-2*t*(2*x-1)))-1/2)/(1./(1+exp(-2*t*1))-1/2)/2+1/2);

This function behaves pretty nicely for t in (0,∞), going from a linear function to the Heaviside step function: Exponential ease curve To achieve kth order continuity at our endpoints we can just run this function through (i.e. compose with) one of the 2k+1 order polynomials above. That way, with t we smoothly span a whole space of kth order smooth ease curves. For example, if h(x) = 3x²-2x³ then this is h(g(x)): Exponential smooth ease curve

Compiling and running qslim on Mac OS x

Thursday, May 23rd, 2013

I recently successfully compiled and executed qslim on mac os x. It took a little tweaking but here’s how I did it.


Before anything else, be sure to install fltk. I did this using macports:

sudo port install fltk


Then, following the instructions in qslim-2.1/README.txt, I first compiled libgfx:

cd libgfx

The libpng support seems out of date and I don’t really need it so in raster-png.cxx I replaced:

//#ifdef HAVE_LIBPNG
#if false

memcopy was shoing up undefined so I added the following include to raster.cxx:

#include <cstring>

Similarly there was a missing include in time.cxx:

#elif defined(HAVE_TIMES)
#include <sys/times.h>

Know, while inside libgfx I issue:

make -C src


First travel inside mixkit and configure:

cd ../mixkit

Trying to make I got a bunch of errors of the form:

MxDynBlock.h:44:33: Error: 'resize' was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]
MxDynBlock.h:44:33: Error: note: declarations in dependent base 'MxBlock<MxStdModel*>' are not found by unqualified lookup
MxDynBlock.h:44:33: Error: note: use 'this->resize' instead

There’re at least two ways to fix this. You can add this-> everywhere gcc is tell you to: edit MxStack.h and MxDynBlock.h or you can add -fpermissive to the CXXFLAGS. This is awkwardly done not in mix-config but in ../libgfx/gfx-config. Where you can edit into something like:

CXXFLAGS = -g -O2  -I/Users/ajx/Downloads/qslim-2.1/libgfx/include -DHAVE_CONFIG_H $(WIN_FLAGS) -fpermissive

Alternatively you could have done this when configuring libgfx with:

cd ../libgfx
./configure CXXFLAGS='-fpermissive'
cd ../mixkit/

If you do it this way then you’ll get warnings instead of errors.

Now you can build with:

make -C src

qslim and qvis

Travel to the qslim tools directory:

cd ../tools/qslim

Here I need to add libfltk_gl to the linker commands in Makefile:

        $(CXX) -o qslim $(OBJS) $(LDFLAGS) $(LIBMIX) -lm -lfltk_gl
        $(CXX) -o qvis $(QVIS_OBJS) $(LDFLAGS) $(LIBMIX) $(GUI_LIBS) -lm -lfltk_gl

Then build with:


(those fpermissive warnings may show up again, but no problem)

Finally I can run the command line program qslim or the gui qvis. For some reason the qui immediately closes if I double click on it. But I can run it by issuing:


where I’m then prompted to load a .smf file.


./ input.smf

You can call the command line program with:

./qslim -t 1000 -M smf -q 2>/dev/null

which spills the .smf file to stdout

It seems .smf format is the same as .obj, at least if there are only vertices and triangular faces. So, you can compose—a rather long—bash oneliner that simplifies an input.obj file to an output.obj file:

grep "^[vf] " input.obj | sed -e "s/^f *\([0-9][0-9]*\)\/[^ ]*  *\([0-9][0-9]*\)\/[^ ]*  *\([0-9][0-9]*\)\/.*/f \1 \2 \3/g" | ./qslim -t 1000 -M smf -q 2>/dev/null | grep "^[vf]" > output.obj


I also built the “filters” (converters?) with:

cd ../filters

Note that ply2smf only acts on stdin and stdout so call it with:

cat input.ply | ./ply2smf >output.smf

Update: I found an easier way to setup the configure script and ensure that all macports libraries are found:

env CPPFLAGS="-I/opt/local/include -fpermissive" LDFLAGS="-L/opt/local/lib" ./configure

Google search current page bookmarklet

Thursday, May 23rd, 2013

Here’s a javascript bookmarklet for googlin’ the title of the current page your viewing:


Drag me onto your bookmark bar.

Then when you’re at a sit that blocks regular visiting but allows entrance through a google search—cough, cough,—you can quickly google the title and click the inevitably top link.

Normalize rows a sparse matrix to sum to one in matlab

Monday, May 20th, 2013

I surprisingly found a bottleneck in my matlab program to be how I was normalizing rows of a sparse adjacency matrix. I was doing this using bsxfun:

A = bsxfun(@rdivide,A,sum(A,2));

This matlab forum lists a few other ways. I compared them with this script using the timeit function:

a1 = @() bsxfun(@rdivide,A,sum(A,2));
a2 = @() spdiags (sum (A,2), 0, size(A,1), size(A,1)) \ A ;
a3 = @() spdiags (1./sum (A,2), 0, size(A,1), size(A,1)) * A ;
% A is 10000x10000 with roughly 6 nnz per row
% 0.32 seconds
% 0.0023 seconds
% 0.0016 seconds

So I’m going with:

A = spdiags (1./sum (A,2), 0, size(A,1), size(A,1)) * A ;

Update: This observation is not valid if A is dense. The same methods above perform differently:

% A = rand(10000,10000);
% 0.49 seconds
% 1.02 seconds
% 0.72 seconds

Looks like bsxfun is still the best bet for dense matrices.

Deriving the probability density function of a non-uniform sampling of the circle

Sunday, May 19th, 2013

I recently posted about non-uniformly sampling a sphere and visualizing the result. As I was examining the result, I though it would be more convenient just to derive the probability density function analytically. This is an exercise in transforming the probability density function of one random variable into that of another.

As a warmup I followed this systematic approach on math.stackexchange for a 2d problem of non-uniformly sampling a circle, actually just the first quarter of the circle. The scheme under consideration is to first sample x and y from the unit square uniformly, then normalize (x,y) to unit length and read off the angle θ. This transformation of normalizing then converting to polar angle, takes these two uniformly samples to a non-uniform sampling of the quarter circle, θ in [0,π/2).

In other words the probability density functions of the first variables x and y are both 1 for x,y in [0,1) and we’d like to know what the probability density function of θ is.

We begin by acknowledging that for any reasonable function u we have the equality:

E(u(θ)) = u(θ) 10≤x<110≤y<1 dxdy.

Now let’s define θ = atan2(y/sqrt(x2+y2),x/sqrt(x2+y2)) = atan2(y,x).

Here we’ll perform a change of variables. Let y = t and let x = t/tan θ.

This means that dxdy = dx/dθ dθ dy/dt dt = -t/sin2(θ) dθdt, hence:

E(u(θ)) = u(θ) 10≤t/tan θ<110≤t<1 -t/sin2(θ) dθdt

We also know that for every reasonable function u that:

E(u(θ)) = u(θ) fΘ(θ) dθ,

where fΘ(θ) is the (so far unknown) probability density function of our random variable Θ. Setting this equal to the equation above we have:

fΘ(θ) = 10≤t/tan θ<110≤t<1 -t/sin2(θ) dt.

So far we’ve been dealing with indefinite integrals, but we can convert to a definite one by evaluating the 1 functions, leaving:

fΘ(θ) = 0min(1,tan θ) -t/sin2(θ) dt
        = -1/sin2(θ) 0min(1,tan θ) t dt
        = -1/sin2(θ) · min(1,tan θ)2/2

Finally we can also write this as a piecewise function:

fΘ(θ) = { 1/cos2θ/2 θ≤π/4
1/sin2θ/2 θ>π/4

We can verify this with the following small matlab program:

sam = 4000000;
method = 'naive';
%method = 'uniform';
switch method
case 'naive'
  N = normalizerow(rand(sam,2));
case 'uniform'
  S = rand(sam,1)*pi/2;
  N = [cos(S(:,1)) sin(S(:,1))];
S = atan2(N(:,2),N(:,1));

% analytic solution
f = @(th) (th<pi/4).*(1/cos(th).^2) + (th>=pi/4).*(1/sin(th).^2);

[h,b] = hist(S,100);
% normalize so integral is 1
h = h./sum(h*(pi/2/numel(b)))
hold on;
hold off

which produces:
sample circle empirical vs analytic

Actually, it’s also pretty straight forward to derive the probability density function geometrically rather than analytically.

Start with our uniform sampling of the unit square:
sample circle geometric 01

When we project this sampling to the quarter circle (green), we’re really integrating along each ray (blue) of angle θ (grey):
sample circle geometric 02

If we had only considered the points inside the circle then we would indeed have a uniform sampling as this integral is then independent of θ. This invites a scheme called rejection sampling: choose x,y in the unit cube but only keep those inside the circle.
sample circle geometric 03

It’s the points that lie outside the circle that cause this kind of sampling to be biased, non-uniform.
sample circle geometric 04

We can compute the integral along this ray geometrically. Since the sampling is uniform, it amounts to just computing the length of the line segment. We just need to know for a given angle θ where the ray intersects the unit square.

First let’s consider θ≤π/4. We know that x = 1 and by the polar coordinations we have that y = r cosθ, thus the radius and the length of these line segments are 1/cos θ. Likewise for θ>π/4 we have line segments of length 1/sin θ. These values are the length of the line segments, and since we’re integrating a uniform density function, they’re also the mass accumulated along the line segment. Finally we must account for the change in units, exchanging each small change in angle with a change in area. Namely by exactly this length divided by two (two because the thin space between two close rays is a triangular, rather than rectangular). Thus, by multiplying our mass by this change we have:

fΘ(θ) = { (1/cos2θ)/2

which matches the analytic result above.

Update: I’m not so confident that this derivation geometric derivation tells the whole story anymore.

Visualizing samples on a sphere

Friday, May 17th, 2013

In my project, I need to uniformly sample directions or equivalently points on the unit sphere. A correct way is to sample the azimuth and cosine of the polar angle uniformly. Another way is to sample points in ℝ3 randomly with mean at the origin and variance 1.

One naive non uniform way, is to sample the x, y and z between [-1,1] uniformly and normalize. I knew that this was biased, but I wanted to see the bias. Here’s a small matlab program that continuous splats random points onto a textured sphere:

% Can't use sphere(); have to use funny parameterization so tex-mapping works
n = 100;
theta = (-n:2:n)/n*pi;
sinphi = (-n:2:n)'/n*1;
cosphi = sqrt(1-sinphi.^2); cosphi(1) = 0; cosphi(n+1) = 0;
sintheta = sin(theta); sintheta(1) = 0; sintheta(n+1) = 0;
X = cosphi*cos(theta);
Y = cosphi*sintheta;
Z = sinphi*ones(1,n+1);
% square texture resolution
w = 1000;
im = ones(w,w);
s = surf(X,Y,Z,'FaceColor','texturemap','Cdata',im,'EdgeColor','none');
axis equal;
method = 'naive';
%method = 'uniform';
%method = 'normal-deviation';
it = 0;
sam = 2000000;
while true
  switch method
  case 'naive'
    % uniformly random 3d point, each coord in [-1,1] and normalize
    N = normalizerow(2*rand(sam,3)-1);
  case 'normal-deviation'
    N = normalizerow(normrnd(zeros(sam,3),1));
  case 'uniform'
    % random polar angle and azimuth
    Z = rand(sam,1)*2-1;
    A = rand(sam,1)*2*pi;
    R =  sqrt(1-Z.^2);
    N = [Z R.*cos(A) R.*sin(A)];
  % project
  S = [(N(:,3)+1)/2 (atan2(N(:,2),N(:,1))+pi)/(2*pi)];
  % splat!
  S = round([mod(w*S(:,2),w-0.5) mod(w*S(:,1),w-0.5)]+1);
  im = im + full(1-sparse(S(:,2),S(:,1),1,w,w));
  it = it + 1;

This produces an incrementally improving image of a textured sphere where black means high probability and white means low probability:
naive sampling sphere matlab visualization

When this converges we can examine the bias around the sphere:
naive sampling sphere matlab visualization spin

The “converged” texture map looks like:
naive sampling sphere texture

Of course, a correct uniform sampling converges rather boringly to a uniformly black sphere.
uniform sampling sphere matlab visualization

In a different project we need not only a random sampling on the sphere, but also a Delaunay triangle mesh. Since all points lie on the sphere they also lie on their respective convex hull:

V = normalizerow(normrnd(zeros(sam,3),1));
F = convhulln(V);

The result looks something like this:
random delaunay mesh on sphere

Lightness vs luma vs grayscale vs value

Friday, May 17th, 2013

In image processing it’s typical to convert an RGB input image into a more meaningful colorspace. For example it’s often argued that distances in CIE L*a*b* colorspace are better correlated with perceived differences. Other times people simply use luma (aka luminance). I want to see how different these are. Here’s a small MATLAB script to generate and compare different grayscale images:

im = imread('hans-hass.jpg');
imd = im2double(im);
% NTSC luminance == Matlab RGB2GRAY gray 
% [Levin et al 2004]
ntsc = rgb2ntsc(im);
% L*a*b*  lightness
% ~[Lischinski et al 2006]
lab = im2double(applycform(im,makecform('srgb2lab')));
% HSV value
hsv = rgb2hsv(imd);
% simple rgb3gray
simple = imd(:,:,1)*0.3 + imd(:,:,2)*0.6 + imd(:,:,3)*0.1;
% Very naive rgb3gray
ignorant = sum(imd,3)/3;
% matrix of differences
row = [hsv(:,:,3) naive simple ntsc(:,:,1) lab(:,:,1)];
col = [hsv(:,:,3);naive;simple;ntsc(:,:,1);lab(:,:,1)];
diff = 0.5*kron(ones(5,1),row)+0.5*(1-kron(ones(1,5),col));
comp = [imd repmat(row,[1 1 3]);repmat([col diff],[1 1 3])];

This produces an array of comparisons:
lightness comparison of grayscale images

Some interesting things I noted were that the different implementations of luma are quite similar and they’re all not so far from lightness. HSV value (the maximum of RGB) not surprisingly is very different.

Correct sse flags for compiling against embree with gcc

Tuesday, May 14th, 2013

I had trouble compiling against the embree raytracing library. I first tried compiling a small test problem with gcc (g++-mp-4.7) and immediately got errors:

error: '_mm_abs_epi32' was not declared in this scope
error: '_mm_shuffle_epi8' was not declared in this scope
error: '_mm_abs_epi32' was not declared in this scope

I tried adding the gcc flag -msse to no avail. I tried -mavx I got tons of errors like:

/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:20:no such instruction: `vxorps %xmm0, %xmm0,%xmm0'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:21:no such instruction: `vmovaps %xmm0, __ZL17_mm_lookupmask_ps(%rip)'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:22:no such instruction: `vmovss LC0(%rip), %xmm0'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:23:no such instruction: `vmovaps %xmm0, 16+__ZL17_mm_lookupmask_ps(%rip)'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:24:no such instruction: `vmovaps LC1(%rip), %xmm0'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:25:no such instruction: `vmovaps %xmm0, 32+__ZL17_mm_lookupmask_ps(%rip)'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:26:no such instruction: `vmovaps LC2(%rip), %xmm0'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:27:no such instruction: `vmovaps %xmm0, 48+__ZL17_mm_lookupmask_ps(%rip)'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:28:no such instruction: `vmovaps LC3(%rip), %xmm0'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:29:no such instruction: `vmovaps %xmm0, 64+__ZL17_mm_lookupmask_ps(%rip)'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:30:no such instruction: `vmovaps LC4(%rip), %xmm0'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:31:no such instruction: `vmovaps %xmm0, 80+__ZL17_mm_lookupmask_ps(%rip)'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:32:no such instruction: `vmovaps LC5(%rip), %xmm0'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:33:no such instruction: `vmovaps %xmm0, 96+__ZL17_mm_lookupmask_ps(%rip)'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:34:no such instruction: `vmovaps LC6(%rip), %xmm0'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:35:no such instruction: `vmovaps %xmm0, 112+__ZL17_mm_lookupmask_ps(%rip)'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:36:no such instruction: `vmovaps LC7(%rip), %xmm0'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:37:no such instruction: `vmovaps %xmm0, 128+__ZL17_mm_lookupmask_ps(%rip)'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:38:no such instruction: `vmovaps LC8(%rip), %xmm0'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:39:no such instruction: `vmovaps %xmm0, 144+__ZL17_mm_lookupmask_ps(%rip)'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:40:no such instruction: `vmovaps LC9(%rip), %xmm0'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:41:no such instruction: `vmovaps %xmm0, 160+__ZL17_mm_lookupmask_ps(%rip)'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:42:no such instruction: `vmovaps LC10(%rip), %xmm0'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:43:no such instruction: `vmovaps %xmm0, 176+__ZL17_mm_lookupmask_ps(%rip)'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:44:no such instruction: `vmovaps LC11(%rip), %xmm0'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:45:no such instruction: `vmovaps %xmm0, 192+__ZL17_mm_lookupmask_ps(%rip)'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:46:no such instruction: `vmovaps LC12(%rip), %xmm0'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:47:no such instruction: `vmovaps %xmm0, 208+__ZL17_mm_lookupmask_ps(%rip)'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:48:no such instruction: `vmovaps LC13(%rip), %xmm0'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:49:no such instruction: `vmovaps %xmm0, 224+__ZL17_mm_lookupmask_ps(%rip)'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:50:no such instruction: `vmovaps LC14(%rip), %xmm0'
/var/folders/6g/l7c387896dz565996h9tz6d40000gn/T//ccZ5EWof.s:51:no such instruction: `vmovaps %xmm0, 240+__ZL17_mm_lookupmask_ps(%rip)'

Finally I looked into the commands embree’s makefile was issuing to compile it in the first place and found the correct flag: -msse4.2.