All pairs distances, matlab

Alec Jacobson

November 10, 2011

weblog/

Often I like to get list of distances between all pairs of points in two sets. That is, if I have a set of points V and another set of points U, I'd like to build D, a #V by #U matrix of distance, where D(i,j) gives me the distance from point V(i) to point U(i). For the sake of timings, construct some V and U values using:
V = rand(5000,3);
U = rand(5000,3);

Two for loops

You can achieve the all pairs distance computation between V and U in the most naive way using two for loops (like you would probably do it in C++):
  % Super-slow for loops method
  D = zeros(size(V,1),size(U,1));
  for i = 1:size(V,1)
    for j = 1:size(U,1)
      D(i,j) = sqrt(sum((V(i,:)-U(j,:)).^2,2));
    end
  end

Two repmats

You can also do this using two repmats in a one-liner, but this gets slow if you hit a memory wall:
  % Slow, especially if hitting memory paging
  D = ...
    squeeze(sqrt(sum((repmat(V,[1 1 size(U,1)])- ...
    permute(repmat(U,[1 1 size(V,1)]),[3 2 1])).^2,2)));

Single for loop, single repmat

You can try to alleviate this by using a single for loop and inside the loop a single repmat. This is faster, but will probably hit a memory wall later on, too:
  % slow single for loop, single repmat method, handles memory paging better
  D = zeros(size(V,1),size(U,1));
  for i = 1:size(V,1)
    D(i,:) = ...
      sqrt(sum((repmat(V(i,:),[size(U,1) 1]) - U).^2,2))';
  end

Single for loop, single bsxfun

The matlab function bsxfun will do the repeat virtually, so we can replace the minus operation inside the loop, giving us a pretty good speed up:
  % Fast single for loop, single bsxfun method, but matlab can handle both for
  % loops in bsxfun
  D = zeros(size(V,1),size(U,1));
  for i = 1:size(V,1)
    D(i,:) = ...
      sqrt(sum(bsxfun(@minus,V(i,:),U).^2,2));
  end

Single bsxfun

But there's no reason for the outer for loop either, bsxfun is powerful enough to virtually repeat both list of points, resulting in a very fast, very concise, one-liner:
  % Fastest
  D = squeeze(sqrt(sum(bsxfun(@minus,V,permute(U,[3 2 1])).^2,2)));
Method5000 points each50000 points each
Two for loops116.730218 secondsNA
Two repmats2.294543 seconds> 900 seconds
Single for loop, single repmat1.064348 seconds20.843250 seconds
Single for loop, single bsxfun0.801842 seconds15.608364 seconds
Single bsxfun0.617075 seconds131.936023 seconds
Note: Contrary to what I thought would happen, the single bsxfun does hit a memory wall for very large input, so the single for loop, single bsxfun/repmat succeed here (see second column). It would be nice to have an automatic way of determining which code to run based on the machine's memory.