Cache matlab function results based on input parameters, using md5

Alec Jacobson

May 09, 2011

weblog/

I have a couple matlab functions that take a long time to compute. Sometimes I remember to save the results, sometimes I forget. Here is the skeleton of a function that automatically cache's every input state and output state it sees. That way if the function is ever called again with the same input, the result is directly loaded from file rather than recomputing the result. It's written very generically so you should be able to just past in your existing function. I save the following in cache_test.m:

function C = cache_test(A,B)
  % CACHE_TEST Dummy program that computes C = A+B, but demonstrates how to use
  % md5 caching of function results on input parameters
  %
  % C = cache_test(A,B)
  %

  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  % Check for cached result, do NOT edit variables until cache is checked,
  % your function code comes later. See below
  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  % get a list of current variables in this scope, this is the input "state"
  variables = who;
  % get a temporary file's name
  tmpf = [tempname('.') '.mat'];
  % save the "state" to file, so we can get a md5 checksum
  save(tmpf,'-regexp',sprintf('^%s$|',variables{:}));
  % get md5 checksum on input "state", we append .cache.mat to the check sum
  % because we'll use the checksum as the cache file name
  [s,cache_name] = system(['tail -c +117 ' tmpf ...
    ' | md5 -r | awk ''{printf "."$1".cache.mat"}''']);
  % clean up
  delete(tmpf);
  clear s tmpf variables;

  % If the checksum cache file exists then we've seen this input "state"
  % before, load in cached output "state"
  if(exist(cache_name,'file'))
    fprintf('Cache exists. Using cache...\n');
    % use cache
    load(cache_name);
  % Otherwise this is the first time we've seen this input "state", so we
  % execute the function as usual and save the output "state" to the cache 
  else
    fprintf('First time. Creating cache...\n');

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % Your function code goes here
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    C = A+B;

    % get list of variables present in this scope at finish of function code,
    % this is the output "state"
    variables = who;
    % save output "state" to file, using md5 checksum cache file name
    save(cache_name,'-regexp',sprintf('^%s$|',variables{:}));
  end
end

If I call:

cache_test(5,4)

I see

First time. Creating cache...

ans =

     9

And if I call it again, I see:

Cache exists. Using cache...

ans =

     9

Update: Weirdly this doesn't seem to support Logical variables or cells

Update: I'm finally taking the advice below and saving to binary. For this you need to chop off the first 166 bytes. So you need tail installed. This supports all variable types, especially it supports cells including varargin. I've updated the code above. But now I recommend checking out the utility/find_cache.m and utility/create_cache.m functions in my gptoolbox. These allow caching with minimal change to your code. Suppose you have an expensive function saved in expensive_function.m looking like:

function X = expensive_function(varargin)
  ...  % DOING SOMETHING THAT TAKES A LONG TIME AND
  ...  % DOES NOT CHANGE FOR IDENTICAL INPUT
end

Then you can cache-ify this file by adding a little bit of code at the very beginning and one line at the very end:

function X = expensive_function(varargin)
  [cache_exists,cache_name] = find_cache();
  if cache_exists
    fprintf('Using expensive_function cache...\n');
    load(cache_name);
    return;
  end
  fprintf('Calling expensive_function...\n');

  ...  % DOING SOMETHING THAT TAKES A LONG TIME AND
  ...  % DOES NOT CHANGE FOR IDENTICAL INPUT

  create_cache(cache_name);
end

Warning: this will not work if expensive_function sets varargout rather than using named output variables. Probably you just need some logic to separate loading, computing and setting varargout and it would.