Well, I couldn’t resist playing around. I created a Matlab mex C file called pdistc
that implements pairwise Euclidean distance for single and double precision. On my machine using Matlab R2012b and R2015a it’s 20–25% faster than pdist
(and the underlying pdistmex
helper function) for large inputs (e.g., 60,000-by-300).
As has been pointed out, this problem is fundamentally bounded by memory and you’re asking for a lot of it. My mex C code uses minimal memory beyond that needed for the output. In comparing its memory usage to that of pdist
, it looks like the two are virtually the same. In other words, pdist
is not using lots of extra memory. Your memory problem is likely in the memory used up before calling pdist
(can you use clear
to remove any large arrays?) or simply because you’re trying to solve a big computational problem on tiny hardware.
So, my pdistc
function likely won’t be able to save you memory overall, but you may be able to use another feature I built in. You can calculate chunks of your overall pairwise distance vector. Something like this:
m = 6e3;
n = 3e2;
X = rand(m,n);
sz = m*(m-1)/2;
for i = 1:m:sz-m
D = pdistc(X', i, i+m); % mex C function, X is transposed relative to pdist
... % Process chunk of pairwise distances
end
This is considerably slower (10 times or so) and this part of my C code is not optimized well, but it will allow much less memory use – assuming that you don’t need the entire array at one time. Note that you could do the same thing much more efficiently with pdist
(or pdistc
) by creating a loop where you passed in subsets of X
directly, rather than all of it.
If you have a 64-bit Intel Mac, you won’t need to compile as I’ve included the .mexmaci64
binary, but otherwise you’ll need to figure out how to compile the code for your machine. I can’t help you with that. It’s possible that you may not be able to get it to compile or that there will be compatibility issues that you’ll need to solve by editing the code yourself. It’s also possible that there are bugs and the code will crash Matlab. Also, note that you may get slightly different outputs relative to pdist
with differences between the two in the range of machine epsilon (eps
). pdist
may or may not do fancy things to avoid overflows for large inputs and other numeric issues, but be aware that my code does not.
Additionally, I created a simple pure Matlab implementation. It is massively slower than the mex code, but still faster than a naïve implementation or the code found in pdist
.
All of the files can be found here. The ZIP archive includes all of the files. It’s BSD licensed. Feel free to optimize (I tried BLAS calls and OpenMP in the C code to no avail – maybe some pointer magic or GPU/OpenCL could further speed it up). I hope that it can be helpful to you or someone else.