Conda already does this. However, because it leverages hardlinks, it is easy to overestimate the space really being used, especially if one only looks at the size of a single env at a time.
To illustrate the case, let’s use du
to inspect the real disk usage. First, if I count each environment directory individually, I get the uncorrected per env usage
$ for d in envs/*; do du -sh $d; done
2.4G envs/pymc36
1.7G envs/pymc3_27
1.4G envs/r-keras
1.7G envs/stan
1.2G envs/velocyto
which is what it might look like from a GUI.
Instead, if I let du
count them together (i.e., correcting for the hardlinks), we get
$ du -sh envs/*
2.4G envs/pymc36
326M envs/pymc3_27
820M envs/r-keras
927M envs/stan
548M envs/velocyto
One can see that a significant amount of space is already being saved here.
Most of the hardlinks go back to the pkgs
directory, so if we include that as well:
$ du -sh pkgs envs/*
8.2G pkgs
400M envs/pymc36
116M envs/pymc3_27
92M envs/r-keras
62M envs/stan
162M envs/velocyto
one can see that outside of the shared packages, the envs are fairly light. If you’re concerned about the size of my pkgs
, note that I have never run conda clean
on this system, so my pkgs
directory is full of tarballs and superseded packages, plus some infrastructure I keep in base (e.g., Jupyter, Git, etc).