Nix memory usage


Nix is a great model with a fundamental abstraction that give is a lot of power and reproducibility. However, this abstraction requires the evaluation of a (simple) functional programming langauge, which can use up more resources than say traditional package management systems.

BioNix sits on top of Nix and allows bioinformatics pipelines to be easily specified. This means it inherits the weaknesses of the underlying Nix platform, notably the memory usage incurred by evaluating a functional language. This is noticable in one of my projects with a complicated pipeline applied to hundreds of samples: each sample takes multiple gigabytes to instantiate, and an expression involving the whole cohort can take up to 30GiB (!) due to the use of multiple nixpkgs trees.

I currently don't have a good solution for this, however since the same workflow is being applied to multiple samples there should be a great deal of redundancy. It's possible that perhaps general memory page deduplication algorithms could help here, like UKSM, which is very effective at reducing memory usage for virtual machines.

UKSM