sea's blog → Algebra, Lisp, and miscellaneous thoughts

Table of Contents

Watch out when using mmap(2)

It's probably happened to you before. You're running some simulation program on your GNU/linux system, and suddenly, you realize there was an error in your code; it would allocate far too much memory.

You foresaw issues like this and thus disabled swap. Modern systems don't really need swap space, anyway, unless you've got an SSD in which case it's still fast enough to be useful. However, even with swap disabled, you see kswapd in your process list consuming CPU cycles, and the whole system grinds to a halt. How? Why?

As it turns out, kswapd doesn't just work on swap space. It also swaps out mmap'd pages. Now, mmap is one of the best tools available to a linux programmer because it allows you to operate on large files with essentially transparent caching, and in the case where your accesses have great locality, you'll get massive speedups plus the security of fitting huge datasets into memory even if you don't have enough, knowing that it will only ever swap in the pieces you're currently viewing.

The fact that mmap is so awesome means that it's in use everywhere. That, of course, is a double edged sword. If your system is under memory pressure, (ie. some program is using up almost all of the available memory) then kswapd will come into play and begin paging out those mmapd regions. The problem, though, is that those mmap'd pages are, by design, for processes making active use of those files!

Thus, every mmap'd page that gets swapped out will be needed again, and soon. Your system grinds to a halt, attempting to swap the pages in and out endlessly. Even with swap space entirely disabled, your system will thrash purely because of the popularity of mmap.

Checkmate. Linux under memory pressure has been beaten by its own efficient system features. The only real solution is to use ulimit to set hard memory limits, and prevent your rogue simulator from exerting memory pressure in the first place.

At the time I've written this, there hasn't been any real progress toward a solution, ie. the ability to mark individual processes as 'unswappable no matter what', which would allow you to mark the GUI-related processes and, even under intense load, retain some control over your system; just enough to summon a terminal and issue the kill commands for your rogue processes.

As it stands now, once the thrashing begins you can't even summon the OOM-killer with the sysrq key. You can, but it will take minutes, perhaps hours, before the kernel is able to rise from its stupor to honor the OOM request.