to hold the malloc lock across mmap syscalls in all cases. dropping it
allows another thread to access the existing chunk cache if necessary.
could be improved to be a bit more aggressive, but i've been testing this
simple diff for some time now with good results.