TheMmapedFileLockupProblem

These two secenairos were described by Colin and Brian in a tech meeing 2008-11-28.

Both involve page faulting involved in handling memory mapped files.

Thread bob1 in user process Bob, opens a file on resmgr nfs
Thread bob2 in user process Bob, opens a file on resmgr io_pkt
bob1 mmaps its nfs fd, which causes
proc to run and lock process Bob's aspace
bob1 then reads from its fd which faults
proc runs to handle bob1 fault
meanwhile, bob2 does a read on io-pkt
bob2's read causes io-pkt to run but it faults on bob2's buffer, so enters page wait
note that io-pkt cannot exit page wait until Bob's address space becomes unlocked
remember bob1? bob1 had faulted reading it's memory mapped nfs file, a proc thread handles the fault by reading nfs
nfs happens to read io_pkt, which causes it to send to io_pkt
but io_pkt is single threaded, so nfs goes send-blocked
note proc wil not release the aspace lock on BOB until io_pkt replies to nfs and returns to proc
io_pkt's single thread is in page wait for bob2, which cannot be processed because Bob's aspace is locked
bob's your uncle

See the whiteboard snapshot mmap_lockup_1.gif, attacched.

user thread Bob mmaps an already open fd: ptr = mmap(...,fd)
user thread Bob calls read() on the same fd, using the same ptr as the output buffer: read(fd, ..., ptr)
read() is the resmgr library locks the tructure of the file, as part of normal resmgr handling.
this is the first read, so the memory access to the file faults
a proc thread runs to handle the fault and issues a read on the same fd
proc thread calls read(), in the resmgr library, which also tires to lock the same attribute
the proc thread hangs forever

See the whiteboard snapshot mmap_lockup_2.gif attached.

So far, we have only a brief list (Feel free to flesh these out)

Judiciously release the lock: may work for Colin's secnario
require resmgrs to have at least one thread for every mmaped file: requiring behavior changes on usercode is difficult to manage
map the whole file into memory on open: works but inefficeint.
clever deadlock detection.
use multiple finer granularity locks instead of a single resmgr attr clock, and a single aspace lock.