wiki3046: PhysicalMemoryFragmentation (Version 3)

Physical Memory Defragmentation#

Overview#

This feature comes from the problem represented by PR16405. As time passes, our physical memory starts to get fragmented. Eventually, even though there might be a significant amount of memory free in total, it is fragmented so that a request for a large piece of contiguous memory will fail.

Contiguous memory is often required for device drivers where the device uses DMA. The normal work-around is to ensure that all device drivers initialize early (before memory is fragmented) and hold onto their memory. This is a harsh restriction, particularly for embedded systems that might want to use different drivers depending on the actions of the user -- starting all possible device drivers simultaneously may not be feasible.

We have done work on the physical memory allocation algorithms in the 6.4.0 release which significantly reduces the amount of fragmentation that occurs. However, no matter how good your algorithms might be, certain usage patterns can lead to fragmentation.

The idea behind this feature is to defragment physical memory as necessary to support requests for contiguous memory.

Feature Description#

No matter how smart our memory allocation algorithms, specific application behaviour can result in fragmented memory. That is, while there may be a significant amount of memory free, it is broken up into small pieces separated by memory that is still in use by an application. Consider a completely degenerate application that allocates 4K blocks of memory until system memory is exhausted and then frees up every second of these blocks. At this point, half of the system memory is still available, but the largest contiguous block is only 4K large.

Thus, in order to satisfy a request for contiguous memory, it may be necessary to consolidate the free memory into contiguous runs -- that is, to defragment the available free memory.

When an application allocates memory, it is provided by the operating system in pages (a page is a 4K block of memory that exists on a 4K boundary). The operating system programs the MMU so that the application can reference the physical block of memory through a virtual address -- during operation the MMU translates a virtual address into a physical address. For example, a request for 16K of memory will be satisfied by four 4K block. The operating system will set aside the four physical blocks for the application and configure the MMU to ensure that the application can reference them through a 16K contiguous virtual address. However, these blocks may not be physically contiguous -- the operating system can arrange the MMU configuration (the virtual to physical mapping) so that non-contiguous physical addresses are accessed through contiguous virtual addresses.

The task of defragmentation consists of changing existing memory allocations and mappings to use different underlying physical pages. By swapping around the underlying physical pages, we can consolidate the fragmented free blocks into contiguous runs.

Defragmentation will be done, if necessary, when an application allocates a piece of contiguous memory. The application does this through the mmap() call, providing a MAP_PHYS flag (or through calling mmap_device_memory() which just turns around and calls mmap() specifying MAP_PHYS). Currently, if it is not possible to satisfy a MAP_PHYS allocation with contiguous memory we fail the mmap() call. Instead, with this feature, we will trigger a memory defragmentation algorithm that attempts to rearrange memory mappings across the system in order to allow the MAP_PHYS allocation to be satisfied.

During the memory defragmentation, the thread calling mmap() will be blocked. It is acceptable that the memory defragmentation take a significant amount of time. It is not acceptable to cause significant delay in other system tasks or to significantly impact thread or interrupt latency. The defragmentation algorithm must not change virtual-to-physical mappings where it is not safe, and should avoid breaking up large page mappings if possible.

Future Considerations#

We have discussed several alternatives and enhancements to the basic idea of defragmenting memory. In the end, we concluded that defragmenting memory on the mmap call is always necessary, and that while additional features might improve the system, they are not necessary.

Ideas discussed included modifying our memory allocation algorithm so that by preference we use memory fragments to support a memory allocation request. This would prevent memory becoming fragmented as quickly as it does. We would need to model the algorithm to get a better idea of how it might behave. However, it seems clear that this sort of an algorithm change would be favourable for certain systems (where defragment-on-mmap happens frequently) at the expense of making some applications run more slowly since they would not get as much benefit from large page sizes.

Another idea would be to defragment memory in the background, using the idle task. When the system is idle, we could run an algorithm that attempts to consolidate free memory into contiguous runs. We have talked about using the idle task for a number of different tasks in the past. In most cases its simply a lack of resources that has prevented us from implementing the ideas. The notion of defragmenting memory while at idle is likely to follow the same path -- the importance and benefit of idle-time defragmenting is arguable, and we have many higher priorities.

Feature Design#

It is not always safe to change memory mappings. It can be safe because of the nature of virtual memory systems -- the application usually does not know or care what physical memory blocks back the virtual memory addresses it is given when it allocates memory. However, for some applications, the physical address of its underlying memory is important -- case in point are the very DMA-based device drivers that this feature is intended to assist: their DMA frames cannot be moved in the physical address space without breaking the application.

Further, it is not always desirable to change memory mappings. Neutrino provides "large page" support on most hardware platforms. If an application has a large contiguous memory allocation the MMU can be programmed to use a large page representation. This is more efficient than multiple small pages. So, even if an application doesn't require physically contiguous memory, having physically contiguous memory can improve efficiency. If our memory defragmentation algorithm breaks up a large page, the application performance will suffer.

Thus our defragmentation algorithm must take into account that not all memory mappings can be changed, and that even where its possible to change a mapping, it may not be desirable.