Nov 30, 2006: Putting Disk Spatial Locality Information on the OS Map to Speedup Disk Accesses

Dr. Xiaodong Zhang, The Ohio State University

Abstract



With the rapid advancement of processor and networking technology, and with the falling price of memory and disks, computing resources of CPU cycles, available bandwidths at different levels of inter- and external connections (for memory, I/O, and Internet), and large capacity of memory and disks are increasingly plentiful to many data- intensive applications. Unfortunately, the improvement of data access latency, particularly, the access latency to disks, has significantly lagged behind. The performance bottleneck of "memory wall" has been shifted to "disk wall" that is a serious bottleneck for many applications. One reason for this is that the operating system is only aware of data access patterns in terms of temporal localities, but is not aware of the physical data layouts in disks. Under this system structure, the operating system has a limited ability to exploit the spatial locality of disk data accesses that has a very high performance potential --- for the same amount of data, sequential accesses are several orders of magnitude faster than random accesses in disks.

To address the concerns of "disk wall," we are building a system infrastructure called DiskSeen, which puts the disk layout information on the OS map. With DiskSeen, we are able to exploit dual localities (both temporal and spatial), simplified as DULO. Specifically, we present two new buffer management techniques: DULO- Caching and DULO-Prefetching. DULO-Caching can effectively hold frequently used random accessed data in buffer cache to avoid slow disk accesses, but timely replace sequentially but not very frequently accessed data to take advantage of fast sequential disk accesses. DULO-Prefetching can adaptively preload sequentially stored data blocks in disks that may belong to multiple files to buffer caches, and significantly improve the prefetching efficiency. This talk will present some preliminary results of a sytstem prototype in Linux Kernel 2.6.11 and its performance evaluation by various data- intensive workloads. We show the effectiveness and low overheads of Disk_seen in a practical system environment.

This is a collaborative research project with Song Jiang (Wayne State University), Xiaoning Ding and Feng Chen (Ohio State University), and Kei Davis (Los Alamos National Lab).

Bio


Xiaodong Zhang is the Robert M. Critchfield Professor in Engineering, and Chairman of the Department of Computer Science and Engineering at the Ohio State University.

His research interests cover a wide spectrum in the areas of high performance and distributed systems. Several technical innovations and research results from his team have been adopted or being developed in commercial products and open source systems with direct impact to our daily computing operations, including the permutation memory interleaving technique first in the Sun MicroSystems' UltraSPARC IIIi processor and then in the Sun's dual-core Gemini Processor, the token thrashing protection mechanism and the Clock-Pro page replacement algorithm for memory management in the Linux Kernel.

Xiaodong Zhang was the Director of Advanced Computational Research Program at the National Science Foundation, 2001-2004. He is the associate Editor-in-Chief of IEEE Transactions on Parallel and Distributed Systems, and is also serving on the Editorial Boards of IEEE Transactions on Computers, IEEE Micro, and Journal of Parallel and Distributed Computing. He received his Ph.D. in Computer Science from University of Colorado at Boulder.