ROMIO Optimizations

ROMIO implements two I/O optimization techniques that in general result in improved performance for applications. The first of these is data sieving [2]. Data sieving is a technique for efficiently accessing noncontiguous regions of data in files when noncontiguous accesses are not provided as a file system primitive. The naive approach to accessing noncontiguous regions is to use a separate I/O call for each contiguous region in the file. This results in a large number of I/O operations, each of which is often for a very small amount of data. The added network cost of performing an I/O operation across the network, as in parallel I/O systems, is often high because of latency. Thus, this naive approach typically performs very poorly because of the overhead of multiple operations. In the data sieving technique, a number of noncontiguous regions are accessed by reading a block of data containing all of the regions, including the unwanted data between them (called ``holes''). The regions of interest are then extracted from this large block by the client. This technique has the advantage of a single I/O call, but additional data is read from the disk and passed across the network.

There are four hints that can be used to control the application of data sieving in ROMIO: ind_rd_buffer_size, ind_wr_buffer_size, romio_ds_read, and romio_ds_write. These are discussed in Section 3.2.

The second optimization is two-phase I/O [1]. Two-phase I/O, also called collective buffering, is an optimization that only applies to collective I/O operations. In two-phase I/O, the collection of independent I/O operations that make up the collective operation are analyzed to determine what data regions must be transferred (read or written). These regions are then split up amongst a set of aggregator processes that will actually interact with the file system. In the case of a read, these aggregators first read their regions from disk and redistribute the data to the final locations, while in the case of a write, data is first collected from the processes before being written to disk by the aggregators.

There are five hints that can be used to control the application of two-phase I/O: cb_config_list, cb_nodes, cb_buffer_size, romio_cb_read, and romio_cb_write. These are discussed in Subsection 3.2.

Rob Latham 2016-08-01