Monday, 14 September 2009

When does InnoDB do disk IO?

When does InnoDB do disk IO?: "This is a list of most of the sources of disk IO in InnoDB. Most of the statements below are true for official MySQL 5.0 and the InnoDB plugin prior to 1.0.4. Versions 1.0.4+ of the InnoDB plugin and XtraDB have made InnoDB much better for IO bound workloads and eventually I will write more about them.

  • synchronous page read - User threads read pages synchronously on buffer cache misses. They may also generate readahead (prefetch) requests handled by other threads. A page is taken from the free list to cache the contents of the disk page and the page is usually moved to the head of the LRU list.
  • trx_purge - This is called by srv_master_thread to remove rows that have been marked as deleted when the rows are no longer visible to other transactions. The work uses synchronous IO. Dmitri has written about the benefits of using a separate thread to keep up with IO bound delete-intensive workloads.
  • buf_flush_free_margin - This is called from user threads to move pages to the free list from the end of the buffer pool LRU list. Dirty and clean pages are moved. Dirty pages are first written back to the database file(s) by the background write thread(s). This can be the source of a lot of mutex contention.
  • log_preflush_pool_modified_pages - This is called from user threads prior to modifying rows. It causes all dirty pages that are too old (all rather than a fixed number) to be written back to the database file(s). User threads wait for the writes to complete when the old dirty pages are too old. Adaptive checkpoint in the 1.0.4 plugin and XtraDB reduce the intermittent delays caused by this. The pages to write are found from the end of the buffer pool flush list.
  • buf_flush_batch - This is called by srv_master_thread when there are too many dirty pages. It attempts to enforce innodb_max_dirty_pages_pct. Prior to 1.0.4 and XtraDB, it writes 100 dirty pages at a time and otherwise writes up to innodb_io_capacity pages at a time. The pages to write are found from the end of the buffer pool flush list. For write-intensive workloads, enforcement of innodb_max_dirty_pages_pct was unlikely prior to the 1.0.4 plugin and XtraDB.
  • ibuf_contract_for_n_pages - This is called by by srv_master_thread to apply deferred changes from the insert buffer to secondary index pages. It schedules up to 5 reads to be handled by the background read thread(s) and waits for the reads to complete. On read completion, a callback method applies the changes from the insert buffer entries to the pages making the pages dirty in the buffer pool. On write intensive workloads, the insert buffer will get full, consume half of the buffer pool and provide no performance benefit because the insert buffer is not used when full. This is not a good state as it is likely to remain full at that point until the server becomes idle and while full will continue to use half of the pages in the buffer pool.
  • doublewrite buffer - When the InnoDB doublewrite buffer is enabled, dirty pages are first written to the double write buffer using sequential IO to write the N dirty pages with one large IO request. After that write has been forced to disk, the pages are updated in place in the database file(s). While each page is written twice, this does not cost 2 random IOs per page as the write for the doublewrite buffer uses one large sequential IO for many pages.
  • transaction log - InnoDB uses a write-ahead log file to provide ACID semantics.
  • buf_read_ahead_linear, buf_read_ahead_random - These generate readahead (prefetch) requests to be handled by the background read thread(s). The requests are generated by user threads. The SHOW STATUS variables Innodb_buffer_pool_read_ahead_rnd and Innodb_buffer_pool_read_ahead_seq count the number of calls to these functions. As each call can generate 0 or more readahead requests, I am skeptical that these counters are useful in official MySQL.

Note that when InnoDB schedules dirty pages to be written back to the database files, neighbor pages may also be scheduled to be written.

InnoDB maintains 3 lists for the buffer pool. The free list has pages that can be used immediately to cache the result of a database file read. The LRU list stores pages in order of usage. A page is usually moved to the front of the LRU each time it is used. Because of this, a scan of a large table can wipe out the buffer pool. Fortunately, a fix for bug 45015 should fix this in the InnoDB 1.0.5 plugin. The flush list maintains pages in order of the LSN for the oldest change to the page.
PlanetMySQL Voting:
Vote UP /
Vote DOWN"

No comments:

Sike's shared items