Skip to content
  • Andrew Morton's avatar
    [PATCH] io-accounting: core statistics · 7c3ab738
    Andrew Morton authored
    
    
    The present per-task IO accounting isn't very useful.  It simply counts the
    number of bytes passed into read() and write().  So if a process reads 1MB
    from an already-cached file, it is accused of having performed 1MB of I/O,
    which is wrong.
    
    (David Wright had some comments on the applicability of the present logical IO accounting:
    
      For billing purposes it is useless but for workload analysis it is very
      useful
    
      read_bytes/read_calls  average read request size
      write_bytes/write_calls average write request size
    
      read_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing
      write_bytes/write_blocks  ie logical/physical  guess since pdflush writes can
                                                    be missed
    
      I often look for logical larger than physical to see filesystem cache
      problems.  And the bytes/cpusec can help find applications that are
      dominating the cache and causing slow interactive response from page cache
      contention.
    
      I want to find the IO intensive applications and make sure they are doing
      efficient IO.  Thus the acctcms(sysV) or csacms command would give the high
      IO commands).
    
    This patchset adds new accounting which tries to be more accurate.  We account
    for three things:
    
    reads:
    
      attempt to count the number of bytes which this process really did cause
      to be fetched from the storage layer.  Done at the submit_bio() level, so it
      is accurate for block-backed filesystems.  I also attempt to wire up NFS and
      CIFS.
    
    writes:
    
      attempt to count the number of bytes which this process caused to be sent
      to the storage layer.  This is done at page-dirtying time.
    
      The big inaccuracy here is truncate.  If a process writes 1MB to a file
      and then deletes the file, it will in fact perform no writeout.  But it will
      have been accounted as having caused 1MB of write.
    
      So...
    
    cancelled_writes:
    
      account the number of bytes which this process caused to not happen, by
      truncating pagecache.
    
      We _could_ just subtract this from the process's `write' accounting.  But
      that means that some processes would be reported to have done negative
      amounts of write IO, which is silly.
    
      So we just report the raw number and punt this decision up to userspace.
    
    Now, we _could_ account for writes at the physical I/O level.  But
    
    - This would require that we track memory-dirtying tasks at the per-page
      level (would require a new pointer in struct page).
    
    - It would mean that IO statistics for a process are usually only available
      long after that process has exitted.  Which means that we probably cannot
      communicate this info via taskstats.
    
    This patch:
    
    Wire up the kernel-private data structures and the accessor functions to
    manipulate them.
    
    Cc: Jay Lan <jlan@sgi.com>
    Cc: Shailabh Nagar <nagar@watson.ibm.com>
    Cc: Balbir Singh <balbir@in.ibm.com>
    Cc: Chris Sturtivant <csturtiv@sgi.com>
    Cc: Tony Ernst <tee@sgi.com>
    Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
    Cc: David Wright <daw@sgi.com>
    Signed-off-by: default avatarAndrew Morton <akpm@osdl.org>
    Signed-off-by: default avatarLinus Torvalds <torvalds@osdl.org>
    7c3ab738