Skip to content
  • Paul Burton's avatar
    mtd: driver _read() returns max_bitflips; mtd_read() returns -EUCLEAN · 40462e54
    Paul Burton authored
    
    
    Linux modified the MTD driver interface in commit edbc4540 (with the
    same name as this commit). The effect is that calls to mtd_read will
    not return -EUCLEAN if the number of ECC-corrected bit errors is below
    a certain threshold, which defaults to the strength of the ECC. This
    allows -EUCLEAN to stop indicating "some bits were corrected" and begin
    indicating "a large number of bits were corrected, the data held in
    this region of flash may be lost soon". UBI makes use of this and when
    -EUCLEAN is returned from mtd_read it will move data to another block
    of flash. Without adopting this interface change UBI on U-boot attempts
    to move data between blocks every time a single bit is corrected using
    the ECC, which is a very common occurance on some devices.
    
    For some devices where bit errors are common enough, UBI can get stuck
    constantly moving data around because each block it attempts to use has
    a single bit error. This condition is hit when wear_leveling_worker
    attempts to move data from one PEB to another in response to an
    -EUCLEAN/UBI_IO_BITFLIPS error. When this happens ubi_eba_copy_leb is
    called to perform the data copy, and after the data is written it is
    read back to check its validity. If that read returns UBI_IO_BITFLIPS
    (in response to an MTD -EUCLEAN) then ubi_eba_copy_leb returns 1 to
    wear_leveling worker, which then proceeds to schedule the destination
    PEB for erasure. This leads to erase_worker running on the PEB, and
    following a successful erase wear_leveling_worker is called which
    begins this whole cycle all over again. The end result is that (without
    UBI debug output enabled) the boot appears to simply hang whilst in
    reality U-boot busily works away at destroying a block of the NAND
    flash. Debug output from this situation:
    
      UBI DBG: ensure_wear_leveling: schedule scrubbing
      UBI DBG: wear_leveling_worker: scrub PEB 1027 to PEB 4083
      UBI DBG: ubi_io_read_vid_hdr: read VID header from PEB 1027
      UBI DBG: ubi_io_read: read 4096 bytes from PEB 1027:4096
      UBI DBG: ubi_eba_copy_leb: copy LEB 0:0, PEB 1027 to PEB 4083
      UBI DBG: ubi_eba_copy_leb: read 1040384 bytes of data
      UBI DBG: ubi_io_read: read 1040384 bytes from PEB 1027:8192
      UBI: fixable bit-flip detected at PEB 1027
      UBI DBG: ubi_io_write_vid_hdr: write VID header to PEB 4083
      UBI DBG: ubi_io_write: write 4096 bytes to PEB 4083:4096
      UBI DBG: ubi_io_read_vid_hdr: read VID header from PEB 4083
      UBI DBG: ubi_io_read: read 4096 bytes from PEB 4083:4096
      UBI DBG: ubi_io_write: write 4096 bytes to PEB 4083:8192
      UBI DBG: ubi_io_read: read 4096 bytes from PEB 4083:8192
      UBI: fixable bit-flip detected at PEB 4083
      UBI DBG: schedule_erase: schedule erasure of PEB 4083, EC 55, torture 0
      UBI DBG: erase_worker: erase PEB 4083 EC 55
      UBI DBG: sync_erase: erase PEB 4083, old EC 55
      UBI DBG: do_sync_erase: erase PEB 4083
      UBI DBG: sync_erase: erased PEB 4083, new EC 56
      UBI DBG: ubi_io_write_ec_hdr: write EC header to PEB 4083
      UBI DBG: ubi_io_write: write 4096 bytes to PEB 4083:0
      UBI DBG: ensure_wear_leveling: schedule scrubbing
      UBI DBG: wear_leveling_worker: scrub PEB 1027 to PEB 4083
      ...
    
    This patch adopts the interface change as in Linux commit edbc4540 in
    order to avoid such situations. Given that none of the drivers under
    drivers/mtd return -EUCLEAN, this should only affect those using
    software ECC. I have tested that it works on a board which is
    currently out of tree, but which I hope to be able to begin
    upstreaming soon.
    
    Signed-off-by: default avatarPaul Burton <paul.burton@imgtec.com>
    Acked-by: default avatarStefan Roese <sr@denx.de>
    40462e54