Skip to content
  • Brian Norris's avatar
    mfd: cros_ec: Retry commands when EC is known to be busy · 11799564
    Brian Norris authored
    Commit 001dde94 ("mfd: cros ec: spi: Fix "in progress" error
    signaling") pointed out some bad code, but its analysis and conclusion
    was not 100% correct.
    
    It *is* correct that we should not propagate result==EC_RES_IN_PROGRESS
    for transport errors, because this has a special meaning -- that we
    should follow up with EC_CMD_GET_COMMS_STATUS until the EC is no longer
    busy. This is definitely the wrong thing for many commands, because
    among other problems, EC_CMD_GET_COMMS_STATUS doesn't actually retrieve
    any RX data from the EC, so commands that expected some data back will
    instead start processing junk.
    
    For such commands, the right answer is to either propagate the error
    (and return that error to the caller) or resend the original command
    (*not* EC_CMD_GET_COMMS_STATUS).
    
    Unfortunately, commit 001dde94 forgets a crucial point: that for
    some long-running operations, the EC physically cannot respond to
    commands any more. For example, with EC_CMD_FLASH_ERASE, the EC may be
    re-flashing its own code regions, so it can't respond to SPI interrupts.
    Instead, the EC prepares us ahead of time for being busy for a "long"
    time, and fills its hardware buffer with EC_SPI_PAST_END. Thus, we
    expect to see several "transport" errors (or, messages filled with
    EC_SPI_PAST_END). So we should really translate that to a retryable
    error (-EAGAIN) and continue sending EC_CMD_GET_COMMS_STATUS until we
    get a ready status.
    
    IOW, it is actually important to treat some of these "junk" values as
    retryable errors.
    
    Together with commit 001dde94, this resolves bugs like the
    following:
    
    1. EC_CMD_FLASH_ERASE now works again (with commit 001dde94, we
       would abort the first time we saw EC_SPI_PAST_END)
    2. Before commit 001dde94, transport errors (e.g.,
       EC_SPI_RX_BAD_DATA) seen in other commands (e.g.,
       EC_CMD_RTC_GET_VALUE) used to yield junk data in the RX buffer; they
       will now yield -EAGAIN return values, and tools like 'hwclock' will
       simply fail instead of retrieving and re-programming undefined time
       values
    
    Fixes: 001dde94
    
     ("mfd: cros ec: spi: Fix "in progress" error signaling")
    Signed-off-by: default avatarBrian Norris <briannorris@chromium.org>
    Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
    11799564