Skip to content
  • Markus Armbruster's avatar
    json: Reject invalid UTF-8 sequences · e59f39d4
    Markus Armbruster authored
    
    
    We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1,
    \xF5..\xFF in the lexer.  That's insufficient; there's plenty of
    invalid UTF-8 not containing these bytes, as demonstrated by
    check-qjson:
    
    * Malformed sequences
    
      - Unexpected continuation bytes
    
      - Missing continuation bytes after start bytes other than
        \xC0..\xC1, \xF5..\xFD.
    
    * Overlong sequences with start bytes other than \xC0..\xC1,
      \xF5..\xFD.
    
    * Invalid code points
    
    Fixing this in the lexer would be bothersome.  Fixing it in the parser
    is straightforward, so do that.
    
    Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
    Reviewed-by: default avatarEric Blake <eblake@redhat.com>
    Message-Id: <20180823164025.12553-23-armbru@redhat.com>
    e59f39d4