Home > Terms > English, UK (UE) > Maximal subpart of an ill-formed subsequence

Maximal subpart of an ill-formed subsequence

The longest code unit subsequence starting at an unconvertible offset that is either: a. the initial subsequence of a well-formed code unit sequence, or b. a subsequence of length one.

  • The term maximal subpart of an ill-formed subsequence can be abbreviated to maximal subpart when it is clear in context that the subsequence in question is ill-formed.
  • This definition can be trivially applied to the UTF-32 or UTF-16 encoding forms, but is primarily of interest when converting UTF-8 strings.
  • For example, in the ill-formed UTF-8 sequence <41 C0 AF 41 F4 80 80 41>, there are two ill-formed subsequences: and , each separated by <41>, which is well-formed. Applying the definition of maximal subparts for these ill-formed subsequences, in the first case is a maximal subpart, because that byte value can never be the first byte of a well-formed UTF-8 sequence. In the second subsequence, is a maximal subpart, because up to that point all three bytes match the specification for UTF-8. It is only when followed by <41> that the sequence of can be determined to be ill-formed, because the specification requires a following byte in the range 80..BF, instead.
The UTF-8 sequence <41 E0 9F 80 41> is ill-formed, because <9F> is not an allowed second byte of a UTF-8 sequence commencing with . In this case, there is an unconvertible offset at and the maximal subpart at that offset is also . The subsequence cannot be a maximal subpart, because it is not an initial subsequence of any well-formed UTF-8 code unit sequence.

This is auto-generated content. You can help to improve it.
0
Collect to Blossary

Member comments

You have to log in to post to discussions.

Terms in the News

Featured Terms

Harry8L
  • 0

    Terms

  • 0

    Blossaries

  • 1

    Followers

Industry/Domain: Software Category: Operating systems

Microsoft Cortana

Microsoft Cortana is an intelligent personal assistant on the Windows Phone 8.1 operating system available on an opt-in basis. Its name comes from ...

Contributor

Featured blossaries

EIM Teaminology

Category: Health   1 1 Terms

Asian Banker Publications

Category: Business   1 13 Terms