Home > Terms > English, UK (UE) > Well-formed UTF-8 code unit sequence

Well-formed UTF-8 code unit sequence

A well-formed Unicode code unit sequence of UTF-8 code units.

  • The UTF-8 code unit sequence <41 C3 B1 42> is well-formed, because it can be partitioned into subsequences, all of which match the specification for UTF-8 in Table 3-7. It consists of the following minimal well-formed code unit subsequences: <41>, , and <42>.
  • The UTF-8 code unit sequence <41 C2 C3 B1 42> is ill-formed, because it contains one ill-formed subsequence. There is no subsequence for the C2 byte which matches the specification for UTF-8 in Table 3-7. The code unit sequence is partitioned into one minimal well-formed code unit subsequence, <41>, followed by one ill-formed code unit subsequence, , followed by two minimal well-formed code unit subsequences, and <42>.
  • In isolation, the UTF-8 code unit sequence would be ill-formed, but in the context of the UTF-8 code unit sequence <41 C2 C3 B1 42>, does not constitute an ill-formed code unit subsequence, because the C3 byte is actually the first byte of the minimal well-formed UTF-8 code unit subsequence . Ill-formed code unit subsequences do not overlap with minimal well-formed code unit subsequences.
This is auto-generated content. You can help to improve it.
0
Collect to Blossary

Member comments

You have to log in to post to discussions.

Terms in the News

Featured Terms

Harry8L
  • 0

    Terms

  • 0

    Blossaries

  • 1

    Followers

Industry/Domain: Geography Category: Countries & Territories

Tahiti

Tahiti is the largest island in the Windward group of French Polynesia, located in the archipelago of Society Islands in the southern Pacific Ocean. ...

Contributor

Featured blossaries

Landee Pipe Wholesaler

Category: Business   3 3 Terms

Star Trek

Category: Entertainment   2 3 Terms

Browers Terms By Category