Home > Terms > English, UK (UE) > Unicode encoding form

Unicode encoding form

A Unicode encoding form assigns each Unicode scalar value to a unique code unit sequence. The Unicode Standard defines three Unicode encoding forms: UTF-8, UTF-16, and UTF-32.

  • For historical reasons, the Unicode encoding forms are also referred to as Unicode (or UCS) transformation formats (UTF). That term is actually ambiguous between its usage for encoding forms and encoding schemes.
  • The mapping of the set of Unicode scalar values to the set of code unit sequences for a Unicode encoding form is one-to-one. This property guarantees that a reverse mapping can always be derived. Given the mapping of any Unicode scalar value to a particular code unit sequence for a given encoding form, one can derive the original Unicode scalar value unambiguously from that code unit sequence.
  • The mapping of the set of Unicode scalar values to the set of code unit sequences for a Unicode encoding form is not onto. In other words, for any given encoding form, there exist code unit sequences that have no associated Unicode scalar value.
  • To ensure that the mapping for a Unicode encoding form is one-to-one, all Unicode scalar values, including those corresponding to noncharacter code points and unassigned code points, must be mapped to unique code unit sequences. Note that this requirement does not extend to high-surrogate and low-surrogate code points, which are excluded by definition from the set of Unicode scalar values.
This is auto-generated content. You can help to improve it.
0
Collect to Blossary

Member comments

You have to log in to post to discussions.

Terms in the News

Featured Terms

Harry8L
  • 0

    Terms

  • 0

    Blossaries

  • 1

    Followers

Industry/Domain: People Category: Sportspeople

Omar "El Gato" Ortiz

Omar Ortiz Uribe, nicknamed El Gato, is a former Mexican football goalkeeper. He made his debut in 1997 with Club de Fútbol Monterrey among others. In ...