html

Character set in HTML

Whenever a computer user hits a key on the keyboard, a relevant character code is generated to represent that keystroke. These character codes form a character set that must be able to communicate with the computer. HTML character set allows you to load the pages correctly by using character codes of symbols/characters. In this post, you would get the various character sets being used in HTML.

Character set in HTML

There are numerous character sets that were used in earlier versions of HTML. Here, we will discuss all the previously and currently supported character set.

ASCII Character set

The ASCII is a 7-bit character set that allows 128 characters to represent the English language terms in computer understandable format. Some characteristics of ASCII are described below:

– The digits (0-9) and all the 26 alphabetical letters can be represented using ASCII

– all the character sets used in HTML are based on ASCII

The primary limitation of the ASCII character set is that it supported only 128 characters that could support only English language terms(alphabets), digits(0-9), and punctuation marks. language-dependent and limited.

ISO 8859-1 Character set

From the recent versions, HTML 4.0 came up with the ISO-8859-1 as its default character set. The ISO 8859-1 is an extension of the ASCII character set and it represents the characters in 8-bit code. It can represent 256 characters which also makes it a language-dependent encoding scheme. A character set is defined in the meta tag of the HTML document and the following code will assist in using the ISO 8859-1 in your HTML document:

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

Text Description automatically generated

Note: Although HTML5 contains a UTF-8 character set, it can also be used with ISO-8859-1 by using the following line in your head tag.

<meta charset="ISO-8859-1">

UTF-8 Character Set

The Unicode character set comprises of the decimal values that represent the characters, symbols, and punctuation marks. These Unicode-based decimal values are then converted to computer understandable format using UTF-8, UTF-16, or UTF-32. However, the UTF-8 is recommended for web pages as it covers all the characters/symbols/punctuation marks that may be used in an HTML document.

The following are the salient features of UTF-8

– Latest character set came as a default for HTML5.

– It can be added to the HTML4 version as well.

– The UTF-8 works on the basis of ASCII and the first 128 characters in UTF-8 are the same as of ASCII.

– Supports all the symbols, characters, and punctuation marks used around the globe

– Converts the numbers/characters/symbols/punctuation-marks in one to four bytes

From the above discussion, the importance of UTF-8 made a solid ground for obsoletion of ASCII and ISO-8859-1-character sets.

Note: The unicode.org provides all the Unicodes of the characters, symbols, emojis, punctuation marks.

How to use UTF-8 in HTML4

The HTML4 came with ISO-889-1 as the default character set. However, you can add the UTF-8-character set in HTML4 by using the following meta tag.

<meta http-equiv="Content-Type" content="text/html;charset=utf-8">

How to use ISO-8859-1 in HTML5

As the default character set of HTML5 is UTF-8 which outperforms the older character sets. However, you can add the ISO-8859-1 character set in your HTML document by using the following line in our head tag.

<meta charset="ISO-8859-1">

Graphical user interface, text, website Description automatically generated

Conclusion

The character set converts the symbols, characters, or any keystroke to a computer-readable format. This article provides an overview of character sets used in HTML starting from earlier versions to the latest HTML5. The ASCII is the oldest character set with limited support for characters and language (only English language). Later on, ASCII was extended to 256 characters with 8-bit support to build a new character set named ISO-8859-1. The ISO-8859-1 is the default character set for HTML4 but offers limited support of characters (256). The UTF-8 covers all the deficiencies of the ISO-8859-1 and thus UTF-8 came as a default character set in HTML5.

About the author

Adnan Shabbir