Understanding Unicode

ASCII (American Standard Code for Information Interchange) is one of the first widespread encoding scheme used. This standard encoded the upper and lower-case English alphabet and US punctuation symbols into a representation of 1's and 0's, so a computer can store the defined text.

This is fine when a language uses the "English" alphabet, however this made it impossible to send a message in a non-standard english alphabet language such as Greek or Chinese.

To help resolve this, Unicode was developed as a way to provide a single character set that covers the languages of the world. It makes it possible to access and manipulate characters by unique numbers. Prior to Unicode, there were hundreds of different encoding systems that represented these different characters, but there was no universal standard until Unicode.

GSM-7 encoding vs. UCS-2 encoding and Text Messaging Character Limits

Mobile telephone companies use GSM-7 encoding, which is written up in the standard GSM 03.38 when sending out text messages through the network.

When sending a Text Message, carriers limit a single SMS message to be a total of 140 bytes. Since GSM encodes most of the characters in the "English" alphabet into 7 bits, it is possible to have up to 160 characters in 1 standard text message. Some GSM characters such as, ^ and ~ require two characters to be encoded making them 14-bits long.

However, when using a non GSM characters, then the message is instead encoded into UCS-2 and since each character is going to be either 2 or 4 bytes (instead of 1 byte), we can only have a maximum of 70 characters fit into a 140 byte SMS message.

This applies even if only one characters is a non-GSM character since the encoding applies to the ENTIRE message and NOT the individual character.

The follow chart below will help better understand how character limits are calculated

8-bit

=

1 byte

1 Standard SMS message

=

up to 140 bytes

1 Standard SMS message

=

up to 160 characters

8-bit * 140 bytes

=

up to 1,120 bits per Standard SMS message

GSM Encoding:
1,120 bits / 7-bit (per char)

=

160 characters per SMS

UCS-2 Encoding:
1,120 bits / 16-bit (per char)

=

70 characters per SMS

When sending out a Text Message to your customers it is important to ensure your message is within the limits set in order to avoid your message from being broken up into multiple Text Messages.

Ytel charges for each Message that is sent out. For example, if your SMS contains 200 characters, then your SMS will be broken into two parts (if using GSM characters) or three parts (if using non-GSM characters) and charged for either two or three text messages even though the Text Message was send out as 1 API call.

Oftentimes when using "Word" or "Google Docs" to create your SMS message, some characters that can be encoded using GSM may actually be encoded using a non-GSM encoding standard due to how "Word" or "Google Docs" represent their characters , so when that message is then "copied" and "pasted" into your Text message, the SMS will be encoded into UCS-2, which will cause your SMS to be split into multiple messages if your character count is above 70.

Using a third party message checker, such as smssplit, will help avoid this common mistake


Did this page help you?