Mojibake
Problems with encoding are called mojibake (文字化け): "unintelligible sequence of characters".
Type | Example |
---|---|
NOT mojibake | Minä tykkään Unicodesta! |
UTF-8 shown as ISO-8859-1 | Minä tykkään Unicodesta! |
ISO-8859-1 shown as UTF-8 | Min� tykk��n Unicodesta! |
- 8-bit encoding can encode/decode everything
- UTF-8 uses '�' in case of errors
- Can be caused by smart apostrophes from unsupported fonts: Right or close quotation mark used as apostrophe.
- ISO-8859-1 was the default character in HTML 4.01. The first part of ISO-8859-1 (entity numbers from 0-127) is the original ASCII character-set. It contains numbers, upper and lowercase English letters, and some special characters.