Skip to content

Unicode

November 2, 2023
June 5, 2015

Unicode - Wikiwand
Universal Character Set characters - Wikiwand
Code point - Wikiwand
BMP
SMP
Astral Planes

𝚚𝚠𝚎𝚛𝚝𝚢.𝚍𝚎𝚟

Plain Text • Dylan Beattie • GOTO 2023 - YouTube ❗!important, 43:11, ASCII history, code page, Unicode, sorting, normalization, encoding, emoji, ligatures
Plain Text - Dylan Beattie - NDC Copenhagen 2022 - YouTube
Code page - Wikiwand
In ASCII days, code pages define what the top half of ASCII represents. It is sometimes bound to a particular use case/app.

Characters, Symbols and the Unicode Miracle - Computerphile - YouTube
EXTRA BITS - UTF-8 'nearly' works - Computerphile - YouTube
Unicode, in friendly terms: ASCII, UTF-8, code points, character encodings, and more - YouTube
These Keys Shouldn't Exist | Nostalgia Nerd - YouTube ASCII and broken pipe character, lingering as non-ASCII (Code page 437) for IBM PCs
Plain Text - Dylan Beattie - NDC Oslo 2021 - YouTube from encoding to Unicode, composition form, normalization form, UTF8, emoji
锟斤拷 �⊠ 是怎样炼成的——中文显示「⼊」门指南【柴知道】 - YouTube

Alt + Code point to input unicode character

Special Characters Ø, ©, ±, °… [PC] | Tim Bird

Programming with Unicode — Programming with Unicode
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) – Joel on Software !important
What every JavaScript developer should know about Unicode
Legacy Character Models and an Introduction to Unicode - Slide list

From Python PEP-261:

**Character**

Used by itself, means the addressable units of a Python Unicode string.

**Code point**

A code point is an integer between 0 and TOPCHAR. If you imagine Unicode as a mapping from integers to characters, each integer is a code point. But the integers between 0 and TOPCHAR that do not map to characters are also code points. Some will someday be used for characters. Some are guaranteed never to be used for characters.

**Codec**

A set of functions for translating between physical encodings (e.g. on disk or coming in from a network) into logical Python objects.

**Encoding**

Mechanism for representing abstract characters in terms of physical bits and bytes. Encodings allow us to store Unicode characters on disk and transmit them over networks in a manner that is compatible with other Unicode software.

**Surrogate pair**

Two physical characters that represent a single logical character. Part of a convention for representing 32-bit code points in terms of two 16-bit code points.

**Unicode string**

A Python type representing a sequence of code points with "string semantics" (e.g. case conversions, regular expression compatibility, etc.) Constructed with the unicode() function.

&what: Discover Unicode & HTML Character Entities
Math Unicode Entities

Unify – Unicode support on browsers and devices

表意文字小組 - Wikiwand
中日韓統一表意文字 - Wikiwand
UAX #38: Unicode Han Database (Unihan)

Combining Marks/Normalization

Combining character - Wikiwand
Zalgo Text Generator ― LingoJam 😄funny

FAQ - Normalization
Unicode equivalence - Wikiwand
String.prototype.normalize() - JavaScript | MDN

UAX #15: Unicode Normalization Forms

Normal Form Decomposed (NFD): é (U+00E9) = e + ́ (U+0065 U+0301).

NFC — Normalization Form Canonical Composition, largest number of code points
NFD — Normalization Form Canonical Decomposition, smallest number of code points
NFKC — Normalization Form Compatibility Composition.
NFKD — Normalization Form Compatibility Decomposition.

Unicode Normalization forms - C# - OneCompiler
dotnet_summit_by.cs

Unicode 相容字元 - Wikiwand
Unicode compatibility characters - Wikiwand

Allows multiple glyphs for one code point
異體字選擇器 - Wikiwand
Variant form (Unicode) - Wikiwand

Encoding

UTF-8 - Wikiwand
UTF-16 - Wikiwand
Surrogates

RFC 3629 - UTF-8, a transformation format of ISO 10646

Byte order mark - Wikiwand
FAQ - UTF-8, UTF-16, UTF-32 & BOM
UTR#17: Unicode Character Encoding Model

research!rsc: UTF-8: Bits, Bytes, and Benefits
Hello World or Καλημέρα κόσμε or こんにちは 世界

Punycode Domain Name

Punycode - Wikiwand

RFC 3492 - Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)
Punycode converter (IDN converter), Punycode to Unicode 🔧

Phishing with Unicode Domains - Xudong Zheng
Internationalized Domain Names (IDN) in Google Chrome

Emoji

Emoji - Wikiwand
How emoji conquered the world | The Verge
The Oral History Of The Poop Emoji (Or, How Google Brought Poop To America) | Fast Company | Business + Innovation
Emoji and the Levitating Businessman - Computerphile - YouTube

Black Woman Astronaut = Woman (U+1F469) + Dark Skin Tone (U+1F3FF) + Zero Width Joiner (U+200D) + Rocket (U+1FD80D)

iEmoji.com
Emoji searcher
📙 Emojipedia — 😃 Home of Emoji Meanings 💁👌🎍😍
😋 Get Emoji — List of all Emojis to ✂ Copy and 📋 Paste 👌
emojidex - custom emoji service and apps
Full Emoji List, v14.0

🎁 Emoji cheat sheet for GitHub, Basecamp, Slack & more

Intro to Emoji URLs - DEV Community

Library

muan/mojibar: Emoji searcher but as a menubar app.

Twemoji
twitter/twemoji: Emoji for everyone. https://twemoji.twitter.com/
Open sourcing Twitter emoji for everyone
JoyPixels® - Freemium emoji icons. Emoji font licensing.

NeelShah18/emot: Open source Emoticons and Emoji detection library: emot

omnidan/node-emoji: simple emoji support for node.js projects
denosaurs/emoji: 🦄 Emojis for dinosaurs

Font

android - CSS reference to phone's Emoji font? - Stack Overflow

jslegers/emoji-icon-font: An experimental icon font
Twemoji Awesome | Like Font Awesome, but for Twitter Emoji.
EmojiSymbols Font
MorbZ/OpenSansEmoji: OpenSans based font which includes the full iOS Emoji set
EmojiSymbols Font
Google Noto Fonts - Noto Emoji
Google Noto Fonts - Noto Color Emoji

Emoji on the Web – Making Faces (and Other Emoji) – Medium

Character Table

Unicode character table
Unicode/UTF-8-character table
Unicodinator
Find all Unicode characters from Hieroglyphs to Dingbats – Codepoints
Unicode codepoint lookup/search tool
&what: Discover Unicode & HTML Character Entities
Unicode Characters ☯ ⚡ ∑ ♥ 😄
&what: Discover Unicode & HTML Character Entities
Graphemica - For people who ♥ letters, numbers, punctuation, &c
Code Charts (Unicode official one, PDFs)
List of Unicode characters - Wikiwand
Unicode Table
Unicode/UTF-8-character table

Typography Cheatsheet → A Comprehensive Guide to Smart Quotes, Dashes & Other Typographic Characters → Typewolf
Keycodes - Javascript Keyboard Codes, Character Codes, Unicode, HTML Entities
HTML Symbols – HTML Icon and Entity Code List

Shapecatcher: Draw the Unicode character you want!

Guobiao

國家標準代碼 - Wikiwand
国标码查询;汉字国家标准编码:GB2312、GBK、GB18030

2 bytes per character, with leading bit 1

Sorting

UTS #10: Unicode Collation Algorithm sorting

为什么汉字的“一二三四五六七八九十”的字典顺序和数字顺序不一致,而是“一七三九二五八六十四”? - 知乎

汉字UTF编码
0x4e00
0x4e8c
0x4e09
0x56db
0x4e94
0x516d
0x4e03
0x516b
0x4e5d
0x5341