That was an interesting article, although I boggled at "the English language letter (the meaning)" in the second paragraph of section 1. Letters aren't meanings. Letters don't even have meanings. Words have meanings, often. (Not always, because, for example, of metasyntactic variables, e.g., "foo.")
I'm surprised, in these days when memory is basically free, that they don't just use UTF-32 and forget about planes altogether. Although that wouldn't eliminate the need to handle combining forms when testing for equality. (And not just combining forms; glyphs with MATHEMATICAL in their names are often visually indistinguishable from the corresponding character without the MATHEMATICAL. And there's been a smiley face
(U+263A U+FE0E) in Unicode from the beginning, long before they had emoji as a category, because it was in IBM's version of 8-bit ASCII. (Or maybe I mean Microsoft's version. Anyway, it was in DOS.))
Really there are several different kinds of equality testing. The article talks about the two ways to encode é in Unicode, but for some purposes, such as dictionary sorting, "café" and "cafe" should be considered equal, so that "caféx" sorts before "cafey." (Okay, here's a realistic example: "café" comes before "cafeteria" in the dictionary.) That's an issue that transcends the details of Unicode. The German "ß" is, for some purposes, equal to "ss" even though they don't look the same or have the same string length. (I think .normalize
handles that one, but I wouldn't swear to it.) Similarly, there's the ongoing debate about whether lower case letters and upper case letters should be considered equal. (Hint: yes.
)
And then there are the Delphic Unicode Consortium pronouncements. For example, they're very insistent that font styles such as italic and boldface aren't separate glyphs, except in MATHEMATICAL-land, where they are. And Elvish and Klingon characters aren't characters at all, but they have a non-Consortium semi-official position in the private use area. But emoji are characters. :~( There's a SUPERSCRIPT LATIN SMALL LETTER N (ⁿ) and digits, but not other letters, not even superscript k, which is almost as common in math as superscript n. And ligatures (ff etc.), another font variation, are Unicode characters.
There's a LATIN SMALL LETTER CHI (ꭓ, U+AB53) and a GREEK SMALL LETTER CHI (χ, U+03C7). The Latin one is way far away from all the other Latin letters, even the weird ones such as ƻ (LATIN LETTER TWO WITH STROKE, U+01BB, the only digit with a stroke). Who uses chi in Latin or Latin-derived languages?
tl;dr: "looks the same" and "linguistically the same" are intersecting sets, but neither is a subset of the other.
P.S.: By the way,
:~(