Sam Ruby is talking about problems with textual data out on the web, or more specifically in the context of RSS, having to do with bad assumptions about character encoding. As someone who once did a lot of work in localization, it's a subject near and dear to my heart.
I'm always amazed that still to this day people don't get that there are such a thing as character encoding.
Sam points out that an upper case A and the Greek Alpha show up in most fonts as the same glyph. However, they are different code points in Unicode.
He's moving this idea up the stack to show why there are so many conflicts between HTML/XML/RSS/etc. The rules for character encoding are different in all those systems and are enforced differently by different tools, which is what causes so many RSS feeds to be badly formed.
He started the presentation talking about how XML is an "attractive nuisance" with regard to the encoding issue, in that it leads people down the primrose path to thinking all their encoding issues are solved just because XML is supposed to take care of encoding.
All in all, the issues Sam is talking about are pretty obscure, and appeal mostly to XML wonks, but that doesn't make them any less valid. The reality is we've all learned to deal with it most of the time, just like we're used to IE fixing up our bad HTML tables.
Powered by: newtelligence dasBlog 2.3.9074.18820
The opinions expressed herein are my own personal opinions and do not represent
my employer's view in any way.
© Copyright 2013, Patrick Cauldwell