Tell the world!

ICANN is meeting in Korea this week to discuss several issues regarding domain management, including post-expiration domain name recovery, registration abuse policies, new gTLDs and IDN ccTLDs. While all of this is interesting, I started to think about how many of English-as-their-only-language web users are even aware of this final issue. Did you ever consider that while the Internet is dominated by English focused websites, 60% of its users are non-English speakers? How many of you were aware that a URL could even be written in Chinese?

IDNs are internationalized domain names that are written using local language characters, not just limited to Latin or ASCII based script. The second level domains have been available for some time, such as “日本語ドメイン.com” but are currently limited to 2LDs and on, leaving the ASCII familiar TLDs (top-level domains) like “.com” to remain as a foreign language appendix. What we are likely to see very soon however, thanks to the ICANN discussions, is domains completely constructed using just one language.

ICANN has set up a test page at idn.icann.org. Here you can see the same example.test domain in Arabic, Greek, Cyrillic, and Hebrew.

So what does this mean for DNS which performs its Q&A based on the ASCII code? In order for DNS to understand and interpret these IDNs the unicode domain string is encoded using punycode, transforming it into ASCII so it can resolve properly. A full explanation of the punycode bootstring algorithm can be found here.

For every domain there is a label assigned to it. The DNS stored label is usually the same as the displayed label for Latin based domain names, but with IDNs and punycode we see a more significant difference between the two. A displayed label is called a U-label for unicode and its stored version is an A-label for ASCII. The result now, that most consumers will never realize, is you can have “‘example.test’, displayed as ‘пример.испытание’, (in cyrillic) but is stored as ‘xn--e1afmkfd.xn--80akhbyknj4f'”(example from ICANN). Every punycode version of these IDNs will begin with “xn--“.

It’s great that ICANN is making this movement for a more internationally conscious and applicable Internet, but it seems very delayed. How much has an English dominated Internet kept the rest of the world out of the loop? A couple of examples provided by ICANN documents bring up everyday situations many of us take for granted. If I read a billboard or advertisement that has an accompanying web address, I go there for more information. But what if that URL was in Chinese or Hindi? I wouldn’t be able to remember the address or use my keyboard to even reproduce it. I would of course prefer to have the web address in the same language as everything else I’m reading. This change will be especially advantageous for script such as Arabic that reads from right to left. You can imagine how confusing that is currently for conveying a URL properly to international consumers.

There are three programs for obtaining entire native language IDNs. The proposed launch date for the IDN ccTLD Fast Track Process is November 16, 2009.

For some application and browser IDN handling issues, check out IDNnews.com.

Matt Sully
Threat Research & Analysis

Tell the world!

Leave a Reply

Your email address will not be published. Required fields are marked *