libidn-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

CVS libidn/doc/specifications


From: libidn-commit
Subject: CVS libidn/doc/specifications
Date: Wed, 18 May 2005 23:34:19 +0200

Update of /home/cvs/libidn/doc/specifications
In directory dopio:/tmp/cvs-serv9078

Added Files:
        draft-klensin-reg-guidelines-08.txt 
Log Message:
Add.


--- /home/cvs/libidn/doc/specifications/draft-klensin-reg-guidelines-08.txt     
2005/05/18 21:34:19     NONE
+++ /home/cvs/libidn/doc/specifications/draft-klensin-reg-guidelines-08.txt     
2005/05/18 21:34:19     1.1




Network Working Group                                         J. Klensin
Internet-Draft                                              May 17, 2005
Expires: November 18, 2005


 Suggested Practices for Registration of Internationalized Domain Names
                                 (IDN)
                  draft-klensin-reg-guidelines-08.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on November 18, 2005.

Copyright Notice

   Copyright (C) The Internet Society (2005).

Abstract

   This document explores the issues in registration of
   internationalized domain names (IDNs).  The basic IDN definition
   potentially allows a very large number of possible characters in
   domain names, and this richness may lead to serious user confusion
   about similar-looking names.  To avoid this confusion, it is
   necessary for the IDN registration process to impose rules that
   disallow some otherwise-valid name combinations.  This document
   suggests a set of mechanisms that registries might use to define and



Klensin                 Expires November 18, 2005               [Page 1]

Internet-Draft              IDN Registration                    May 2005


   implement such rules, including adaptation of methods developed for
   Chinese, Japanese, and Korean domain names to other languages and
   scripts.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1   Background . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.2   The Nature and Status of these Recommendations . . . . . .  4
     1.3   Terminology  . . . . . . . . . . . . . . . . . . . . . . .  5
       1.3.1   Languages and Scripts  . . . . . . . . . . . . . . . .  5
       1.3.2   Characters, Variants, Registrations, and Other
               Issues . . . . . . . . . . . . . . . . . . . . . . . .  6
       1.3.3   Confusion, Fraud, and Cybersquatting . . . . . . . . .  8
     1.4   A Review of the JET Guidelines . . . . . . . . . . . . . .  8
       1.4.1   JET Model  . . . . . . . . . . . . . . . . . . . . . .  8
       1.4.2   Reserved Names and Label Packages  . . . . . . . . . .  9
     1.5   Languages, Scripts, and Variants . . . . . . . . . . . . .  9
       1.5.1   Languages and Scripts  . . . . . . . . . . . . . . . .  9
       1.5.2   Variant Selection  . . . . . . . . . . . . . . . . . . 11
     1.6   Variants are not a Universal Remedy  . . . . . . . . . . . 13
     1.7   Reservations and Exclusions  . . . . . . . . . . . . . . . 13
       1.7.1   Sequence Exclusions for Valid Characters . . . . . . . 13
       1.7.2   Character Pairing Issues . . . . . . . . . . . . . . . 13
     1.8   The Registration Bundle  . . . . . . . . . . . . . . . . . 14
       1.8.1   Definitions and Structure  . . . . . . . . . . . . . . 14
       1.8.2   Application of the Registration Bundle . . . . . . . . 14
   2.  Some Implications of This Approach . . . . . . . . . . . . . . 15
   3.  Required Modifications to JET Model Needed Under Some of
       the Models Above . . . . . . . . . . . . . . . . . . . . . . . 16
   4.  Conclusions and Recommendations About the General Approach . . 17
   5.  A Model Table Format . . . . . . . . . . . . . . . . . . . . . 18
   6.  A Model Label Registration Procedure: "CreateBundle" . . . . . 19
     6.1   Description of the CreateBundle Mechanism  . . . . . . . . 19
     6.2   The "no-variants" Case . . . . . . . . . . . . . . . . . . 20
     6.3   CreateBundle and Nameprep Mapping  . . . . . . . . . . . . 21
   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 21
   8.  Internationalization Considerations  . . . . . . . . . . . . . 22
   9.  Security Considerations  . . . . . . . . . . . . . . . . . . . 23
   10.   Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 23
   11.   References . . . . . . . . . . . . . . . . . . . . . . . . . 24
       Author's Address . . . . . . . . . . . . . . . . . . . . . . . 25
       Intellectual Property and Copyright Statements . . . . . . . . 26








Klensin                 Expires November 18, 2005               [Page 2]

Internet-Draft              IDN Registration                    May 2005


1.  Introduction

1.1  Background

   The IDNA (Internationalized Domain Names in Applications)
   specification [RFC3490] defines the basic model for encoding non-
   ASCII strings in the DNS, and additional specifications ([RFC3491],
   [RFC3492]) define the mechanisms and tables needed to support it.  As
   work on these specifications neared completion, it became apparent
   that it would be desirable for registries to impose additional
   restrictions on the names that could actually be registered (e.g.,
   see [IESG-IDN] and [ICANN-IDN]) as a means of reducing potential
   confusion among characters that were similar in some way.  This
   document explores these IDN (international domain name) registration
   issues and suggests a set of mechanisms that IDN registries might
   use.  Registration restrictions are part of a long tradition.  For
   example, while the original DNS specifications [RFC1035] permitted
   any string of octets to be used in a DNS label, they also recommended
   the use of a much more restricted subset, one that was derived from
   the much older "hostname" rules [RFC0952] and defined by the "LDH"
   convention (for the three permitted types of characters, letters,
   digits, and the hyphen).  Enforcement of those restricted rules in
   registrations was the responsibility of the registry or domain
   administrator.  They were not embedded in the DNS protocol itself,
   although some applications protocols, notably those concerned with
   electronic mail, did impose and then enforce similar rules.

   If there are no constraints on registration in a zone, people can
   register characters that increase the risk of misunderstandings,
   cybersquatting, and other forms of confusion.  That a similar
   situation existed even before the introduction of IDNA is exemplified
   by domain names such as example.com and examp1e.com (note that the
   latter domain contains the digit "1" instead of the letter "l").

   For non-ASCII names (so-called "internationalized domain names" or
   "IDNs"), the problem was more complicated than that which led to the
   LDH (hostname) rules.  In the earlier situation, all protocols,
   hosts, and DNS zones used ASCII exclusively in practice, so the LDH
   restriction could reasonably be applied uniformly across the
   Internet.  With the introduction of a very large character
   repertoire, and with different geographical and political locations
   and languages having requirements for different collections of
   characters, the optimal registration restrictions became, not a
   global matter, but ones that were different in different areas and,
   hence, in different DNS zones.

   For some human languages, there are characters and/or strings that
   have equivalent or near-equivalent usages.  If someone is allowed to



Klensin                 Expires November 18, 2005               [Page 3]

Internet-Draft              IDN Registration                    May 2005


   register a name with such a character or string, the registry might
   want to automatically associate all of the names that have the same
   meaning with the registered name.  The registry might also decide
   whether the names that are associated with, or generated by, one
   registration should, as a group or individually, go into the zone or
   be blocked from registration by different parties.

   To date, the best-developed system for handling registration
   restrictions for IDNs is the JET Guidelines for Chinese, Japanese,
   and Korean [RFC3743], the so-called "CJK" languages.  That system is
   limited to those languages and, in particular, to their common script
   base.  Those languages are also the best-known and most widely-used
   ones in the world whose writing system is constructed on
   "ideographic" or "pictographic" principles.  This document explores
   the principles behind the JET guidelines.  It then examines some of
   the issues that might arise in trying to adapt them to alphabetic
   languages, i.e., ones who characters primarily represent sounds,
   rather than meanings.

   This document describes five things:

   1.  The general background and considerations for non-ASCII scripts
       in names.  Just as the JET Guidelines contain some suggestions
       that may not be applicable to alphabetic scripts, some of the
       suggestions here, especially the more specific ones, may be
       applicable to some scripts and not others

   2.  Suggested practices for describing character variants

   3.  A method for using a zone's character variants to determine which
       names should be associated with a registration

   4.  A format for publishing a zone's table of character variants.
       Such tables are referred to below simply as "the table".

   5.  A model algorithm for name registration given the presence of
       language tables.

1.2  The Nature and Status of these Recommendations

   The document makes recommendations for consideration by registries
   and, where relevant, those who coordinate them and use their
   services.  None of the recommendations are intended to be normative.
   Indeed, the intent of the document is to illustrate a framework from
   which variations to meet the needs of particular registries and their
   processing of particular languages can be developed.  Of course, if
   registries make similar decisions and utilize similar tools, it may
   reduce costs and confusion -- both between registries and for users



Klensin                 Expires November 18, 2005               [Page 4]

Internet-Draft              IDN Registration                    May 2005


   and registrars who have relationships with more than one domain.

1.3  Terminology

1.3.1  Languages and Scripts

   This document uses the term "language" in what may be, to many
   readers, an odd way.  Neither this specification, nor IDNA, nor the
   DNS are directly concerned with natural language, but only about the
   characters that make up a given label.  In some respects, the term
   "script", as used in the character coding community, might be more
   appropriate.  However, different subsets of the same script may be
   used with different languages and the same language may be written
   using different characters (or even completely different scripts) in
   different locations, so that term is not precisely correct either.
   Long-standing confusion has also resulted by the fact that most
   scripts are, informally at least, named after one of the languages
   written in them: "Chinese" describes both a language and a collection
   of characters also used in writing Japanese, Korean, and, at least
   historically, some other languages; "Latin" describes both a
   language, the characters used to write that language, and, often
   characters used to write a number of contemporary languages that are
   derived from or similar to those used to write Latin; the script used
   to write the Arabic language is called "Arabic" but is also used
   (typically with some additions or deletions) to write a number of
   other languages, and so on.  Situations in which a script has a
   clearly-defined name independent of the name of a language are the
   exception, rather than the rule; examples include Hangul, used to
   write Korean, Katakana and Hiragana, used to write Japanese, and a
   few others.  And some scholars have historically used "Roman" or
   "Roman-derived" in an attempt to distinguish between a script and the
   Latin language.

   The term "language" is hence used in this document in the informal
   sense of a written language and is defined, for this purpose, by the
   characters used to write it.  In this context, a "language" is
   defined by the combination of a code (see Section 1.4.1) and an
   authority that has chosen to use that code and establish a character-
   listing for it.  Authorities are normally TLD registries (see
   Section 7 and [IANA-language-registry]), but it is expected that they
   will find appropriate experts and that advice from language and
   script experts selected by international neutral bodies will also
   become part of the registration system.  In addition, as discussed
   below in Section 7, registries may conclude that the best interests
   of registrants, stakeholders, and the Internet community would be
   served by constructing "language tables" that mix scripts and
   characters in ways that conform to no known language.  Conventions
   should be developed for such registrations that do not misleadingly



Klensin                 Expires November 18, 2005               [Page 5]

Internet-Draft              IDN Registration                    May 2005


   reflect specific language codes.

1.3.2  Characters, Variants, Registrations, and Other Issues

   1.  Characters in this document are given as their Unicode codepoints
       in U+xxxx format, with their official names, or both.

   2.  The following terms are used in this document.

       *  A "string" is an sequence of one or more characters.

       *  This document discusses characters that may have equivalent or
          near-equivalent characters or strings.  The "base character"
          is the character that has zero or more equivalents.  In the
          JET Guidelines, base characters are referred to as "valid
          characters".  In a table with variants, as described in
          Section 5, the base characters occupy the first column.
          Normally (and always if the recommendation of Section 6.3 is
          adopted) the base characters will be the characters that
          appear in registration requests from registrants; all other
          character will be considered to make the registration attempt
          invalid.

       *  The "variant(s)" are the character(s) and/or string(s) that
          are treated as equivalent to the base character.  Note that
          these might not be true equivalent characters: a particular
          original character may be a base character with a mapping to a
          particular variant character, but that variant character may
          not have a mapping to the original base character and, indeed,
          the variant character may not appear in the base character
          list, and hence may not be valid for use in a registration.
          Usually, characters or strings to be designated as variants
          are considered either equivalent or sufficiently similar (by
          some registry-specific definition) that confusion between them
          and the base character might occur.

       *  The "base registration" is the single name that the registrant
          requested from the registry.  The JET Guidelines use the term
          "label string" for this name.

       *  A label (or "name") is described as "registered" if it is
          actually entered into a domain (i.e., a zone file) by the
          registry, so that it can be accessed and resolved using
          standard DNS tools.  The JET Guidelines describe a
          "registered" label as "activated".  However, some domains use
          a slightly different registration logic in which a name can be
          registered with the registrar, if one is involved, and with
          the registry but not actually entered into the zone file until



Klensin                 Expires November 18, 2005               [Page 6]

Internet-Draft              IDN Registration                    May 2005


          an additional activation or delegation step occurs.  This
          document does not make that distinction, but is compatible
          with it.

          As specified in the IDNA Standard, the name actually placed in
          the zone file is always the internal ("punycode") form.  There
          is no provision for actually entering any other form of an IDN
          into the DNS.  It remains controversial, with different
          registrars and registries having adopted different policies,
          as to whether the registration, as submitted by the
          registrant, is in the form of
             The native-script name, either in UTF-8 or in some coding
             specified by the registrar.
             The internal-form ("punycode") name.
             Both forms of the name together, so that the registrar and
             registry can verify the intended translation.
          If some variant system is used, it is almost certain to be
          necessary that the native-script form of the requested name be
          available to the registry.

       *  A "registration bundle" is the set of all labels that comes
          from expanding the base characters for a single name into
          their variants.  The presence of a label in a registration
          bundle does not imply that it is registered.  In the JET
          Guidelines, a registration bundle is called an "IDN Package".

       *  A "reserved label" is a label in a registration bundle that is
          not actually registered.

       *  A "registry" is the administrative authority for a DNS zone.
          That is, the registry is the body that enforces, and typically
          makes, policies that are used in a particular zone in the DNS.

       *  "Coded Character Set" ("CCS") is a term for a list of
          characters and the code positions assigned to them.  ASCII and
          Unicode are CCSs.

       *  A "language" is something spoken by humans, independent of how
          it is written or coded.  ISO Standard 639 and IETF BCP 47 (RFC
          3066) [RFC3066] list and define codes for identifying
          languages.

       *  A "script" is a collection of characters (glyphs, independent
          of coding) that are used together, typically to represent one
          or more languages.  Note that the script for one language may
          heavily overlap the script for another.  This does not imply
          that they have identical scripts.




Klensin                 Expires November 18, 2005               [Page 7]

Internet-Draft              IDN Registration                    May 2005


       *  "Charset" is an IETF-invented term to describe, more or less,
          the combination of a script, a CCS that encodes that script,
          and rules for serializing the bytes when those are stored on a
          computer or transmitted over the network.

[1058 lines skipped]




reply via email to

[Prev in Thread] Current Thread [Next in Thread]