libidn-commit
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

CVS libidn/doc/specifications


From: libidn-commit
Subject: CVS libidn/doc/specifications
Date: Mon, 08 Nov 2004 12:21:21 +0100

Update of /home/cvs/libidn/doc/specifications
In directory dopio:/tmp/cvs-serv2262

Added Files:
        draft-klensin-reg-guidelines-04.txt 
        draft-xdlee-idn-cdnadmin-02.txt 
Log Message:
Add.


--- /home/cvs/libidn/doc/specifications/draft-klensin-reg-guidelines-04.txt     
2004/11/08 11:21:21     NONE
+++ /home/cvs/libidn/doc/specifications/draft-klensin-reg-guidelines-04.txt     
2004/11/08 11:21:21     1.1
Network Working Group                                         J. Klensin
Internet-Draft                                              July 6, 2004
Expires: January 4, 2005



  Registration of Internationalized Domain Names: Overview and Method
                  draft-klensin-reg-guidelines-04.txt


Status of this Memo


   By submitting this Internet-Draft, I certify that any applicable
   patent or other IPR claims of which I am aware have been disclosed,
   and any of which I become aware will be disclosed, in accordance with
   RFC 3668.


   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as
   Internet-Drafts.


   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."


   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.


   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.


   This Internet-Draft will expire on January 4, 2005.


Copyright Notice


   Copyright (C) The Internet Society (2004).  All Rights Reserved.


Abstract


   IETF has introduced standards-track mechanisms to enable the use of
   "internationalized", i.e., non-ASCII, names in the DNS and
   applications that use it.  This has led, in turn, to concerns that
   characters with similar meanings or appearances could cause user
   confusion and opportunities for deliberate deception and fraud.  Part
   of this problem can be addressed by limiting, on a per-zone (or
   per-registry) basis, the specific characters that can be used to be a
   subset of the list allowed by the standard and by creating
   "reservations" of labels that might create confusion with those that
   are permitted.  The model for doing this for languages that use




Klensin                 Expires January 4, 2005                 [Page 1]
Internet-Draft              IDN Registration                   July 2004



   characters that originated with Chinese has been extensively
   developed in another document.  This document discusses some of the
   issues in that design and relates them to considerations and
   mechanisms that might be appropriate for other languages and scripts,
   especially those involving alphabetic characters.


   In particular, it describes some suggested practices for registering
   internationalized domain names (IDNs) in a zone.  Before accepting
   such registrations of domain names, the zone's registry should decide
   which codepoints in the Unicode character set the zone will accept.
   The registry should also decide whether particular characters in a
   registered domain name should cause action with regard to other
   domain names which are considered equivalent; these domain names
   might be added to the zone or blocked from registration.  This
   document also describes the concept of character variants for
   registering IDNs, how they might be handled in the registration
   process, and how to publish tables that list the character variants.


   This document is intended to supply a basis for adapting methods
   developed for Chinese, Japanese, and Korean to other languages and
   scripts.  If these adaptations are made carefully and with due
   consideration for local issues, the likelihood of problematic DNS
   registrations with be significantly reduced.  A specific method is
   introduced that should be applicable (directly, or with minor
   modifications), to many scripts.



























Klensin                 Expires January 4, 2005                 [Page 2]
Internet-Draft              IDN Registration                   July 2004



Table of Contents


   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
     1.1   Background . . . . . . . . . . . . . . . . . . . . . . . .  4
     1.2   The Nature and Status of these Recommendations . . . . . .  5
     1.3   Terminology  . . . . . . . . . . . . . . . . . . . . . . .  5
       1.3.1   Languages and Scripts  . . . . . . . . . . . . . . . .  5
       1.3.2   Characters, Variants, Registrations, and Other
               Issues . . . . . . . . . . . . . . . . . . . . . . . .  6
       1.3.3   Confusion, Fraud, and Cybersquatting . . . . . . . . .  8
     1.4   A Review of the JET Guidelines . . . . . . . . . . . . . .  8
       1.4.1   JET Model  . . . . . . . . . . . . . . . . . . . . . .  8
       1.4.2   Reserved Names and Label Packages  . . . . . . . . . .  9
     1.5   Languages, Scripts, and Variants . . . . . . . . . . . . . 10
       1.5.1   Languages and Scripts  . . . . . . . . . . . . . . . . 10
       1.5.2   Variant Selection  . . . . . . . . . . . . . . . . . . 11
     1.6   Variants are not a Universal Remedy  . . . . . . . . . . . 13
     1.7   Reservations and Exclusions  . . . . . . . . . . . . . . . 13
       1.7.1   Sequence Exclusions for Valid Characters . . . . . . . 13
       1.7.2   Character Pairing Issues . . . . . . . . . . . . . . . 13
     1.8   The Registration Bundle  . . . . . . . . . . . . . . . . . 14
       1.8.1   Definitions and Structure  . . . . . . . . . . . . . . 14
       1.8.2   Application of the Registration Bundle . . . . . . . . 14
   2.  Some Implications of This Approach . . . . . . . . . . . . . . 15
   3.  Required Modifications to JET Model Needed Under Some of
       the Models Above . . . . . . . . . . . . . . . . . . . . . . . 16
   4.  Conclusions and Recommendations About the General Approach . . 17
   5.  A Model Table Format . . . . . . . . . . . . . . . . . . . . . 18
   6.  A Model Label Registration Procedure: "CreateBundle" . . . . . 19
     6.1   Description of the CreateBundle Mechanism  . . . . . . . . 19
     6.2   The "no-variants" Case . . . . . . . . . . . . . . . . . . 20
     6.3   CreateBundle and Nameprep Mapping  . . . . . . . . . . . . 21
   7.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 21
   8.  Security Considerations  . . . . . . . . . . . . . . . . . . . 22
   9.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23
   10.   References . . . . . . . . . . . . . . . . . . . . . . . . . 23
       Author's Address . . . . . . . . . . . . . . . . . . . . . . . 25
       Intellectual Property and Copyright Statements . . . . . . . . 26














Klensin                 Expires January 4, 2005                 [Page 3]
Internet-Draft              IDN Registration                   July 2004



1.  Introduction


1.1  Background


   Once work on the basic model for encoding non-ASCII strings in the
   DNS with IDNA ([RFC3490], [RFC3491], [RFC3492]) was nearing
   completion, it became clear that it would be desirable for registries
   to impose additional restrictions on the names that could actually be
   registered (e.g., see [IESG-IDN] and [ICANN-IDN]) as a means of
   reducing potential confusion among characters that were similar in
   some way.  These restrictions were, in many respects, part of a long
   tradition.  For example, while the original DNS specifications
   [RFC1035] permitted any string of octets to be used in a DNS label,
   they also recommended the use of a much more restricted subset, one
   that was derived from the much older "hostname" rules [RFC0952] and
   defined by the "LDH" convention (for the three permitted types of
   characters, letters, digits, and the hyphen).  Enforcement of those
   restricted rules in registrations was the responsibility of the
   registry or domain administrator.  They were not embedded in the DNS
   protocol itself, although some applications protocols, notably those
   concerned with electronic mail, did impose and then enforce similar
   rules.


   If there are no constraints on registration in a zone, people can
   register characters that increase the risk of misunderstandings,
   cybersquatting, and other forms of confusion.  That a similar
   situation existed even before the introduction of IDNA is exemplified
   by domain names such as example.com and examp1e.com (note that the
   latter domain contains the digit "1" instead of the letter "l").


   For non-ASCII names (so-called "internationalized domain names" or
   "IDNs"), the problem was more complicated than that which led to the
   LDH (hostname) rules.  In the earlier situation, all protocols,
   hosts, and DNS zones used ASCII exclusively in practice, so the LDH
   restriction could reasonably be applied uniformly across the
   Internet.  With the introduction of a very large character
   repertoire, and with different geographical and political locations
   and languages having requirements for different collections of
   characters, the optimal registration restrictions became, not a
   global matter, but ones that were different in different areas and,
   hence, in different DNS zones.


   For some human languages, there are characters and/or strings that
   have equivalent or near-equivalent usages.  If someone is allowed to
   register a name with such a character or string, the registry might
   want to automatically associate all of the names that have the same
   meaning with the registered name.  The registry might also decide
   whether the names that are associated with, or generated by, one




Klensin                 Expires January 4, 2005                 [Page 4]
Internet-Draft              IDN Registration                   July 2004



   registration should, as a group or individually, go into the zone or
   be blocked from registration by different parties.


   To date, the best-developed system for handling registration
   restrictions for IDNs is the JET Guidelines for Chinese, Japanese,
   and Korean [RFC3743], the so-called "CJK" languages.  That system is
   limited to those languages and, in particular, to their common script
   base.  This document explores the principles behind those guidelines
   and some of the issues that might arise in trying to adapt them to
   alphabetic languages.


   This document describes five things:


   1.  The general background and considerations for non-ASCII scripts
       in names.  Just as the JET Guidelines contain some suggestions
       that may not be applicable to alphabetic scripts, some of the
       suggestions here, especially the more specific ones, may be
       applicable to some scripts and not others
   2.  Suggested practices for describing character variants
   3.  A method for using a zone's character variants to determine which
       names should be associated with a registration
   4.  A format for publishing a zone's table of character variants
   5.  A model algorithm for name registration given the presence of
       language tables.


1.2  The Nature and Status of these Recommendations


   The document makes recommendations for consideration by registries
   and, where relevant, those who coordinate them and use their
   services.  None of the recommendations are intended to be normative.
   Indeed, the intent of the document is to illustrate a framework from
   which variations to meet the needs of particular registries and their
   processing of particular languages can be developed.  Of course, if
   registries make similar decisions and utilize similar tools, it may
   reduce costs and confusion -- both between registries and for users
   and registrars who have relationships with more than one domain.


1.3  Terminology


1.3.1  Languages and Scripts


   This document uses the term "language" in what may be, to many
   readers, an odd way.  Neither this specification, nor IDNA, nor the
   DNS are directly concerned with natural language, but only about the
   characters that make up a given label.  In some respects, the term
   "script", as used in the character coding community, might be more
   appropriate.  However, different subsets of the same script may be
   used with different languages and the same language may be written




Klensin                 Expires January 4, 2005                 [Page 5]
Internet-Draft              IDN Registration                   July 2004



   using different characters (or even completely different scripts) in
   different locations, so that term is not precisely correct either.
   Long-standing confusion has also resulted by the fact that most
   scripts are, informally at least, named after one of the languages
   written in them: "Chinese" describes both a language and a collection
   of characters also used in writing Japanese, Korean, and, at least
   historically, some other languages; "Latin" describes both a
   language, the characters used to write that language, and, often
   characters used to write a number of contemporary languages that are
   derived from or similar to those used to write Latin; the script used
   to write the Arabic language is called "Arabic" but is also used
   (typically with some additions or deletions) to write a number of
   other languages, and so on.  Situations in which a script has a
   clearly-defined name independent of the name of a language are the
   exception, rather than the rule; examples include Hangul, used to
   write Korean, Katakana and Hiragana, used to write Japanese, and a
   few others.  And some scholars have historically used "Roman" or
   "Roman-derived" in an attempt to distinguish between a script and the
   Latin language.


   The term "language" is hence used in this document in the informal
   sense of a written language and is defined, for this purpose, by the
   characters used to write it.  In this context, a "language" is
   defined by the combination of a code (see Section 1.4.1) and an
   authority that has chosen to use that code and establish a
   character-listing for it.  Authorities are normally TLD registries
   (see Section 7 and [IANA-language-registry]), but it is expected that
   they will find appropriate experts and that advice from language and
   script experts selected by international neutral bodies will also
   become part of the registration system.  In addition, as discussed
   below in Section 7, registries may conclude that the best interests
   of registrants, stakeholders, and the Internet community would be
   served by constructing "language tables" that mix scripts and
   characters in ways that conform to no known language.  Conventions
   should be developed for such registrations that do not misleadingly
   reflect specific language codes.


1.3.2  Characters, Variants, Registrations, and Other Issues


   1.  Characters in this document are given as their Unicode codepoints
       in U+xxxx format, with their official names, or both.


   2.  The following terms are used in this document.


       1.  A "string" is an sequence of one or more characters.
       2.  This document discusses characters that may have equivalent
            or near-equivalent characters or strings.  The "base
            character" is the character that has zero or more




Klensin                 Expires January 4, 2005                 [Page 6]
Internet-Draft              IDN Registration                   July 2004



            equivalents.  In the JET Guidelines, base characters are
            referred to as "valid characters".  In a table with
            variants, as described in Section 5, the base characters
            occupy the first column.  Normally (and always if the
            recommendation of Section 6.3 is adopted) the base
            characters will be the characters that appear in
            registration requests from registrants; all other character
            will be considered to make the registration attempt invalid.
       3.  The "variant(s)" are the character(s) and/or string(s) that
            are treated as equivalent to the base character.  Note that
            these might not be true equivalent characters: a particular
            original character may be a base character with a mapping to
            a particular variant character, but that variant character
            may not have a mapping to the original base character and,
            indeed, the variant character may not appear in the base
            character list, and hence may not be valid for use in a
            registration.  Usually, characters or strings to be
            designated as variants are considered either equivalent or
            sufficiently similar (by some registry-specific definition)
            that confusion between them and the base character might
            occur.
       4.  The "base registration" is the single name that the
            registrant requested from the registry.
       5.  A label (or "name") is described as "registered" if it is

[1229 lines skipped]
--- /home/cvs/libidn/doc/specifications/draft-xdlee-idn-cdnadmin-02.txt 
2004/11/08 11:21:21     NONE
+++ /home/cvs/libidn/doc/specifications/draft-xdlee-idn-cdnadmin-02.txt 
2004/11/08 11:21:21     1.1

[40730 lines skipped]




reply via email to

[Prev in Thread] Current Thread [Next in Thread]