[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: libidn2 0.13
From: |
Tim Rühsen |
Subject: |
Re: libidn2 0.13 |
Date: |
Sat, 07 Jan 2017 19:48:42 +0100 |
User-agent: |
KMail/5.2.3 (Linux/4.8.0-2-amd64; KDE/5.28.0; x86_64; ; ) |
On Dienstag, 3. Januar 2017 10:00:53 CET Nikos Mavrogiannopoulos wrote:
> On Mon, Jan 2, 2017 at 10:17 PM, Tim Rühsen <address@hidden> wrote:
> >> * APIs more like libidn's that take a full domain name and do proper
> >>
> >> operations on them. In several forms, UTF-8, USC-32, locale encoded,
> >> etc.
> >>
> >> * APIs to decode a IDNA2008 domain from ACE to Unicode format. That is
> >>
> >> not described by the IDNA2008 RFCs, interestingly enough, but I
> >> suspect people will want it, hah!
> >
> > Wget used to use ACE decoding from libidn, but only for logging/displaying
> > purpose. Since we switched to libidn2, the UTF-8/locale named will not be
> > displayed any more :-). With such a function I would reactivate the
> > logging
> > code.
>
> For gnutls unfortunately the reverse is really necessary and that's
> the reason we are stuck with libidn. We need to be able to print the
> actual name of the certificate and not only the punycode which is
> non-human readable for most languages.
Than let's define a function.
Let me start with a suggestion to get the ball rolling
int idn2_fromASCII (const uint8_t *src, uint8_t **dst)
'src' is an UTF-8 encoded string (domain name)
'dst' is the punycode-decoded output, also UTF-8.
Examples:
foo.bar -> foo.bar
übel.de -> übel.de
xn--bel-goa.de -> übel.de
xn--bel-goa.größer.de -> übel.größer.de
Casing: we leave input as it is - only domain labels that start with xn-- will
be converted without any casing check.
Why utf-8 and utf-8 ?
- Most applications internally work already with UTF-8.
- It is easy to convert to utf-16/utf-32 (ucs2/ucs4).
- Leave charset transcoding out of the library
- ...
Do we need an additional 'flags' for future use ? Why not.
If we want charset transcoding, we also need input and output charset, maybe
also language (e.g. think of turkish i/I casing). Do we want that ?
Regards, Tim
signature.asc
Description: This is a digitally signed message part.