Re: [Bug-wget] Support non-ASCII URLs

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Support non-ASCII URLs

From:	Giuseppe Scrivano
Subject:	Re: [Bug-wget] Support non-ASCII URLs
Date:	Wed, 16 Dec 2015 10:53:51 +0100
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

Hi Eli,

thanks for working on it, I have a few questions:

Eli Zaretskii <address@hidden> writes:

> This second part is the main part of the change.  It uses 'iconv',
> when available, to convert the file names to the local encoding,
> before saving the files.  Note that the same function I modified is
> used by ftp.c, so downloading via FTP should also work with non-ASCII
> file names now; however, I didn't test that.
>
> Thanks.
>
> diff --git a/src/url.c b/src/url.c
> index c62867f..d984bf7 100644
> --- a/src/url.c
> +++ b/src/url.c
> @@ -43,6 +43,11 @@ as that of the covered work.  */
>  #include "host.h"  /* for is_valid_ipv6_address */
>  #include "c-strcase.h"
>  
> +#if HAVE_ICONV
> +#include <iconv.h>
> +#include <langinfo.h>
> +#endif
> +
>  #ifdef __VMS
>  #include "vms.h"
>  #endif /* def __VMS */
> @@ -1531,6 +1536,90 @@ append_uri_pathel (const char *b, const char *e, bool 
> escaped,
>    append_null (dest);
>  }
>  
> +static char *
> +convert_fname (const char *fname)
> +{
> +  char *converted_fname = (char *)fname;
> +#if HAVE_ICONV
> +  const char *from_encoding = opt.encoding_remote;
> +  const char *to_encoding = opt.locale;
> +  iconv_t cd;
> +  /* sXXXav : hummm hard to guess... */
> +  size_t len, done, inlen, outlen;
> +  char *s;
> +  const char *orig_fname = fname;;
> +
> +  /* Defaults for remote and local encodings.  */
> +  if (!from_encoding)
> +    from_encoding = "UTF-8";
> +  if (!to_encoding)
> +    to_encoding = nl_langinfo (CODESET);
> +
> +  cd = iconv_open (to_encoding, from_encoding);
> +  if (cd == (iconv_t)(-1))
> +    logprintf (LOG_VERBOSE, _("Conversion from %s to %s isn't supported\n"),
> +            quote (from_encoding), quote (to_encoding));
> +  else
> +    {
> +      inlen = strlen (fname);
> +      len = outlen = inlen * 2;
> +      converted_fname = s = xmalloc (outlen + 1);
> +      done = 0;
> +
> +      for (;;)
> +     {
> +       if (iconv (cd, &fname, &inlen, &s, &outlen) != (size_t)(-1))
> +         {
> +           /* Flush the last bytes.  */
> +           iconv (cd, NULL, NULL, &s, &outlen);

should not the return code be checked here?

> +           *(converted_fname + len - outlen - done) = '\0';
> +           iconv_close(cd);
> +           DEBUGP (("Converted file name '%s' (%s) -> '%s' (%s)\n",
> +                    orig_fname, from_encoding, converted_fname, 
> to_encoding));
> +           return converted_fname;
> +         }
> +
> +       /* Incomplete or invalid multibyte sequence */
> +       if (errno == EINVAL || errno == EILSEQ)
> +         {
> +           logprintf (LOG_VERBOSE,
> +                      _("Incomplete or invalid multibyte sequence 
> encountered\n"));
> +           xfree (converted_fname);
> +           converted_fname = (char *)orig_fname;
> +           break;
> +         }
> +       else if (errno == E2BIG) /* Output buffer full */
> +         {
> +           char *new;
> +
> +           done = len;
> +           outlen = done + inlen * 2;
> +           new = xmalloc (outlen + 1);
> +           memcpy (new, converted_fname, done);
> +           xfree (converted_fname);

What would be the extra cost in terms of copied bytes if we just replace
the three lines above with xrealloc?

Regards,
Giuseppe

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [Bug-wget] URL encoding issues (Was: GNU wget 1.17.1 released), (continued)

Prev by Date: Re: [Bug-wget] flock is not available on solaris 10 (at least sparc)
Next by Date: Re: [Bug-wget] flock is not available on solaris 10 (at least sparc)
Previous by thread: Re: [Bug-wget] Support non-ASCII URLs
Next by thread: Re: [Bug-wget] Support non-ASCII URLs
Index(es):
- Date
- Thread