bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] [Bug-Wget] Misc. patches


From: Darshit Shah
Subject: Re: [Bug-wget] [Bug-Wget] Misc. patches
Date: Sat, 5 Jul 2014 16:01:02 +0530

On Sat, Jul 5, 2014 at 3:45 PM, Giuseppe Scrivano <address@hidden> wrote:
> Hi Darshit,
>
> few comments below:
>
> Darshit Shah <address@hidden> writes:
>
>> Hi,
>>
>> I've attached 3 patches to this mail. The first one eliminates some of
>> the error codes in Wget as a first step to cleaning up the error
>> reporting method.
>>
>> The second is siimply fixing indentation issues.
>>
>> The third updates our usage of libpsl according to the latest release
>> version and converts all domain names to lowercase before passing them
>> to libpsl for checking.
>>
>> --
>> Thanking You,
>> Darshit Shah
>>
>> From 64b57d26b6283fe1039999ad3aabbee8dbaa4c97 Mon Sep 17 00:00:00 2001
>> From: Darshit Shah <address@hidden>
>> Date: Sat, 5 Jul 2014 12:08:09 +0530
>> Subject: [PATCH 3/3] Convert domains to lowercase before libpsl checks
>>
>> ---
>>  src/ChangeLog |  5 +++++
>>  src/cookies.c | 26 ++++++++++++++++++++++++--
>>  2 files changed, 29 insertions(+), 2 deletions(-)
>>
>> diff --git a/src/ChangeLog b/src/ChangeLog
>> index 5306cb2..6360303 100644
>> --- a/src/ChangeLog
>> +++ b/src/ChangeLog
>> @@ -1,5 +1,10 @@
>>  2014-07-05  Darshit Shah  <address@hidden>
>>
>> +     * cookies.c (check_domain_match): Libpsl requires that all domain names
>> +     passed to it be in utf8 lower case.
>> +
>> +2014-07-05  Darshit Shah  <address@hidden>
>> +
>>       * http.c (gethttp): Fix indentation of conditional block
>>       (gethttp): Remove unneeded variable
>>
>> diff --git a/src/cookies.c b/src/cookies.c
>> index a46aeee..b478060 100644
>> --- a/src/cookies.c
>> +++ b/src/cookies.c
>> @@ -501,7 +501,17 @@ numeric_address_p (const char *addr)
>>  /* Check whether COOKIE_DOMAIN is an appropriate domain for HOST.
>>     Originally I tried to make the check compliant with rfc2109, but
>>     the sites deviated too often, so I had to fall back to "tail
>> -   matching", as defined by the original Netscape's cookie spec.  */
>> +   matching", as defined by the original Netscape's cookie spec.
>> +
>> +   Wget now uses libpsl to check domain names against a public suffix
>> +   list to see if they are valid. However, since we don't provide a
>> +   psl on our own, if libpsl is compiled without a public suffix list,
>> +   fall back to using the original "tail matching" heuristic. Also if
>> +   libpsl is unable to convert the domain to lowercase, which means that
>> +   it doesnt have any runtime conversion support, we again fall back to
>> +   "tail matching" since libpsl states the results are unpredictable with
>> +   upper case strings.
>> +   */
>>
>>  static bool
>>  check_domain_match (const char *cookie_domain, const char *host)
>> @@ -509,6 +519,7 @@ check_domain_match (const char *cookie_domain, const 
>> char *host)
>>
>>  #ifdef HAVE_LIBPSL
>>    DEBUGP (("cdm: 1"));
>> +  char * cookie_domain_lower, * host_lower;
>
> please initialize them to NULL and format like char
> *cookie_domain_lower, *host_lower (no space between * and the variable
> name), otherwise...
>
>>    const psl_ctx_t *psl;
>>    int is_acceptable;
>>
>> @@ -519,7 +530,18 @@ check_domain_match (const char *cookie_domain, const 
>> char *host)
>>        goto no_psl;
>>      }
>>
>> -  is_acceptable = psl_is_cookie_domain_acceptable (psl, host, 
>> cookie_domain);
>> +  if (psl_str_to_utf8lower (cookie_domain, NULL, NULL, 
>> &cookie_domain_lower) != PSL_SUCCESS ||
>> +      psl_str_to_utf8lower (host, NULL, NULL, &host_lower) != PSL_SUCCESS)
>
> ...if the first "psl_str_to_utf8lower" fails then "host_lower" keeps
> some bogus value...
>
>> +    {
>> +        DEBUGP (("libpsl unable to parse domain name. "
>> +                 "Falling back to simple heuristics.\n"));
>> +        goto no_psl;
>> +    }
>> +
>> +  is_acceptable = psl_is_cookie_domain_acceptable (psl, host_lower, 
>> cookie_domain_lower);
>> +  xfree (cookie_domain_lower);
>> +  xfree (host_lower);
>
> ...and *boom* here.
>
Aah! I somehow managed not to get any "boom"s despite having a test
that saw psl_str_to_utf8lower() fail. However, your comment is correct
and I'll fix that. The general idea was that if the function fails, it
will fail on both the calls

>> +
>>    return true ? (is_acceptable == 1) : false;
>>
>>  no_psl:
>> --
>> 2.0.1
>>
>
> feel free to push after the change.
>
>
>
>
>> From eccb40cac9aa5a446a2b6c1eb61710930223106b Mon Sep 17 00:00:00 2001
>> From: Darshit Shah <address@hidden>
>> Date: Sat, 5 Jul 2014 11:54:46 +0530
>> Subject: [PATCH 2/3] Fix indentation and remove excess variable
>>
>> ---
>>  src/ChangeLog |  5 +++++
>>  src/http.c    | 21 ++++++++++-----------
>>  2 files changed, 15 insertions(+), 11 deletions(-)
>
> ACK.
>
>
>
>
>> From 91fe00af91825bd80f4b300ba45d5874c3c79a46 Mon Sep 17 00:00:00 2001
>> From: Darshit Shah <address@hidden>
>> Date: Thu, 3 Jul 2014 20:53:33 +0530
>> Subject: [PATCH 1/3] Remove usunused error codes
>>
>> ---
>>  src/ChangeLog |  5 +++++
>>  src/http.c    |  2 +-
>>  src/wget.h    | 14 ++++++--------
>>  3 files changed, 12 insertions(+), 9 deletions(-)
>>
>> diff --git a/src/ChangeLog b/src/ChangeLog
>> index d19dc8d..7962213 100644
>> --- a/src/ChangeLog
>> +++ b/src/ChangeLog
>> @@ -1,3 +1,8 @@
>> +2014-07-03  Darshit Shah  <address@hidden>
>> +
>> +     * wget.h (uerr_t): Remove unused error codes
>> +     * http.c: (http_loop): Remove reference to unused error code
>> +
>>  2014-06-30  Giuseppe Scrivano  <address@hidden>
>>
>>       * convert.c (local_quote_string): Initialize newname.
>> diff --git a/src/http.c b/src/http.c
>> index 383e9f3..4fac3ba 100644
>> --- a/src/http.c
>> +++ b/src/http.c
>> @@ -3169,7 +3169,7 @@ Spider mode enabled. Check if remote file exists.\n"));
>>
>>        switch (err)
>>          {
>> -        case HERR: case HEOF: case CONSOCKERR: case CONCLOSED:
>> +        case HERR: case HEOF: case CONSOCKERR:
>>          case CONERROR: case READERR: case WRITEFAILED:
>>          case RANGEERR: case FOPEN_EXCL_ERR:
>>            /* Non-fatal errors continue executing the loop, which will
>> diff --git a/src/wget.h b/src/wget.h
>> index 331e8e2..3b6ff86 100644
>> --- a/src/wget.h
>> +++ b/src/wget.h
>> @@ -334,21 +334,19 @@ typedef enum
>>  {
>>    /*  0  */
>>    NOCONERROR, HOSTERR, CONSOCKERR, CONERROR, CONSSLERR,
>> -  CONIMPOSSIBLE, NEWLOCATION, NOTENOUGHMEM /* ! */,
>> -  CONPORTERR /* ! */, CONCLOSED /* ! */,
>> +  CONIMPOSSIBLE, NEWLOCATION,
>>    /* 10  */
>>    FTPOK, FTPLOGINC, FTPLOGREFUSED, FTPPORTERR, FTPSYSERR,
>> -  FTPNSFOD, FTPRETROK /* ! */, FTPUNKNOWNTYPE, FTPRERR, FTPREXC /* ! */,
>> +  FTPNSFOD, FTPUNKNOWNTYPE, FTPRERR,
>>    /* 20  */
>>    FTPSRVERR, FTPRETRINT, FTPRESTFAIL, URLERROR, FOPENERR,
>> -  FOPEN_EXCL_ERR, FWRITEERR, HOK /* ! */, HLEXC /* ! */, HEOF,
>> +  FOPEN_EXCL_ERR, FWRITEERR, HEOF,
>>    /* 30  */
>> -  HERR, RETROK, RECLEVELEXC, FTPACCDENIED /* ! */, WRONGCODE,
>> +  HERR, RETROK, RECLEVELEXC, WRONGCODE,
>>    FTPINVPASV, FTPNOPASV, CONTNOTSUPPORTED, RETRUNNEEDED, RETRFINISHED,
>>    /* 40  */
>> -  READERR, TRYLIMEXC, URLBADPATTERN /* ! */, FILEBADFILE /* ! */, RANGEERR,
>> -  RETRBADPATTERN, RETNOTSUP /* ! */, ROBOTSOK /* ! */, NOROBOTS /* ! */,
>> -  PROXERR,
>> +  READERR, TRYLIMEXC, FILEBADFILE, RANGEERR,
>> +  RETRBADPATTERN, PROXERR,
>>    /* 50  */
>>    AUTHFAILED, QUOTEXC, WRITEFAILED, SSLINITFAILED, VERIFCERTERR,
>>    UNLINKERR, NEWLOCATION_KEEP_POST, CLOSEFAILED, ATTRMISSING, UNKNOWNATTR,
>
> this change will screw all the numeric comments :(  I don't see anyway
> why we need to maintain these, so feel free to drop them in this same
> patch and push the result.
I'll do that. In fact I intended to do so already but don't know why I
didn't. I also wanted to regroup them based on functionality so that
we may start merging some of them and eliminating the others.
>
> Regards,
> Giuseppe



-- 
Thanking You,
Darshit Shah



reply via email to

[Prev in Thread] Current Thread [Next in Thread]