[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] Support for <meta charset=...> tag
From: |
Tim Rühsen |
Subject: |
Re: [PATCH] Support for <meta charset=...> tag |
Date: |
Sun, 26 Jul 2020 13:52:34 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 |
Ah, sorry, just saw this email with your patch :-)
Could you attach your patch as attachment. Git can't am/apply your patch
here.
Regards, Tim
On 15.07.20 12:55, Sho Amano wrote:
> Hi! I've been using the first version of wget for a long time and first of
> all,
> I want to say thank you to all of the maintainers and contributors of
> this project!
>
> I was looking at the code recently to find that it doesn't support
> "<meta charset=...>" tag yet.
> I don't see any issues in bug tracker related to this, so I created a patch.
> I'm hoping it helps.
>
> I also attach two HTML files for verification. One of them specifies
> Japanese path
> in UTF-8, others does in Shift-JIS. Serve these files on localhost:8080, and
> let
> wget follow the link. (e.g. `wget -d --recursive --level=2
> http://localhost:8080/charset_test_shift_jis.html`) Verify that in
> both cases, wget tries to download
> http://localhost:8080/%E6%97%A5%E6%9C%AC%E8%AA%9E.html.
>
> Thanks!
> Sho Amano
>
> ---
> src/html-url.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/src/html-url.c b/src/html-url.c
> index b80cf269..5324d244 100644
> --- a/src/html-url.c
> +++ b/src/html-url.c
> @@ -182,6 +182,7 @@ static const char *additional_attributes[] = {
> "http-equiv", /* used by tag_handle_meta */
> "name", /* used by tag_handle_meta */
> "content", /* used by tag_handle_meta */
> + "charset", /* used by tag_handle_meta */
> "action", /* used by tag_handle_form */
> "style", /* used by check_style_attr */
> "srcset", /* used by tag_handle_img */
> @@ -191,7 +192,7 @@ static struct hash_table *interesting_tags;
> static struct hash_table *interesting_attributes;
>
> /* Will contains the (last) charset found in 'http-equiv=content-type'
> - meta tags */
> + or 'charset' meta tags */
> static char *meta_charset;
>
> static void
> @@ -574,6 +575,7 @@ tag_handle_meta (int tagid _GL_UNUSED, struct
> taginfo *tag, struct map_context *
> {
> char *name = find_attr (tag, "name", NULL);
> char *http_equiv = find_attr (tag, "http-equiv", NULL);
> + char *charset = find_attr (tag, "charset", NULL);
>
> if (http_equiv && 0 == c_strcasecmp (http_equiv, "refresh"))
> {
> @@ -673,6 +675,20 @@ tag_handle_meta (int tagid _GL_UNUSED, struct
> taginfo *tag, struct map_context *
> }
> }
> }
> + else if (charset)
> + {
> + /* Handle stuff like:
> + <meta charset="CHARSET">
> + If charset is acquired from http-equiv then it is overwritten. */
> +
> + /* Do a minimum check on the charset value */
> + if (check_encoding_name (charset))
> + {
> + char *mcharset = xstrdup (charset);
> + xfree (meta_charset);
> + meta_charset = mcharset;
> + }
> + }
> }
>
> /* Handle the IMG tag. This requires special handling for the srcset attr,
>
signature.asc
Description: OpenPGP digital signature