[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has
From: |
Ruijie Yu |
Subject: |
bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect? |
Date: |
Fri, 28 Apr 2023 09:30:30 +0800 |
User-agent: |
mu4e 1.9.22; emacs 30.0.50 |
Eli Zaretskii <eliz@gnu.org> writes:
>> Date: Fri, 28 Apr 2023 00:19:22 +0800
>> From: Ruijie Yu via "Bug reports for GNU Emacs,
>> the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
>>
>> I'm trying out the function `libxml2-parse-html-region' as recommended
>> by a thread in help-gnu-emacs. However, I discovered that the last
>> argument of this function does not help me normalize a relative url.
>>
>> Reproducer:
>>
>> Visit the attached toy html file. I imagine that it is hosted at
>> "https://example.com/good/day".
>>
>> Run this snippet:
>>
>> (pp (libxml-parse-html-region
>> (point-min) (point-max)
>> "https://example.com/good/day"))
>>
>> Compare it with this snippet:
>>
>> (pp (libxml-parse-html-region
>> (point-min) (point-max)))
>>
>> What I get is this result for both snippets (which is shown twice, once
>> "pretty-printed", and once returned as a string):
>>
>> --8<---------------cut here---------------start------------->8---
>> (html nil
>> (body nil "\n "
>> (a
>> ((href . "/hello"))
>> "1")
>> "\n "
>> (a
>> ((href . "../world"))
>> "2")
>> "\n "
>> (a
>> ((href . "good"))
>> "3")
>> "\n "
>> (a
>> ((href . "morning/or/night"))
>> "4")
>> "\n "))
>> --8<---------------cut here---------------end--------------->8---
>>
>> Notice, that the href values are not normalized: they are copied
>> verbatim from the original html file.
>>
>> If I understand the docstring correctly, the last argument of
>> `libxml2-parse-html-region', when specified as a url string, should be
>> used as the "base point" of resolving relative paths found within the
>> html document. But the <a href=xxx> paths are not resolved at the
>> moment.
>
> If you look at xml.c, you will see that we just call a libxml function
> passing it this URL. So if anything isn't as expected, the answer is
> in libxml, not in Emacs.
Thank you for pointing that out. I will take a look at its source in a
day or two. I am also upgrading it from 2.10.3-2 to 2.10.4-2, and will
see if that changes anything.
If I end up deciding that it is a libxml2 bug, I'll file a bug there and
link to this bug.
For completeness, here attached is the toy html file that I forgot to
attach in my initial report.
1
2
3
4
--
Best,
RY
[Please note that this mail might go to spam due to some
misconfiguration in my mail server -- still investigating.]
- bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect?, Ruijie Yu, 2023/04/27
- bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect?, Eli Zaretskii, 2023/04/27
- bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect?,
Ruijie Yu <=
- bug#63125: 30.0.50; [BUG] last argument of libxml2-parse-html-region has no effect?, Ruijie Yu, 2023/04/28
- bug#63125: 30.0.50; [BUG] last argument of libxml-parse-html-region has no effect?, Ruijie Yu, 2023/04/28
- bug#63125: 30.0.50; [BUG] last argument of libxml-parse-html-region has no effect?, Eli Zaretskii, 2023/04/28
- bug#63125: 30.0.50; [BUG] last argument of libxml-parse-html-region has no effect?, Ruijie Yu, 2023/04/28
- bug#63125: 30.0.50; [BUG] last argument of libxml-parse-html-region has no effect?, Eli Zaretskii, 2023/04/29