[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How to get title of web page by url?
From: |
Lennart Borgman |
Subject: |
Re: How to get title of web page by url? |
Date: |
Wed, 28 Jul 2010 17:44:45 +0200 |
On Wed, Jul 28, 2010 at 5:34 PM, Thamer Mahmoud
<thamer.mahmoud@gmail.com> wrote:
> filebat Mark <filebat.mark@gmail.com> writes:
>
>> Thanks, Thamer. It works.
>>
>> Below is the code snippet.
>>
>> Well, I still have an encoding problem.
>> To get the title of "http://www.baidu.com", the title we get is displayed as
>> unrecognizable codes.
>>
>> I have tried to encode it, in the way of "(setq web_title_str
>> (encode-coding-string web_title_str 'utf-8-dos))", but it fails.
>
> I'm also new to Elisp (well sort of).
>
> But here is a modified version that should handle both charsets and
> newlines (and other issues noticed by Deniz Dogan. Thanks).
>
> (defun www-get-page-title (url)
> (let ((title))
> (with-current-buffer (url-retrieve-synchronously url)
> (goto-char (point-min))
> (re-search-forward "<title>\\([^<]*\\)</title>" nil t 1)
> (setq title (match-string 1))
> (goto-char (point-min))
> (re-search-forward "charset=\\([-0-9a-zA-Z]*\\)" nil t 1)
> (decode-coding-string title (intern (match-string 1))))))
>
> The robustness of this code would still depend on whether the HTML is
> well-formed, but it should be good enough I think.
Have a look at url-copy-file for how to get this correct. (Or
web-vcs-url-copy-file in nXhtml which is a little bit more careful.)