[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "Readability" feature in eww

From: Rüdiger Sonderfeld
Subject: Re: "Readability" feature in eww
Date: Mon, 03 Nov 2014 10:37:47 +0100
User-agent: KMail/4.13.3 (Linux/3.13.0-37-generic; KDE/4.13.3; x86_64; ; )

On Monday 03 November 2014 01:41:14 Lars Magne Ingebrigtsen wrote:
> This is a heuristic, of course, so it can be tweaked endlessly.  The
> current algorithm just gives most words a positive score, HTML markup a
> negative score, and words inside <a> tags a negative score.  For such a
> simple algorithm, it seems to give pretty good results.
> But tweaking is necessary for it to be ... better.  If anybody has ideas
> for tweaks or better algorithms, please be my guest and have at it.

HTML5 has introduced tags such as <main> and <article>, which can be used to 
identify the important parts.  I'm not sure how widespread their use thus far 
is (I think org-mode supports it already if one sets the HTML5 export option).  
But at least adding them to the heuristic might help.

E.g., https://developer.mozilla.org/en-US/docs/Web/HTML/Element/main


reply via email to

[Prev in Thread] Current Thread [Next in Thread]