[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] Kill files
From: |
Duncan |
Subject: |
Re: [Pan-users] Kill files |
Date: |
Tue, 25 Apr 2017 07:41:24 +0000 (UTC) |
User-agent: |
Pan/0.142 (He slipped to Sam a double gin; 505bd7027) |
Dieter Britz posted on Mon, 24 Apr 2017 12:00:15 +0200 as excerpted:
> People talk about setting up a kill file for posters to news groups that
> annoy others, by off topic postings etc. Is it possible to do that with
> pan?
This repeats the same idea as the replies by HH, DG and Pedro in the
other subthread, but with a bit more explanation of what pan's actually
doing and why, and why it's like binary-choice killfiling (killfiled or
not) but better. =:^)
First, let's understand the difference between a fine-grained scoring
mechanism like pan has, where if desired the effects of many scoring
rules can be applied together to arrive at a final score for a post,
which then can be used to apply some action (like simply hiding the post,
or marking it read, or deleting it, or on the other end, hilighting it
with various colors depending on how high it scores, or automatically
downloading the post to cache, or saving its attachments), vs a hard
binary or trinary filter mechanism, which will act immediately on the
first filter that applies to either kill (generally hide and mark-read,
sometimes delete, depending on the implementation) or not, possibly (the
trinary case) with the addition of a watch flag (and perhaps auto-
download depending on implementation) if the post isn't killed.
So in pan, a score of -9999 is defined as ignored. That's what binary
filters would filter out, also known as killing, thus the term killfile.
And a score of +9999 is defined as watched.
Meanwhile, FWIW, there's a number of other preset score category levels
as well. These can be seen under the view menu, header pane. Here's the
full listing, lowest to highest:
-9999 (or lower): Ignored
Either multiple scoring rules applied to result in the message being
ignored, *OR* a single scoring rule set ignored/-9999 and stopped further
processing of further scoring.
By default pan doesn't display these messages, but doesn't take any other
action (marking them read, deleting them, etc).
-9998 to -1: Low
The result of one or more scoring rules lowered the message score into
negative territory, but not enough to make it ignored.
0: Default
Of course 0 is the default score, if no scoring rules apply, or if the
scoring rules exactly balance each other out.
1 to 4999: Medium
The result of one or more scoring rules was a moderate scoring boost, to
less than 5000/high, however.
There's an option to display these in a different color, but I don't
believe it's on by default. (FWIW I've been running pan since 2002, a
decade and a half now, and long ago forgot what the defaults were for
many of the options I've customized.)
5000 to 9998: High
The result of one or more scoring rules was a higher scoring boost, more
than 4999, but less than 9999.
Again, there's an option to display these in a different color, but I
don't believe it's on by default.
9999 (or higher): Watched
Either multiple scoring rules resulted in a score at or above 9999, *OR*
a single scoring rule set it to watched/9999 and stopped further scoring
rule processing.
Pan should display these in a different color, by default I believe.
There are options (off by default) that allow auto-downloading or the
like.
As you should already see, scoring allows a far richer and more nuanced
setup than arbitrary binary kill/show or trinary kill/show/watch
filters. But by using the watched/ignored options only, which basically
set +9999/-9999 respectively and stop further score processing, you can
have a simpler binary or trinary setup if you wish.
It's up to you. =:^)
Meanwhile, as I already mentioned, there are choices under view, header
pane, to match (or not) each of these scoring categories separately.
Again under view, header pane, pan can then be set to display either
explicitly matched posts, matched posts and their subthreads, or matched
posts and their entire threads, as desired.
It's up to you. =:^)
And in the preferences dialog (edit menu, preferences), on the colors
tab, you can set the colors for each scoring category.
It's up to you. =:^)
(Tho do note that these days, pan only shows those colors in the score
column, not the entire line as it used to do. So you have to have the
score column in your listing or you won't see the colors. I preferred it
coloring the entire line, but oh, well, I'm a user, not a dev... and
unfortunately, that's NOT a user available option. As I'm writing this,
however, I'm wondering just how hard it might be to find that and patch
it to whole line, tho. I /am/ an advanced enough user that even tho I
don't claim to be a dev, I can /sometimes/ work out patches on my own,
and as I run gentoo, I normally build everything from sources and can and
often do apply my own patches or those I've picked up from others to
various packages, including pan. So I'll have to look into patching
this...)
OK, so you can set whether the various score categories are displayed or
not, and if displayed, you can set the color per category, but what about
more practical score-based actions? In particular, for those who track
things via marked-read, and who don't have pan's preference to
automatically mark everything in the group read when they fetch headers
or leave a group, not displaying ignored posts AND not having them
automatically marked read is frustrating, because then they hang around,
still marked unread!
Of course if you've been paying attention, you already know the answer,
as I mentioned it above.
It is (of course) up to you! =:^)
(Noticing the trend yet? =:^)
Preferences dialog, actions tab.
One possible setup might be:
Delete articles scoring at: -9999 or less (ignored)
This would auto-delete ignored articles.
Mark articles read scoring at: -9998 to -1 (low/negative)
This would auto-mark-read negative/low-scoring articles, but wouldn't
delete them. The idea here is to let you hide them by default (by
showing only unread), but still keep them around in case you see a reply
and you want to see the message it's replying to.
(I /believe/ it'll mark anything read UNDER the named category as well,
so it would mark ignored articles read too, if they're not deleted with
the earlier option, above. But I'm not actually sure on this bit.)
Alternatively, if you don't delete ignored articles, you can simply mark
them read, and still show negative/low-scoring articles that aren't
entirely ignored.
Cache articles scoring at: 1 to 4999 (medium)
Of course you can set this to high/5000-9998 or watched/9999 instead, if
that fits your needs better.
The idea is that if an article is sufficiently highly scored, you want it
cached for you so it's already there when you would otherwise have to
download it to cache.
Do be aware that pan's cache size is pretty small, 10 MB by default, and
especially if you're doing binaries and using this setting, you'll
probably want a larger cache. That's set in preferences, on the behavior
tab.
(Again, I /believe/ it'll do the same with the higher categories, high
and watched, too, but I've not actually tested it to be sure.)
Download attachments of articles scoring at: Disabled
If you're doing binaries, you might want to set this instead of the cache
option.
Generally, people download binaries using one of two strategies.
Here, I prefer to have pan's cache set way big, and download messages to
cache first, so they're local. Then when they're already cached so I
won't be waiting for the download, I can go thru and sort out what I
really want, saving it where I want it, and deleting what I don't really
want. This works best for (relatively) small binaries that you will
download many hundreds or thousands of, like still images or audio clips
mostly under 10 minutes in length, with the occasional longer audio clip
or short video. It also requires a much larger cache setting (on the
order of gigabytes, for me), or pan will start deleting previously
downloaded to cache but still unread messages, to make room for the
newest still downloading to cache messages.
For that binaries strategy or for text messages, the auto-download-to-
cache action exists. Just be aware of the cache size requirements and
adjust it accordingly.
The other strategy, which is obviously pan's default given the very small
10 MB default cache size, is to have pan download and save off the
binaries immediately, without caring at all about the messages they're
attached to. Because the attachments are saved immediately and the
messages they were attached to don't matter, those messages can be
deleted from cache as soon as the attachment is saved, so this requires a
far smaller cache and pan's default 10 MB cache suffices.
This works best for very large binaries, typically half-hour or longer
videos like TV series episodes or feature-length movies. It works best
if you don't care about the messages containing the attachments at all
(no discussion of the series, etc), since unless you increase the size of
the cache anyway, they'll be deleted effectively immediately after the
attachment processing is completed.
It is for this binaries strategy that the auto-download-(and-save)-
attachments action exists. Obviously this isn't going to work too well
if your interest is primarily text groups (and people post binaries there
too, and the messages score high enough for the action to trigger),
because you'll end up with a bunch of random binaries that happened to be
attached to watched or whatever level scoring messages saved off to
wherever you have pan saving them.
OK, but what about the scoring itself?
First of all, the watch (thread) and ignore (thread or author) entries on
the articles menu are the GUI method to create scoring rules that set the
+/-9999 score and abort further score processing.
Next, there's the edit article's watch/ignore/score and add a scoring
rule entries, again on the articles menu. These bring up a dialog,
either directly (for add) or indirectly (for edit, using the add button
there), that lets you setup a more detailed scoring rule. This is more
flexible than the arbitrary watch/ignore options above, allowing you to
match various options and if matched either set a specific score and
abort further scoring as the above watch/ignore options do, or
alternatively, to simply add/subtract whatever score and continue
processing further scoring rules. You can also set an expiry for the
rule, if desired, or make it permanent.
It's this last option, to add/subtract some score value and continue
processing more scoring rules, that's where the real flexibility comes
in. You can match on multiple subject keywords in multiple rules, adding
or subtracting based on the match, then add/subtract based on author,
then do some more based on references (effectively thread, only sometimes
message-ids are deleted from the header and it won't match the thread any
longer), then subtract points if it's cross-posted/spammed to too many
groups, and add or subtract more points based on size in bytes or line
count.
As long as no match sets an arbitrary score and stops further processing,
all these matches will result in a final score that combines the effects
and the relative scoring weight of all the others, and pan uses that
final score to decide what scoring category the message belongs in, and
thus whether to show it and how, as well as what automated actions to
apply.
See how much richer a good scoring system is, compared to arbitrary
binary/trinary-choice filtering on just ONE match-factor?
Of course if that's too complex for you, just use the watch/ignore and be
done with it.
It's up to you. =:^)
Meanwhile, as the others suggested, the real advanced stuff is reserved
for those who choose to directly edit the scorefile itself. They posted
the link to the format description.
http://www.slrn.org/docs/score.txt
But, keep in mind that the link above is for a different news client,
slrn, which shares a general scorefile format with pan. Unfortunately,
however, pan's score-processing code isn't quite as advanced as slrn's,
so some of the more complex stuff described there doesn't work in pan.
Pan hasn't implemented the include statement, for instance, so don't try
to use it. The {} grouping logic isn't implemented either, AFAIK.
And, pan hasn't implemented the score keyword's single-colon AND logic,
so single or double colon doesn't matter, it's always interpreted as OR
(double-colon). This is unfortunate, but the effect can be partially
counteracted by simply creating multiple conditions, each of which gives
partial points. So instead of an AND score with five conditions to meet
and a +1000 value, you can use pan's OR scoring on each of the five
conditions, with a +200 value on each. The total if all match will still
be +1000, but of course the effect might be less anticipated if only some
conditions match and that interacts with another would-be compound with
only some conditions matching.
Another difference is that pan's scoring matches are always case
insensitive. So don't worry about John vs. JOHN vs. john vs. JoHN, the
same regex will match them all without any fancy regex footwork.
Some additional scorefile format notes:
* Unfortunately for some, understanding regular expressions is really
necessary to take full advantage of scoring, particularly when editing
the scorefile itself, but it's worth it, and pan's GUI does allow simple
scoring even if you don't know regex.
It's up to you. =:^)
* The note in section 1.1 recommending that one stick to the overview
headers (typically subject/from/date/message-id/references/bytes/lines
and often xref), but allowing others, most definitely applies.
Unfortunately it's a technical limitation of the protocol, not something
pan (or slrn or any other news client) can do anything about.
The thing is that pan can score headers in the overview without
downloading the full message (or full headers). For the most part,
that's the headers needed to display the message in the headers pane,
author, subject, date, etc, plus message-id and references for threading
and tracking across multiple servers, etc. But for the more exotic
headers, pan won't get them, and thus can't score them, until the article
is downloaded to cache.
So if you have an abuser that keeps nym-shifting and otherwise
deliberately changing everything in the headers he has access to, in
ordered to try to avoid killfiling, but who always posts thru a provider
that adds an xtrace header with a consistent value you can score on, you
*CAN* score on it, but you'll have to download the messages to cache
first.
Take it from someone who was in the position of trying to killfile a
poster like that at one point, before pan could score such non-overview
headers, being able to ignore-score it, but only after downloading to
cache, sucks, but it definitely sucks less than having to actually show
the message in ordered to see who it is and block it!
* Note that while you can set an expiry on the score in the pan GUI, and
at that point pan will indeed quit applying that score, it won't actually
remove it from the scorefile. The only way to actually remove the score
from the scorefile is to manually edit it.
Unfortunately, this does mean that if you actively add expiring scoring
rules and never manually remove them, eventually your scorefile will be
cluttered with perhaps hundreds or thousands of expired rules and they'll
begin to affect score-file loading performance as pan still has to
process them at least far enough to see they're expired, and then how far
to ignore until the beginning of the next possibly still valid rule.
So you'll probably want to either clear out the scorefile and start new
occasionally, or manually edit it to at least clean out the expired rules
from time to time, or simply don't use expiring scores, just living with
it unless it's worth a permanent rule.
* Yes, an initial % on a line *DOES* mean it's a comment.
By implication, most of the lines pan adds when you add a score via the
GUI are comments and don't matter for the actual scoring at all. They're
only there to aid human readers.
Of course that means you can edit or delete them as you wish, without
affecting actual operation.
Here, I tend to delete pretty much all of pan's added comments, with the
exception of the date added comments for expiring scores, since that way
I can see how long I had set the expiry.
* If you do heavy scoring with lots of rules, using pan's GUI to set them
up isn't particularly efficient for machine processing. The example in
the linked documentation is somewhat more efficient, but it's too short
to really get the point across. If you're planning to do a lot of manual
scorefile editing or simply want to make your scorefile more efficient,
either check past scoring threads for this list/group (the list is
available as a newsgroup on news.gmane.org) where I've posted a longer
example from my scorefile, or ask for such an example.
* Similarly, if you're not good with regular expressions and need some
help designing a score that's more complex than you can easily do with
the pan GUI, or if something's just not working as you expected it to,
with scoring or something else, ask for help. We've dealt with a number
of such queries over the years. =:^)
OK, so hope that's of help. Some people just want an answer to plug in
without understanding it. Others want to understand what's going on, so
next time they want to do something similar but not identical, they can
figure out how to do it themselves. I'm certainly in this latter group,
and my posts tend to go to the extreme in explaining things. That
frustrates the first group, but I've stacks of thanks from people who
preferred the better understanding my explanatory if extremely verbose
style gave them, and sometimes I get new insights or ideas (like possibly
patching the score coloring to the whole line instead of just the score
column, above) as I'm writing things down, and it's the combination of
both of those that's my motivation to keep posting as I do. =:^)
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman