[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] Select Headers with RE in the Subject/Author Entryfield?
From: |
Duncan |
Subject: |
Re: [Pan-users] Select Headers with RE in the Subject/Author Entryfield? |
Date: |
Mon, 8 Jun 2015 11:31:48 +0000 (UTC) |
User-agent: |
Pan/0.140 (Chocolate Salty Balls; GIT af87825) |
Heinz Mezera posted on Mon, 08 Jun 2015 09:16:22 +0200 as excerpted:
> I'd like to select Headers in the Header-Pan with a regular expresssion
> in the Subject/Author field and need your help. Is this possible and how
> do I do it.
>
> I want to select all headers
> - starting with three alphabetic characters
> - followed by an underscore
> - two digits after the underscore
> - and any number of charcters afterwards.
>
> PAN Info:
> Pan 0.139 Sexual Chocolate (GIT bf56508 git://git.gnome.org/pan2;
> i686-pc-linux-gnu)
** Note that after changing the search expression, you may have to toggle
to something else (say subject), then back to regex, in ordered to get it
to "take". I noticed it would dynamically refilter part of the time, but
would appear to stall out and not update without the toggle, sometimes.
Given that hint, and the caveat that I tested the components separately
but not together, as I didn't have posts handy that matched that specific
pattern...
One way to do it:
^[[:alpha:]]{3}_[[:digit:]]{2}.*$
^ = zero-width match at the beginning/left
$ = same at the end/right
Non-special characters match themselves. Letters, digits, _, etc, are
non-special.
. matches exactly one occurrence of any character (and *, mentioned again
below, is any number including zero, so .* is a full wildcard, including
matching nothing).
[] encloses a "character class". Such character classes can include
ranges of characters [a-z], individual lists [123], and/or category
classes (I seem to have forgotten the proper term ATM) like the above,
enclosed in further [:xxx:] marks, thus the nesting.
So [[:alpha:][:digit:]] and [a-zA-Z0-9] would both match alphanumeric
characters in ASCII, tho pan's regex is case insensitive so both a-z and
A-Z wouldn't be needed for pan, only one or the other. You can also do
things like [[:digit:]abc._], to match digits, abc, and the individual
characters . and _. The significance of the [:xxx:] matches, however, is
that they work across character sets, so [:alpha:] matches letters that
would be skipped in character-sets where a-z doesn't include all letters
due to strange ordering or something.
To match a - in a character-class, put it at the beginning so it can't
specify a range. The \ char is the escape char, both inside and outside
a character-class, so you can use \] to match a literal ] for instance,
and of course \\ to match a literal \.
Additionally, you can specify a /negative/ character-class with ^ as the
first character (outside a character-class, it means match the beginning,
inside, as the first character of the class, it negates the class, inside
as anything other than the first char, it matches itself normally). So
[^abc] means any character /but/ abc.
Significantly, character classes normally only match *ONE* character. To
match more than one you can repeat, [a-z][a-z] will match TWO letters, or
use frequency specifiers inside of {} as I did, above. {1,3} would be
one, two, or three matches, {1,} would be at least one match.
In addition to the {}-delimited frequency range specifiers, there's:
* = zero or more (*NOT* one or more, it doesn't have to be there!)
? = zero or one (may or may not be there, but matches only once)
+ = 1 or more
Again in case it didn't sink in above, \ is the escape char, so to match
a literal *, you'd use \*
() are the grouping characters, and | indicates alternatives (or). So
((cat)|(horse)) will match "cat" or "horse" but will NOT match "cah", for
instance. Note that the alternatives do NOT need to be the same length,
and that the inside grouping help clarify the scope of the match but
aren't absolutely required, so (cat|horse) should have the same effect.
So there are two ways to match a "cat" that may or may not be there:
(cat)?
(cat|)
That's the basics. FWIW for non-pan usage, some regex uses make things
like {} special characters, so {3} is a frequency and \{3\} are the
literal characters, while others don't unless they're escaped, so {3}
would be the literal characters and the backslash-escaped version would
be frequency. And of course the shell has its own special chars and \
escape char, so sometimes you need to play with the number of \\\ a bit
in ordered to get it to work like you want, but once you understand the
basics, even /just/ the basics, regex can really be quite powerful.
Of course there's far FAR more. Just a couple quick examples. First, ()
not only groups, but stores for later use. So if for instance you are
trying to match quotes but don't know if it's single-quotes or double-
quotes, you can use (['"]) for the first match (possibly as (['"])? or
('|"|) if you don't know if it'll be quoted or not), and \1 or possibly
$1 to automatically match the same thing at the other end of the quote.
Second, there's what's called look-ahead and look-behind matching, which
can be positive or negative. So for instance if you want to match "pro"
but not "gopro", there's a way to say "look behind (to the left of) the
pro and don't match if the preceding letters are 'go'". I don't use them
enough to be sure of my memory, however, so generally have to look that
sort of advanced stuff up, if I need it. And for this advanced stuff,
you usually have to either lookup or test whether whatever you're trying
to work with actually supports it or not. I'm not sure whether pan does,
for instance, tho it wouldn't surprise me if it did.
So back to the specific case in point:
^[[:alpha:]]{3}_[[:digit:]]{2}.*$
Given the above, we can parse that as:
^ Left anchor (begin the line with what follows):
[[:alpha:]] one alphabet character
{3} match the previous exactly three times
_ (matches itself)
[[:digit:]] one digit
{2} match the previous exactly twice
. any character
* match the previous any number (including none) of times
$ right anchor (end of line)
Of course the .*$ aren't actually needed, since without them the match is
simply left-anchored only, but I like the explicit "the rest of the line
doesn't matter for the match" that .*$ provides. And in non-pan usages
where you're matching to delete or replace the match, it COULD matter, as
failing to include the .*$ would leave any other junk on the line still
there, while including it would match and thus delete/replace the entire
line.
--
Duncan - List replies preferred. No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master." Richard Stallman
- [Pan-users] Select Headers with RE in the Subject/Author Entryfield?, Heinz Mezera, 2015/06/08
- Re: [Pan-users] Select Headers with RE in the Subject/Author Entryfield?,
Duncan <=
- Re: [Pan-users] Select Headers with RE in the Subject/Author Entryfield?, Heinz Mezera, 2015/06/09
- Re: [Pan-users] Select Headers with RE in the Subject/Author Entryfield?, Heinz Mezera, 2015/06/12
- Re: [Pan-users] Select Headers with RE in the Subject/Author Entryfield?, Duncan, 2015/06/12
- Re: [Pan-users] Select Headers with RE in the Subject/Author Entryfield?, Heinz Mezera, 2015/06/12
- Re: [Pan-users] Select Headers with RE in the Subject/Author Entryfield?, Duncan, 2015/06/12
- Re: [Pan-users] Select Headers with RE in the Subject/Author Entryfield?, Andrew Nile, 2015/06/12
- Re: [Pan-users] Select Headers with RE in the Subject/Author Entryfield?, Duncan, 2015/06/12
- Re: [Pan-users] Select Headers with RE in the Subject/Author Entryfield?, Andrew Nile, 2015/06/14
Re: [Pan-users] Select Headers with RE in the Subject/Author Entryfield?, Andrew Nile, 2015/06/09