[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] Re: Search messages
From: |
Alan Meyer |
Subject: |
Re: [Pan-users] Re: Search messages |
Date: |
Mon, 27 Apr 2009 20:30:38 -0700 (PDT) |
Duncan <address@hidden> wrote:
...
> For search, however, there's a workaround, provided the message
> is still in the (10 MB by default) cache, reasonably likely for
> text-only users but not so much for those doing binaries where
> the cache is tiny. Just do a filesystem (not pan) search of
> the cache (a subdir of ~/.pan2 by default, named article-cache
> or article_cache I'm not sure which as it changed at some time
> in the past and I ended up symlinking one to the other, here).
>
> Filesearching the cache, you'll come up with a file or list of
> files with names matching (as closely as easily possible on a
> filesystem, a few strange message-id characters may be replaced
> by more commonly allowed chars) the message-ids of the messages
> in question. You can then open those files, basically the raw
> text format of the messages in question, in a normal text
> editor, or use pan's message-id search as mentioned above to
> find them in pan.
Here's a little utility program that makes searching the files
easier. I use it to reduce a large article list to the bare
essentials I want for searching. Each line will have:
Date Time Subject (num parts)
Then I search the output with less or grep.
The files on my system are in ~/.pan2/groups.
Alan
---------------------- cut here ------------------------
#!/usr/bin/python
########################################################
# uniqpan.py
#
# Filter a Pan newsgroup articles file to find article subjects and dates.
#
# Author: Alan Meyer
# License: Free under the GNU GPL.
########################################################
import sys, time, re
if len(sys.argv) != 2:
sys.stderr.write("""
usage: uniqpan.py article_filename
e.g.,
cd ~/.pan2/groups
uniqpan.py article_filename | sort > whatever.txt
""")
sys.exit(1)
idPat = re.compile("^<.*>")
try:
fp = open(sys.argv[1], "r")
except IOError, info:
sys.stderr.write("%s: %s" % (sys.argv[1], str(info)))
sys.exit(1)
while True:
line = fp.readline()
if not line:
break
idMatch = idPat.match(line)
if idMatch is not None:
# Found an article message-id
# Get succeeding lines
title = fp.readline().strip()
authorCode = fp.readline()
timePosted = fp.readline()
dateTm = (0,0,0)
try:
# Next line may be time or may be references
dateTm = time.gmtime(float(timePosted))
except:
# Try the line after
try:
timePosted = fp.readline()
dateTm = time.gmtime(float(timePosted))
except:
# Give up
dateTime = None
# Time as YYYYMMDD
dateTime = "%04d%02d%02d %02d%02d%02d" % \
(dateTm[0], dateTm[1], dateTm[2], dateTm[3], dateTm[4], dateTm[5])
# Output what we've got
sys.stdout.write("%s %s\n" % (dateTime, title))