[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Ifile-discuss] Re: Updated ifile writeup
From: |
Karl Vogel |
Subject: |
[Ifile-discuss] Re: Updated ifile writeup |
Date: |
17 Feb 2003 20:56:51 -0500 |
>> In an earlier message, I said:
K> I've updated my ifile installation notes at
K> http://www.dnaco.net/~vogelke/Software/Internet/Servers/Mail/Spam/Ifile/
K> if you're interested.
>> On 17 Feb 2003 13:23:12 +0100,
>> "clemens fischer" <address@hidden> said:
C> this is a good and comprehensive resource.
Thanks.
C> one question: you seem to store email messages one (numbered) file in a
C> common directory of type maildir or mh-box, but the page mentions
C> locking of the mailbox or securely appending to it. these terms would
C> rather make sense for mbox type mailboxes. do you keep both formats?
C> if yes, why?
Partly old habits, partly convenience.
* My disk space used to be very limited, so I couldn't store each
message in its own file; hence, my use of mbox-style files for
collections. Gzip does a better job on one large mailbox than a
bunch of smaller messages. I could tar them up, but I don't wanna.
* I use an old Emacs package (RMAIL) to read my mail, and I'm
not sure if it will handle maildir or MH format. I know, VM is
probably better, but I've already put some effort into messing with
the RMAIL Lisp stuff.
* I really hate programs which store thousands of files in one
directory. People may not be running the latest and greatest
OS release, and even if they are, Un*x tends to reward small
directories, so I stick with no more than 1000 or so files per
directory unless I have a really good reason to do otherwise.
This way, I can use "*" within a subdirectory and know that I
won't get some stupid "arg list too long" message, or I can use
"find ... | xargs" if I really have to mess with 20-30 thousand
messages at once.
* I use a program called "spmail" to split up mbox-style files when I
want to do something to each message. Since spmail makes sure that
each message ends with a trailing newline, I can do my greps or
seds or whatever and cat everything together to create a new mbox
file. spmail will also split by day, week, month, or year.
C> other than that i particularly like the spam-corpus and the additional
C> links to other sources of spam, which is the first people would have to
C> get to train ifile.
Glad you like it. My spam corpus is too big to put on my ISP homepage:
18.0M credit
0.5M diploma
6.0M fraud
12.0M gtaylor
0.2M license
28.0M local
204.0M net-abuse
3.0M uk-corpus
------------------
271.7M TOTAL
It gzips down to about 70 Mbytes, which still puts me over my quota.
I need a better ISP (suggestions welcome). If anyone wants to mirror
the corpus, let me know; maybe we can work out some type of FTP thing.
--
Karl Vogel I don't speak for the USAF or my company
address@hidden http://www.pobox.com/~vogelke
I drive way too fast to worry about cholesterol. --unknown