ifile-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Ifile-discuss] Re: Updated ifile writeup


From: Karl Vogel
Subject: [Ifile-discuss] Re: Updated ifile writeup
Date: 17 Feb 2003 20:56:51 -0500

>> In an earlier message, I said:

K> I've updated my ifile installation notes at
K> http://www.dnaco.net/~vogelke/Software/Internet/Servers/Mail/Spam/Ifile/
K> if you're interested.

>> On 17 Feb 2003 13:23:12 +0100, 
>> "clemens fischer" <address@hidden> said:

C> this is a good and comprehensive resource.

   Thanks.

C> one question: you seem to store email messages one (numbered) file in a
C> common directory of type maildir or mh-box, but the page mentions
C> locking of the mailbox or securely appending to it.  these terms would
C> rather make sense for mbox type mailboxes.  do you keep both formats?
C> if yes, why?

   Partly old habits, partly convenience.

   * My disk space used to be very limited, so I couldn't store each
     message in its own file; hence, my use of mbox-style files for
     collections.  Gzip does a better job on one large mailbox than a
     bunch of smaller messages.  I could tar them up, but I don't wanna.

   * I use an old Emacs package (RMAIL) to read my mail, and I'm
     not sure if it will handle maildir or MH format.  I know, VM is
     probably better, but I've already put some effort into messing with
     the RMAIL Lisp stuff.

   * I really hate programs which store thousands of files in one
     directory.  People may not be running the latest and greatest
     OS release, and even if they are, Un*x tends to reward small
     directories, so I stick with no more than 1000 or so files per
     directory unless I have a really good reason to do otherwise.
     This way, I can use "*" within a subdirectory and know that I
     won't get some stupid "arg list too long" message, or I can use
     "find ... | xargs" if I really have to mess with 20-30 thousand
     messages at once.

   * I use a program called "spmail" to split up mbox-style files when I
     want to do something to each message.  Since spmail makes sure that
     each message ends with a trailing newline, I can do my greps or
     seds or whatever and cat everything together to create a new mbox
     file.  spmail will also split by day, week, month, or year.

C> other than that i particularly like the spam-corpus and the additional
C> links to other sources of spam, which is the first people would have to
C> get to train ifile.

   Glad you like it.  My spam corpus is too big to put on my ISP homepage:

      18.0M   credit
       0.5M   diploma
       6.0M   fraud
      12.0M   gtaylor
       0.2M   license
      28.0M   local
     204.0M   net-abuse
       3.0M   uk-corpus
     ------------------
     271.7M   TOTAL

   It gzips down to about 70 Mbytes, which still puts me over my quota.
   I need a better ISP (suggestions welcome).  If anyone wants to mirror
   the corpus, let me know; maybe we can work out some type of FTP thing.

-- 
Karl Vogel                      I don't speak for the USAF or my company
address@hidden                          http://www.pobox.com/~vogelke

I drive way too fast to worry about cholesterol.  --unknown





reply via email to

[Prev in Thread] Current Thread [Next in Thread]