savannah-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Savannah-help-public] request for freetype mailing lists


From: Werner LEMBERG
Subject: Re: [Savannah-help-public] request for freetype mailing lists
Date: Mon, 28 Feb 2005 16:26:22 +0100 (CET)

> > > If you want to process the mboxes, such as spam filtering,
> > > please do so and then submit another tarball.  I will then
> > > import the mboxes.
> > 
> > Uh, oh, I've never done that.  Can you give me a pointer how to
> > proceed?  This is, which program do you recommend for this task?
> > Something like `cat mbox.old | spamfilter > mbox.new'.
> 
> Neither did I. I suppose there is a way to combine formail and
> spamassassin (plus ClamAV, Vipul's Razor...) to do the job. That's
> something I do not even do for my own spam filtering yet..

In the last few days I've written a raw perl script which scans an
mbox file using ClamAV and SpamAssassin (the latter can automatically
use Vipul's Razor).  It's ugly, but it works more or less.  Maybe you
can use it too.  See below.

The cleaned-up mailing list archives are here:

  http://groff.ffii.org/mbox/

Please do an import.  If you are done please drop me a mail so that I
can remove the mbox files.

I tried hard to convert links (which refer to mails in the mailing
list archive) to the new location, but I now think that this is
impossible to do automatically: The html files which hold the emails
in the archive get names which can't be deduced algorithmically: After
mail `1.html' it is possible that the next one is `4.html', then
`6.html', etc. :-(

Consequently I ask you to put the following remark (or something
similar) at the top of the `freetype-devel' mailing list archive:


  Mails in this archive before January 2005 have been imported from
  another mailing list.  If you find a reference like this

    http://www.freetype.org/pipermail/devel/2002-April/003133.html

  you have to manually search through the files of the corresponding
  month:

    http://lists.nongnu.org/archive/html/freetype-devel/2002-04/

  Sorry for the inconvenience.


Here the text for for the `freetype' archive:


  Mails in this archive before January 2005 have been imported from
  another mailing list.  If you find a reference like this

    http://www.freetype.org/pipermail/freetype/2002-April/003133.html

  you have to manually search through the files of the corresponding
  month:

    http://lists.nongnu.org/archive/html/freetype/2002-04/

  Sorry for the inconvenience.


If you know a better solution please tell me.


     Werner


======================================================================


#!/usr/bin/perl
#
# scanmbox.pl -- remove spam and infected mails from an mbox
#
# written by Werner Lemberg  <address@hidden>
#
# (C) 2005, Public Domain.
#
# History:
#
#   2005-02-24  First version.
#
# Usage:
#
#   perl scanmbox.pl [-i] [-x] <inbox> <outbox>
#
# The removed files are replaced with a small dummy mail to leave the
# number of emails unchanged.  Additionally, the spam mails are collected
# in the directories `<inbox>-spam' and `<inbox>-virus'.
#
# The following packages (or newer ones) are necessary:
#
#   clamav-0.83.tar.gz
#   Mail-SpamAssassin-3.0.2
#   Mail-Box-2.059
#   Mail-ClamAV-0.13
#
# If you don't want the dummy mails, add option `-x'.
#
# SpamAssassin has the possibility to automatically access Vipul's Razor
# (for on-line spam filtering).  Use option `-i' to do that.  Note that
# you must have a permanent internet connection; additionally, the script
# becomes very slow.
#
# There is a memory leak somewhere in the script which makes it a real
# memory monster -- my Perl capabilities are too limited to analyze this
# further.  In case you want to process mboxes with more than 1000 mails
# I suggest to split them manually first.


use strict;

use Getopt::Std;
use Mail::Box::Manager;
use Mail::SpamAssassin;
use Mail::ClamAV qw/:all/;

getopts('ix');

our ($opt_i, $opt_x);

die "Error: Mbox file `$ARGV[1]' already exists!\n" if -f $ARGV[1];
open(FH, '>', $ARGV[1]);
close(FH);

die "Error: Directory for spam mail already exists!\n" if -d "$ARGV[0]-spam";
mkdir "$ARGV[0]-spam";
die "Error: Directory for virus mail already exists!\n" if -d "$ARGV[0]-virus";
mkdir "$ARGV[0]-virus";

my $mgr = new Mail::Box::Manager;
my $spamtest = Mail::SpamAssassin->new({local_tests_only => !$opt_i});
my $clamav = new Mail::ClamAV(retdbdir());

my $in = $mgr->open(folder => $ARGV[0],
                    access => 'r');

$clamav->buildtrie;
$clamav->maxreclevel(6);
$clamav->maxfiles(1001);
$clamav->maxfilesize(1024 * 1028 * 20);

my $num_mails = 0;
my $total_mails = @$in;
my $mail;
my $spamstatus;
my $virusstatus;

foreach my $msg (@$in) {
  $num_mails++;

  open(TMP, '>', 'scanmail.tmp');
  print TMP $msg->string;
  close(TMP);
  $virusstatus = $clamav->scan('scanmail.tmp', CL_SCAN_MAIL());

  if ($virusstatus->virus) {
    print "Found virus ($virusstatus, $num_mails/$total_mails)\n";
    if (!$opt_x) {
      my $newmsg =
        Mail::Message->build
          (From => 'Clam AntiVirus',
           Date => $msg->head->get('Date'),
           data => "The original message contains a virus and has been 
deleted.\n");
      $mgr->appendMessage($ARGV[1], $newmsg);

      open(OUT, '>', "$ARGV[0]-virus/" . sprintf("%06d", $num_mails));
      print OUT $msg->string;
      close(OUT);
    }
  }
  else {
    $mail = $spamtest->parse($msg->string);
    $spamstatus = $spamtest->check($mail);

    if ($spamstatus->is_spam()) {
      printf "Found spam ($num_mails/$total_mails)\n";
      if (!$opt_x) {
        my $newmsg =
          Mail::Message->build
            (From => 'SpamAssassin',
             Date => $msg->head->get('Date'),
             data => "The original message is spam and has been deleted.\n");
        $mgr->appendMessage($ARGV[1], $newmsg);

        open(OUT, '>', "$ARGV[0]-spam/" . sprintf("%06d", $num_mails));
        print OUT $msg->string;
        close(OUT);
      }
    }
    else {
      if (!($num_mails % 10)) {
        print "$num_mails/$total_mails\n";
      }
      $mgr->appendMessage($ARGV[1], $msg);
    }

    $mail->finish();
    $spamstatus->finish();
  }
}

$in->close();

unlink 'scanmail.tmp';

# EOF




reply via email to

[Prev in Thread] Current Thread [Next in Thread]