diff -bur ifile-1.3.4.orig/ifile.1 ifile-1.3.4/ifile.1 --- ifile-1.3.4.orig/ifile.1 2004-05-01 15:59:44.000000000 +0200 +++ ifile-1.3.4/ifile.1 2004-11-19 23:51:23.000000000 +0100 @@ -1,5 +1,5 @@ .\" This is a comment -.TH IFILE "1" "August 2002" "ifile 1.1.1" "User Commands" +.TH IFILE "1" "November 2004" "ifile 1.3.4" "User Commands" .SH NAME ifile \- core executable for the ifile mail filtering system .SH SYNOPSIS @@ -67,12 +67,44 @@ For each of the files, output rating scores and add statistics for the folder with the highest score .TP -\fB\-T\fR, \fB\-\-threshold\fR=\fITHRESH\fR -With more than one folder, for top two folders (f0,f1) -and ratings (r0,r1), if \fITHRESH\fR > 0 then with -q -print also 'diff[f0,f1](%) x.xx' and with --c -q, if R=(r0-r1)/(r0+r1), R*1000 < THRESH, print 'f0,f1' -instead of just 'f0'; disabled if \fITHRESH\fR=0. +\fB\-T\fR, \fB\-\-threshold\fR=\fIthreshold\fR +When used with both \fB-c\fR and \fB-q\fR, +output the two highest ranking categories if +their score differs by at most \fIthreshold\fR / 1000, +which can be used to detect border cases. +When used with \fB-q\fR only and any \fIthreshold\fR > 0, +output the score difference percentage. +For example, +.RS +.RS +\fBifile -T\fR1 \fB-q\fR foo.txt +.RE +might result in +.RS +.br +spam -15570.48640776 +.br +non-spam -18728.00272369 +.br +diff[spam,non-spam](%) 9.21 +.RE +If so, then +.RS +\fBifile -T\fR93 \fB-q -c\fR foo.txt +.RE +will result in +.RS +foo.txt spam,non-spam +.RE +whereas +.RS +\fBifile -T\fR92 \fB-q -c\fR foo.txt +.RE +will result in +.RS +foo.txt spam +.RE +.RE .TP \fB\-r\fR, \fB\-\-reset\-data\fR Erases all currently stored information @@ -140,9 +172,168 @@ .I ~/.idata ifile database (default location). See \fIFAQ\fR included in ifile package for description of database format. .SH AUTHOR -Jason Rennie and many others. See the ChangeLog for the full list. +Jason Rennie and many others. See the ChangeLog for the full list. .\".SH "SEE ALSO" .\".BR ifilter_mh (1), .\".BR irefile_mh (1), .\".BR knowledge_base.mh (1), .\".BR news2mail (1). +.SH EXAMPLES +Before using +.BR ifile , +you need to train it. +Let's say that you have three folders, "spam", "ifile" and "friends", +and the following directory structure: + +.RS +.NF 100 +/--+--spam----+--1 + | +--2 + | +--3 + | + +--ifile---+--1 + | +--2 + | +--3 + | + +--friends-+--1 + +--2 + +--3 +.NF 0 +.RE + +The following commands build the ifile database in ~/.idata (use the +.B -d +option to specify a different location for the database): + +.RS +.br +.BR "ifile \-h \-i" " spam /spam/*" +.br +.BR "ifile \-h \-i" " ifile /ifile/*" +.br +.BR "ifile \-h \-i" " friends /friends/*" +.RE + +The +.B -h +option strips off headers besides "Subject:", "From:" and "To:". +I find that +.B -h +improves ifile's performance, but you may find otherwise for +your personal collection. + +Note that we have made the argument to +.B -i +the same as the corresponding folder name. This is not necessary. The +argument to +.B -i +can be any word you want to use to identify a category of e-mails. The +argument to +.B -i +must not include space characters (including tab, feedline, etc.). + +At this point, your ~/.idata file should look something like this: + +.RS +.br +spam ifile friends +.br +662 1020 6451 +.br +3 3 3 +.br +jrennie 9 0:3 1:18 2:16 +.br +mindspring 6 1:7 2:5 +.br +make 9 0:5 1:3 +.br +yahoo 9 0:1 1:22 2:2 +.RE + +The first line is the space-separated list of folders. Their ordering +specifies a numbering (spam=0, ifile=1, friends=2). The second line is a +token count for each folder (e.g. 662 tokens observed in the three spam +messages). The third line is an e-mail count for each folder (e.g. 3 +e-mails for each of spam, ifile and friends). Each following line +specifies statistics for a word. The format of a line is + +.RS +\fIword age folder\fR:\fIcount\fR [\fIfolder\fR:\fIcount\fR ...] +.RE + +where \fIfolder\fR is the folder number determined by the first line +ordering. Folders with a count of zero are not listed. So, the line +beginning with "jrennie" indicates that "jrennie" appeared 3 times in +"spam" e-mails, 18 times in "ifile" e-mails and 16 times in "friends" +e-mails. The \fIage\fR is the number of e-mails that have been processed +since the word was added to the database. Very infrequent words are +pruned from the database to keep the database size down. + +Now that you have a database, you might want to filter some e-mails. Say +you have the following incoming e-mails: + +.RS +.NF 100 +/--inbox--+--1 + +--2 + +--3 +.NF 0 +.RE + +To find out what folders ifile thinks these e-mails belong in, run + +.RS +.br +.BR "ifile -c -q" " /inbox/1" +.br +.BR "ifile -c -q" " /inbox/2" +.br +.BR "ifile -c -q" " /inbox/3" +.RE + +Let's say that 1 is about ifile, 2 is spam and 3 is from a +friend. Assuming ifile does its job correctly, you'll see output like +this: + +.RS +.br +/inbox/1 ifile +.br +/inbox/2 spam +.br +/inbox/3 friends +.RE + +With such little training data, ifile is unlikely to get the labels +correct, but you should get the idea :-) + +Now, if you move the e-mails to the folders suggested by ifile, you'll +want to update the database accordingly. You can do this with the +.B -i +option, like before. Or, you can simply use +.B -Q +in place of +.B -q +above. This automatically adds the e-mail to the folder ifile suggests. + +Now, assume for a moment that e-mail 1 was actually spam. We've added 1 +to ifile and put it in the ifile folder. We need to move it to the spam +folder and update the ifile database accordingly. We can update the +database with the following command: + +.RS +.BR "ifile -d" " ifile " +.BR "-i" " spam /inbox/1" +.RE + +This deletes the e-mail from "ifile" and adds it to "spam". +.SH "SEE ALSO" +Examples of how to use +.B ifile +together with +.BR procmail (1) +and +.BR metamail (1) +can be found in the directory +.B /usr/share/doc/ifile/examples.