Subject line processing in Gnats 4.0

From: Paul Traina
Subject: Re: Subject line processing in Gnats 4.0
Date: Tue, 11 Dec 2001 21:42:51 -0800

Negative Dirk,
I had it behave that way before, and it sucked.  Every time someone sent in
anything like "OS/2" it would append it to PR #2.
Instead, I wrote a script called refile_pr (I think it may still be in my
home directory or under bin/... that solved this problem).
If it's gone, I can tell you how to recreate it.  WIth gnats 4.0, it's
trivial, about 5 lines of real code.

From: "Dirk Bergstrom" <address@hidden>
To: "Milan Zamazal" <address@hidden>; "Michael Richardson"
Cc: <address@hidden>
Sent: Tuesday, December 11, 2001 6:10 PM
Subject: RE: Subject line processing in Gnats 4.0

> i've been wrestling with this problem for some time (we've been running
> 4.0a for over a year...), and i've been meaning to send this to the list
> for a while:
> summary:
> at my company, we have been having some difficulties with gnats'
> handling of email replies to PRs.  gnats checks the subject of incoming
> mail, and if it matches a fairly restrictive pattern (must look like a
> reply), the message is appended to the relevant PR.  if there is no
> match, a new PR is created.  this often leads to creation of bogus PRs,
> because the matching expression is too restrictive.  i propose that we
> remove the restriction, and append *all* messages with subjects matching
> a PR to that PR.
> detailed explanation:
> when a message is filed by queue-pr (via file-pr), gnats scans the
> subject line for the following regex (line 575 in file-pr.c):
> "(.*re[ \t]*(\\[[0-9]+\\])?:)?[ \t]*([-a-z0-9_+.]*[:/][ \t]*([0-9]+))"
> which (sort of) translates to a regex of /.*(re:)? <category>/<pr_num>/.
> if it
> finds this pattern, the message is appended to pr_num; if not, gnats
> creates a new PR, using the subject as synopsis, and body as
> description.
> this only works if people using gnats pay close attention to the subject
> of their emails.  in the real world, however, developers forward
> messages, they use mailers that insert extra cruft in the subject line
> (email addresses, names, etc), and sometimes gnats is used in concert
> with a different bug-tracking system, which inserts it's own tracking
> number in the subject.  whatever the cause, the result is a bogus PR,
> which clutters up the system, confuses developers and support staff, and
> takes time and energy to close/expunge.
> these subjects do the Right Thing:
> *) re: compiler/9876: compiled code runs backwards
> *) serial-port/1234: cannot process frosted flakes
> but these subjects all do the Wrong Thing (create new PR, instead of
> appending):
> *) Re: FW: sw-kern-sanders/16543: kernel panic on gunshot
> *) PR 12345
> *) Fw: sw-rip-vnwnkl/10795: The routing subsystem is asleep
> *) RE: 2001-0821-021: hw-pr0n/16413 FPC shows pictures of naked ladies
> on reboot
> (basically, anything in front of the category that's not "re:" bombs.)
> i believe that the current behavior violates the principle of least
> surprise -- if i send a message to bugs with a subject "re: fw:
> sw-foo/1234", i expect that gnats will append it to PR 1234.  opening a
> new PR is *not* what i intend.
> solution:
> it's not clear to me why the regex is so restrictive.  i think it would
> make more sense for gnats to assume that any message with a parseable PR
> identifier (<category>/<pr_num>) in the subject should be appended to
> that PR.  that would allow for all the real-world messages to be
> processed in a manner that would not surprise the sender.
> furthermore, i think it should also accept "PR<pr_num>", if it is the
> first reasonable text in the subject, a regex along the lines of "^[
> \t]*((re|fw):)?[ \t]*pr([0-9]+)".  i often see bogus PRs with this
> synopsis, which were clearly intended to be part of <pr_num>, not
> separate PRs.  however, this would be a more extreme change, and might
> have unforseen consequences...
> does anyone see a reason to keep the restrictive regex?
