From MAILER-DAEMON Sun Feb 13 23:22:05 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D0XkK-0004aU-Pi for mharc-nmh-workers@gnu.org; Sun, 13 Feb 2005 23:22:04 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D0Xk6-0004Vv-MV for nmh-workers@nongnu.org; Sun, 13 Feb 2005 23:21:51 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D0Xjz-0004RB-WE for nmh-workers@nongnu.org; Sun, 13 Feb 2005 23:21:45 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0Xju-0004Ml-T2 for nmh-workers@nongnu.org; Sun, 13 Feb 2005 23:21:38 -0500 Received: from [64.46.156.66] (helo=colo.heeltoe.com) by monty-python.gnu.org with smtp (Exim 4.34) id 1D0XBJ-0004Ca-PB for nmh-workers@nongnu.org; Sun, 13 Feb 2005 22:45:54 -0500 Received: (qmail 14540 invoked from network); 14 Feb 2005 03:41:11 -0000 Received: from foxharp.ne.client2.attbi.com (HELO mail.foxharp.boston.ma.us) (24.61.85.42) by 64.46.156.66 with SMTP; Mon, 14 Feb 2005 03:41:11 +0000 Received: (qmail 26716 invoked from network); 14 Feb 2005 03:45:23 -0000 Received: from unknown (HELO grass.foxharp.boston.ma.us) (192.168.111.11) by 0 with SMTP; 14 Feb 2005 03:45:23 -0000 Received: (qmail 23509 invoked from network); 14 Feb 2005 03:45:23 -0000 Received: from unknown (HELO foxharp.boston.ma.us) (192.168.111.11) by grass.foxharp.boston.ma.us with SMTP; 14 Feb 2005 03:45:23 -0000 To: nmh-workers@nongnu.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <23505.1108352723.1@foxharp.boston.ma.us> Date: Sun, 13 Feb 2005 22:45:23 -0500 Message-ID: <23507.1108352723@foxharp.boston.ma.us> From: Paul Fox Subject: [Nmh-workers] scan or show of UTF-encoded headers? X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Feb 2005 04:21:57 -0000 can nmh decode UTF or otherwise-encoded headers? it's not that i _want_ to be able to read all of the UTF-encoded spam i get, but i recently, for the very first time, got a legitimate piece of mail with encoded Subject:, From:, and To: lines. i'd like to be better prepared for next time... paul =--------------------- paul fox, pgf@foxharp.boston.ma.us (arlington, ma, where it's 19.8 degrees) From MAILER-DAEMON Mon Feb 14 05:24:44 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D0dOx-0005XS-0m for mharc-nmh-workers@gnu.org; Mon, 14 Feb 2005 05:24:23 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D0dNc-0004VQ-W0 for nmh-workers@nongnu.org; Mon, 14 Feb 2005 05:23:01 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D0dN2-00049r-M6 for nmh-workers@nongnu.org; Mon, 14 Feb 2005 05:22:27 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0dMg-0003Vi-IE for nmh-workers@nongnu.org; Mon, 14 Feb 2005 05:22:02 -0500 Received: from [193.109.254.211] (helo=mail36.messagelabs.com) by monty-python.gnu.org with smtp (Exim 4.34) id 1D0d2O-0000Hr-Oy for nmh-workers@nongnu.org; Mon, 14 Feb 2005 05:01:05 -0500 X-VirusChecked: Checked X-Env-Sender: okiddle@yahoo.co.uk X-Msg-Ref: server-20.tower-36.messagelabs.com!1108375258!13926931!1 X-StarScan-Version: 5.4.8; banners=-,-,- X-Originating-IP: [158.234.9.163] Received: (qmail 2433 invoked from network); 14 Feb 2005 10:00:58 -0000 Received: from iris.logica.co.uk (158.234.9.163) by server-20.tower-36.messagelabs.com with SMTP; 14 Feb 2005 10:00:58 -0000 Received: from trentino.logica.co.uk ([158.234.142.59]) by iris.logica.co.uk (8.12.3/8.12.3/Debian -4) with ESMTP id j1EA0w61002562; Mon, 14 Feb 2005 10:00:58 GMT Received: from trentino.logica.co.uk (localhost [127.0.0.1]) by trentino.logica.co.uk (Postfix) with ESMTP id 6E6AE2F96F; Mon, 14 Feb 2005 11:00:38 +0100 (CET) X-VirusChecked: Checked X-StarScan-Version: 5.0.7; banners=.,-,- In-reply-to: <23507.1108352723@foxharp.boston.ma.us> From: Oliver Kiddle References: <23507.1108352723@foxharp.boston.ma.us> To: Paul Fox Subject: Re: [Nmh-workers] scan or show of UTF-encoded headers? Date: Mon, 14 Feb 2005 11:00:38 +0100 Message-ID: <5222.1108375238@trentino.logica.co.uk> Cc: nmh-workers@nongnu.org X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Feb 2005 10:24:16 -0000 Paul Fox wrote: > > can nmh decode UTF or otherwise-encoded headers? it's not that Yes. See the decode function in the mh-format manual page. It has a few limitations however. It doesn't use iconv or similar to convert headers to the current encoding. So you need to use a UTF-8 locale and set the MM_CHARSET environment variable to UTF-8. That means that it then won't decode a ISO-8859-1 header anymore. > i _want_ to be able to read all of the UTF-encoded spam i get, but The thing I find with spam is that they always seem to break the rfc by including space characters in the encoded section of the header. I don't know whether this is also common in legitimate mails but nmh doesn't decode such headers. The relevant code is in sbr/fmt_rfc2047.c if you're interested in looking. Oliver From MAILER-DAEMON Mon Feb 14 09:58:27 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D0hgB-0000HZ-Ky for mharc-nmh-workers@gnu.org; Mon, 14 Feb 2005 09:58:27 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D0hfn-00009y-7l for nmh-workers@nongnu.org; Mon, 14 Feb 2005 09:58:03 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D0hfY-0008Ux-4G for nmh-workers@nongnu.org; Mon, 14 Feb 2005 09:57:51 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0hfV-0008RD-1B for nmh-workers@nongnu.org; Mon, 14 Feb 2005 09:57:45 -0500 Received: from [64.46.156.66] (helo=colo.heeltoe.com) by monty-python.gnu.org with smtp (Exim 4.34) id 1D0hOc-0001fi-6u for nmh-workers@nongnu.org; Mon, 14 Feb 2005 09:40:22 -0500 Received: (qmail 30250 invoked from network); 14 Feb 2005 14:35:34 -0000 Received: from foxharp.ne.client2.attbi.com (HELO mail.foxharp.boston.ma.us) (24.61.85.42) by 64.46.156.66 with SMTP; Mon, 14 Feb 2005 14:35:34 +0000 Received: (qmail 32007 invoked from network); 14 Feb 2005 14:39:46 -0000 Received: from unknown (HELO grass.foxharp.boston.ma.us) (192.168.111.11) by 0 with SMTP; 14 Feb 2005 14:39:46 -0000 Received: (qmail 32683 invoked from network); 14 Feb 2005 14:39:46 -0000 Received: from unknown (HELO foxharp.boston.ma.us) (192.168.111.11) by grass.foxharp.boston.ma.us with SMTP; 14 Feb 2005 14:39:46 -0000 To: nmh-workers@nongnu.org Subject: Re: [Nmh-workers] scan or show of UTF-encoded headers? In-reply-to: okiddle's message of Mon, 14 Feb 2005 11:00:38 +0100. <5222.1108375238@trentino.logica.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <32679.1108391986.1@foxharp.boston.ma.us> Date: Mon, 14 Feb 2005 09:39:46 -0500 Message-ID: <32681.1108391986@foxharp.boston.ma.us> From: Paul Fox X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Feb 2005 14:58:25 -0000 > Paul Fox wrote: > > > > can nmh decode UTF or otherwise-encoded headers? it's not that > > Yes. See the decode function in the mh-format manual page. It has a few > limitations however. It doesn't use iconv or similar to convert headers > to the current encoding. So you need to use a UTF-8 locale and set the > MM_CHARSET environment variable to UTF-8. That means that it then won't > decode a ISO-8859-1 header anymore. hmmm. i'll play with it. does anyone have any clever scripts to wrap this up into a nice solution? > > > i _want_ to be able to read all of the UTF-encoded spam i get, but somehow when you remove the line that preceded that one, it makes me sound like a nutcase, eh? :-) paul =--------------------- paul fox, pgf@foxharp.boston.ma.us (arlington, ma, where it's 27.1 degrees) From MAILER-DAEMON Mon Feb 14 11:31:48 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D0j54-0001E6-O2 for mharc-nmh-workers@gnu.org; Mon, 14 Feb 2005 11:28:15 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D0j4x-0001Bf-9K for nmh-workers@nongnu.org; Mon, 14 Feb 2005 11:28:09 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D0j4q-00019M-Ux for nmh-workers@nongnu.org; Mon, 14 Feb 2005 11:28:04 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0j4n-0000vB-NT for nmh-workers@nongnu.org; Mon, 14 Feb 2005 11:27:59 -0500 Received: from [131.130.221.38] (helo=imap.unet.univie.ac.at) by monty-python.gnu.org with esmtp (Exim 4.34) id 1D0idw-0001Vp-Ck for nmh-workers@nongnu.org; Mon, 14 Feb 2005 11:00:12 -0500 X-Spam-Flags: <.FIVETEN-SPAMSRC.@imap.unet.univie.ac.at>[80.108.70.152:chello080108070152.13.11.univie.teleweb.at] Received: from chello080108070152.13.11.univie.teleweb.at (Debian-exim@chello080108070152.13.11.univie.teleweb.at [80.108.70.152]) by imap.unet.univie.ac.at (8.12.10/8.12.10) with ESMTP id j1EFxuPC107268 for ; Mon, 14 Feb 2005 17:00:06 +0100 Received: from rldprog (helo=chello080108070152.13.11.univie.teleweb.at) by chello080108070152.13.11.univie.teleweb.at with local-esmtp (Exim 4.34) id 1D0idy-0000JB-Aq for nmh-workers@nongnu.org; Mon, 14 Feb 2005 17:00:14 +0100 To: nmh-workers@nongnu.org Subject: Re: [Nmh-workers] scan or show of UTF-encoded headers? Mail-Followup-To: nmh-workers@nongnu.org In-reply-to: <32681.1108391986@foxharp.boston.ma.us> References: <32681.1108391986@foxharp.boston.ma.us> Comments: In-reply-to Paul Fox message dated "Mon, 14 Feb 2005 09:39:46 -0500." Date: Mon, 14 Feb 2005 17:00:14 +0100 From: Harald Geyer Message-Id: X-DCC-ZID-Univie-Metrics: mx7.univie.ac.at 4248; Body=1 Fuz1=1 Fuz2=1 X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Feb 2005 16:28:11 -0000 > > Paul Fox wrote: > > > > > > can nmh decode UTF or otherwise-encoded headers? it's not that > > > > Yes. See the decode function in the mh-format manual page. It has a few > > limitations however. It doesn't use iconv or similar to convert headers > > to the current encoding. So you need to use a UTF-8 locale and set the > > MM_CHARSET environment variable to UTF-8. That means that it then won't > > decode a ISO-8859-1 header anymore. > > hmmm. i'll play with it. does anyone have any clever scripts to > wrap this up into a nice solution? What do you consider a nice solution? I use the method as described by Oliver (actually that's the default of the debian package). It works satisfactory but unfortunately we have a wild mixture of latin1 and latin9 in europe (thanks to MS windows not being able or willing to adapt to the new situation in the past four years) so half of the mails I get isn't decoded at all. If anybody has a patch or an other solution, I would be interested as well. Harald From MAILER-DAEMON Mon Feb 14 11:51:54 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D0jRx-0001yG-Qj for mharc-nmh-workers@gnu.org; Mon, 14 Feb 2005 11:51:53 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D0jRj-0001ta-Ll for nmh-workers@nongnu.org; Mon, 14 Feb 2005 11:51:40 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D0jRd-0001rE-JB for nmh-workers@nongnu.org; Mon, 14 Feb 2005 11:51:34 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0jRb-0001nx-Fb for nmh-workers@nongnu.org; Mon, 14 Feb 2005 11:51:31 -0500 Received: from [64.46.156.66] (helo=colo.heeltoe.com) by monty-python.gnu.org with smtp (Exim 4.34) id 1D0j2m-00048v-S3 for nmh-workers@nongnu.org; Mon, 14 Feb 2005 11:25:53 -0500 Received: (qmail 1689 invoked from network); 14 Feb 2005 16:21:13 -0000 Received: from foxharp.ne.client2.attbi.com (HELO mail.foxharp.boston.ma.us) (24.61.85.42) by 64.46.156.66 with SMTP; Mon, 14 Feb 2005 16:21:13 +0000 Received: (qmail 938 invoked from network); 14 Feb 2005 16:25:25 -0000 Received: from unknown (HELO grass.foxharp.boston.ma.us) (192.168.111.11) by 0 with SMTP; 14 Feb 2005 16:25:25 -0000 Received: (qmail 13667 invoked from network); 14 Feb 2005 16:25:25 -0000 Received: from unknown (HELO foxharp.boston.ma.us) (192.168.111.11) by grass.foxharp.boston.ma.us with SMTP; 14 Feb 2005 16:25:25 -0000 To: nmh-workers@nongnu.org Subject: Re: [Nmh-workers] scan or show of UTF-encoded headers? In-reply-to: Harald.Geyer's message of Mon, 14 Feb 2005 17:00:14 +0100. MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <13663.1108398325.1@foxharp.boston.ma.us> Date: Mon, 14 Feb 2005 11:25:25 -0500 Message-ID: <13665.1108398325@foxharp.boston.ma.us> From: Paul Fox X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Feb 2005 16:51:47 -0000 > > > Yes. See the decode function in the mh-format manual page. It has a few > > > limitations however. It doesn't use iconv or similar to convert headers > > > to the current encoding. So you need to use a UTF-8 locale and set the > > > MM_CHARSET environment variable to UTF-8. That means that it then won't > > > decode a ISO-8859-1 header anymore. > > > > hmmm. i'll play with it. does anyone have any clever scripts to > > wrap this up into a nice solution? > > What do you consider a nice solution? I use the method as described > by Oliver (actually that's the default of the debian package). It works > satisfactory but unfortunately we have a wild mixture of latin1 and latin9 > in europe (thanks to MS windows not being able or willing to adapt to > the new situation in the past four years) so half of the mails I > get isn't decoded at all. If anybody has a patch or an other solution, i guess i was thinking of a wrapper for scan or show that took care of setting up the locale and charset, either via argument for manually choosing, or maybe even by examining the message and then figuring out what locale/charset it should probably use, this time. (i confess i don't exchange a lot of mail with non-english/ ascii-speaking correspondents, and being american/english/ascii guy myself, have never really had to adjust locales or charsets etc. which is to say, i may not fully understand what i'm asking for. :-) paul =--------------------- paul fox, pgf@foxharp.boston.ma.us (arlington, ma, where it's 32.5 degrees) From MAILER-DAEMON Mon Feb 14 13:50:24 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D0lIe-0005Ge-08 for mharc-nmh-workers@gnu.org; Mon, 14 Feb 2005 13:50:24 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1CzLNx-0006iO-Mj for nmh-workers@nongnu.org; Thu, 10 Feb 2005 15:58:02 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1CzLNv-0006hZ-O0 for nmh-workers@nongnu.org; Thu, 10 Feb 2005 15:58:00 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1CzLNv-0006h1-7W for nmh-workers@nongnu.org; Thu, 10 Feb 2005 15:57:59 -0500 Received: from [139.78.100.219] (helo=dc.cis.okstate.edu) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1CzLAO-0007Df-K2 for nmh-workers@nongnu.org; Thu, 10 Feb 2005 15:44:00 -0500 Received: from dc.cis.okstate.edu (localhost.cis.okstate.edu [127.0.0.1]) by dc.cis.okstate.edu (8.12.6/8.12.6) with ESMTP id j1AKTT5P098617 for ; Thu, 10 Feb 2005 14:29:29 -0600 (CST) (envelope-from martin@dc.cis.okstate.edu) Message-Id: <200502102029.j1AKTT5P098617@dc.cis.okstate.edu> To: nmh-workers@nongnu.org Date: Thu, 10 Feb 2005 14:29:29 -0600 From: Martin McCormick X-Mailman-Approved-At: Mon, 14 Feb 2005 13:50:22 -0500 Subject: [Nmh-workers] refile Sometimes totally Shreds a Message X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 10 Feb 2005 20:58:02 -0000 I use nmh-1.0.4 in FreeBSD UNIX and have noticed that the refile function occasionally eats a message. It moves it from one folder to another all right, but what ends up in the receiving folder is a file containing all 0xFF's. I have tried to capture a message that triggers this behavior but it is difficult since most messages do not self-destructand refile corectly. When one does shred, I can't get it back to experiment with because, by definition of the problem, it is simply gone. Martin McCormick WB5AGZ Stillwater, OK OSU Information Technology Division Network Operations Group From MAILER-DAEMON Mon Feb 14 13:58:28 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D0lQS-0006SC-D0 for mharc-nmh-workers@gnu.org; Mon, 14 Feb 2005 13:58:28 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D0lQP-0006RL-Ur for nmh-workers@nongnu.org; Mon, 14 Feb 2005 13:58:26 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D0lJm-0005QB-ND for nmh-workers@nongnu.org; Mon, 14 Feb 2005 13:51:46 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0lJk-0005NP-9b for nmh-workers@nongnu.org; Mon, 14 Feb 2005 13:51:32 -0500 Received: from [193.109.254.211] (helo=mail36.messagelabs.com) by monty-python.gnu.org with smtp (Exim 4.34) id 1D0l3h-00079C-M2 for nmh-workers@nongnu.org; Mon, 14 Feb 2005 13:34:58 -0500 X-VirusChecked: Checked X-Env-Sender: okiddle@yahoo.co.uk X-Msg-Ref: server-5.tower-36.messagelabs.com!1108406096!13966065!1 X-StarScan-Version: 5.4.8; banners=-,-,- X-Originating-IP: [158.234.9.163] Received: (qmail 8388 invoked from network); 14 Feb 2005 18:34:56 -0000 Received: from iris.logica.co.uk (158.234.9.163) by server-5.tower-36.messagelabs.com with SMTP; 14 Feb 2005 18:34:56 -0000 Received: from trentino.logica.co.uk ([158.234.142.59]) by iris.logica.co.uk (8.12.3/8.12.3/Debian -4) with ESMTP id j1EIYu61030323 for ; Mon, 14 Feb 2005 18:34:56 GMT Received: from trentino.logica.co.uk (localhost [127.0.0.1]) by trentino.logica.co.uk (Postfix) with ESMTP id E75D02FC1F for ; Mon, 14 Feb 2005 19:34:35 +0100 (CET) To: nmh-workers@nongnu.org X-VirusChecked: Checked X-StarScan-Version: 5.0.7; banners=.,-,- In-reply-to: <13665.1108398325@foxharp.boston.ma.us> From: Oliver Kiddle References: <13665.1108398325@foxharp.boston.ma.us> Subject: Re: [Nmh-workers] scan or show of UTF-encoded headers? Date: Mon, 14 Feb 2005 19:34:35 +0100 Message-ID: <31941.1108406075@trentino.logica.co.uk> X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Feb 2005 18:58:26 -0000 You wrote: > i guess i was thinking of a wrapper for scan or show that took care > of setting up the locale and charset, either via argument for manually > choosing, or maybe even by examining the message and then figuring out > what locale/charset it should probably use, this time. It's probably easier to hack the C code. I've had a quick go at producing something which uses iconv to convert stuff to the native character set (patch is below). Would be good if you could try this out and look for ways to improve it. I've not thought through what the between_encodings stuff is doing and if that is affected at all. If this is going to be turned into something we can commit to CVS, we also need to work out the necessary configure stuff for iconv. As it is, you may need to fiddle the Makefile to get this to compile. Oliver Index: h/prototypes.h =================================================================== RCS file: /cvsroot/nmh/nmh/h/prototypes.h,v retrieving revision 1.9 diff -u -r1.9 prototypes.h --- h/prototypes.h 27 Jan 2005 16:26:24 -0000 1.9 +++ h/prototypes.h 14 Feb 2005 18:18:38 -0000 @@ -61,6 +61,7 @@ char **getans (char *, struct swit *); int getanswer (char *); char **getarguments (char *, int, char **, int); +char *get_charset(); char *getcpy (char *); char *getfolder(int); int lkclose(int, char*); Index: sbr/fmt_rfc2047.c =================================================================== RCS file: /cvsroot/nmh/nmh/sbr/fmt_rfc2047.c,v retrieving revision 1.2 diff -u -r1.2 fmt_rfc2047.c --- sbr/fmt_rfc2047.c 2 Jul 2002 22:09:14 -0000 1.2 +++ sbr/fmt_rfc2047.c 14 Feb 2005 18:18:38 -0000 @@ -10,6 +10,7 @@ */ #include +#include static signed char hexindex[] = { -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, @@ -97,6 +98,10 @@ /* Check for initial =? */ if (*p == '=' && p[1] && p[1] == '?' && p[2]) { + int use_iconv = 0; + iconv_t cd; + char *saveq, *convbuf; + startofmime = p + 2; /* Scan ahead for the next '?' character */ @@ -106,9 +111,14 @@ if (!*pp) continue; - /* Check if character set is OK */ - if (!check_charset(startofmime, pp - startofmime)) - continue; + /* Check if character set can be handled natively */ + if (!check_charset(startofmime, pp - startofmime)) { + use_iconv = 1; + *pp = '\0'; + cd = iconv_open(get_charset(), startofmime); + *pp = '?'; + if (cd == (iconv_t)-1) continue; + } startofmime = pp + 1; @@ -159,6 +169,12 @@ if (between_encodings) q -= whitespace; + if (use_iconv) { + saveq = q; + if (!(q = convbuf = (char *)malloc(endofmime - startofmime))) + continue; + } + /* Now decode the text */ if (quoted_printable) { for (pp = startofmime; pp < endofmime; pp++) { @@ -218,6 +234,15 @@ } } + if (use_iconv) { + size_t inbytes = q - convbuf, outbytes = BUFSIZ; + char *start = convbuf; + iconv(cd, &start, &inbytes, &saveq, &outbytes); + q = saveq; + iconv_close(cd); + free(convbuf); + } + /* * Now that we are done decoding this particular * encoded word, advance string to trailing '='. From MAILER-DAEMON Mon Feb 14 13:59:06 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D0lR4-0006iX-1P for mharc-nmh-workers@gnu.org; Mon, 14 Feb 2005 13:59:06 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D0lJw-0005Vj-J5 for nmh-workers@nongnu.org; Mon, 14 Feb 2005 13:51:47 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D0lJk-0005Pd-Au for nmh-workers@nongnu.org; Mon, 14 Feb 2005 13:51:32 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0lJj-0005NP-Du for nmh-workers@nongnu.org; Mon, 14 Feb 2005 13:51:31 -0500 Received: from [131.130.221.38] (helo=imap.unet.univie.ac.at) by monty-python.gnu.org with esmtp (Exim 4.34) id 1D0l4F-0007CN-0c for nmh-workers@nongnu.org; Mon, 14 Feb 2005 13:35:31 -0500 X-Spam-Flags: <.FIVETEN-SPAMSRC.@imap.unet.univie.ac.at>[80.108.70.152:chello080108070152.13.11.univie.teleweb.at] Received: from chello080108070152.13.11.univie.teleweb.at (Debian-exim@chello080108070152.13.11.univie.teleweb.at [80.108.70.152]) by imap.unet.univie.ac.at (8.12.10/8.12.10) with ESMTP id j1EIZHPC022040 for ; Mon, 14 Feb 2005 19:35:25 +0100 Received: from rldprog (helo=chello080108070152.13.11.univie.teleweb.at) by chello080108070152.13.11.univie.teleweb.at with local-esmtp (Exim 4.34) id 1D0l4K-0000Nb-G7 for nmh-workers@nongnu.org; Mon, 14 Feb 2005 19:35:36 +0100 To: nmh-workers@nongnu.org Subject: Re: [Nmh-workers] scan or show of UTF-encoded headers? Mail-Followup-To: nmh-workers@nongnu.org In-reply-to: <13665.1108398325@foxharp.boston.ma.us> References: <13665.1108398325@foxharp.boston.ma.us> Comments: In-reply-to Paul Fox message dated "Mon, 14 Feb 2005 11:25:25 -0500." Date: Mon, 14 Feb 2005 19:35:36 +0100 From: Harald Geyer Message-Id: X-DCC-ZID-Univie-Metrics: mx7.univie.ac.at 4249; Body=1 Fuz1=1 Fuz2=1 X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Feb 2005 18:59:04 -0000 > > What do you consider a nice solution? I use the method as described > > by Oliver (actually that's the default of the debian package). It works > > satisfactory but unfortunately we have a wild mixture of latin1 and latin9 > > in europe (thanks to MS windows not being able or willing to adapt to > > the new situation in the past four years) so half of the mails I > > get isn't decoded at all. If anybody has a patch or an other solution, > > i guess i was thinking of a wrapper for scan or show that took care > of setting up the locale and charset, either via argument for manually > choosing, or maybe even by examining the message and then figuring out > what locale/charset it should probably use, this time. If one wants to do it manually 'export MM_CHARSET="ISO-8859-1"' works for me, but usually you don't do that, because having a correctly decoded subject isn't worth to type that in. Also of couse the terminal must be able to handle the charset. With latin1 and latin9 there ist no problem, but if you want UTF-8 you need to change your terminal too, with what ever tool your os provides for that. With scan that wouldn't work at all, because you can have any number of different charsets in the headers of the many messages in one folder. Obviously any script which tries to do the above runs into the same problem that prevents nmh from doing it itself: The script would need to know which charsets the terminal can handle and how to tell it. Also changing the terminal might confuse other programs. I guess it would be much easier und less prone to error to just implement transcoding of messages through iconv instead of trying to adapt the display on a per message basis. I remember the gnus people using big sets of tables to do a mixture of transcoding and unifying between character sets which led to messages being split into several parts of different character sets, when it didn't work correctly. I don't know what had been their reason to not use iconv. An other, less universal but easier implementable, approach would be to add the possibility to tell nmh that the terminal can handle more than one character set. In case of messages which are almost plain ascii one would get the correct result interspersed with "broken" glyphs. Anything that would need heavy transcoding is unlikely to be displayable on an ascii terminal at all. > (i confess i don't exchange a lot of mail with non-english/ > ascii-speaking correspondents, and being american/english/ascii > guy myself, have never really had to adjust locales or charsets etc. > which is to say, i may not fully understand what i'm asking for. :-) Locales are a nice thing, but only as long as everybody is using the same as you do ... Harald From MAILER-DAEMON Mon Feb 14 15:52:28 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D0nCZ-000460-0w for mharc-nmh-workers@gnu.org; Mon, 14 Feb 2005 15:52:15 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D0nCH-00040M-O8 for nmh-workers@nongnu.org; Mon, 14 Feb 2005 15:51:57 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D0nCC-0003vo-0Y for nmh-workers@nongnu.org; Mon, 14 Feb 2005 15:51:54 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0nCA-0003sj-24 for nmh-workers@nongnu.org; Mon, 14 Feb 2005 15:51:50 -0500 Received: from [128.173.38.235] (helo=h80ad26eb.async.vt.edu) by monty-python.gnu.org with esmtp (TLSv1:DES-CBC3-SHA:168) (Exim 4.34) id 1D0mvl-0000lx-Vf for nmh-workers@nongnu.org; Mon, 14 Feb 2005 15:34:55 -0500 Received: from turing-police.cc.vt.edu (turing-police.cc.vt.edu [127.0.0.1]) by turing-police.cc.vt.edu (8.13.3/8.13.3) with ESMTP id j1EKYZdc027718; Mon, 14 Feb 2005 15:34:35 -0500 Message-Id: <200502142034.j1EKYZdc027718@turing-police.cc.vt.edu> X-Mailer: exmh version 2.7.2 01/07/2005 with nmh-1.1-RC3 To: Harald Geyer Subject: Re: [Nmh-workers] scan or show of UTF-encoded headers? In-Reply-To: Your message of "Mon, 14 Feb 2005 19:35:36 +0100." From: Valdis.Kletnieks@vt.edu References: <13665.1108398325@foxharp.boston.ma.us> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="==_Exmh_1108413274_3236P"; micalg=pgp-sha1; protocol="application/pgp-signature" Content-Transfer-Encoding: 7bit Date: Mon, 14 Feb 2005 15:34:34 -0500 Cc: nmh-workers@nongnu.org X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Feb 2005 20:52:08 -0000 --==_Exmh_1108413274_3236P Content-Type: text/plain; charset=us-ascii On Mon, 14 Feb 2005 19:35:36 +0100, Harald Geyer said: > Obviously any script which tries to do the above runs into the same > problem that prevents nmh from doing it itself: The script would need > to know which charsets the terminal can handle and how to tell it. > Also changing the terminal might confuse other programs. > > I guess it would be much easier und less prone to error to just > implement transcoding of messages through iconv instead of trying > to adapt the display on a per message basis. In general, you *can't* do a good job of using iconv to mash things between the various iso8859-* charsets. There *will* be lossage - after all, there is a *reason* they're up to -15, namely that one isn't sufficient. So whichever one you're in, there *will* be lossage for the other 14. On the flip side, it's possible to do lossless conversion *from* any 8859-* into the UTF-8 space. So teaching the code that currently does MM_CHARSET that if the user is in a UTF-8 environ, it should use iconv to convert 8859 to utf-8 is a better solution. And yes, it's possible that the user is in a utf-8 environment, but doesn't have actual font glyghs for all the planes (so, for instance Hebrew or Cyrillic characters don't display). This is actually a non-issue, for 2 reasons: 1) If they don't have the Hebrew glyghs installed, there's nothing you could have done anyhow. 2) On the other hand, it's fairly safe to assume that if they're in a UTF-8 locale, that their software has at least enough smarts to put up a "unknown character" box at that position. > I remember the gnus people using big sets of tables to do a mixture > of transcoding and unifying between character sets which led to > messages being split into several parts of different character sets, > when it didn't work correctly. I don't know what had been their reason > to not use iconv. At least in the MULE-ized versions of Emacs and XEmacs, the basic reason for the big sets of tables is because they're using their own internal encoding instead of UTF-mumble (which is also why they couldn't use iconv). As a result, the big tables are visible to you. If it used iconv instead, the big tables are still there - just hidden off in /usr/lib/iconv where you don't usually see them. --==_Exmh_1108413274_3236P Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (GNU/Linux) Comment: Exmh version 2.5 07/13/2001 iD8DBQFCEQtacC3lWbTT17ARAh5MAJ9bCCdv7S1mN/ekhnnPcnS6v3UETwCgg+LD 5KsPgG3OIFdodEEmjUdqk8U= =w+5u -----END PGP SIGNATURE----- --==_Exmh_1108413274_3236P-- From MAILER-DAEMON Mon Feb 14 16:21:54 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D0nfF-0000qZ-D2 for mharc-nmh-workers@gnu.org; Mon, 14 Feb 2005 16:21:53 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D0nf3-0000l4-EQ for nmh-workers@nongnu.org; Mon, 14 Feb 2005 16:21:41 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D0nev-0000fb-QZ for nmh-workers@nongnu.org; Mon, 14 Feb 2005 16:21:35 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0nev-0000f1-Eu for nmh-workers@nongnu.org; Mon, 14 Feb 2005 16:21:33 -0500 Received: from [131.130.221.38] (helo=imap.unet.univie.ac.at) by monty-python.gnu.org with esmtp (Exim 4.34) id 1D0nQ2-0003jx-LX for nmh-workers@nongnu.org; Mon, 14 Feb 2005 16:06:11 -0500 X-Spam-Flags: <.FIVETEN-SPAMSRC.@imap.unet.univie.ac.at>[80.108.70.152:chello080108070152.13.11.univie.teleweb.at] Received: from chello080108070152.13.11.univie.teleweb.at (Debian-exim@chello080108070152.13.11.univie.teleweb.at [80.108.70.152]) by imap.unet.univie.ac.at (8.12.10/8.12.10) with ESMTP id j1EL5tPC160098 for ; Mon, 14 Feb 2005 22:06:02 +0100 Received: from rldprog (helo=chello080108070152.13.11.univie.teleweb.at) by chello080108070152.13.11.univie.teleweb.at with local-esmtp (Exim 4.34) id 1D0nQ6-0000R6-0k for nmh-workers@nongnu.org; Mon, 14 Feb 2005 22:06:14 +0100 To: nmh-workers@nongnu.org Subject: Re: [Nmh-workers] scan or show of UTF-encoded headers? Mail-Followup-To: nmh-workers@nongnu.org In-reply-to: <200502142034.j1EKYZdc027718@turing-police.cc.vt.edu> References: <13665.1108398325@foxharp.boston.ma.us> <200502142034.j1EKYZdc027718@turing-police.cc.vt.edu> Comments: In-reply-to Valdis.Kletnieks@vt.edu message dated "Mon, 14 Feb 2005 15:34:34 -0500." Date: Mon, 14 Feb 2005 22:06:13 +0100 From: Harald Geyer Message-Id: X-DCC-ZID-Univie-Metrics: mx8 4247; Body=1 Fuz1=1 Fuz2=1 X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Feb 2005 21:21:46 -0000 > > I guess it would be much easier und less prone to error to just > > implement transcoding of messages through iconv instead of trying > > to adapt the display on a per message basis. > > In general, you *can't* do a good job of using iconv to mash things between > the various iso8859-* charsets. There *will* be lossage - after all, there > is a *reason* they're up to -15, namely that one isn't sufficient. So whiche > ver > one you're in, there *will* be lossage for the other 14. > > On the flip side, it's possible to do lossless conversion *from* any 8859-* > into the UTF-8 space. So teaching the code that currently does MM_CHARSET > that if the user is in a UTF-8 environ, it should use iconv to convert 8859 > to utf-8 is a better solution. Actually it is the same solution: If the user is in an UTF-8 environment, you can't/shouldn't convert to iso8859-* anyway. The best solution is to convert to the most powerful charset available - be it lossless or not. > > I remember the gnus people using big sets of tables to do a mixture > > of transcoding and unifying between character sets which led to > > messages being split into several parts of different character sets, > > when it didn't work correctly. I don't know what had been their reason > > to not use iconv. > > At least in the MULE-ized versions of Emacs and XEmacs, the basic reason for > the big sets of tables is because they're using their own internal encoding > instead of UTF-mumble (which is also why they couldn't use iconv). I think I didn't use MULE but I guess you are right - it's a long time since I switched to vim and nmh ... Harald From MAILER-DAEMON Mon Feb 14 17:00:47 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D0oGt-0005iD-23 for mharc-nmh-workers@gnu.org; Mon, 14 Feb 2005 17:00:47 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D0oGq-0005h9-Ks for nmh-workers@nongnu.org; Mon, 14 Feb 2005 17:00:44 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D0oGo-0005ff-Nz for nmh-workers@nongnu.org; Mon, 14 Feb 2005 17:00:44 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0oC9-00045V-1t for nmh-workers@nongnu.org; Mon, 14 Feb 2005 16:55:53 -0500 Received: from [67.98.153.98] (helo=loco.ccr.org) by monty-python.gnu.org with esmtp (Exim 4.34) id 1D0nfm-00050W-EB for nmh-workers@nongnu.org; Mon, 14 Feb 2005 16:22:26 -0500 Received: from [67.98.153.113] (113.ccr.org [67.98.153.113]) by loco.ccr.org (Postfix) with ESMTP id 6BB1AEB1; Mon, 14 Feb 2005 16:28:28 -0500 (EST) Message-ID: <42111691.6020309@ccr.org> Date: Mon, 14 Feb 2005 16:22:25 -0500 From: Mike O'Dell User-Agent: Mozilla Thunderbird 1.0RC1 (Macintosh/20041201) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Martin McCormick , nmh-workers@nongnu.org Subject: Re: [Nmh-workers] refile Sometimes totally Shreds a Message References: <200502102029.j1AKTT5P098617@dc.cis.okstate.edu> In-Reply-To: <200502102029.j1AKTT5P098617@dc.cis.okstate.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Feb 2005 22:00:44 -0000 i've never seen the behavior you describe since adopting MH back in 1982 or so. at least in The Good Ol' Days, the refile operation was done with a link-unlink syscall pair and there is essentially no way for that to impact the contents of a file. in fact, i doubt seriously that the contents of the file matters one whit as to whether this happens. you didn't mention which version of FreeBSD you were running or whether you might have the ~/Mail directory mounted via NFS. i know there is considerable angst over filesystem weirdness in the 5.x branch of FreeBSD (hence my server running 4.works for the last several years). the 0xFFs sound suspiciously like the block erasure that's done when file blocks are freed, usually when the link count goes to zero. if i were guessing (which I am), it sounds like there's a race condition between the link and unlink and the inode ref count getting updated incorrectly, thereby triggering an erasure. i believe that all the blocks are cleaned before the blocks are released back to the freelists. i haven't looked at the code in a long time, but given the new locking code in 5.x, it's not impossible to imagine that getting hosed and produce the result you are seeing. again i'm speculating, but you are running 5.x, you may have found a way to provoke a bug with some regularity, if not precisely reproduce it. in that case, i suggest you raise the issue on the 5.x kernel list. -mo Martin McCormick wrote: > I use nmh-1.0.4 in FreeBSD UNIX and have noticed that the > refile function occasionally eats a message. It moves it from one > folder to another all right, but what ends up in the receiving folder > is a file containing all 0xFF's. > > I have tried to capture a message that triggers this behavior > but it is difficult since most messages do not self-destructand refile corectly. > When one does shred, I can't get it back to experiment with because, > by definition of the problem, it is simply gone. > > Martin McCormick WB5AGZ Stillwater, OK > OSU Information Technology Division Network Operations Group > > > _______________________________________________ > Nmh-workers mailing list > Nmh-workers@nongnu.org > http://lists.nongnu.org/mailman/listinfo/nmh-workers From MAILER-DAEMON Mon Feb 14 17:00:54 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D0oH0-0005nV-GX for mharc-nmh-workers@gnu.org; Mon, 14 Feb 2005 17:00:54 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D0oGz-0005mm-8p for nmh-workers@nongnu.org; Mon, 14 Feb 2005 17:00:53 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D0oGx-0005li-16 for nmh-workers@nongnu.org; Mon, 14 Feb 2005 17:00:52 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0oCD-00045V-Da for nmh-workers@nongnu.org; Mon, 14 Feb 2005 16:55:57 -0500 Received: from [131.130.221.38] (helo=imap.unet.univie.ac.at) by monty-python.gnu.org with esmtp (Exim 4.34) id 1D0nYi-0004ah-WD for nmh-workers@nongnu.org; Mon, 14 Feb 2005 16:15:09 -0500 X-Spam-Flags: <.FIVETEN-SPAMSRC.@imap.unet.univie.ac.at>[80.108.70.152:chello080108070152.13.11.univie.teleweb.at] Received: from chello080108070152.13.11.univie.teleweb.at (Debian-exim@chello080108070152.13.11.univie.teleweb.at [80.108.70.152]) by imap.unet.univie.ac.at (8.12.10/8.12.10) with ESMTP id j1ELEoPC078456; Mon, 14 Feb 2005 22:14:58 +0100 Received: from rldprog (helo=chello080108070152.13.11.univie.teleweb.at) by chello080108070152.13.11.univie.teleweb.at with local-esmtp (Exim 4.34) id 1D0nYj-0000Ri-En; Mon, 14 Feb 2005 22:15:09 +0100 To: Martin McCormick Subject: Re: [Nmh-workers] refile Sometimes totally Shreds a Message Mail-Followup-To: nmh-workers@nongnu.org In-reply-to: <200502102029.j1AKTT5P098617@dc.cis.okstate.edu> References: <200502102029.j1AKTT5P098617@dc.cis.okstate.edu> Comments: In-reply-to Martin McCormick message dated "Thu, 10 Feb 2005 14:29:29 -0600." Date: Mon, 14 Feb 2005 22:15:09 +0100 From: Harald Geyer Message-Id: X-DCC-ZID-Univie-Metrics: mx9.univie.ac.at 4247; Body=2 Fuz1=2 Fuz2=2 Cc: nmh-workers@nongnu.org X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Feb 2005 22:00:53 -0000 > I use nmh-1.0.4 in FreeBSD UNIX and have noticed that the > refile function occasionally eats a message. It moves it from one > folder to another all right, but what ends up in the receiving folder > is a file containing all 0xFF's. Sounds like it fails to open it without noticing or something like that. > I have tried to capture a message that triggers this behavior > but it is difficult since most messages do not self-destructand refile corect > ly. > When one does shred, I can't get it back to experiment with because, > by definition of the problem, it is simply gone. Doesn't refile leave ",XXXX" files in the original folder, with XXXX the message number? Harald From MAILER-DAEMON Mon Feb 14 17:27:13 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D0ogR-0000LJ-5b for mharc-nmh-workers@gnu.org; Mon, 14 Feb 2005 17:27:12 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D0ogE-0000Gm-JN for nmh-workers@nongnu.org; Mon, 14 Feb 2005 17:26:58 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D0ogC-0000EY-7I for nmh-workers@nongnu.org; Mon, 14 Feb 2005 17:26:56 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0ogB-0000Dq-7e for nmh-workers@nongnu.org; Mon, 14 Feb 2005 17:26:55 -0500 Received: from [64.46.156.66] (helo=colo.heeltoe.com) by monty-python.gnu.org with smtp (Exim 4.34) id 1D0oRj-0000qv-P9 for nmh-workers@nongnu.org; Mon, 14 Feb 2005 17:11:59 -0500 Received: (qmail 11884 invoked from network); 14 Feb 2005 22:07:20 -0000 Received: from foxharp.ne.client2.attbi.com (HELO mail.foxharp.boston.ma.us) (24.61.85.42) by 64.46.156.66 with SMTP; Mon, 14 Feb 2005 22:07:20 +0000 Received: (qmail 5205 invoked from network); 14 Feb 2005 22:11:32 -0000 Received: from unknown (HELO grass.foxharp.boston.ma.us) (192.168.111.11) by 0 with SMTP; 14 Feb 2005 22:11:32 -0000 Received: (qmail 1128 invoked from network); 14 Feb 2005 22:11:32 -0000 Received: from unknown (HELO foxharp.boston.ma.us) (192.168.111.11) by grass.foxharp.boston.ma.us with SMTP; 14 Feb 2005 22:11:32 -0000 To: nmh-workers@nongnu.org Subject: Re: [Nmh-workers] scan or show of UTF-encoded headers? In-reply-to: okiddle's message of Mon, 14 Feb 2005 19:34:35 +0100. <31941.1108406075@trentino.logica.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <1124.1108419092.1@foxharp.boston.ma.us> Date: Mon, 14 Feb 2005 17:11:32 -0500 Message-ID: <1126.1108419092@foxharp.boston.ma.us> From: Paul Fox X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 14 Feb 2005 22:27:03 -0000 it seems that this patch relies on other local changes you've made to your tree -- i went searching for "get_charset" with google, and came up with a message you sent to this list on january 24 which contained such a routine, in different mime-related patch. :-) paul > > i guess i was thinking of a wrapper for scan or show that took care > > of setting up the locale and charset, either via argument for manually > > choosing, or maybe even by examining the message and then figuring out > > what locale/charset it should probably use, this time. > > It's probably easier to hack the C code. I've had a quick go at > producing something which uses iconv to convert stuff to the native > character set (patch is below). Would be good if you could try this out > and look for ways to improve it. > > I've not thought through what the between_encodings stuff is doing and > if that is affected at all. If this is going to be turned into something > we can commit to CVS, we also need to work out the necessary configure > stuff for iconv. As it is, you may need to fiddle the Makefile to get > this to compile. > > Oliver > > Index: h/prototypes.h > =================================================================== > RCS file: /cvsroot/nmh/nmh/h/prototypes.h,v > retrieving revision 1.9 > diff -u -r1.9 prototypes.h > --- h/prototypes.h 27 Jan 2005 16:26:24 -0000 1.9 > +++ h/prototypes.h 14 Feb 2005 18:18:38 -0000 > @@ -61,6 +61,7 @@ > char **getans (char *, struct swit *); > int getanswer (char *); > char **getarguments (char *, int, char **, int); > +char *get_charset(); > char *getcpy (char *); > char *getfolder(int); > int lkclose(int, char*); > Index: sbr/fmt_rfc2047.c > =================================================================== > RCS file: /cvsroot/nmh/nmh/sbr/fmt_rfc2047.c,v > retrieving revision 1.2 > diff -u -r1.2 fmt_rfc2047.c > --- sbr/fmt_rfc2047.c 2 Jul 2002 22:09:14 -0000 1.2 > +++ sbr/fmt_rfc2047.c 14 Feb 2005 18:18:38 -0000 > @@ -10,6 +10,7 @@ > */ > > #include > +#include > > static signed char hexindex[] = { > -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, > @@ -97,6 +98,10 @@ > > /* Check for initial =? */ > if (*p == '=' && p[1] && p[1] == '?' && p[2]) { > + int use_iconv = 0; > + iconv_t cd; > + char *saveq, *convbuf; > + > startofmime = p + 2; > > /* Scan ahead for the next '?' character */ > @@ -106,9 +111,14 @@ > if (!*pp) > continue; > > - /* Check if character set is OK */ > - if (!check_charset(startofmime, pp - startofmime)) > - continue; > + /* Check if character set can be handled natively */ > + if (!check_charset(startofmime, pp - startofmime)) { > + use_iconv = 1; > + *pp = '\0'; > + cd = iconv_open(get_charset(), startofmime); > + *pp = '?'; > + if (cd == (iconv_t)-1) continue; > + } > > startofmime = pp + 1; > > @@ -159,6 +169,12 @@ > if (between_encodings) > q -= whitespace; > > + if (use_iconv) { > + saveq = q; > + if (!(q = convbuf = (char *)malloc(endofmime - startofmime))) > + continue; > + } > + > /* Now decode the text */ > if (quoted_printable) { > for (pp = startofmime; pp < endofmime; pp++) { > @@ -218,6 +234,15 @@ > } > } > > + if (use_iconv) { > + size_t inbytes = q - convbuf, outbytes = BUFSIZ; > + char *start = convbuf; > + iconv(cd, &start, &inbytes, &saveq, &outbytes); > + q = saveq; > + iconv_close(cd); > + free(convbuf); > + } > + > /* > * Now that we are done decoding this particular > * encoded word, advance string to trailing '='. > > > _______________________________________________ > Nmh-workers mailing list > Nmh-workers@nongnu.org > http://lists.nongnu.org/mailman/listinfo/nmh-workers =--------------------- paul fox, pgf@foxharp.boston.ma.us (arlington, ma, where it's 31.3 degrees) From MAILER-DAEMON Tue Feb 15 05:34:40 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D0zyV-0007z0-Bv for mharc-nmh-workers@gnu.org; Tue, 15 Feb 2005 05:30:37 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D0zy7-0007yV-FG for nmh-workers@nongnu.org; Tue, 15 Feb 2005 05:30:13 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D0zpi-0007mI-H5 for nmh-workers@nongnu.org; Tue, 15 Feb 2005 05:21:53 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D0zpZ-0007m4-2Q for nmh-workers@nongnu.org; Tue, 15 Feb 2005 05:21:23 -0500 Received: from [193.109.254.211] (helo=mail36.messagelabs.com) by monty-python.gnu.org with smtp (Exim 4.34) id 1D0zLz-0002bN-HP for nmh-workers@nongnu.org; Tue, 15 Feb 2005 04:50:47 -0500 X-VirusChecked: Checked X-Env-Sender: okiddle@yahoo.co.uk X-Msg-Ref: server-10.tower-36.messagelabs.com!1108461048!12890404!1 X-StarScan-Version: 5.4.8; banners=-,-,- X-Originating-IP: [158.234.9.163] Received: (qmail 19399 invoked from network); 15 Feb 2005 09:50:48 -0000 Received: from iris.logica.co.uk (158.234.9.163) by server-10.tower-36.messagelabs.com with SMTP; 15 Feb 2005 09:50:48 -0000 Received: from trentino.logica.co.uk ([158.234.142.59]) by iris.logica.co.uk (8.12.3/8.12.3/Debian -4) with ESMTP id j1F9oi61018726 for ; Tue, 15 Feb 2005 09:50:44 GMT Received: from trentino.logica.co.uk (localhost [127.0.0.1]) by trentino.logica.co.uk (Postfix) with ESMTP id 332762FC5A for ; Tue, 15 Feb 2005 10:50:24 +0100 (CET) X-VirusChecked: Checked X-StarScan-Version: 5.0.7; banners=.,-,- In-reply-to: <1126.1108419092@foxharp.boston.ma.us> From: Oliver Kiddle References: <1126.1108419092@foxharp.boston.ma.us> To: nmh-workers@nongnu.org Subject: Re: [Nmh-workers] scan or show of UTF-encoded headers? Date: Tue, 15 Feb 2005 10:50:24 +0100 Message-ID: <6365.1108461024@trentino.logica.co.uk> X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Feb 2005 10:30:22 -0000 Paul Fox wrote: > it seems that this patch relies on other local changes you've > made to your tree -- i went searching for "get_charset" with google, > and came up with a message you sent to this list on january 24 > which contained such a routine, in different mime-related patch. :-) The patch is against what is currently in the CVS repository which includes my patch from last month. If you can check the code out of CVS, it should apply against that. Otherwise, let me know and I'll put a tarball somewhere you can download it from. Having experimented with the patch a little myself, I've found that in a UTF-8 locale it has problems with the PUTS macro in fmt_scan.c. The macro strips out control characters but isgraph() doesn't work with multibyte characters. The following patch is a hack to make it basically work for UTF-8. For a proper fix, we will need to adjust the formatting code to handle multibyte characters properly. The width padding/truncating also breaks with multibyte characters. Does anyone know if the mh interfaces (mh-e etc) rely on the width truncating to avoid overflowing fixed width buffers? Oliver Index: sbr/fmt_scan.c =================================================================== RCS file: /cvsroot/nmh/nmh/sbr/fmt_scan.c,v retrieving revision 1.13 diff -u -r1.13 fmt_scan.c --- sbr/fmt_scan.c 30 Sep 2003 19:55:12 -0000 1.13 +++ sbr/fmt_scan.c 15 Feb 2005 09:01:35 -0000 @@ -130,7 +130,7 @@ sp++;\ }\ while ((c = (unsigned char) *sp++) && --i >= 0 && cp < ep)\ - if (isgraph(c)) \ + if (!iscntrl(c) && !isspace(c)) \ *cp++ = c;\ else {\ while ((c = (unsigned char) *sp) && (iscntrl(c) || isspace(c)))\ @@ -148,7 +148,7 @@ while ((c = (unsigned char) *sp) && (iscntrl(c) || isspace(c)))\ sp++;\ while((c = (unsigned char) *sp++) && cp < ep)\ - if (isgraph(c)) \ + if (!iscntrl(c) && !isspace(c)) \ *cp++ = c;\ else {\ while ((c = (unsigned char) *sp) && (iscntrl(c) || isspace(c)))\ From MAILER-DAEMON Tue Feb 15 05:38:30 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D1066-0008Fk-Nr for mharc-nmh-workers@gnu.org; Tue, 15 Feb 2005 05:38:27 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D105v-0008FD-3G for nmh-workers@nongnu.org; Tue, 15 Feb 2005 05:38:16 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D105n-0008En-CO for nmh-workers@nongnu.org; Tue, 15 Feb 2005 05:38:09 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D102n-000860-KE for nmh-workers@nongnu.org; Tue, 15 Feb 2005 05:35:03 -0500 Received: from [193.109.254.211] (helo=mail36.messagelabs.com) by monty-python.gnu.org with smtp (Exim 4.34) id 1D0zoC-0004fR-Ak for nmh-workers@nongnu.org; Tue, 15 Feb 2005 05:19:56 -0500 X-VirusChecked: Checked X-Env-Sender: okiddle@yahoo.co.uk X-Msg-Ref: server-20.tower-36.messagelabs.com!1108462794!13970646!1 X-StarScan-Version: 5.4.8; banners=-,-,- X-Originating-IP: [158.234.9.163] Received: (qmail 22087 invoked from network); 15 Feb 2005 10:19:54 -0000 Received: from iris.logica.co.uk (158.234.9.163) by server-20.tower-36.messagelabs.com with SMTP; 15 Feb 2005 10:19:54 -0000 Received: from trentino.logica.co.uk ([158.234.142.59]) by iris.logica.co.uk (8.12.3/8.12.3/Debian -4) with ESMTP id j1FAJs61006177 for ; Tue, 15 Feb 2005 10:19:54 GMT Received: from trentino.logica.co.uk (localhost [127.0.0.1]) by trentino.logica.co.uk (Postfix) with ESMTP id 215FE2FC5D for ; Tue, 15 Feb 2005 11:19:34 +0100 (CET) X-VirusChecked: Checked X-StarScan-Version: 5.0.7; banners=.,-,- In-reply-to: <200502142034.j1EKYZdc027718@turing-police.cc.vt.edu> From: Oliver Kiddle References: <13665.1108398325@foxharp.boston.ma.us> <200502142034.j1EKYZdc027718@turing-police.cc.vt.edu> To: nmh-workers@nongnu.org Subject: Re: [Nmh-workers] scan or show of UTF-encoded headers? Date: Tue, 15 Feb 2005 11:19:34 +0100 Message-ID: <6705.1108462774@trentino.logica.co.uk> X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Feb 2005 10:38:19 -0000 Valdis.Kletnieks@vt.edu wrote: > > In general, you *can't* do a good job of using iconv to mash things between > the various iso8859-* charsets. There *will* be lossage - after all, there > is a *reason* they're up to -15, namely that one isn't sufficient. So whichever > one you're in, there *will* be lossage for the other 14. I'd agree that you can't do a perfect job but would argue that the result of using iconv is better than doing nothing. Even in an iso8859-* locale. I'd prefer to see spaces, question marks or unknown character boxes to the raw quoted-printable/base64 encoding. Many headers in real e-mails will convert without problems. By the way, the patch I posted yesterday currently stops on reaching a character it can't convert. It should be easy to make it insert a '?' or similar. GNU iconv can actually do some slightly more intelligent mappings if you append //translit to the destination encoding name. This means that, for example, the euro symbol becomes "EUR". Oliver From MAILER-DAEMON Mon Feb 21 09:52:10 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D3Euu-0006WQ-Lk for mharc-nmh-workers@gnu.org; Mon, 21 Feb 2005 09:52:09 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D3EtQ-0006A4-DO for nmh-workers@nongnu.org; Mon, 21 Feb 2005 09:50:36 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D3Et8-00062X-Eg for nmh-workers@nongnu.org; Mon, 21 Feb 2005 09:50:27 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D3Et7-0005yH-NK for nmh-workers@nongnu.org; Mon, 21 Feb 2005 09:50:17 -0500 Received: from [193.109.254.211] (helo=mail36.messagelabs.com) by monty-python.gnu.org with smtp (Exim 4.34) id 1D3EOl-0007d7-O6 for nmh-workers@nongnu.org; Mon, 21 Feb 2005 09:18:56 -0500 X-VirusChecked: Checked X-Env-Sender: okiddle@yahoo.co.uk X-Msg-Ref: server-5.tower-36.messagelabs.com!1108995532!14236015!1 X-StarScan-Version: 5.4.11; banners=-,-,- X-Originating-IP: [158.234.9.163] Received: (qmail 10762 invoked from network); 21 Feb 2005 14:18:52 -0000 Received: from iris.logica.co.uk (158.234.9.163) by server-5.tower-36.messagelabs.com with SMTP; 21 Feb 2005 14:18:52 -0000 Received: from trentino.logica.co.uk ([158.234.142.59]) by iris.logica.co.uk (8.12.3/8.12.3/Debian -4) with ESMTP id j1LEIr61023132 for ; Mon, 21 Feb 2005 14:18:53 GMT Received: from trentino.logica.co.uk (localhost [127.0.0.1]) by trentino.logica.co.uk (Postfix) with ESMTP id BF68A30F8A for ; Mon, 21 Feb 2005 15:18:32 +0100 (CET) X-VirusChecked: Checked X-StarScan-Version: 5.1.13; banners=.,-,- From: Oliver Kiddle To: nmh-workers@nongnu.org Date: Mon, 21 Feb 2005 15:18:32 +0100 Message-ID: <20265.1108995512@trentino.logica.co.uk> Subject: [Nmh-workers] inc bug X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Feb 2005 14:52:05 -0000 A little while back, I noticed that inc crashes if I have logged in using su from another user. Having investigated, it is the code which reopens the terminal to read the password. When logged in using su, this is still owned by the original user so the fopen fails. It ought to be opening /dev/tty instead of whatever ttyname() returns for stdin. Fix is below. Oliver Index: sbr/getpass.c =================================================================== RCS file: /cvsroot/nmh/nmh/sbr/getpass.c,v retrieving revision 1.4 diff -u -r1.4 getpass.c --- sbr/getpass.c 9 May 2000 21:44:16 -0000 1.4 +++ sbr/getpass.c 21 Feb 2005 14:05:26 -0000 @@ -35,7 +35,7 @@ #include #include -#include /* for ttyname() */ +#include /* for isatty() */ #include "h/mh.h" /* for adios() */ /* We don't use MAX_PASS here because the maximum password length on a remote @@ -52,21 +52,21 @@ { struct termios oterm, term; int ch; - char *p, *ttystring; + char *p; FILE *fout, *fin; static char buf[MAX_PASSWORD_LEN + 1]; + int istty = isatty(fileno(stdin)); /* Find if stdin is connect to a terminal. If so, read directly from * the terminal, and turn off echo. Otherwise read from stdin. */ - if((ttystring = (char *)ttyname(fileno(stdin))) == NULL) { + if (!istty || !(fout = fin = fopen("/dev/tty", "w+"))) { fout = stderr; fin = stdin; } else /* Reading directly from terminal here */ { - fout = fin = fopen(ttystring, "w+"); (void)tcgetattr(fileno(fin), &oterm); term = oterm; /* Save original info */ term.c_lflag &= ~ECHO; @@ -81,7 +81,7 @@ *p++ = ch; *p = '\0'; - if(ttystring != NULL) { + if (istty) { (void)tcsetattr(fileno(fin), TCSANOW, &oterm); rewind(fout); (void)fputc('\n', fout); From MAILER-DAEMON Tue Feb 22 11:12:57 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D3cee-0007JQ-WB for mharc-nmh-workers@gnu.org; Tue, 22 Feb 2005 11:12:57 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D1mcL-0007hz-4D for nmh-workers@nongnu.org; Thu, 17 Feb 2005 09:26:57 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D1mcB-0007cQ-74 for nmh-workers@nongnu.org; Thu, 17 Feb 2005 09:26:51 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D1mc8-0007Zf-IG for nmh-workers@nongnu.org; Thu, 17 Feb 2005 09:26:44 -0500 Received: from [193.61.29.2] (helo=rhea.dcs.bbk.ac.uk) by monty-python.gnu.org with esmtp (Exim 4.34) id 1D1mGk-0005mp-6M for nmh-workers@nongnu.org; Thu, 17 Feb 2005 09:04:38 -0500 Received: from penguin.dcs.bbk.ac.uk by rhea.dcs.bbk.ac.uk (8.8.8+Sun/) id OAA06635; Thu, 17 Feb 2005 14:04:29 GMT Received: from dcs.bbk.ac.uk (mick@localhost) by penguin.dcs.bbk.ac.uk (8.12.8/8.12.8/Submit) with ESMTP id j1HE4UIR019059 for ; Thu, 17 Feb 2005 14:04:30 GMT Message-Id: <200502171404.j1HE4UIR019059@penguin.dcs.bbk.ac.uk> X-Authentication-Warning: penguin.dcs.bbk.ac.uk: mick owned process doing -bs To: nmh-workers@nongnu.org Organization: School of Computer Science & Information Systems, Birkbeck, University of London Postal-Address: Malet Street, London, WC1E 7HX, England X-Face: j`yK\"D[C7([1kA*7qHyH; bs`JSWtI; N0O02hsN([X$Z%Bq>wz3*VW*F,G/I=xvz]3SIT2:`|`t; M2?Cz-6+\m`so#Ev@Z6*lGj"8"ACE:}>-^,+W\1%B/W!&~8V`:Djd+ MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <19057.1108649070.1@dcs.bbk.ac.uk> Date: Thu, 17 Feb 2005 14:04:30 +0000 From: Mick Farmer X-Mailman-Approved-At: Tue, 22 Feb 2005 11:12:55 -0500 Subject: [Nmh-workers] Show and OpenOffice X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 17 Feb 2005 14:27:02 -0000 Dear NMHers, When someone sends me a Word document (typically as an attachment) I want to read it using OpenOffice. To this end I have the following in my .mh_profile. mhshow-show-application/msword: %pooffice '%F' OpenOffice starts and gets as far as a blank document before failing with the following error. signal 11 (segmentation fault) Needless to say, if I unpack the message using mhstore and use OpenOffice to read the document there's no problem. I'm running nmh-1.0.4-18 on RedHat 9. Regards, Mick /"\ \ / Linux Registered X ASCII Ribbon Campaign User #287765 / \ Against HTML Mail From MAILER-DAEMON Tue Feb 22 13:20:33 2005 Received: from mailman by lists.gnu.org with archive (Exim 4.43) id 1D3ee8-0007mY-Mu for mharc-nmh-workers@gnu.org; Tue, 22 Feb 2005 13:20:32 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1D3edl-0007gh-Ot for nmh-workers@nongnu.org; Tue, 22 Feb 2005 13:20:10 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1D3edd-0007c7-Rp for nmh-workers@nongnu.org; Tue, 22 Feb 2005 13:20:04 -0500 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1D3edd-0007bx-N2 for nmh-workers@nongnu.org; Tue, 22 Feb 2005 13:20:01 -0500 Received: from [193.109.254.211] (helo=mail36.messagelabs.com) by monty-python.gnu.org with smtp (Exim 4.34) id 1D3eNL-0007BV-0n for nmh-workers@nongnu.org; Tue, 22 Feb 2005 13:03:11 -0500 X-VirusChecked: Checked X-Env-Sender: okiddle@yahoo.co.uk X-Msg-Ref: server-3.tower-36.messagelabs.com!1109095389!14245796!1 X-StarScan-Version: 5.4.11; banners=-,-,- X-Originating-IP: [158.234.9.163] Received: (qmail 11976 invoked from network); 22 Feb 2005 18:03:09 -0000 Received: from iris.logica.co.uk (158.234.9.163) by server-3.tower-36.messagelabs.com with SMTP; 22 Feb 2005 18:03:09 -0000 Received: from trentino.logica.co.uk ([158.234.142.59]) by iris.logica.co.uk (8.12.3/8.12.3/Debian -4) with ESMTP id j1MI3861014182 for ; Tue, 22 Feb 2005 18:03:08 GMT Received: from trentino.logica.co.uk (localhost [127.0.0.1]) by trentino.logica.co.uk (Postfix) with ESMTP id 8145832EE7 for ; Tue, 22 Feb 2005 19:02:48 +0100 (CET) X-VirusChecked: Checked X-StarScan-Version: 5.0.7; banners=.,-,- In-reply-to: <31941.1108406075@trentino.logica.co.uk> From: Oliver Kiddle References: <13665.1108398325@foxharp.boston.ma.us> <31941.1108406075@trentino.logica.co.uk> To: nmh-workers@nongnu.org Subject: Re: [Nmh-workers] scan or show of UTF-encoded headers? Date: Tue, 22 Feb 2005 19:02:48 +0100 Message-ID: <31831.1109095368@trentino.logica.co.uk> X-BeenThere: nmh-workers@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: nmh-workers.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Feb 2005 18:20:15 -0000 On 14 Feb, I wrote: > It's probably easier to hack the C code. I've had a quick go at > producing something which uses iconv to convert stuff to the native > character set (patch is below). Would be good if you could try this out > and look for ways to improve it. I've now produced something good enough that I'll put it in CVS unless someone complains first. This now includes configure tests for finding iconv and a feature to put a question mark in place of any characters iconv failed to convert. I've not been able to test the configure changes on many systems so it would be good if you could have a go at compiling this on any systems you have access to. If you try this out from a UTF-8 locale, you're likely to notice that nmh can't yet handle multibyte characters when truncating/padding strings to fit a particular width. I've put fixing that on my todo list. Oliver Index: configure.in =================================================================== RCS file: /cvsroot/nmh/nmh/configure.in,v retrieving revision 1.66 diff -u -r1.66 configure.in --- configure.in 27 Jan 2005 16:26:24 -0000 1.66 +++ configure.in 22 Feb 2005 17:47:30 -0000 @@ -445,7 +445,7 @@ AC_CHECK_HEADERS(string.h memory.h stdlib.h unistd.h errno.h fcntl.h \ limits.h crypt.h termcap.h termio.h termios.h locale.h \ langinfo.h netdb.h sys/param.h sys/time.h sys/utsname.h \ - arpa/inet.h arpa/ftp.h) + iconv.h arpa/inet.h arpa/ftp.h) AC_CACHE_CHECK(POSIX termios, nmh_cv_sys_posix_termios, @@ -547,6 +547,46 @@ done AC_SUBST(TERMLIB)dnl +dnl --------------- +dnl CHECK FOR ICONV +dnl --------------- + +dnl Find iconv. It may be in libiconv and may be iconv() or libiconv() +if test "x$ac_cv_header_iconv_h" = "xyes"; then + AC_CHECK_FUNC(iconv, ac_found_iconv=yes, ac_found_iconv=no) + if test "x$ac_found_iconv" = "xno"; then + AC_CHECK_LIB(iconv, iconv, ac_found_iconv=yes) + if test "x$ac_found_iconv" = "xno"; then + AC_CHECK_LIB(iconv, libiconv, ac_found_iconv=yes) + fi + if test "x$ac_found_iconv" != "xno"; then + LIBS="-liconv $LIBS" + fi + fi +fi +if test "x$ac_found_iconv" = xyes; then + AC_DEFINE(HAVE_ICONV, 1, [Define if you have the iconv() function.]) +fi + +dnl Check if iconv uses const in prototype declaration +if test "x$ac_found_iconv" = "xyes"; then + AC_CACHE_CHECK(for iconv declaration, ac_cv_iconv_const, + [AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[#include + #include ]], + [[#ifdef __cplusplus + "C" + #endif + #if defined(__STDC__) || defined(__cplusplus) + size_t iconv (iconv_t cd, char * *inbuf, size_t *inbytesleft, char * *outbuf, size_t *outbytesleft); + #else + size_t iconv(); + #endif]])], + [ac_cv_iconv_const=], + [ac_cv_iconv_const=const])]) + AC_DEFINE_UNQUOTED([ICONV_CONST], $ac_cv_iconv_const, + [Define as const if the declaration of iconv() needs const.]) +fi + dnl -------------- dnl CHECK FOR NDBM dnl -------------- Index: h/prototypes.h =================================================================== RCS file: /cvsroot/nmh/nmh/h/prototypes.h,v retrieving revision 1.9 diff -u -r1.9 prototypes.h --- h/prototypes.h 27 Jan 2005 16:26:24 -0000 1.9 +++ h/prototypes.h 22 Feb 2005 17:47:30 -0000 @@ -61,6 +61,7 @@ char **getans (char *, struct swit *); int getanswer (char *); char **getarguments (char *, int, char **, int); +char *get_charset(); char *getcpy (char *); char *getfolder(int); int lkclose(int, char*); Index: sbr/fmt_rfc2047.c =================================================================== RCS file: /cvsroot/nmh/nmh/sbr/fmt_rfc2047.c,v retrieving revision 1.2 diff -u -r1.2 fmt_rfc2047.c --- sbr/fmt_rfc2047.c 2 Jul 2002 22:09:14 -0000 1.2 +++ sbr/fmt_rfc2047.c 22 Feb 2005 17:47:30 -0000 @@ -10,6 +10,10 @@ */ #include +#ifdef HAVE_ICONV +# include +# include +#endif static signed char hexindex[] = { -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, @@ -61,6 +65,12 @@ int between_encodings = 0; /* are we between two encodings? */ int equals_pending = 0; /* is there a '=' pending? */ int whitespace = 0; /* how much whitespace between encodings? */ +#ifdef HAVE_ICONV + int use_iconv = 0; /* are we converting encoding with iconv? */ + iconv_t cd; + int fromutf8; + char *saveq, *convbuf; +#endif if (!str) return 0; @@ -73,6 +83,14 @@ return 0; for (p = str, q = dst; *p; p++) { + + /* reset iconv */ +#ifdef HAVE_ICONV + if (use_iconv) { + iconv_close(cd); + use_iconv = 0; + } +#endif /* * If we had an '=' character pending from * last iteration, then add it first. @@ -106,9 +124,20 @@ if (!*pp) continue; - /* Check if character set is OK */ - if (!check_charset(startofmime, pp - startofmime)) + /* Check if character set can be handled natively */ + if (!check_charset(startofmime, pp - startofmime)) { +#ifdef HAVE_ICONV + /* .. it can't. We'll use iconv then. */ + *pp = '\0'; + cd = iconv_open(get_charset(), startofmime); + fromutf8 = !strcasecmp(startofmime, "UTF-8"); + *pp = '?'; + if (cd == (iconv_t)-1) continue; + use_iconv = 1; +#else continue; +#endif + } startofmime = pp + 1; @@ -159,6 +188,14 @@ if (between_encodings) q -= whitespace; +#ifdef HAVE_ICONV + if (use_iconv) { + saveq = q; + if (!(q = convbuf = (char *)malloc(endofmime - startofmime))) + continue; + } +#endif + /* Now decode the text */ if (quoted_printable) { for (pp = startofmime; pp < endofmime; pp++) { @@ -218,6 +255,35 @@ } } +#ifdef HAVE_ICONV + /* Convert to native character set */ + if (use_iconv) { + size_t inbytes = q - convbuf; + size_t outbytes = BUFSIZ; + ICONV_CONST char *start = convbuf; + + while (inbytes) { + if (iconv(cd, &start, &inbytes, &saveq, &outbytes) == + (size_t)-1) { + if (errno != EILSEQ) break; + /* character couldn't be converted. we output a `?' + * and try to carry on which won't work if + * either encoding was stateful */ + iconv (cd, 0, 0, &saveq, &outbytes); + *saveq++ = '?'; + /* skip to next input character */ + if (fromutf8) { + for (start++;(*start & 192) == 128;start++) + inbytes--; + } else + start++, inbytes--; + } + } + q = saveq; + free(convbuf); + } +#endif + /* * Now that we are done decoding this particular * encoded word, advance string to trailing '='. @@ -229,6 +295,9 @@ whitespace = 0; /* re-initialize amount of whitespace */ } } +#ifdef HAVE_ICONV + if (use_iconv) iconv_close(cd); +#endif /* If an equals was pending at end of string, add it now. */ if (equals_pending) Index: sbr/fmt_scan.c =================================================================== RCS file: /cvsroot/nmh/nmh/sbr/fmt_scan.c,v retrieving revision 1.13 diff -u -r1.13 fmt_scan.c --- sbr/fmt_scan.c 30 Sep 2003 19:55:12 -0000 1.13 +++ sbr/fmt_scan.c 22 Feb 2005 17:47:30 -0000 @@ -130,7 +130,7 @@ sp++;\ }\ while ((c = (unsigned char) *sp++) && --i >= 0 && cp < ep)\ - if (isgraph(c)) \ + if (!iscntrl(c) && !isspace(c)) \ *cp++ = c;\ else {\ while ((c = (unsigned char) *sp) && (iscntrl(c) || isspace(c)))\ @@ -148,7 +148,7 @@ while ((c = (unsigned char) *sp) && (iscntrl(c) || isspace(c)))\ sp++;\ while((c = (unsigned char) *sp++) && cp < ep)\ - if (isgraph(c)) \ + if (!iscntrl(c) && !isspace(c)) \ *cp++ = c;\ else {\ while ((c = (unsigned char) *sp) && (iscntrl(c) || isspace(c)))\