[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunder

From: Eli Zaretskii
Subject: bug#35507: Gnus mojibakifies UTF-8 text/x-patch attachments from Thunderbird
Date: Wed, 01 May 2019 20:32:22 +0300

> From: Paul Eggert <address@hidden>
> Date: Tue, 30 Apr 2019 12:20:58 -0700
> Although Internet RFC 2046 section 4.1.2 says the default charset for
> text/* media types is US-ASCII, Internet RFC 6557 section 3 amends this
> to say that registered text/* media types should require a charset
> specification (or should say it's not needed because the payload has
> that info, which obviously doesn't apply here). It later says that if
> there is a strong reason to have a charset default, the default should
> be UTF-8.

(You meant RFC 6657, I believe.)

That's not exactly my reading of the RFC language.  First, it sounds
like the text there is primarily intended for the sending MUA, not for
the receiving MUA.  And second, this text:

     In order to improve interoperability with deployed agents, "text/*"
     media type registrations SHOULD either

     a.  specify that the "charset" parameter is not used for the defined
         subtype, because the charset information is transported inside
         the payload (such as in "text/xml"), or

     b.  require explicit unconditional inclusion of the "charset"
         parameter, eliminating the need for a default value.

     In accordance with option (a) above, registrations for "text/*" media
     types that can transport charset information inside the corresponding
     payloads (such as "text/html" and "text/xml") SHOULD NOT specify the
     use of a "charset" parameter, nor any default value, in order to
     avoid conflicting interpretations should the "charset" parameter
     value and the value specified in the payload disagree.

     Thus, new subtypes of the "text" media type SHOULD NOT define a
     default "charset" value.  If there is a strong reason to do so
     despite this advice, they SHOULD use the "UTF-8" [RFC3629] charset as
     the default.

     Regardless of what approach is chosen, all new "text/*" registrations
     MUST clearly specify how the charset is determined; relying on the
     default defined in Section 4.1.2 of [RFC2046] is no longer permitted.
     However, existing "text/*" registrations that fail to specify how the
     charset is determined still default to US-ASCII.

seems to say that:

  . it is preferable, for new types of text/* media, not to have any
    default charset, unless there's a strong reason to the contrary

  . all new text/* registrations must specify how the charset is
    determined, and not rely on the default from RFC 2046

Is text/x-patch a "new media type" or not?  If it is not new, then
where is it defined?  I couldn't find it on the IANA site.

If it _is_ "new", my reading of the RFC is that we should not define
or expect any defaults, which means this bug is squarely in
Thunderbird's yard, and we shouldn't change Gnus to arbitrarily assume
UTF-8 as the default.

> I have filed a Thunderbird bug report for this, as Thunderbird should
> specify a charset; see
> <https://bugzilla.mozilla.org/show_bug.cgi?id=1167982>. However, Gnus
> should be a polite citizen and handle these attachments nicely rather
> than converting the non-ASCII UTF-8 characters to mojibake.

Does Gnus have a command to re-decode an already decoded MIME part?
If not, it should.  But other than that, I don't see why we should
change Gnus in this regard, certainly not unconditionally assuming

reply via email to

[Prev in Thread] Current Thread [Next in Thread]