[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] Patch Logs vs. character sets

From: Tom Lord
Subject: [Gnu-arch-users] Patch Logs vs. character sets
Date: Tue, 25 May 2004 11:44:01 -0700 (PDT)

Aaron mentioned his belief that patch logs will eventually be UTF-8.

I don't think so -- I think that would be a mistake.


~ Patch log entries can be in any character set and encoding
  form which is a superset of ASCII

~ All header data which arch wants to be parsable will be 
  in ASCII, using Pika escaping and Unicode for non-ASCII
  character data.

~ Header names consist of any non-:, non-whitespace (not _ascii_
  whitespace), non-empty string.   Header names that arch
  cares about will be ASCII.

~ An optional header will be used to specify encoding form, 

        Encoding: iso-8859-1


        Encoding: utf-8

~ Some commands produce as output non-parsed fragments from
  patch logs.   One example is the "--summary" option
  that many commands take (e.g,. `tla missing --summary').
  Another example is an automatically constructed ChangeLog.

  Most of these (ChangeLogs being the exception) should infer
  the user's preferred character set from the locale and 
  transcode log message data appropriately.   For example,
  if a log message is encoded in iso-8859-4 but my terminal
  understands utf-8, `tla missing --summary' should recode
  the summary line in utf-8 before printing it.

  (If transcoding isn't possible because the destination set
  can't represent a particular character or because arch
  doesn't know how, then Pika escaping can be used for 
  non-ASCII characters.)

  Log excerpts injected into ChangeLogs should also be automatically
  transcoded but in that case, the target encoding should be taken
  from a comment in the ChangeLog.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]