bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Word wrapping in PO


From: Dwayne Bailey
Subject: Re: Word wrapping in PO
Date: Tue, 18 Dec 2007 11:53:22 +0200

On Mon, 2007-12-17 at 12:35 +0100, Bruno Haible wrote:
> Hi,
> 
> Dwayne Bailey wrote:
> > Although in most cases translators are not reading the PO format
> > directly but are rather using a PO editor.  There are cases where a
> > translator wants to work directly with the format.
> > ...
> > However, in a few cases the way that Gettext currently wraps words in
> > the PO format makes it hard to read in the raw format.
> 
> You can always avoid this wrapping by preprocessing your PO file with
> "msgcat --width=10000".

Unfortunately that makes editing even harder as a 5 lines message
becomes 1.  Not ideal.

> > The following
> > diff snippet shows how a good wrapping was converted to a much harder to
> > read form:
> > 
> > --- manager/chrome/pippki/pippki.dtd.po (revision 9338)
> > +++ manager/chrome/pippki/pippki.dtd.po (working copy)
> > @@ -95,9 +95,9 @@
> >  "form data, personal certificates, and private keys will be forgotten.
> > Are "
> >  "you sure you want to reset your master password?"
> >  msgstr ""
> > -"Indien u u meesterwagwoord terugstel, sal al u gestoorde web- en "
> > -"e-poswagwoorde, vormdata, persoonlike sertifikate en private sleutels
> > "
> > -"vergeet word. Is u seker dat u die meesterwagwoord wil teruglaai?"
> > +"Indien u u meesterwagwoord terugstel, sal al u gestoorde web- en e-"
> > +"poswagwoorde, vormdata, persoonlike sertifikate en private sleutels
> > vergeet "
> > +"word. Is u seker dat u die meesterwagwoord wil teruglaai?"
> > 
> >  #. Values for getpassword.xul
> >  #: getPassword.title
> > 
> > This might not be clear with wrapping in my email.  In summary:
> > 
> > ...web- en "
> > "e-poswagwoorde, ...
> > 
> > becomes
> > 
> > web en e-"
> > "poswagwoorde...
> > 
> > The second is much harder to read.
> 
> I don't see how to do line breaking here that would avoid this special case,
> without using a dictionary-based approach. I don't want a dictionary-based
> line breaking in gettext since gettext has to support many languages, the
> installed size of > 30 dictionaries would be huge, and the line breaking
> feature in gettext is of minor usefulness anyway.
> 
> > Suggested solution: wrap only on spaces and do fancy word breaking only
> > in cases where it is needed.
> 
> Sorry, this does not work for Chinese. The algorithm that gettext uses is
> the Unicode line breaking algorithm [1], chosen because it produces
> acceptable results in most scripts and most languages.

Yes Unicode works well for most things, I don't know the chinese case
well enough to understand the issues.  Although breaking on Chinese line
break characters to me is still feasible as it is outside of Latin1.  

I'm afraid Unicode line breaks are still not ideal for what goes into PO
files in many cases.  Two more cases I've now seen that make things
rather confusing.

Variables: %s can break over two lines
Tags: </b> it is perfectly permissible to break these over lines.

Unfortunately the Unicode algorithms are very focused on text and will
break anywhere not always in ideal places.

But this is a balance of implementation simplicity, so your call.  But I
would still say using space breaking with a fallback to more specific
breaking in other character ranges in Unicode would render more readable
results.

> Bruno
> 
> [1] http://www.unicode.org/reports/tr14/
-- 
Dwayne Bailey
Translate.org.za

+27-12-460-1095 (w)
+27-83-443-7114 (cell)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]