bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xliff-comment] XLIFF vs. PO vs. Trolltech


From: Oswald Buddenhagen
Subject: Re: [xliff-comment] XLIFF vs. PO vs. Trolltech
Date: Mon, 19 May 2008 13:44:48 +0200
User-agent: KMail/1.9.9

Hello Asgeir, *,

thanks for the replies.

On Saturday 17 May 2008 06:39:32 Asgeir Frimannsson wrote:
> On Saturday 17 May 2008 03:05:11 am Oswald Buddenhagen wrote:
> > Trolltech is looking into implementing/improving XLIFF support in Qt's
> > Linguist tool chain. Interoperability with PO files is an item, too.
> > This is what I've come up with. Please sanity-check it, so we don't set
> > a faulty de-facto standard in case we go for it. ;)
>
> In terms of PO interoperability and the representation of TS in PO, it
> would probably be wise to discuss this on the GNU gettext mailinglist
> (address@hidden) see http://savannah.gnu.org/projects/gettext/ .
>
OK, on CC now.

> Also, the Translate Toolkit (translate.sf.net) have some existing ts<->po
> converters but I'm not sure what the status of these are.
>
Somewhat rudimentary, it seems after a quick test.

> > - The PO representation guide says that everything should be put into one
> >   <file> element and PO references should be represented as <context
> >   context-type="sourcefile">. This is in accordance with the XLIFF spec
> >   (see "sourcefile" value doc). However, that means that if I create an
> > .xlf file directly from sources I get a different representation than if
> > I create a .po file and convert it to .xlf later. I find this
> > inconsistency not justified, so I think I would opt for the "native"
> > representation with multiple <file> elements. Only if the PO message has
> > additional references to other files, sourcefile contexts would be used.
>
> The main issue with representing this as multiple <file> elements is that
> in XLIFF, there is no concept of meta-data above the <file> level.
>
Right. I just mapped the .po file header to a message with an empty source 
coming from a file with no name, i.e., basically doing what .po does.
This is sort of hacky, but OTOH it requires no special support from tools, so I 
expect less trouble from this approach than some more or less arbitrary other 
mapping.

> We used 
> a single <file> element for representing a PO, as a PO is a single file.

> If
> e.g. gettext implemented support natively for XLIFF, the data model would
> be very different, as the source would be a set of source-files with
> extracted translatable text, rather than a single resource file.
>
This is basically what I proposed, right?

> (this might be a bit Qt/Trolltech specific from here:)
>
> From what I understand from your mail you are trying to accomplish
> something like
>
> # generates a single .xlf for the project with mutiple <file> elements
> lupdate -xlf myproject.pro
>
> # generates a single .po for the project
> lupdate -po myproject.pro
>
> # generates a single .ts for the project
> lupdate -ts myproject.pro
>
> So you are saying that if you take the PO generated above and create an
> XLIFF from it using the representation guide, it will be different from the
> XLIFF created by lupdate directly?
>
Yes.

> If so, I don't see anything wrong with 
> that, as they are technically representing two rather different
> data-models.
>
Yes ... however, one of our aims is having lossless conversion between the 
formats (*) for smooth integration into existing systems (and to simplify 
internal testing :). This should happen as naturally as possible, without 
introducing magic meta data unless unavoidable.

(*) OK, so converting from XLIFF to something else and back to XLIFF is not 
going to work losslessly, but you get the idea. :)

> As a side-note: In some of my work, I've found it more beneficial to
> represent PO files as a hierarchy of <group> elements based on the PO
> references rather than the flat structure we have defined in the PO
> representation guide. This structure gives a much better contextual
> hierarchy for both translators and processing tools. This approach takes
> more processing though, as you have inter-trans-unit references, and the PO
> would have to be fully read before starting to write the XLIFF file.
> Howerver, you might find this
> representation closer to what you're trying to accomplish,
>
Yes.

> although I'm not sure how it matches with the ts <context> element.
>
That's fine - .ts contexts are basically nested into files (well, actually, it 
is not unlikely to have the same context both in a .ui file and in the 
associated .cpp file, but that's not really a tragedy).

> PO:
> #:src/MyDialog.cpp:23 src/MyOtherDialog.cpp:12
> msgid "Hello World"
> msgstr ""
>
> XLIFF representation:
> <group restype='x-directory' resname='src'>
>   <group restype='x-file' resname='MyDialog.cpp'>
>     <trans-unit id='1'>
>       <source>Hello World</source>
>     </trans-unit>
>   </group>
>   <group restype='x-file' resname='MyOtherDialog.cpp'>
>     <trans-unit id='2' translate='no'>
>       <source><ph id='x' xid='1'/></source>
>     </trans-unit>
>   </group>
> </group>
>
Hmm, this approach didn't occur to me, as it basically contradicts the expected 
usage of <file> elements, no? Something to change for XLIFF 2.0?

> > - Gettext's new msgctxt keyword was brought up before. Incidentally, the
> >   <comment> element in Qt's own .ts files maps pretty well to it. There
> >   is no standardized mapping for .xlf yet, though. I would pick up a
> >   previously suggested approach and do it like that:
> >
> >       <trans-unit>
> >         <source>foobar</source>
> >         <target>irgendwas</target>
> >         <context-group purpose="match information">
> >           <context context-type="x-gettext-msgctxt"
> > match-mandatory="yes">some context info</context>
> >         </context-group>
> >       </trans-unit>
> >
> >   For plural forms, the context would be attached to the plural group.
> >   The exact value for purpose= is not clear to me - the values suggested
> >   seem to refer to TM only. I think I would simply skip the purpose ...
>
> Translator editors can e.g. display the context to the translator only
> if 'purpose' is set to 'information', and hide it otherwise.
>
Oh, right - I misread the spec. So "information" is definitely correct.

> Similarly, a 
> TM processor can chose to perform additional 'context matching' based on
> the the 'match' purpose-value. This would e.g. be useful if you had two
> identical translation units, but with different contexts, and the TM
> processor could automatically match better based on these.
>
Yes, except that I need it to apply not only to the TM processor, but also to 
the tool that generates the output for the translator library in the program. I 
suppose it won't hurt if I slightly stretch the definition for the linugist 
tools, but it seems to me that something formally approved would be cleaner.

> > - .ts files know a <context> element. I consider it stronger than
> > msgctxt: it is not optional; every message is in a context. Therefore I
> > would map it to nested groups:
> >
> >       <group restype="x-trolltech-ts-context">
> >         <context-group purpose="match information">
> >           <context context-type="x-trolltech-ts-context"
> > match-mandatory="yes">the
> > context</context>
> >         </context-group>
> >         <trans-unit .../>
> >       </group>
> >
> >   FWIW, the mapping to PO would be via a magic extracted comment:
> >   #. ts:context <the context>
>
> This sounds sensible to me.
>
Good.

> > - As the repr. guide says, .po files do not encode the (target) language.
> >   Therefore I would add an X-Language: header to the initial msgstr. It
> > would be implanted and extracted during conversion. When converting from
> > an .xlf file which does not have a first message that seems to be a .po
> > file header, a message would be generated and marked with
> > X-Virgin-Header:; if this header is found on converting back, the message
> > would be zapped.
>
> Not sure I understand the use-case for this.
>
That's again for the lossless conversion. Simply because .ts needs the target 
language for the same purpose that .po uses the "Plural-Forms:" header - 
unfortunately, no unambiguous reverse mapping is possible.

> > - Gettext's #| msgid (previous source in fuzzy translation) would be
> > mapped to <alt-trans> elements as suggested on this list before: Each
> > previous source is tacked onto a current source. If more previous sources
> > than current sources exist (plural to singular "downgrade"), the source
> > gets two alt-trans elements, the second one with an empty target marked
> > with restype="x-dummy".
> > - Gettext's #| msgctxt would get mapped just like msgctxt, only that the
> >   context-type would be x-gettext-previous-msgctxt.

> > - Contrary to the guide, I would store obsolete messages, marking the
> >   <trans-unit> resp. the containing plural <group> with translate="no".
> >   I see no harm in doing this and it yields a more faithful conversion.
> >   The messages would go into a <file> with the imaginary original name
> >   Obsolete_PO_entries.
>
> I'm not sure if we really need to go to this extent. I guess it's more a
> design-question if XLIFF was really meant to be a replacement for all
> features that a format supports, rather than an extraction-format. E.g.
> obsolete entries in PO is a way of storing translation that was used in
> previous versions of the project, but are no longer used (however they may
> pop up in later versions of the project, that's why they are stored). XLIFF
> was not intended to be a storage container for these (I guess TMs replace
> this functionality), and I'm not sure if trying to mold XLIFF into such a
> storage container would break processing tools etc (wrong statistics, word
> counts, file counts etc).
>
Good point. But we need it for the lossless roundtrips again. :)
Luckily, lupdate has an option -noobsolete already - I guess adding that to the 
anticipated lconvert would not be exceedingly hard. :-)

> > - The guide does not specify how to map fuzzy plurals. I guess one should 
> >   require approval of all <trans-unit>s in the <group> for non-fuzziness.
>
> Yes, this is a design-limitation of the current XLIFF specification. This
> approach sounds reasonable to me.
>
OK

Regards,
-- 
Oswald Buddenhagen

Trolltech GmbH
Rudower Chaussee 13 
12489 Berlin
Germany

Fon:    +49 (030) 6392 3255
Fax:    +49 (030) 6392 3256 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]