[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: plists in UTF8
From: |
David Ayers |
Subject: |
Re: plists in UTF8 |
Date: |
Wed, 14 Jun 2006 14:12:23 +0200 |
User-agent: |
Mozilla Thunderbird 1.0.2 (X11/20060423) |
Richard Frith-Macdonald schrieb:
>
> On 14 Jun 2006, at 11:43, David Wetzel wrote:
>
>> Hi folks,
>>
>> plparse does not work with plists that contain UTF8 Cyrillic chars.
>>
>> Property List Editor.app on Mac OS X does.
>>
>> File says: ru.plist: UTF-8 Unicode C program text
>>
>> May we change this behaviour?
>
>
> Well is it a bug? ... plparse is intended to provide a check that a
> file contains a valid property list ... but it could easily be the case
> that 'Property List Editor.app' will edit invalid property lists (fault
> tolerance makes sense in an editor b ut not in a checker) ... so what
> you probably need to determine is if there is a bug in plparse.
>
> A valid property list may ...
>
> 1. Be ASCII data (with \U escapes for unicode)
> 2. Be UTF-16 with a leading BOM to identify it
> 3. Be UTF-8 with a leading BOM to identify it
>
> I guess in theory an XML property list could also specify its character
> encoding in the header but we don't have support for that.
>
> Anything else is invalid ... because it's non-portable and the meaning
> of the data in the file would change if you opened the file using
> another locale.
>
> I guess if you want plparse to accept non-portable files (ie guess that
> the encoding is that of the current locale), you could provide a patch
> to add a command-line option to get it to do that.
> eg. plparse -PermitNonPortable YES filename
>
> I don't think that would cause problems for anyone.
The issue is whether a UTF-8 plist without a BOM is a valid plist (i.e.
should be considered non-portable).
I've often read that BOM's in UTF-8 files cause issues (e.g.:
http://en.wikipedia.org/wiki/Byte_Order_Mark). It becomes a problem
when multiple text files are concatenated and someone (I think it was
you) told me that BOM's within files have been deprecated. (I wonder if
cat(1) or it underlying facilities would be patched to handle this).
I think that one could argue that a plain UTF-8 file should be
considered valid/portable by plparse... But for that to be of any value
would also mean, that UTF-8 files would be parsed correctly in non-UTF-8
locales, which I suppose is the reason that UTF-8 without BOM is
currently considered non-portable.
Cheers,
David
- plists in UTF8, David Wetzel, 2006/06/14
- Re: plists in UTF8, Richard Frith-Macdonald, 2006/06/14
- Re: plists in UTF8,
David Ayers <=
- Re: plists in UTF8, Richard Frith-Macdonald, 2006/06/14
- Re: plists in UTF8, Pete French, 2006/06/14
- Re: plists in UTF8, Pete French, 2006/06/14
- Re: plists in UTF8, Richard Frith-Macdonald, 2006/06/14
- Re: plists in UTF8, Pete French, 2006/06/14
- Re: plists in UTF8, David Ayers, 2006/06/14