pdf-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [pdf-devel] Proposal of API for the Encoded Text module


From: Aleksander Morgado
Subject: Re: [pdf-devel] Proposal of API for the Encoded Text module
Date: Mon, 28 Jan 2008 15:42:35 +0100
User-agent: Thunderbird 2.0.0.9 (Macintosh/20071031)


Just one other thing to remember is that PDF Names are either a subset
of PDDocEncoding _OR_ they are valid UTF8 strings.  (See PDFRef 1.7,
3.2.4).

Leonard

In fact, I think that this is one of the reasons to have the UTF-8 built-in support in the library. I suppose that PDF Name and PDF String types in the `object library' will be based on the pdf_text_t from the `base library', which directly supports UTF-8.

Anyway, are you sure that these two encodings are the only ones allowed for PDF Names? In older Acrobat versions PDF Names could be encoded in specific `host encodings', like Shift-JIS or Big Five for Asian languages (PDFRef 1.7, H.3).

If this is the case, how can we detect the encoding being used in the PDF Name? For example, a PDF with a japanese encoding for PDF names which is read in a US-localized system... What the present text module API provides so far is a function to detect the best encoding for a given Unicode string, and not a function to detect the encoding being used in a given multibyte string. Something like this could also be needed, but I am not sure if this is possible to implement.

--
Aleksander




reply via email to

[Prev in Thread] Current Thread [Next in Thread]