On Sun, Mar 27, 2011 at 2:24 PM, Aleksander Morgado
<address@hidden> wrote:
Hi Krishnan,
> I am working on implementing
> pdf_status_t
> pdf_fsys_disk_get_parent (const pdf_text_t path_name,
> pdf_text_t parent_path);
> as part of FS#39.
>
> I have a (partial) patch that does the necessary string (ASCII)
> manipulation
> to get the canonicalized path name for the relative path supplied to
> the function.
> I would need some pointers on what unicode utility functions I could
> use to get this patch
> working for unicode filenames as well.
I'll review the patch with more detail in the following days, just a
small comment now regarding unicode strings. The path is always in host
encoding:
* in gnu systems paths will all be UTF-8 compatible, ending in a single
NUL byte, so all standard string utility methods are enough (e.g.
strlen()). If your patch assumes an ASCII-encoded path, it's very
probable that you don't need to do anything specific to support UTF-8.
* in win32, the path may come as wide chars, in UTF-16. Thus, you'll
need to use windows-specific wide-char versions of the same string
utility methods in the implementation (like wcslen()).
Another note regarding the fsys module API. Currently this API uses
pdf_text_t objects as input path arguments. This ends up being far from
optimal. Also, pdf_text_t to host encoding conversion doesn't include
trailing NUL bytes by default, which ends up requiring an additional
realloc() after the conversion to add these end-of-string NUL bytes. I
think its reasonable to change the API so that the fsys API gets as
input always host-encoded strings represented by NUL-terminated
(pdf_char_t *) instead of (pdf_text_t *) (or 2-NUL terminated wide char
strings in windows). Will be discussing this with the list in the
following weeks.
Cheers,
--
Aleksander