[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [pdf-devel] Updated tokeniser/parser patch

From: Michael Gold
Subject: Re: [pdf-devel] Updated tokeniser/parser patch
Date: Fri, 23 Jan 2009 19:27:47 -0500
User-agent: Mutt/1.5.18 (2008-05-17)

On Fri, Jan 23, 2009 at 23:54:58 +0100, address@hidden wrote:
>    For the basic type, something like pdf_obj_t is needed by the tokeniser
>    anyway even if it's never exported publically.
> Could we use something like 'pdf_token_t'?

Sure, and parts of the pdf_obj code could be used for that.

But several types of tokens are basically the same as objects (int,
real, string, and name) -- we should keep this in mind when designing
the object layer, and maybe find some way to share code.

> The more I think about
> moving the tokeniser to the base layer the more convinced I am in that
> it is a good idea:
> - We could use it in the type 4 functions parser in the fp module.
> - We could let the user of the library to use the tokeniser,
>   publishing it as a module in the base layer: some applications may
>   find it useful.

I was thinking of making it public too, and had the functions exported
in pdf.h with the latest patch.

At some point we'll need code to write tokens to a stream, which would
be another base layer task.

>    The paper gives this example:
>      struct the_struct
>      {
>        int foo;
>        // ...and more fields
>        uintptr_t filler[8];
>      };
> That is like a "fat" opaque pointer :) I think that we can use that
> approach when publishing opaque little structures (such as cartesian
> points, list iterators, etc).

Yes, it would make sense for structures that don't require us to
allocate any memory, and that we don't expect clients to keep around
after exiting a function.

Structures like this probably won't need a destructor either, which
makes things a bit simpler for the client.

>    I don't have a problem with it, but it will need something like
>    pdf_obj_t to store these types:
>      int, real, string, name, comment, keyword
>    as well as the valueless types corresponding to
>      "{", "}", "<<", ">>", "[", "]"
>    And the parser will want to put these objects inside dicts and
>    arrays, though it could convert them or wrap them if necessary.
> What about the 'pdf_token_t' that I mentioned above? The parser in the
> object layer would still be able to use pdf_token_t if needed.

It's no problem for the parser to read pdf_token_t objects, but I'm not
sure what type of object it should output. Some component will need to
replace tokens with the objects that will be visible by the client (and
it would be nice not to have to rewrite every dict/array at that time).

-- Michael

Attachment: signature.asc
Description: Digital signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]