pdf-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[pdf-devel] Sources organization


From: jemarch
Subject: [pdf-devel] Sources organization
Date: Mon, 21 Jan 2008 17:33:02 +0100
User-agent: Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (Shijō) APEL/10.6 Emacs/22.0.93 (x86_64-unknown-linux-gnu) MULE/5.0 (SAKAKI)

Hi.

Due to the new library architecture documented in
http://www.gnupdf.org/Lib:Architecture we should decide what kind of
sources organization we are going to use. So I would like to discuss
some points.

1. Underscores vs. hyphens in file names

   We already discussed that issue in this list. Following Behdah's
   and Karl's advices we are going to use hyphens instead of
   underscores in source file names.

2. One library or Four libraries?

   The envisioned architecture for libgnupdf is a layered one,
   featuring four distinct layers (from top to bottom):

   - Page Layer

     This layer implements several abstractions that represent the
     contents of a page in a PDF document: text, lines, arcs, bitmaps,
     etc. This layer also provides rasterized bitmaps with page
     contents, using some graphics library. An API is provided to both
     read and write page contents.

     The interaction with the graphics library (libcairo) will be
     implemented in this layer.

   - Document Layer

     This layer implements the concept of PDF documents as a
     collection of pages, annotations, fonts, sounds, 3d artwork,
     discussion threads, forms, etc. It is implemented on top of the
     object layer. An API is provided to manipulate those
     abstractions.

   - Object Layer

     This layer implements the concepts of PDF objects and PDF
     documents as a structured hierarchy of objects. An API is
     provided to manipulate that structure and the objects that
     compound it. Since the object hierarchy can be quite complex a
     garbage collection mechanism is provided to the client of the
     layer.

   - Base Layer

     This layer implements basic functionality such as memory
     allocation, fixed-point arithmetic, interpolation functions,
     geometry routines, character encoding and access to the
     filesystem. The base layer is the responsible to provide common
     system-independent abstractions to other parts of the library.
      
   There are several alternatives to organize the library:

   2.a) One library with conditional compilation of some layers

        In this scheme a single library called libgnupdf.(so|a) would
        be constructed. The user would select the layers to be
        included in the library using a `configure' switch like:
        --with-pdf-level=(page|document|object|base). 

        Then if the `document' level is selected the library will
        contain the code of the base, object and document layers.

        In this way we would achieve flexibility: a client application
        may want to manipulate the contents of a PDF document in order
        to extract text or metadata (like libextractor). The `page'
        layer is not needed in this scenario (nor the dependency with
        libcairo, for example). That client application would want to
        link against a `--with-pdf-level=document' compiled libgnupdf.

        The libray would then distribute a single `pdf.h' file
        granting access to the functionality of all the enabled
        layers. 

   2.b) Four libraries with chained dependencies

        An alternative to 2.a) is to build four libraries:

        - libgnupdf-page that depends on
        - libgnupdf-doc that depends on
        - libgnupdf-obj that depends on
        - libgnupdf-base

        Each library would then distribute a header file:
        `pdf_base.h', `pdf_doc.h', `pdf_obj.h' and `pdf_page.h'.
        
3) Single sources directory vs. several sources directories

   There are two possibilities:

   3.a) Single directory containing all the source files

        This is the current schema used in src/

   3.b) A directory for each layer

        In this schema we would have four source directories: 
   
        - src/base
        - src/object
        - src/document
        - src/page

4) Symbol Names and File Names

   Due to the quite clear distinction between layers would be
   quite useful to carry information about the layer in both file
   names and symbol names.

   In this way files pertaining to the LAYER module would use the
   `pdf-LAYER_LETTER' prefix in its name. Symbols (variables,
   constants, functions and data types) would also use the same
   prefix.

   Several examples:

   - The `pdf-b-stm.h' file would define the `pdf_b_stm_t' data type.
   - The `pdf-o-doc.h' file would define the `pdf_o_doc_t' data type.
   - The `pdf-o-obj.h' file would define the `pdf_o_obj_get_XXX' function.
 
   I see the following alternatives regarding this issue:
  
   4.a) Carry layer information in both file and symbol names
   4.b) Carry layer information in symbol names only
   4.c) Carry layer information in file names only
   4.d) Do not carry layer information in files nor in symbols and use
        different prefixed only in the presence of a collision.


My thoughts about these points are the following:

- I am not sure about 2.a) vs 2.b)
- I would choose 3.a) only if 2.a)
- I would choose 3.b) only if 2.b)
- I would choose 4.a) 

  That schema may lead to quite long names (such as
  `pdf_b_stm_f_pred_dealloc') and/or redundant ones (such as
  `pdf-o-obj.h'). But I think these are not serious problems. We would
  need to use different prefixes for `pdf_doc.h' (object level) and
  `pdf_doc.h' (document level) anyway.

So, what do you think about these points? Maybe there is a better
2.c)? Or a more suitable 3.c)? 

It is important to address this issue now: we are near to complete the
base layer architecture and thus to begin an intense implementation
period.

Thanks.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]