[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[pdf-devel] Sources organization
From: |
jemarch |
Subject: |
[pdf-devel] Sources organization |
Date: |
Mon, 21 Jan 2008 17:33:02 +0100 |
User-agent: |
Wanderlust/2.14.0 (Africa) SEMI/1.14.6 (Maruoka) FLIM/1.14.8 (Shijō) APEL/10.6 Emacs/22.0.93 (x86_64-unknown-linux-gnu) MULE/5.0 (SAKAKI) |
Hi.
Due to the new library architecture documented in
http://www.gnupdf.org/Lib:Architecture we should decide what kind of
sources organization we are going to use. So I would like to discuss
some points.
1. Underscores vs. hyphens in file names
We already discussed that issue in this list. Following Behdah's
and Karl's advices we are going to use hyphens instead of
underscores in source file names.
2. One library or Four libraries?
The envisioned architecture for libgnupdf is a layered one,
featuring four distinct layers (from top to bottom):
- Page Layer
This layer implements several abstractions that represent the
contents of a page in a PDF document: text, lines, arcs, bitmaps,
etc. This layer also provides rasterized bitmaps with page
contents, using some graphics library. An API is provided to both
read and write page contents.
The interaction with the graphics library (libcairo) will be
implemented in this layer.
- Document Layer
This layer implements the concept of PDF documents as a
collection of pages, annotations, fonts, sounds, 3d artwork,
discussion threads, forms, etc. It is implemented on top of the
object layer. An API is provided to manipulate those
abstractions.
- Object Layer
This layer implements the concepts of PDF objects and PDF
documents as a structured hierarchy of objects. An API is
provided to manipulate that structure and the objects that
compound it. Since the object hierarchy can be quite complex a
garbage collection mechanism is provided to the client of the
layer.
- Base Layer
This layer implements basic functionality such as memory
allocation, fixed-point arithmetic, interpolation functions,
geometry routines, character encoding and access to the
filesystem. The base layer is the responsible to provide common
system-independent abstractions to other parts of the library.
There are several alternatives to organize the library:
2.a) One library with conditional compilation of some layers
In this scheme a single library called libgnupdf.(so|a) would
be constructed. The user would select the layers to be
included in the library using a `configure' switch like:
--with-pdf-level=(page|document|object|base).
Then if the `document' level is selected the library will
contain the code of the base, object and document layers.
In this way we would achieve flexibility: a client application
may want to manipulate the contents of a PDF document in order
to extract text or metadata (like libextractor). The `page'
layer is not needed in this scenario (nor the dependency with
libcairo, for example). That client application would want to
link against a `--with-pdf-level=document' compiled libgnupdf.
The libray would then distribute a single `pdf.h' file
granting access to the functionality of all the enabled
layers.
2.b) Four libraries with chained dependencies
An alternative to 2.a) is to build four libraries:
- libgnupdf-page that depends on
- libgnupdf-doc that depends on
- libgnupdf-obj that depends on
- libgnupdf-base
Each library would then distribute a header file:
`pdf_base.h', `pdf_doc.h', `pdf_obj.h' and `pdf_page.h'.
3) Single sources directory vs. several sources directories
There are two possibilities:
3.a) Single directory containing all the source files
This is the current schema used in src/
3.b) A directory for each layer
In this schema we would have four source directories:
- src/base
- src/object
- src/document
- src/page
4) Symbol Names and File Names
Due to the quite clear distinction between layers would be
quite useful to carry information about the layer in both file
names and symbol names.
In this way files pertaining to the LAYER module would use the
`pdf-LAYER_LETTER' prefix in its name. Symbols (variables,
constants, functions and data types) would also use the same
prefix.
Several examples:
- The `pdf-b-stm.h' file would define the `pdf_b_stm_t' data type.
- The `pdf-o-doc.h' file would define the `pdf_o_doc_t' data type.
- The `pdf-o-obj.h' file would define the `pdf_o_obj_get_XXX' function.
I see the following alternatives regarding this issue:
4.a) Carry layer information in both file and symbol names
4.b) Carry layer information in symbol names only
4.c) Carry layer information in file names only
4.d) Do not carry layer information in files nor in symbols and use
different prefixed only in the presence of a collision.
My thoughts about these points are the following:
- I am not sure about 2.a) vs 2.b)
- I would choose 3.a) only if 2.a)
- I would choose 3.b) only if 2.b)
- I would choose 4.a)
That schema may lead to quite long names (such as
`pdf_b_stm_f_pred_dealloc') and/or redundant ones (such as
`pdf-o-obj.h'). But I think these are not serious problems. We would
need to use different prefixes for `pdf_doc.h' (object level) and
`pdf_doc.h' (document level) anyway.
So, what do you think about these points? Maybe there is a better
2.c)? Or a more suitable 3.c)?
It is important to address this issue now: we are near to complete the
base layer architecture and thus to begin an intense implementation
period.
Thanks.
- [pdf-devel] Sources organization,
jemarch <=