gm2
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gm2] On the gm2 build process and dependencies


From: Martin Hofmann
Subject: [Gm2] On the gm2 build process and dependencies
Date: Mon, 18 Jun 2012 23:33:17 +0200
User-agent: Mozilla/5.0 (X11; FreeBSD i386; rv:12.0) Gecko/20120506 Thunderbird/12.0.1

I've been trying to get my head around what gm2 actually does to build
a program, what gets linked in and why ...

Apart from understanding what goes on, my interest is to tailor the building of executables - from "all statically linked in" to "as much dynamically linked in as possible".

I've written together what I've collected so far, as a remainder to myself and maybe a reference for discussion. Some questions are included
in the last section below.

There are for sure errors and omissions in the following description, so
I'd be glad if someone "in the know" would point them out ...

Thank you for any comments!

Regards
Martin


*************


Build process of gm2 Modula-2
=============================

What follows is a step-by-step description of the insides of building a simple "hello world" type program. This is mostly derived from the output of the `-v` option during a run of gm2.

1  Compile modules to assembly files
------------------------------------

    cc1gm2: hello.mod --> hello.s

This is the "real" compilation step:

Apart from the symbols for defined Modula-2 variables and procedures of
the module, the `hello.s` will also contain function entry points

    _M2_Hello_init
    _M2_Hello_finish

which make the module's initialization and termination code callable from outside.

Furthermore, a reference to

    __gxx_personality_v0

is included because a pointer to this function is placed into a table
which is used for exception handling.


2  Assemble modules into object files
-------------------------------------

    as: hello.s --> helloprog.o

Nothing special here, this is done by all GCC compilers.

But note that the object file for the main program module is named
`helloprog.o` instead of `hello.o`. (This is of course only relevant if
non-temporary files are created via the `-save-temps-option` to gm2.)


3  Collect a list of all modules to be initialized
--------------------------------------------------

    gm2l: hello.mod --> hello.l

This starts from the main program module and collects all modules which
are directly or indirectly imported - including the standard library
modules.

It does scan `.def` and `.mod` files, as long as it finds them.

QUESTION: What if a module `A` imports a module `B` in its
implementation module, but the source `A.mod` is not available?

The resulting text file `hello.l` basically contains one name per line
(without the `.mod` or `.def` extension).

Source is `gm2build/gcc/gm2/gm2l.mod`.


3  Determine a possible initialization order
--------------------------------------------

    gm2lorder: hello.l --> hello.lst

This reorders the list of modules so that the most "basic" module, which
does not depend on any other moduel, comes first.

QUESTION: Or does it only care about run-time system modules to be
first, and the other modules are ordered by the search algorithm of
`gm2l`?

Source is `gm2-compiler/gm2lorder.mod`.


4  Generate a scaffolding main program
--------------------------------------

    gm2lgen: hello.lst --> hello.cpp

This generates a short C++ program which:

a) provides the entry point function `main()` for C++-like program
   startup (all following actions take place during execution of this
   main function),

b) calls all module initialization functions in the given order,

c) calls the program module initialization function last - this starts
   the Modula-2 program itself,

d) calls all module finalization functions (in reverse order) after the
   Modula-2 program module returns,

e) catches any exceptions thrown (or "raised") during all this
   initialization and finalization business, and gives an appropriate
   final messages if so (via `RTExceptions_DefaultErrorCatch()`).

f) or else returns 0 in case of normal termination (via `exit(0)`).

The sequence of functions calls to the outside of `hello.cpp` is thus
like this:

    _M2_Storage_init (argc, argv);
    _M2_SYSTEM_init (argc, argv);
    _M2_M2RTS_init (argc, argv);
    _M2_RTExceptions_init (argc, argv);
    _M2_IOLink_init (argc, argv);
    // ... other init functions ...
    M2RTS_ExecuteInitialProcedures (); /* sic, no '_' prefix? */
    _M2_hello_init (argc, argv);

    _M2RTS_ExecuteTerminationProcedures ();
    _M2_hello_finish ();
    // ... other finish functions, reverse order ...
    _M2_Storage_finish ();

Source is `gm2-compiler/gm2lgen.mod`.


5  Compile the scaffolding main program into an object file
-----------------------------------------------------------

    gm2cc: hello.cpp --> hellostart.o

This uses the C++ compiler `cc1plus` (the `gm2cc` generates the command
line). The resulting object is again not named `hello.o` but
`hellostart.o`.

Of course, this is done again in two steps, compilation and assembly.

QUESTION: Where does `gm2cc` come from?


6  Packing the object files into a library
------------------------------------------

    gm2lcc: hello.lst helloprog.o hellostart.o --> hello.a

I'm not quite sure why this is done, but this seems to collect all the
imported modules and the two modules generated from the program module
into one static library.

Source is `gm2-compiler/gm2lcc.mod`.


7  Linking it all together
--------------------------

This uses `collect2` (as a disguised `ld` command?) to link the stuff in
the static library with the run-time support objects and libraries, and
also the required Modula-2 libraries.

Here is the command line for `collect2`, with comments interspersed:


/usr/home/mh/opt/bin/../libexec/gcc/i386-unknown-freebsd9.0/4.1.2/collect2
    -V -dynamic-linker /libexec/ld-elf.so.1
    -o hello

- Now the objects and libs to include:

      /usr/lib/crt1.o

- `crt1.o` provides the real entry point into the executable, sets up
  `argc`/`argv`, calls `main()` function (and `_init`, `_fini`?)

      /usr/lib/crti.o

- `crti.o` defines sections `.init` and `.fini`, which each contain the
  prologue (initial part) of a `_init` rsp. `_fini` function. QUESTION:
  Where are `_init` and `_fini` called? I think in `crt1.o`?

  NOTE: These two objects correspond to the system's `libc`, that's why
  they come from `/usr/lib`.

        /usr/home/mh/opt/bin/../lib/gcc/i386-unknown-
        freebsd9.0/4.1.2/crtbegin.o

- `crtbegin.o` starts lists of constructors/destructors for global C++
  objects (`__CTOR_LIST__` and `__DTOR_LIST__`), starts sections
  `.ctors` and `.dtors`. (`collect2` arranges for a list of ctors and
  dtors to be placed in these sections.)

  NOTE: This is concerned with C++, and thus taken from gm2's
  installation.

      -L/home/mh/opt/lib/gcc/i386-unknown-freebsd9.0/4.1.2/gm2/iso
      -L/home/mh/opt/lib/gcc/i386-unknown-freebsd9.0/4.1.2/gm2/pim
      -L/usr/home/mh/opt/bin/../lib/gcc/i386-unknown-freebsd9.0/4.1.2
      -L/usr/home/mh/opt/bin/../lib/gcc
      -L/home/mh/opt/lib/gcc/i386-unknown-freebsd9.0/4.1.2

-L/usr/home/mh/opt/bin/../lib/gcc/i386-unknown-freebsd9.0/4.1.2/../../..
      -L/home/mh/opt/lib/gcc/i386-unknown-freebsd9.0/4.1.2/../../..

- Lib paths for gm2 and C++ libs.

      hello.a

- Objects for the program's modules, except libraries.

      -lgm2iso
      -lgm2

- Modula-2 libraries - the ISO library needs the basic library.

      -lm

- Math C lib - Modula-2 numerics are implemented on them.

      -lstdc++

- For the sake of `hello.cpp`, the scaffolding program, the C++ library
  is needed ...

      -lgcc_eh

- Provides _Unwind_RaiseException and other ABI EH functions, also
  `__gcc_personality_v0`, the C++ exception "personality" function.

      -lgcc_s

- Also provides `_Unwind_RaiseException` and other EH stuff, plus low-
  level arithmetic functions like in `libgcc`, plus threading ...

      -lgcc

- Low-level arithmetic functions to emulate architecture's missing
  capabilities. (Also provides a `__main` function which is called at
  the start of a C++ `main()` function, this function calls all the
  constructors listed in `__CTOR_LIST__`.)

      -lc

- The C lib (from the system, eg `/usr/lib`).

      -lgcc_s
      -lgcc

- Don't know why they are mentioned twice.

      /usr/home/mh/opt/bin/../lib/gcc/i386-unknown-
      freebsd9.0/4.1.2/crtend.o

- Counterpart to `crtbegin.o`, finishes the `.ctors` and `.dtors`
  sections.

      /usr/lib/crtn.o

- Counterpart to `crti.o`, finishes the `_init` and `_fini` functions
  and their sections.


8  Discussion (and questions ...)
---------------------------------

Apart from compiling source code into object files, the while build
process is concerned with three (or four?) issues. I wonder if some of
this could be simplified.

1. Program startup (and termination).

   Would'n it be nicer and easier to generate a Modula-2 scaffolding
   program containing a `main()` function?

   Could this also avoid the need to link `libstd++` in?

   Since this scaffoling program would only vary in the list of
   functions to be called (init, main, finish), we could even re-use a
   fixed object module which references just a list of function pointers
   outside in a build-time generated (assembly?) object ...?

2. Initialization and finalization of modules (this is not a problem in
   C, but it is related to C++ global object construction/destruction
   and very similar to Ada's elaboration order issues).

   Could the chasing of imports for an initilization order be deferred
   to runtime, thus making the build process simpler (and the use of
   modules in shared libraries effortless)? I think of a scheme like
   this:

   Into every module are two procedures (and variables) generated along
   the follwing lines (in a Module A which imports B and C), the
   procedures need to be exported. They keep track of a reference count
   and init/finish the module the first time it is needed rsp the time
   it is no longer needed.

       VAR _M2_importCount : CARDINAL;   (* Assume BSS init to 0 *)
           _M2_isInitializing : BOOLEAN; (* Assume init to FALSE *)

       PROCEDURE _M2_A_import;
       BEGIN
         INC(_M2_importCount);
         IF _M2_importCount = 1 THEN
           (* First time import - initialize now *)
           _M2_isInitializing := TRUE; (* Protect from cyclic init. *)
           _M2_B_import;               (* Need B initialized *)
           _M2_C_import;               (* Need C initialized *)
           _M2_A_init;                 (* Initialize A itself *)
           _M2_isInitializing := FALSE;
         ELSIF _M2_isInitializing THEN
           (* Cyclic dependency - bad thing! *)
           HALT
         END
       END _M2_A_import;

       PROCEDURE _M2_A_release;
       BEGIN
         (* Assert _M2_importCount > 0 *)
         DEC(_M2_importCount);
         IF _M2_importCount = 0 THEN
           (* Last release - finalize now, in reverse order *)
           _M2_A_finish;  (* Finalize A itself *)
           _M2_C_release; (* Don't need C any more *)
           _M2_B_release; (* Don't need B any more *)
         END
       END _M2_A_release;

   This way every module would itself arrange for the initialization of
   its imported modules. (A more elaborate variant would distingish
   between import of procedures and variables - which need initialized
   provider modules - and import of types and constands only - wich
   doesn't.)

3. Exception handling setup.

   As far as I can tell now (this is rather new stuff to me), gm2 uses
   gcc's "zero exception cost" model througout. There are roughly three
   components to it:

   a) Tables of frame info in the compiled objects;

   b) language-independent runtime functions like
      `_Unwind_RaiseException`, these reside in `libgcc_eh` (and/or
      other places?);

   c) a language-dependent "personality" function, this gets called
      during stack unwinding and is responsible to find an appropriate
      exception handler in a given frame.

   I'm not sure how much of `libgcc`, `libgcc_eh`, `libgcc_s`,
   `libstdc++` is actually neede to implement this kind of exception
   handling - assuming a "pure" Modula-2 program, not a mixed-language
   beast.

   I'd very much like to get rid of the dependency of both the C++
   compiler and the C++ library ...

4. Threading setup.

   Here gm2 uses the GNU `libpth` library. Can't say much about it now.


*********************

So far I have twiddled with linking a sample program against shared

    libstdc++
    libgm2
    libgm2iso

(the gm2 shared libs were cobbled together from the object files in the
`SO` dirs ...).

This kind of worked, except that

- lots of complex number functions (`ccos` and friends) are not in my
  `libm` and generate undefined references - it would be a labor of love
  to reimplement them based on a C90 (not C99) library ...

- the path and name for the `libpth` had to be given explicitly to
  satisfy references to it (from the Modula-2 libraries).

It also seems that exception handling is somewhat brittle in these
circumstances, but this can well be my fault :-)







reply via email to

[Prev in Thread] Current Thread [Next in Thread]