classpath
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Target native layer


From: Dr. Torsten Rupp
Subject: Target native layer
Date: Tue, 10 Aug 2004 13:30:09 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624

Dear Classpath-developers,

I read many emails in the list about the TARGET_*-layer. I'm not very happy about the discussion, because we discussed this already around 1 year ago and at that time it seemed everybody was happy to get the abstraction layer TARGET_*. And of course I'm not happy about it, because there is a discussion to remove it completely without imho understanding the idea behind the layer. After exchanging some private emails I like to post some of my thoughts in public.

1. Advantages and disadvantages

Advantages of TARGET_*:
- efficient code (independent of compiler)
- easy to port (just override the macros which are some
  special case for the target compared to the generic
  implementation)
- macros are usually small (only 3-5 lines)
- functions can also used (e. g. for complex native code)
- there is no "dead-code"
- no need for extensive ifdef-elif-else-endif constructs
- target dependent implementation is located in a single file
  (header-file with macros)

Disadvantages of TARGET_*
- debugging is more complicated
- not type-save

Advantages of target-layer functions like do_*()
- debugging is easier
- type-safe like other C-code

Disadvantages of target-layer functions do_*()
- less efficient (additional function-call)
- "dead-code" if some function in a object-file is not used
  (problem with linker; see comments below)
- code cluttered with ifdef-elif-else-endif constructs
- running autoconf for embedded systems is difficult, thus
  many "hard-wired" predefines are needed
- target dependent implementation are located in several
  files (configure, header-file with predefines, c-file)

2. Naming and other "cosmetic" things

The naming convention of the TARGET_* is like the following:

TARGET_NATIVE_<module>_<function>

<module> stands for some group of functions, e. g. file-functions.
<function> stands for the name of a function, usually the name of the
corresponding OS-function from Linux extended by some suffix, e. g.
OPEN_READ which stands for open(...,RD_READONLY). There is (or at
least: there should be) no exception, thus some names could become
a little bit long. But there are no "arbitrary" abbreviations
which are difficult to understand.

For the math-macros the naming is also like this, e. g.

TARGET_NATIVE_MATH_FLOAT_DOUBLE_ISNAN

module: MATH_FLOAT (floating point macros)
function: DOUBLE_ISNAN (check if a double is NaN)

The prefix TARGET_NATIVE_ is always used to avoid naming conflicts with
existing names in the OS-includes. E. g. using OPEN_READ() only would be
dangerous, because OPEN_READ could be some already existing constant,
macro or function in the specific target OS.

OF COURSE.... this naming is _not_ fixed and I do _not_ assume it is the
best solution at all (but it is some solution; before that we had many
conflicts with different OSs, because of the so "convenient" short names). If needed this can be changed without to much confusion and pain (it would be some pain of course for aicas, but this would be acceptable).

Cosmetics are:
 - length of names for macros (usually I never typeing a macro name,
   instead I use cut+copy; emacs-users can use auto-completion,
   Eclipse-users also have some help be the IDE). By the way: the
   longest name is currently 55 characters long.
 - prefix TARGET_NATIVE: some other would also be good
 - length of lines

3. complexity of macros, debugging

It is true that #defines are difficult to debug. They are also difficult
to write, but of course it depends always on the specific macro. Usually
the macros are of the form like this

#define TARGET_NATIVE_<name>(...) \
  INCLUDE
  do { \
    FUNCTION
    RESULT
  } while (0)

e. g.:

#define TARGET_NATIVE_FILE_OPEN(filename,filedescriptor,flags,permissions,result) \
  #include <sys/types.h>
  #include <sys/stat.h>
  #include <fcntl.h>
  do { \
    filedescriptor=open(filename, \
                        flags, \
                        permissions \
                        ); \
    result=(filedescriptor>=0)?TARGET_NATIVE_OK:TARGET_NATIVE_ERROR; \
  } while (0)

The standard implementation (generic) contain 138 of these macros. Only 9 are more complex, because of different possible implementations (selected by autoconf) or transformation of values. Thus in most of the cases the macros are only "wrappers" for some OS-specific call including adaption of parameters (e. g. types or units, result value).

Imho the complexity of the macros is usually not very high (if it becomes complex also a function can be implemented; then a macro is only an "alias"). They are multi-lined to make them more readable. The "include"-statements are needed in the generic implementation and the
"do...while" is a construct for safe usage of the macro.

Debugging of macros is difficult - if they are complex. If a macro is only a wrapper then only a OS-specific function is called with some additional calculations, e. g. evaluation of the return value. It is a good idea to keep the macros as simple as possible. And it is possible, because the TARGET_*-layer does not add additional functionally, it only "maps" some functionality.

3. autoconf, POSIX - porting

autoconf is a nice tool which is also used at aicas heavily (half of my
time I'm doing "autoconf"). But autoconf is also limited in its usage. For Unix-like systems autoconf is a good solution (for this autoconf is written, I assume), but for non-Unix-like systems autoconf could be become a problem. We discussed at aicas around 1 year ago if we should use autoconf only, but we detected this is not really possible. Especially for embedded systems and "strange" systems (e. g. MinGW or embOS) it is a big challenge to "trim" autoconf in such a way that the right configuration is selected. I detected some complicacy of autoconf and the specific OS which makes it very difficult to use autoconf only. I will give you a few examples:

- for embeeded system it is not always feasible to check if a function
exists by compiling and linking a small example-program, because sometimes linkage is done only partially on the host. Final linkage is done on the target when loading the program or when creating the system image with the included application. Thus AC_CHECK_FUNC is not feasible. The same problem occurs for other checks, e. g. constants or datatypes.

- some systems have very strange and even wrong header-files. E. g. for
Windows/MinGW the headers sys/stat.h, io.h, windows.h, winbase.h are
needed for checking chsize() (truncate) or mkdir(). For embOS even some
header-files can not be included, because they are wrong (but cannot be
changed/fixed by aicas). These things make autoconf very complicate to
use, because if there are a lot of possible functions which can be used to implement some feature, it is not clear which function is detected by
autoconf for some specific system. There even could be very bad
side-effects if more than one function is available (e. g. f1() and f2()) and at some time f2() is used instead f1() (with different behaviors or limitations), if there is some change for another target system (e. g. you add some changes for RTEMS, but this will also have effects on e. g. embOS. You will not detect this problem until testing again all targets for any change in autoconf). It is a little bit "undeterministic" which features are detected and if they are usable.

- some features are not detectable by autoconf, e. g. the ordering of
parameters for functions like inb() and outb() (we had that problem) or
additional parameters (which usually only produce a warning which is
discarded), e. g. gethostbyname_r() under Solaris.

There are much more difficult things which occur with autoconf. To replace some target layer (e. g. TARGET_*) by autoconf only will imho make implementations for non-Unix-like systems very difficult and will only shift the so called "complex" C-macro-implementation into "complex"
autoconf-macros-implementations (imho M4 is not much better then a
C-preprocessor and difficult to debug).

4. Multiple code - some statistics:

In the current implementation we use at aicas we have the following
systems. The numbers below count the number of macros at all (functions
and constants) which are different from the standard (generic)
implementation:

generic macros: 220

Linux: 0
Solaris: 11
RTEMS: 3
MinGW: 46
embOS: 10 (only partially implemented)

There are 0 (Linux) upto 46 (MinGW) special cases of macros. Some have to be implemented because of different OS functions, some are implemented for efficiency (e. g. some OSes offer a POSIX-thread interface, but the native thread-interface is usually more efficient. It can be used with the macros-technique without any overhead). Thus for some targets 0..20% of the macros have to be reimplemented to cover special causes. For "Unix"-like systems this is only usually less then 5%.

Some additional comments:

Efficient code: wrapper-functions are nice, but in some cases an overkill, e. g. when calling simple function like sin(). In general C compilers do not optimize this (imho that is some reason why "inline" was introduced). If "inline" can be used, macros are almost not needed anymore.

autoconf: autoconf is a good idea and aicas is also heavily using it, but there are some limitations especially for embedded systems: because
autoconf can not run test-programs on embedded target-systems, some test
can not be done with autoconf (see above). I had to replace many
autoconf-test-functions by special versions which can be used for embedded systems. And still there are many things which are difficult to handle, e. g. which include files have to be included for some native-function. Some target systems make it really hard to use autoconf in the right way.

dead-code: the standard GNU linker does not remove functions which are not used (dead-code). Thus if at least one function is needed from a object file, all other function from that object file are linked to the application, too. There is only one automatic way to remove dead-code (-ffunction-sections), but this have other disadvantages. Even the man-page do not recommend it. Thus to remove dead-code of a function some #ifdef-endif around a function is needed. On the other side: A non-used macro will not produce any dead code.

Some personal view:

By the way: I like "long" names and I hate uncommon abbreviations e. g.
like "fnctn" instead "function". I also like some prefix which indicate the location of, e. g. "file_open()" instead "open()". I usually have no problems with long names if the naming is consistent and useful. I also have no problems with lines longer then 80 characters, because my editor does not have some "optimal" line length.



These are my thoughts to this topic. I hope all developers who are interrested in some target native layer will reconsider the current discussion. And I hope we will find a solution which can satisfy everybody.

Sincerely,

Torsten





reply via email to

[Prev in Thread] Current Thread [Next in Thread]