bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: gettext cvs, woe32 dlls


From: Charles Wilson
Subject: Re: gettext cvs, woe32 dlls
Date: Fri, 12 May 2006 16:35:18 -0400
User-agent: Thunderbird 1.5.0.2 (Windows/20060308)

Bruno Haible wrote:

Thank you for your long explanations. I believe that I have committed to
the gettext CVS a solution that, like yours, supports building DLLs on
Cygwin, but also satisfies the following additional goals:

Overall response: very nice solution. I look forward to seeing it. I've got a typically long-winded detailed response below, and at times it may seem overly critical. However, the takeaway message remains: very nice, and I believe it will work for gettext on cygwin and mingw. A similar methodology could be adopted by other C libraries on those platforms. I don't think a similar solution will work, in general, for C++ libraries -- but gnu::autosprintf is coded in such a way that on cygwin, with g++-3.4.4, it "squeaks by".

  A) No source code in .h and .c files change. Only some infrastructure is
     added in separate files and in Makefile.ams and configure.acs.

Nice.

  B) For public libraries, the same .h file is valid regardless whether
     the user will link with the shared or with the static library. No
     STATIC_LIBRARY_FOO flags.

That's a neat trick. <g>

  C) The boundaries between private libraries (here: between libgettextlib
     and libgettextsrc) does not require source code changes. I.e. a
     module can be moved from libgettextsrc to libgettextlib or vice
     versa without source code changes. (This is important because most of
     libgettextlib is shared code from gnulib. It must not carry the name
     of the library into which it gets compiled.) So no
     LIBGETTEXTSRC_DLL_VARIABLE etc. macros.

Yeah, I was kinda worried about that -- I saw the ChangeLog comments concerning some of the files my patch modified: "Reimported from gnulib". I could just imagine a new re-import clobbering my changes...

Let me explain, because some things would be simpler if libtool had the
adequate support for it.

(This lack of meshing well with libtool is why most other packages -- and libtool -- signed on to the auto-import bandwagon)

GNU ld's --enable-auto-import has three fatal drawbacks:
  - It produces executables and shared libraries with relocations in the
    .text segment, defeating the principles of virtual memory.

I agree this is a serious drawback, but only because it means every process has its own true copy of the *client's* .text segment. I'm not sure how this "defeats the prinicples of virtual memory" unless .bss segments ALSO "defeat" them (.rdata is still "shared", now that it is actually used [circa gcc-3.4 on pei386] ). It just means that the *client's* .text segment in each process is backed by separate real memory in each case, instead of the same block of memory for all cases. Given that the .text segment is usually the largest segment by far, that's bad enough, isn't it?

It means that there is (virtually) NO memory savings in having multiple processes use the same DLL (if that DLL is a client of another and auto-imports data from it -- recall that for DLLs which do NOT auto-import from somewhere else, their .text is still read-only; see code snippet below) Now, eventually this is a distinction without a difference: in a large framework like gnome or kde, almost all DLLs are clients of at least one low-level DLL that exports DATA items -- and in an auto-import regime like cygwin, it means that almost all DLLs will have a writable .text so you lose the *physical* memory advantages of DLLs for *most* of them.

Thus, the only remaining benefit to shared libraries is the ability to slipstream in an updated version without relinking client apps.

But "virtual memory" defeated? There's a lot more to "virtual memory" than simply the mechanics of loading certain code objects from disk! Say rather that auto-import defeats (most of) the _physical_ memory savings expected by users of DSOs on virtual memory systems.


From pe-dll.c:
void pe_create_import_fixup (rel)
{
  ...
  if (!name_thunk_sym || name_thunk_sym->type != bfd_link_hash_defined)
  {
     ...
     /* If we ever use autoimport, we have to cast
        text section writable */
     config.text_read_only=false;
  }
}

And, of course, in the case of your packages, in an auto-import regime both libgettextsrc and libgettextlib force their clients to auto-import data. Thus, the .text segment *of the client app* for all the utilities msg* etc will be writable and subject to the physical memory disadvantages above -- if you were going to simultaneously run a bunch of msg* applications on a windows box!

However, you *might* simultaneously run a bunch of applications that all rely on libintl (like bash shells, for instance). Since libintl (in an auto-import regime) forces ITS clients to auto-import -- then the .text segment of bash.exe will be marked writable, and all those open shells will each incur a physical memory penalty. (Ditto for the libgettextsrc/libgettextlib .text segments, since they are clients of libintl: so the per-process physical memory penalty for all those simultaneously-running msg* applications is bigger than just the msg* app's .text; it also includes the libgettextsrc/libgettextlib .text)

  - For some constructs such as
        extern int var;
        int * const b = &var;
    it creates an executable that will give an error at runtime, rather
    than either a compile-time or link-time error or a working executable.
    (This is with both gcc and g++.) Whereas this code, not relying on
    auto-import:
        extern __declspec (dllimport) int var;
        int * const b = &var;
    gives a compile-time error with gcc and works with g++.

I'm wondering if the magic code in g++ that allows this (necessary for proper initialization of C++ objects) should be re-implemented in gcc -- or moved from the C++ frontend to the pe[i]386 backend -- specifically to enable this to work in both cases. But with PRESENT capabilities of gcc/g++, you're right.

  - It doesn't work in some cases (references to a member field of an
    exported struct variable, or to a particular element of an exported
    array variable), requiring code modifications.  One platform dictates
    code modifications on all platforms.

This happens all the time: #if HAS_PROPERTY_FOO ... #else ... #endif is a source code modification that appears in common code seen by all platforms, even if the preprocessor removes it on most of them. I'm thinking here of AC_C_CONST or my Nov2005 attempt with CONST_PROBLEMATIC_WIN32.

Further, in 0.14.5's po-lex.h:

#if !(__STDC__ && \
((defined __STDC_VERSION__ && __STDC_VERSION__ >= 199901L && !defined __DECC) \
   || (defined __GNUC__ && __GNUC__ >= 2 && !defined __APPLE_CC__)))

(Now, granted, the above ugliness has disappeared from 0.15pre2 -- because now ALL platforms use the po_gram_error* functions rather than the optimized macros. But even there, isn't that a case of less-capable platforms finally "forcing" more capable ones to do it the same way as their weaker cousins?)

Plus, the newer --enable-runtime-pseudo-relocs option (enabled by default, but has no effect if --auto-import is disabled) usually takes care of this drawback. It relies on pre-main startup code linked-in from the platform runtime; cygwin1.dll on cygwin, but crt0.o on mingw.

This is unacceptable.  Therefore I disable this option, through the
woe32-dll.m4 autoconf macro.

That's fine, and it's your decision as maintainer. We've actually already had, and settled, this argument last Novemenber, when you announced this decision. However, I would like to point out the benefits to my platform that auto-import has generated, notwithstanding the drawbacks above -- not in an attempt to change your mind about your package(s), but simply to explain why those who continue to rely on auto-import are not all benighted heathens (which is the impression I get from your comments about that feature, both this round and last November):

(1) How many hours of my time and yours has it taken to get to this point? Including my patch last November, your re-implementation of (parts of) it last December, and now this round? At that level of effort, do you think ANY libraries would have been built as DLLs on cygwin -- by anyone other than people as stubborn as I?

(2) On the other hand, the drawbacks #2 and #3 that you mention rarely seem to occur in practice, and #3 is handled by runtime-pseudo-relocs. Further, #2 was non-existent until gcc-3.4.x -- because until that time readonly vars were NOT stored in .rdata on pei386. Even now, the most common occurrence of #2 is exactly what it is in gettext: popt or getopt_long const structs containing addresses of local flag variables.


So, until very recently, the only oft-encountered true drawback to dll-import on cygwin/mingw was extra memory usage per-process. The benefits of DLLs (smaller on-disk executable size, modularization, and ability to update (e.g.) zlib DLL after security flaw was discovered WITHOUT having to relink every application known to mankind [*]) far outweighed this ONE memory consumption drawback.

[*] not to mention the inevitable mailing list traffic a static-lib-only cygwin distribution would generate: "Why don't you make libz.a a DLL? libpng? libX11? libXpm? ...." -- just look at the complaints we DO get about the C++ runtime library! And, DLLs are absolutely necessary for decent and manageable operation/distribution of plugin-based packages like apache.

====

So, given the old physical memory drawback and the newly-revealed const struct issues, it is an admirable goal to try to reduce reliance on auto-import -- as long as no additional burden is imposed upon the users. However, those who set other priorities for their package development and continue to rely on auto-import can be forgiven, IMO.


----------------------------------------------------------------

gettext has 3 kinds of libraries:
1) Public libraries which export only functions.
2) Public libraries which export also variables.
3) Private libraries which export functions and libraries.
Namely
  1) libasprintf
  2) libintl, libgettextpo
  3) libgettextlib, libgettextsrc

1) This case is well handled by libtool and ld already. The .h file doesn't
   need modifications; __declspec(dllimport) on functions is not needed.
   The function names are exported because GNU ld does an implicit
   --export-all-symbols if no symbols are explicitly exported.

Not exactly. This is true for C functions and C++ functions, but not for C++ classes: classes *are* data. That's why there is no such beast as a C++ library that doesn't export some data (even if just a vtable or type_info object, in .rdata).

Sadly, I don't think your solution below will work for these C++ objects *in general*, because "manually" (e.g. using a script) generating the _imp__* pointer variables for C++ mangled names of vtables and such is...err, non-trivial.

---
I could envision a scenario where you must always build shared if you want static -- and that somehow you use the shared libraries' import lib as a hint for which _imp__* pointers you need to create for the static lib. (e.g. --disable-shared is not allowed on cygwin).
---

In this particular case, autosprintf has no virtual functions -- so there is no vtable. It has no public or protected data members which would need to be exposed to (imported by) clients or derived classes. It has no private data members *that are non-POD types*, whose class-type would need to be exposed to clients.

Therefore, with ONE exception, the libasprintf DLL can be treated as having a functional-only interface, even tho it is C++.

The exception, missing type_info for gnu::autosprintf, doesn't appear to be a problem. You'll see by doing an

  objdump -x -t ./cygasprintf-0.dll |\
     sed -e 's/.rdata\$_/.rdata$ _/' \
         -e 's/.text\$_/.text$ _/' \
         -e 's/ _Z/ __Z/' |\
     c++filt

on the un-stripped cygasprintf-0.dll that it DOES, actually, expose quite a few data items -- vtables, type_info, and guard objects -- for C++ stdlib stuff. But these are mostly in .rdata ["(sec 4)" == section with Idx = 3 thanks to 0-based/1-based idiocy] and .text ["(sec 1)" == Idx 0].

There's one thing missing: the type_info object for the gnu::autosprintf class. (If autosprintf had virtual functions, then unless the class were explicitly marked declspec(dllexport), the vtable would also be "missing". However, you can't miss what doesn't exist, so that's not a problem here).

Now, the missing type_info OUGHT to be a problem. And, the fact that g++ has some issues with exporting vtables and type_info is a known problem in g++-4.x up to current 4.2.CVS. (I think, but do not KNOW, that if I tried to compile foo.cc below using g++-4.x on cygwin, it would fail due to the issues mentioned above. I don't have a 4.x compiler on _this_ computer.)

However, for whatever reason, and even though I do NOT see a type_info representation for autosprintf in the library, the following code actually works when compiled with g++-3.4.4 on cygwin:

#include <iostream>
#include <autosprintf.h>
#include <typeinfo>
#include <cxxabi.h>

using namespace gnu;
using namespace std;

int main(int argc, char* argv[])
{

  const char* directory = "/c/bob";
  const char* filename  = "alice.txt";
  const int line = 27;
  const char* errstring = "a message";

  autosprintf as("%s/%s", directory, filename);
  char *pathname = as;
  cerr << autosprintf("syntax error in %s:%d: %s",
                      pathname, line, errstring)
       << endl;

  char* demang = abi::__cxa_demangle(typeid(as).name(),  0, 0, NULL);
  cerr << demang << endl;
  free(demang);
}

So, you're OK with this treatment of libasprintf -- but I wouldn't draw any conclusions about this method with regards to OTHER C++ libraries or classes. And no promises when (if) the cygwin/mingw guys ever release a 4.x compiler.

2) For this case, when --enable-shared is specified, I preprocess the .h file
   so that exported variables are marked with __declspec(dllimport). A simple
   sed statement:

     sed -e 's/export \([^()]*\);/export __declspec(dllimport) \1;/'

   After this header file is installed, it must be valid for both the
   shared and the static library. (You cannot expect that users of the
   library really think of setting a STATIC_XYZ flag when using the static
   library.) When a user compiles code that accesses a variable, the compiler
   will generate a reference to _imp__variable. These _imp__* pointer
   variables are normally generated for the DLL by the compiler or linker when
   __declspec(dllexport) is used. But we need them also in the static
   library! So I create a C file that generates these _imp__* pointer
   variables:

        #include "cygwin/export.h"
        VARIABLE(variable1)        // defines _imp__variable1
        VARIABLE(variable2)        // defines _imp__variable2
        ...

   and compile this into both the static and the shared library. So I don't
   need to provide the __declspec(dllexport) alternative in the header file;
   it's ok to use __declspec(dllimport) always.

   The linker needs to be given the --export-all-symbols flag explicitly in
   this case.

Nice. I like it. (I assume that on other platforms, the 'export' prefix is simply removed?)

Q: what about the installed files in /usr/share/gettext/ ? If <libintl.h>/"gettext.h" is "munged" on cygwin, then can cygwin's gettextize make an un-munged client source package -- so that the "munging" occurs when the client package gets built?

3) For this case, where no .h file needs to be installed, the same approach
   can be used. However, a small modification is possible: Since the
   library is a private one, no .h file is installed, and the static
   library doesn't need to be installed. If --enable-shared was specified, the
   static library is not even used. (The programs and the testsuite do not
   link statically.) The .h file contains

         export PRIVATE_DLL_VARIABLE int variable1;
         export PRIVATE_DLL_VARIABLE struct { ... } variable2;
         ...

   and PRIVATE_DLL_VARIABLE is defined in config.h through

         #if defined __CYGWIN__ && (--enable-shared was specified)
          #define PRIVATE_DLL_VARIABLE __declspec(dllimport)
         #else
          #define PRIVATE_DLL_VARIABLE
         #endif

I believe you should use (__CYGWIN__ || __MINGW32__) && ... not just __CYGWIN__.

Notes:

- You see that DLL_EXPORT (set by libtool) is never used: in case 2 because
  we don't need/want a LIBFOO_DLL_VARIABLE macro for every library, in case 3
  because the code compiled without DLL_EXPORT is not used at all.

Right -- by adding the redirection pointers to the static lib, you don't care about DLL_EXPORT any more.

- The process of adding the _imp__* pointer variables to the .a and .dll.a
  file, and the --export-all-symbols flag, could be done by libtool.

Hmm. Maybe -- I'll have to think about that, especially as relating to projects which, even then, will continue to rely on auto-import. Would this break that? Also, would the ease-of-use advantage now shift the other way: --disable-auto-import now becomes preferred although not required -- that might be a good thing. Also, would the behavior of libtool need to be different for C, vs C++/objC/etc?

Like I said, I'll have to give it some thought.

- The only drawback I can see of this technique is that a static library
  built alone (with --disable-shared) will be slightly more efficient
  than the static library built together with the shared one - due to the
  _imp__* indirections. But hey, if it has taken 11 years to port gettext
  with shared libraries to Cygwin, it is because the Woe32 DLLs are
  optimized excessively for performance at the expense of standards compliance
  and ease of use. (Like the floating-point hardware that was in use before
  IEEE 754: it produced wrong results but did so very efficiently.)

Optimized? Hah! The Windows386 developers said "hey, let's just reuse the PE386 exe format for dlls. We only need to change one thing here, call it PEI386, and we're done!" Never mind that whole no-unresolved symbols thing...They were LAZY, not clever.

- Unlike --enable-auto-import, which operates on the code that _uses_
  a shared library, this technique operates on the library itself; the
  code that uses the library sees the dllimports in the header file and
  does not need further fixup.

Yep, that part I like.

Yes, in general you are right: we have four states (per library)
    building the library as shared : declspec(dllexport)
    building the library as static : <no decorator> (*)
    building a client of the library, intending to link shared :
       extern declspec(dllimport)
    building a client of the library, intending to link static :
       extern (*)

With the technique above, the last two states are collapsed into one.

Right -- so then there is no burden on the client. The first two states remain, but that's the library builder's (i.e. yours and mine) problem -- which we can handle.

Overall, I like it. Not sure if it is completely generalizable to other (non-gettext) libraries and languages on cygwin/mingw -- and we may run into issues with C++ when/if there is ever an official g++-4.x release for cygwin/mingw. But that's a g++ bug, not a gettext bug. For right now and the medium-term future, your solution looks good to me.

--
Chuck




reply via email to

[Prev in Thread] Current Thread [Next in Thread]