coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] build: Option for building all tools in a single binary


From: Pádraig Brady
Subject: Re: [PATCH v2] build: Option for building all tools in a single binary
Date: Tue, 10 Jun 2014 15:36:17 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 06/10/2014 04:50 AM, Alex Deymo wrote:
> New in v2:
>  * More comments on coreutils.c
>  * Added some documentation to the gen-single-binary.sh
>  * Added a flag to get-lists-of-program.sh to just print the list
>    of programs. It feels less dirty that the previous stuff.
>  * Fixed/improved gen-single-binary.sh to include all the required
>    flags.
> 
> Tested this patch with:
> ./configure --prefix=`pwd`/foo --enable-single-binary && make && make install
> ...and "it works for me".
> 
> I think this is ready for patch review if you are not against having this
> feature on coreutils. I'd like to remind you that this is a configure
> flag that's disabled by default, so the tradeoff between having
> "true" linking against pthreads and the 5x disk space required is what
> determines if you want to enable this. In our case (ChromeOS) we will
> probably enable it. So here you have the patch =). I also think that
> having this feature will increase performance rather than decrease it
> because less disk space means faster access/less cached data. But
> again, that's up to the user.

We usally send patches around in git am format to ease processing.
  I.E. format-patch --stdout -1 > to-attach.patch

Here is a suggested git commit message for the patch:

    build: support building all tools in a single binary

    Add the --enable-single-binary option to the configure file.
    When enabled, this option builds a single binary file containing
    the selected tools.  Which tool gets executed depends on the value
    of argv[0] which can be set implicitly through symlinks to the
    single program.

    This setup reduces significantly the size of a complete coreutils
    install, since code from lib/libcoreutils.a is not duplicated in
    every one of the more than 100 binaries.  Runtime overhead is
    increased due to more dynamic libraries being loaded, and extra
    initialization being performed for all utils.  Also initially
    a larger binary is loaded from storage, though this is usually
    alleviated due to caching, and in fact the single binary should
    have better caching characteristics.

    Comparing the size of the individual versus single binary on x86_64:
      $ cd src
      $ size coreutils
      $ size -t $(../build-aux/gen-lists-of-programs.sh --list-progs |
                  | grep -Ev '(coreutils|libstdbuf)') | tail -n1
         text    data     bss     dec     hex filename
      1196629    6892   94352 1297873  13cdd1 coreutils
      4901010  124964  163768 5189742  4f306e (TOTALS)

    Storage requirements are reduced similarly:
      $ cd src
      $ du -h coreutils
      $ du -ch $(../build-aux/gen-lists-of-programs.sh --list-progs |
                 grep -Ev '(coreutils|libstdbuf)') | tail -n1
      1.2M    coreutils
      5.3M    total

    When installing, the makefile will create symlinks from each
    configured tool to a single "coreutils" binary installed on the
    same directory.

    * Makefile.am: Detail changes...
    * bootstrap: Likewise.
    * build-aux/gen-lists-of-programs.sh: Likewise.
    * build-aux/gen-single-binary.sh: Likewise.
    * configure.ac: Likewise.
    * src/coreutils.c: Likewise.
    * src/getlimits.c: Likewise.
    * src/local.mk: Likewise.
    * NEWS: Mention the new feature

Further comments below...

> 
>  Makefile.am                        |  6 +++
>  bootstrap                          |  3 ++
>  build-aux/gen-lists-of-programs.sh |  7 +++
>  build-aux/gen-single-binary.sh     | 99 
> ++++++++++++++++++++++++++++++++++++++
>  configure.ac                       | 26 ++++++++++
>  gnulib                             |  2 +-
>  src/coreutils.c                    | 82 +++++++++++++++++++++++++++++++
>  src/getlimits.c                    |  2 +
>  src/local.mk                       | 31 ++++++++++++
>  9 files changed, 257 insertions(+), 1 deletion(-)
>  create mode 100755 build-aux/gen-single-binary.sh
>  create mode 100644 src/coreutils.c
> 
> diff --git a/Makefile.am b/Makefile.am
> index e88dc9c..eb467c7 100644
> --- a/Makefile.am
> +++ b/Makefile.am
> @@ -180,6 +180,12 @@ check-git-hook-script-sync:
>       rm -rf $$t;                                                     \
>       test $$fail = 0
>  
> +# If we are building a single-binary, create symlinks for them.
> +install-exec-hook:
> +     @for i in $(single_binary_progs); do \
> +             $(LN_S) -s coreutils $(DESTDIR)$(bindir)/$$i$(EXEEXT); \
> +     done

Only doing this at install time is problematic.
Ideally we would generate either combined + symlinks _or_ separate binaries
in the build dir. That avoids redundant building in the combined case,
but more importantly allows running the test suite with `make check`.

Note one might be able to leverage the following in configure.ac:

  AM_CONDITIONAL([SINGLE_BINARY], [test "$gl_single_binary" = yes])

Then the local.mk could be conditionalized with "if SINGLE_BINARY" etc.

>  noinst_LIBRARIES =
>  MOSTLYCLEANFILES =
>  CLEANFILES =
> diff --git a/bootstrap b/bootstrap
> index a3e68f0..87f97a2 100755
> --- a/bootstrap
> +++ b/bootstrap
> @@ -807,6 +807,9 @@ version_controlled_file() {
>    fi
>  }
>  
> +# Regenerate src/single-binary.sh
> +./build-aux/gen-single-binary.sh src/local.mk

Where we generate this file is important and the crux of the patch.
Now bootstrap is shared between various gnulib using projects so
editing that script directly is not the best place I think.
Without checking, it seems like updating the coreutils specific
bootstrap_post_import_hook() in bootstrap.conf would be more appropriate.

Things that should be checked when generating single-binary.mk early like this
is that it handles and avoids builds for --enable-no-install-program=PROG_LIST
and that the generated .mk file is included in the `make dist` tarball.
If the .mk does need to be conditionalized at configure time then the
generation could move to configure.ac, but note the much greater portability
requirements in that case (including within gen-single-binary.sh).

> diff --git a/build-aux/gen-single-binary.sh b/build-aux/gen-single-binary.sh
> new file mode 100755
> index 0000000..4df1180
> --- /dev/null
> +++ b/build-aux/gen-single-binary.sh
> @@ -0,0 +1,99 @@
> +#!/bin/sh
> +
> +# Generate the list of rules for the single-binary option based on all the 
> other
> +# binaries found in src/local.mk.
> +#
> +# We need to duplicate the specific rules to build each program into a new
> +# static library target. We can't reuse the existing target since we need to
> +# create a .a file instead of linking the program. We can't do this at
> +# ./configure since the file names need to available when automake runs to 
> let
> +# it generate all the required rules in Makefile.in. The configure step will
> +# select which ones will be used to build, but they need to be generated
> +# beforehand.
> +#
> +# Instead of maintaining a duplicated list of rules, we generate the
> +# single-binary required rules based on the normal configuration found on
> +# src/local.mk with this script.
> +
> +if [ "x$1" == "x" ]; then
> +  echo "Usage: $0 path/to/src/local.mk" >&2
> +  exit 1
> +fi
> +
> +set -e
> +
> +LOCAL_MK=$1
> +SINGLE_BINARY_MK="`dirname $1`/single-binary.mk"
> +GEN_LISTS_OF_PROGRAMS="`dirname $0`/gen-lists-of-programs.sh"

"$0"
It's worth checking with a space in your build path

> +
> +ALL_PROGRAMS=$($GEN_LISTS_OF_PROGRAMS --list-progs \
> +    | grep -v -F -e coreutils -e libstdbuf.so \
> +    | tr '.[' '_')

safer to match up the number of chars in tr's FROM and TO.
What's the '.' for anyway?

> +# Replace all the programs by the single binary and simlinks if specified.
> +single_binary_progs=
> +single_binary_libs=
> +if test "$gl_single_binary" = yes; then
> +  single_binary_progs=`echo $optional_bin_progs`
> +  single_binary_libs=`
> +    for p in $single_binary_progs; do
> +      echo -n "src/libsinglebin_$p.a " | tr "\133" _

`echo -n`, and to a lesser extent `tr` wouldn't
be portable enough for configure. How about:

-      echo -n "src/libsinglebin_$p.a " | tr "\133" _
+      # Convert '[' to '_'
+      test x"$p" = x'@<:@' && p='_'
+      printf 'src/libsinglebin_%s.a ' "$p"

> +    done`
> +  optional_bin_progs="coreutils"
> +  echo $single_binary_libs
> +fi
> +AC_SUBST([single_binary_progs], [$single_binary_progs])
> +AC_SUBST([single_binary_libs], [$single_binary_libs])

> diff --git a/gnulib b/gnulib
> index a10acfb..a008d62 160000
> --- a/gnulib
> +++ b/gnulib
> @@ -1 +1 @@
> -Subproject commit a10acfb1d2118f9a180181d3fed5399dbbe1df3c
> +Subproject commit a008d625b7854d2c08c6606d90f6f2f48263f973

I see you're running the latest gnulib.
I'll disregard this bit

> diff --git a/src/coreutils.c b/src/coreutils.c
> new file mode 100644

> +int
> +main (int argc, char **argv, char** envp)
> +{
> +  char *name = argv[0];
> +  char *p;
> +
> +  /* Replace argv[0] by the tool name, removing the path from it. */
> +  for (p = name; *p; ++p) {
> +    if (*p == '/')
> +      name = p + 1;
> +  }

Probably more general to use last_component() above.

> +  argv[0] = name;

Do we need to change argv[0] away from the full path?
It would be best to not do that because for example if
running `/opt/gnu/ls --help` we want that full path in the output.

> +  /* If this program is called directly as "coreutils" instead of using a
> +   * symlink, we use argv[1] as the name of the tool, shifting all the
> +   * arguments one position. */
> +  if (!strcmp(argv[0], "coreutils")) {

if STREQ (name, "coreutils")
  {
    argv++;
    argc--;
  }

> +    argv++;
> +    argc--;
> +  }
> +  if (argc < 1 || !argv[0])
> +    usage(EXIT_FAILURE);
> +
> +  /* Lookup the right main program */
> +#define SINGLE_BINARY_PROGRAM(prog_name_str, main_name) \
> +  if (!strcmp(prog_name_str, argv[0])) \
> +    return _single_binary_main_##main_name(argc, argv, envp);
> +#include "coreutils.h"
> +#undef SINGLE_BINARY_PROGRAM
> +
> +  fprintf(stderr, "%s: program not found.\n", argv[0]);

error (EXIT_ENOENT, 0, "%s: program not found.\n", argv[0]);

> +  usage(EXIT_FAILURE);
> +}
> diff --git a/src/getlimits.c b/src/getlimits.c
> index 597efd8..62da2bb 100644
> --- a/src/getlimits.c
> +++ b/src/getlimits.c
> @@ -167,4 +167,6 @@ main (int argc, char **argv)
>    print_float (FLT);
>    print_float (DBL);
>    print_float (LDBL);
> +
> +  exit (EXIT_SUCCESS);
>  }

A vestigial attempt at suppressing the no return warning?
Anyway getlimits is not distributed so we can leave that out?

> diff --git a/src/local.mk b/src/local.mk
> index 865dd74..43a9923 100644
> --- a/src/local.mk
> +++ b/src/local.mk
> @@ -42,6 +42,7 @@ noinst_PROGRAMS =           \
>  noinst_HEADERS =             \
>    src/chown-core.h           \
>    src/copy.h                 \
> +  src/coreutils.h            \
>    src/cp-hash.h                      \
>    src/dircolors.h            \
>    src/fiemap.h                       \
> @@ -150,6 +151,7 @@ src_mv_LDADD = $(LDADD)
>  src_nice_LDADD = $(LDADD)
>  src_nl_LDADD = $(LDADD)
>  src_nohup_LDADD = $(LDADD)
> +src_numfmt_LDADD = $(LDADD)

Ah we were relying on automake auto deps for that :)

>  src_od_LDADD = $(LDADD)
>  src_paste_LDADD = $(LDADD)
>  src_pathchk_LDADD = $(LDADD)
> @@ -395,6 +397,19 @@ src_libstdbuf_so_LDADD = $(LIBINTL)
>  src_libstdbuf_so_LDFLAGS = -shared
>  src_libstdbuf_so_CFLAGS = -fPIC $(AM_CFLAGS)
>  
> +# Single binary dependencies
> +src_coreutils_SOURCES = src/coreutils.c src/coreutils.h
> +src_coreutils_CFLAGS = $(AM_CFLAGS) -Wno-suggest-attribute=noreturn

One can't add -Wno-suggest-attribute=noreturn unconditionally.
Perhaps we should be marking all these as __attribute__ ((noreturn))
and explicitly calling exit() in all main() functions rather than
doing an implicit or explicit return. The noreturn attribute could
be added easily to the declaration at the top of coreutils.c I expect.
If that's no possible then I'd suggest adjusting configure.ac to
checkout the new option a little earlier, and then:

  if test "$gl_single_binary" = yes; then
     gl_WARN_ADD([-Wno-suggest-attribute=noreturn])
  fi


Some more general points...

It seems that vdir/dir/ls are not handled appropriately.
You probably have to set the ls_mode based on the name being compiled.

I'd really like the configure and build logic to not
change unless the --enable-single-binary option is specified
coreutils has to be portable to all sorts of shells/compilers/linkers/...

Also it would be useful to be able to exclude certain programs from this.
I don't mean to give user the choice of particular programs to build seperately,
rather hardcoding `sort` as being a separate program for example due to
threading constraints for example.

thanks,
Pádraig.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]