[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [BLOG] rust blog post
Re: [BLOG] rust blog post
Tue, 26 Nov 2019 04:26:43 -0800
In case someone likes a narrower style:
On +2019-11-26 12:27:37 +0200, Efraim Flashner wrote:
> Hopefully this is better. I added a new line between each paragraph
> On Tue, Nov 26, 2019 at 10:58:41AM +0100, Pierre Neidhardt wrote:
> > I think the attachment broke the formatting of the file (there is no
> > paragraph break). Could you resend it?
> Efraim Flashner <address@hidden> אפרים פלשנר
> GPG key = A28B F40C 3E55 1372 662D 14F7 41AA E7DC CA3D 8351
> Confidentiality cannot be guaranteed on emails sent or received unencrypted
It's easy to think of Rust as a new programming language but
it has already been around for five years. Rust has made it
past it's 1.0 release and the compiler is written in Rust.
We even have mrustc to act as a secondary method to
bootstrap new Rust releases without falling back to
downloading precompiled tarballs. So how is the state of
Rust in Guix today?
Truthfully, Rust in Guix could be better. The developer
story for Rust is pretty straightforward: write your
program, declare your dependencies in a Cargo.toml file, and
```cargo foo``` will figure out your dependency chain.
```cargo build``` will download any missing dependencies,
even using a cache directory to reduce downloads, and
compile the bits of the dependencies that are needed.
But what about for distro maintainers?
Obviously we can't download dependencies at build time, they
need to be packaged ahead of time. So we package those
dependencies. But wait, those dependencies have dependencies
that are needed, and those ones too. It's dependencies all
the way down, hidden in 5 years of iterative development
that we're late to the party to, trying to capture snapshots
in time where specific versions of libraries built using
previous generations. All this all the way back to the
beginning, whenever that is.
Obviously humans are prone to errors, so to work around this
while packaging Rust crates Guix has effectively two
importers for crates, one that will import a specific
version and list it's dependencies, and one that can take a
crate and recursively import all the packages that it
depends on. Currently some work is needed to allow the
recursive importer to interpret version numbers, but for now
it works quite well.
Taking a break from Rust for a moment, let's look at some of
the other languages that are packaged. Packages written in
C/C++, processed with autotools or cmake or meson, are the
easiest. Dependencies are declared, source code is provided,
and there's a clear distinction between source code and
compiled binary; source code is for hacking on, binaries are
for executing. The closest to a middle ground are libraries
which allow programs to use features from other programs. In
order to use a package, all of its dependencies must be
packaged and the libraries linked.
ends up in the same problem as we saw with Rust, recursive
dependencies all the way down, iterative versions depending
on previous ones, and a misty past from whence everything
sprang forth, which must be recreated in order to bring us
back to the present day. But there's more difficulty, often
even after a 'build' phase has been run and tests have been
it's no longer source, it's a binary... or something. So
just what did we build and test?
boundary between source code and binaries.
So how about python? Python is a scripting language and can
be run without being compiled, but it also can be compiled
(pre-interpreted?) to bytecode and installed either locally
or globally. That leaves us with source code which can
double as a binary, and a bytecode which is clearly a
binary. Given these two states, we declare the uncompiled
version as source code, ignore that it can be run as a
script except when testing the code, and we never return to
How about Go? Go is another language that defies packaging
efforts, primarily because build instructions often make use
of the HEAD of other git branches, not tagged and released
versions. That the names of the libraries are long and
cumbersome is mostly a secondary issue. On the developer
side a binary is a ```go build``` away. Go will download
missing source and compile libraries as needed. On a
packager side the libraries are carefully gathered one by
one, precompiled, and placed carefully in a directory
hierarchy for use in future builds. What could be a long
build of a program is replaced by an intermediate series of
packages where libraries are pre-compiled, and at each stage
only the new code has to be compiled.
For all except the distro maintainer, the similarities are
strong between Rust and Go. In both cases dependencies are
downloaded as part of the build process, there's a cache for
the downloaded sources and the compiled libraries, and build
artifacts can be reused between different programs with
overlapping dependencies. For the distro maintainer many of
these similarities are thrown out. Dependencies are packaged
ahead of time and previously packaged libraries is literally
a cache. Libraries can be reused for other packages, yes,
but for Rust they're not.
Why not? If they're already compiled why not reuse them?
Previously we've discussed source code and compiled binaries
(or libraries), but in Rust there are two types of
libraries. There are dynamic libraries, packaged as
```libfoo.so```, and there are Rust libraries, packaged as
```libfoo.rlib``` or ```libfoo-MAGICHASH.rlib```. When a
Rust package declares a dependency on a Rust library, it
doesn't declare a dependency on the whole library but rather
just on the parts that it needs. This means that we can get
away with packaging only a portion of the dependent library,
or the library with only some of its features or its own
dependencies. When compiling a final binary, a Rust binary
doesn't link to an rlib, it takes just the part that it
needs and incorporates it into the binary. As far as package
maintainers are concerned, this isn't ideal but it is
something we can live with, we already have this case with
static libraries from other languages. If we were to compile
the binary manually the command would be ```rustc --binary
foo --extern bar=/path/to/libbar.rlib``` and we'd continue
on. However, when bar depends on baz, the similar command,
```rust --library bar --extern baz=/path/to/libbaz.rlib```
_doesn't_ link libbaz to libbar. This leaves us in a pickle;
we know which libraries we need but we're unable to compile
them individually and build them up iteratively until we
reach the binary endgoal.
One of our packaged Rust programs, rust-cbindgen, is used by
Icecat. Rust-cbindgen declares 8 (TODO: check this number)
dependencies. When run outside of the build environment
```cargo build``` downloads a total of 58 (TODO: check this
number) packages, compiles them and produces a binary. Our
recursive importer created more than 300 new packages before
it was told to stop. Returning to our build process for rust
libraries, since we couldn't link one rlib to another rlib,
we opted to compile one rlib and then place its source in
the build directory of the next one where it was recompiled.
Baz would be built, then baz's source would be put in bar's
vendor directory where baz and bar would be built. After
this baz's and bar's sources would be put in foo's vendor
directory, where all three would be compiled. This sounds
like Go, except that we're throwing away all the results of
our builds each time we start a new package.
Since we were just copying the sources from package to
package, the simplest solution was to consider the Rust
dependants as shared sources and not as shared libraries.
Yes, the same source would be used between multiple
programs, but each one package already only took the small
portion of the shared source that it needed so there was no
benefit to compiling the entire package ahead of time,
especially with the mounting recursive dependencies, who's
compiled libraries were being thrown away anyway.
Rust-cbindgen ships with a Cargo.toml listing 8 dependants.
It also ships with a Cargo.lock, detailing the 8
dependencies and the bits of other libraries that are
needed. By packing the sources of the 58 enumerated
libraries and placing them in the vendor directory where the
necessary parts could be compiled we ended at the same place
we were headed anyway; only the sources were propagated from
package build to package build, only the source was the
relevant part, only the source is shared.