[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Optionally using more advanced CPU features

From: Dave Love
Subject: Re: Optionally using more advanced CPU features
Date: Wed, 23 Aug 2017 14:59:23 +0100
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.5 (gnu/linux)

address@hidden (Ludovic Courtès) writes:

> Hi,
> Ricardo Wurmus <address@hidden> skribis:
>> I was wondering how we should go about optionally building software for
>> more advanced CPU features.  Currently, we build software for the lowest
>> common feature set among x86_64 CPUs.  That’s good for portability, but
>> not so good for performance.
>> Enabling CPU features often happens through configure flags, but
>> expressing support at that level in our package definitions seems bad.
>> How can we make it possible for users to build their software for
>> different CPUs?
> To some extent, I think this is a compiler/OS/upstream issue.  By that I
> mean that the best way to achieve use of extra CPU features is by using
> the “IFUNC” feature of GNU, which is what libc does (it has
> variants of strcmp etc. tweaked for various CPU extensions like SSE, and
> the right one gets picked up at load time.)  Software like GMP, Nettle,
> or MPlayer also does this kind of selection at run time, but using
> custom mechanisms.

That may be the best way to handle it, but it's not widely available,
and isn't possible generally (as far as I know), e.g. for Fortran code.
See also below.  This issue surfaced again recently in Fedora.

In cases that don't dispatch on cpuid (or whatever), I think the
relevant missing OS/tool support is SIMD-specific hwcaps in the loader.
Hwcaps seem to be essentially undocumented, but there is, or has been,
support for instruction set capabilities on some architectures, just not
x86_64 apparently.  (An ancient example was for missing instructions on
some SPARC systems which greatly affected crypto operations in ssh et

>> We can cross-compile for other architectures on the command line with
>> “--target” and “--system”; can we allow for compilation with special CPU
>> features across the graph with “--features”?  Build system abstractions
>> or package definitions would then be changed to recognize these features
>> and modify the corresponding flags as needed.
> I’ve considered this, but designing this would be tricky, and not quite
> right IMO.
> There’s probably scientific software out there that can benefit from
> using the latest SSE/AVX/whatever extension, and yet doesn’t use any of
> the tricks above.  When we find such a piece of software, I think we
> should investigate and (1) see whether it actually benefits from those
> ISA extensions, and (2) see whether it would be feasible to just use
> ‘target_clones’ or similar on the hot spots.

One example which has been investigated, and you can't, is BLIS.  You
need it for vaguely competitive avx512 linear algebra.  (OpenBLAS is
basically fine for previous Intel and AMD SIMD.)  See, e.g.,
et seq.  I don't know if there's any good reason to, but if you want
ATLAS you have the same issue -- along with extra issues building it.

Related, I argue, as on the Fedora list, that like BLAS (and LAPACK)
should handled the way they are in Debian, with shared libraries built
compatibly with the reference BLAS.  They should be selectable at run
time, typically according to compute node type by flipping the
search path; you should be able to substitute BLIS or a GPU
implementation for OpenBLAS.  That likely applies in other cases, but
I'm most familiar with the linear algebra ones.

[By the way, you do have to be careful with ISA-specific libraries on
heterogeneous systems if you use checkpoint-restart, as you probably
should on an HPC cluster -- you need to restart on compatible hardware.]

reply via email to

[Prev in Thread] Current Thread [Next in Thread]