bug-gzip
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] deflate.c: identify slide_Pos() for later optimization


From: John Reiser
Subject: [PATCH] deflate.c: identify slide_Pos() for later optimization
Date: Mon, 23 Jul 2012 11:06:31 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:14.0) Gecko/20120717 Thunderbird/14.0

Modern "multimedia" vectorized hardware instructions can speed deflate().
For higher-end x86* CPUs the speedup might be 2% to 3% of total CPU time.
On a slower CPU, or with a compiler plus instruction decoder that suffer
longer latency after a branch (such as gcc for some PowerPC chips)
then the improvement might be 5% to 8%.

The attached patch introduces a new subroutine slide_Pos() in deflate.c
which identifies the operation that is subject to optimization.
The opportunity arises when sliding the window.  The vectors head[]
and prev[] of substring indices are adjusted using saturating subtraction.
A very good compiler should be able to recognize and vectorize the operation
from the patched source.  If not, then any compiler which can inline a local
subroutine should give code which is no worse than the unmodified version.
A compiler which does not inline slide_Pos might introduce a penalty
approximately equal to the cost of two internal subroutine calls.

If there is interest, then I will follow with assembly-language versions
of slide_Pos for i686/x86_64 (with runtime selection among several variants
according to actual hardware capabilities), PowerPC altivec (compile-time
selection) and ARM neon (compile-time selection.)

-- 
John Reiser, address@hidden

Attachment: 0002-slide_Pos-identify-for-future-optimization.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]