groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Groff] Interesting statistics from a full test of doclifter


From: Eric S. Raymond
Subject: [Groff] Interesting statistics from a full test of doclifter
Date: Wed, 27 Dec 2006 12:32:56 -0500
User-agent: Mutt/1.4.2.2i

Out of 13,117 man pages in a full FC6 unstallation with netpbm removed:

78 (0.59%) use .ti

32 (0.24%) use .EX/.EE or .Ex/.Ee with no local definition

91 (0.69%) use .DS/.DE or .Ds/.De with no local definition.

14 (0.10%) use mdoc .Xo/.Xc.

29 (0.022) use \w.

All these statistics are after applying 388 (8.2%) error-fixup patches
which should have had no effect on the numbers above.  I have
successfully pushed 213 patches upstream to man-page maintainers.

The numbers for .EX/.EE are much lower than I expected, and the
numbers for .DS/.DE a bit higher.  The way I instrumented doesn't pick
up instances where .EX/.EE has been locally defined by the page author
(the check is after macroexpansion time), which is not too uncommon and
probably accounts for the difference.

It's not trivial to instrument to check this, but I am certain from
eyeball inspections that the most commonly used low-level troff
request cliche is .nf/.fi in combination with .RS/.RE and .ft CW/.ft R
to format example command lines and that sort of thing.  I am also
certain that .EX/.EE, encapsulating that cliche, is the single local 
extension most often defined by man-page authors.

In deciding whether or not to implement a feature in doclifter, I normally
consider anything below 0.5% to be statistical noise.  That's also my
target for its error rate, which is currently 3.4 percent but which I
am confident will drop to below 2.5% when the netpbm and groff pages
are cleaned up.

You may also find these lists interesting. In each, the number to the
right of '=' is a doclifter error status, and the number in parens is
a processing time in seconds.  Again, these are after 388 error fixups
and thus represent the hard core of pages that are intractable to
translation.  Also note that most of the large number of Perl-related
pages in section 3 are actually empty; the fact that they are shipped
at all is an artifact of some bug in their build system.

Pages that return status 4 (internal error in doclifter):

! /usr/share/man/man7/groff_char.7.gz=4 (0.49)
! /usr/share/man/man7/groff_mdoc.7.gz=4 (2.11)
! /usr/share/man/man7/mdoc.samples.7.gz=4 (1.47)

Pages that return status 1 (cannot be translated):

! /usr/share/man/man1/groffer.1.gz=1 (1.77)
! /usr/share/man/man1/kwordtrans.1.gz=1 (0.09)
! /usr/share/man/man1/nc.1.gz=1 (0.35)
! /usr/share/man/man1/pdl.1.gz=1 (0.02)
! /usr/share/man/man1/sntp.1.gz=1 (0.48)
! /usr/share/man/man1/solterm.1.gz=1 (0.15)
! /usr/share/man/man1/sox.1.gz=1 (1.49)
! /usr/share/man/man1/soxmix.1.gz=1 (1.47)
! /usr/share/man/man1/tar.1.gz=1 (8.72)
! /usr/share/man/man1/wordtrans.1.gz=1 (0.15)
! /usr/share/man/man3/B::Stash.3pm.gz=1 (0.03)
! /usr/share/man/man3/Carp::Heavy.3pm.gz=1 (0.03)
! /usr/share/man/man3/DBI::ProfileSubs.3pm.gz=1 (0.03)
! /usr/share/man/man3/DBM_Filter::compress.3pm.gz=1 (0.04)
! /usr/share/man/man3/DBM_Filter::encode.3pm.gz=1 (0.05)
! /usr/share/man/man3/DBM_Filter::int32.3pm.gz=1 (0.04)
! /usr/share/man/man3/DBM_Filter::null.3pm.gz=1 (0.04)
! /usr/share/man/man3/DBM_Filter::utf8.3pm.gz=1 (0.04)
! /usr/share/man/man3/Encode::CJKConstants.3pm.gz=1 (0.03)
! /usr/share/man/man3/Encode::CN::HZ.3pm.gz=1 (0.03)
! /usr/share/man/man3/Encode::Config.3pm.gz=1 (0.03)
! /usr/share/man/man3/Encode::JP::H2Z.3pm.gz=1 (0.03)
! /usr/share/man/man3/Encode::JP::JIS7.3pm.gz=1 (0.03)
! /usr/share/man/man3/Encode::KR::2022_KR.3pm.gz=1 (0.03)
! /usr/share/man/man3/Mail::SpamAssassin::PluginHandler.3pm.gz=1 (0.03)
! /usr/share/man/man3/ModPerl::Code.3pm.gz=1 (0.07)
! /usr/share/man/man3/PDL::BAD2_demo.3pm.gz=1 (0.02)
! /usr/share/man/man3/PDL::BAD_demo.3pm.gz=1 (0.01)
! /usr/share/man/man3/PDL::Config.3pm.gz=1 (0.01)
! /usr/share/man/man3/PDL::Doc::Config.3pm.gz=1 (0.01)
! /usr/share/man/man3/Roadmap.3pm.gz=1 (0.25)
! /usr/share/man/man3/XML::DOM-ecmascript.3pm.gz=1 (0.11)
! /usr/share/man/man3/XML::SAX::PurePerl::Reader.3pm.gz=1 (0.03)
! /usr/share/man/man3/libbind-gethostbyname.3.gz=1 (0.17)
! /usr/share/man/man3/mod_perl2.3pm.gz=1 (0.04)
! /usr/share/man/man5/groff_out.5.gz=1 (1.05)
! /usr/share/man/man5/groff_tmac.5.gz=1 (0.66)
! /usr/share/man/man7/groff.7.gz=1 (1.66)
! /usr/share/man/man7/groff_trace.7.gz=1 (0.43)
! /usr/share/man/man8/hwclock.8.gz=1 (0.42)
! /usr/share/man/man8/ifenslave.8.gz=1 (0.06)
! /usr/share/man/man8/ip.8.gz=1 (1.25)
! /usr/share/man/man8/ipsec_eroute.8.gz=1 (0.49)
! /usr/share/man/man8/ipsec_spi.8.gz=1 (0.47)
! /usr/share/man/man8/isdnctrl_conf.8.gz=1 (0.03)
! /usr/share/man/man8/netstat.8.gz=1 (0.45)
! /usr/share/man/man8/sg_wr_mode.8.gz=1 (0.16)
! /usr/share/man/man8/t1libconfig.8.gz=1 (0.11)
! /usr/share/man/man8/tc-pfifo.8.gz=1 (0.02)

Pages that return status 6 (XML validation of output failed):

! /usr/share/man/man1/perlop.1.gz=6 (1.91)
! /usr/share/man/man1/perlre.1.gz=6 (1.17)
! /usr/share/man/man3/AdmBlackBoxMethods.3.gz=6 (0.13)
! /usr/share/man/man3/AdmBlackBoxModuleVector.3.gz=6 (0.12)
! /usr/share/man/man3/CGI.3pm.gz=6 (3.28)
! /usr/share/man/man3/Inline::C-Cookbook.3pm.gz=6 (1.44)
! /usr/share/man/man3/XKanjiControl.3.gz=6 (0.88)
! /usr/share/man/man3/XLookupKanjiString.3.gz=6 (0.87)
! /usr/share/man/man3/XML::DOM::Element.3pm.gz=6 (0.22)
! /usr/share/man/man3/adm_bb_cap_t.3.gz=6 (0.13)
! /usr/share/man/man3/alchemist.h.3.gz=6 (1.96)
! /usr/share/man/man3/blackbox.h.3.gz=6 (0.12)
! /usr/share/man/man3/glGetHistogram.3gl.gz=6 (0.18)
! /usr/share/man/man3/glTexImage2D.3gl.gz=6 (0.43)
! /usr/share/man/man3/gluNurbsCallback.3gl.gz=6 (0.30)
! /usr/share/man/man3/gluTessBeginPolygon.3gl.gz=6 (0.15)
! /usr/share/man/man3/jrKanjiControl.3.gz=6 (0.86)
! /usr/share/man/man3/jrKanjiString.3.gz=6 (0.88)
! /usr/share/man/man3/uilib.3.gz=6 (0.88)
! /usr/share/man/man5/elinkskeys.5.gz=6 (0.88)
! /usr/share/man/man7/mdoc.7.gz=6 (0.42)
 
-- 
                <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]