[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gpsd-dev] Clarifications needed for the time-service HOWTO

From: Gary E. Miller
Subject: Re: [gpsd-dev] Clarifications needed for the time-service HOWTO
Date: Mon, 21 Oct 2013 12:53:48 -0700

Yo Eric!

I've read, and appreciate where you are going with it.  There are
a few incorrect things though, and questions to answer.  I will
need to spend an hour to do so and can not do that during $DAYJOB.

I'll get it tonight.

The first problem is you assume Stratum 1 know what they are doing
(known and compensated-for latency).  Any fool can be a Stratum 1 and
there are a lot of really bad ones.  How I know, and how I can show you
to know, to explain in this doc, is what will take me a while to write.

On Mon, 21 Oct 2013 13:34:07 -0400
"Eric S. Raymond" <address@hidden> wrote:

> Other experts, feel free to chime in.
> Gary E. Miller <address@hidden>:
> > > Is it really true that most public NTP servers are Stratum 2, or
> > > are there more layers in normal use?
> > 
> > Maybe most, but you'll see a lof of 1 and 3's.
> I've been doing research. Revised text near the end of first section:
>      You will hear time service people speak of "Stratum 0" (the
> reference clocks) "Stratum 1" (NTP servers directly connected to
> reference clocks over a path with known and compensated-for latency)
>      "Stratum 2" (publicly accessible servers that get time from
> Stratum 1 over a network link.) Stratum 3 chimers redistribute time
> from Stratum 2, and so forth. There are defined higher strata up to
> 15, but you will probably never see a public chimer higher than
> Stratum 3.
>      Ordinary client computers are normally configured to get time
> from one or more Stratum 2 (or less commonly Stratum 3) servers. With
> GPSD and a suitable GPS, you can easily condition your clock to higher
>      accuracy than typical Stratum 2; with a little effort you can do
>      better than public Stratum 1 servers.
> If this is misstating the facts in any way - for example, if Stratum 3
> and up servers are more common than we are implying here - someone
> please speak up.
> > > More generally: what can I discover about the quality of the
> > > chimers I listen to?
> > 
> > Just compare several. 
> "Just compare several".  How delightfully vague!  What I need to
> document for the HOWTO is *how to do this*.  Concrete procedure.
> (1) What reporting tool do I run?  
> (2) Where among the numbers it will display for each chimer is
> the figure of merit I should be paying attention to? 
> (3) What do reasonable values of that figure look like?  What
> do weird outliers look like?
> It would be illuminating if you replied with a transcript of how
> the report looks on your system and pointed out which numbers are
> the significant ones.  If you can include a contrasting report 
> from a system with bad chimers, please do.
> > You should have at least 2, more likely 5 in
> > at least one of your ntp.conf.  
> Yup, I got that.  It's at the beginning of the new section on NTP
> performance tuning.  Which we are now writing...
> >                       Then the bad (to you) ones will just
> > stand out.  Some are just bad, some will not have a good network
> > connection to you and will appear bad.
> That second sentence is *useful*.  New text:
>     A chimer can be a poor performer (what the inventor of NTP
> whimsically calls a "falseticker") for either of two reasons. It may
> be shipping bad time, or the best routes between you and it have
> large latency variations.  (Large but fixed latencies can be
> compensated out using a fudge.)
> > > How specific can we be about time jitter?  Is this a topic for the
> > > HOWTO at all?
> > 
> > We can describe it, but since it is the error part, it will be 
> > specific to chimers, time sources, networks and clients.
> What sorts of jitter are produced by different parts of the 
> delivery chain?  What do typical magnitudes look like?
> On to a different topic...
> > >    Those hotplug devices will, however, may be able to use plain,
> > >    non-kernel PPS. gpsd tries to automatically fall back to this
> > > when absence of root permissions makes KPPS unavailable. This
> > > fallback is complicated by the fact that gpsd needs to
> > > communicate to ntpd in a different way in root and non-root
> > > mode.  This complicates the configuration in ways beyond the
> > > scope of this document and is strongly discouraged in practice.
> > > 
> > > This paragraph troubles me. I'm not sure, but I think it may be
> > > conflating two different issues and two sets of constraints. 
> > 
> > Yes, two related issues.  KPPS to PPS fallback, and the problems of
> > fallback to non-root.  In general we should just discourage
> > non-root and say it is bad, do not do that.
> I understand that you want to discourage non-root operation, and I'm
> not arguing that we shouldn't.  But...
> We are writing a ground-truth document here. In these it's bad
> practice to mix policy and mechanism.  We should be clear about "what
> happens if you do X" even if (perhaps especially if) we think X is
> a bad idea.
> There are several reasons for this, but at least one sufficient one
> is that it helps the reader build an adaptable mental model rather 
> than merely following instructions semi-blindly.
> Here's how you do this sort of thing right.  First, supply 
> motivation - why privilege-dropping happens:
>     In order to present the smallest possible attack surface to
>     privilege-escalation attempts, gpsd run as root drops its root
>     privileges very soon after startup - just after it has opened any
>     serial device paths passed on the command line.
>     Thus, KPPS can only be used with devices passed that way, not with
>     GPSes that are later presented to gpsd by the hotplug system.
> Those hotplug devices will, however, may be able to use plain,
> non-kernel PPS. gpsd tries to automatically fall back to this when
> absence of root permissions makes KPPS unavailable.
> (Here comes the don't-do-that.)
>     In general, if you start gpsd as other than root, the following
>     things will happen that slightly degrade the accuracy of reported
>     time:
>     1. Devices passed on the command line will be unable to use KPPS
> and will fall back to the same plain PPS that all hotplug devices must
>     use, increasing the associated error from ~1 uSec to about ~5
> uSec.
>     2. gpsd will be unable to renice itself to a higher priority.
> This action helps protect it against jitter induced by variable system
>     load. It's particularly important if your NTP server is a
> general-use computer that's also handling mail or web service or
> development.
>     3. The way you have to configure ntpd and chrony will change away
>     from what we show you here; ntpd will need to be told different
>     shared-memory segment numbers, and chrony will need a different
>     socket location.
>     You may also find gpsd can't open serial devices at all if your
>     OS distribution has done "secure" things with the permissions.
> (Notice that the don't-do-that is presented in a way that increases 
> the reader's options rather than decreasing them.  Now we transition
> to "here is best practice".) 
>     When in doubt, the preferred method to start your timekeeping is:
>     $ su -
>     # killall -9 gpsd ntpd
>     # ntpd -gN
>     # sleep 2
>     # gpsd -n /dev/ttyXX
>     # sleep 2
>     # cgps
>     where /dev/ttyXX is whatever 1PPS-capable device you have.  In the
>     rest of these setup instructions will assume that you are starting
>     gpsd as root, with occasional glances at the non-root case.
> > > Which set of ntpd segments GPSD can use is constrained by whether
> > > it started up as root or not.
> > 
> > Worse, by whether it is root or not when initialized, which may be
> > at hot plug time.
> I believe this is incorrect. All shared-memory segments are opened in
> ntpshm_init(), which is called before privilege-dropping and well
> before gpsd begins accepting hotplug notifications.  Please review the
> code to either verify this or point out where and why I'm full of
> crap.
> > > 2) GPSD started as root; device is hotplugged. GPSD
> > > will use privileged ntpd segments 0 and 1,
> > 
> > No.  It will use units 2 and 3.  Which is likely not what is in
> > ntp.conf and in practive is not a fail.
> Again, I believe this is incorrect.  
> > > 3) GPSD started as non-root; device path either passed on command
> > > line *or* hotplugged.  GPSD will use privileged ntpd segments 2
> > > and 3; KPPS will not work but plain PPS will.
> > 
> > Sort of, the ntp.conf mmust be changed to use units 2 and 3.
> Understood, and covered in the revised language.
> > The problem with just keeping the first sentence is the user is not
> > left with an idea of the severity of the problems he will encounter.
> Which is why the right thing to do is *document those problems
> explicitly *. As I have done.
> > We have seen that in the past where users try to run as non-root and
> > have not  understood the instructions to run as non-root are
> > incomplete and problematic.  So if you keep the first sentence,
> > then say if you are not root (hot plug or initialization) that is
> > bad, unsupported and out side the scope, that could work.
> I've refuted this in a couple of subtle ways above, here's where I hit
> you over the head with a 2x4 to get your attention, ya ornery
> mule. :-)
> What you have just enunciated is a recipe for documentation that
> *sucks*.  I won't do it, and I *will* teach you how and why not to
> fuck up like this if you're not utterly impervious.
> When your content is "Do A and B and C, and if you wander off the
> narrow path *dragons will eat you*", you are stiffing your users.
> You are, among other things, not supporting their ability to cope if
> reality wanders outside of the scenarios you imagined when you were
> documenting.
> *Good* documentation doesn't merely teach facts and procedures, it
> nurses the ability to adapt and improvise intelligently.  It does this
> by presenting a causal model that can be applied not merely when
> things go right but when they go wrong - and not merely in the
> exact circumstances the author had in mind but in conditions the
> author didn't anticipate.  It conveys not just operation but
> understanding.
> Saying that a mode of operation is "unsupported" is justified when
> that mode yields results that are random or dangerous.  It is *not*
> justified when you are trying to avoid the discomfort of describing
> options that you think are bad policy.  The reader's priorities may
> be different than yours!  
> Now reread my new text and notice how at every step it *creates
> options*.  It doesn't say "don't do that!", it says "here are the
> consequences if you do". Instead of walling the user in, each warning
> gives him additional context with which to understand normal
> operation - and with which to troubleshoot if things don't go as
> expected.

Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
        address@hidden  Tel:+1(541)382-8588

Attachment: signature.asc
Description: PGP signature

reply via email to

[Prev in Thread] Current Thread [Next in Thread]