gpsd-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gpsd-dev] Clarifications needed for the time-service HOWTO


From: Gary E. Miller
Subject: Re: [gpsd-dev] Clarifications needed for the time-service HOWTO
Date: Mon, 21 Oct 2013 19:10:06 -0700

Yo Eric!

On Mon, 21 Oct 2013 13:34:07 -0400
"Eric S. Raymond" <address@hidden> wrote:

> Other experts, feel free to chime in.
> 
> Gary E. Miller <address@hidden>:
> > > Is it really true that most public NTP servers are Stratum 2, or
> > > are there more layers in normal use?
> > 
> > Maybe most, but you'll see a lof of 1 and 3's.
> 
> I've been doing research. Revised text near the end of first section:
> 
>      You will hear time service people speak of "Stratum 0" (the
> reference clocks) "Stratum 1" (NTP servers directly connected to
> reference clocks over a path with known and compensated-for latency)
>      "Stratum 2" (publicly accessible servers that get time from
> Stratum 1 over a network link.) Stratum 3 chimers redistribute time
> from Stratum 2, and so forth. There are defined higher strata up to
> 15, but you will probably never see a public chimer higher than
> Stratum 3.
> 
>      Ordinary client computers are normally configured to get time
> from one or more Stratum 2 (or less commonly Stratum 3) servers. With
> GPSD and a suitable GPS, you can easily condition your clock to higher
>      accuracy than typical Stratum 2; with a little effort you can do
>      better than public Stratum 1 servers.
> 
> If this is misstating the facts in any way - for example, if Stratum 3
> and up servers are more common than we are implying here - someone
> please speak up.

As mentioned early today, the stratum is more like a hop count.  Any
one server may be stratum 1, stratum 2 and stratum at different times
of the day.

In the example I cut into the HOWTO just now there are 2 well respected
time servers differing by 12 mSec!

> > > More generally: what can I discover about the quality of the
> > > chimers I listen to?
> > 
> > Just compare several. 
> 
> "Just compare several".  How delightfully vague!  What I need to
> document for the HOWTO is *how to do this*.  Concrete procedure.

I have started to add this to the HOWTO.  Notice the new ntpq output.
The 'jitter' numbers are a good indication of stability.  The offset
relativve to other servers is also a good indication of problems.

> (1) What reporting tool do I run?  

I like to do this:
        watch ntq -p

> (2) Where among the numbers it will display for each chimer is
> the figure of merit I should be paying attention to? 

Jitter and Offset.  A large offset is probably indicative of an assymetric
net path.  Something ntpd has a hard time compensating for.  And offset
of 16 is probably bad leap seconds.

> (3) What do reasonable values of that figure look like?  What
> do weird outliers look like?

Reasonabkle depends on your needs.  If you just want to know what second
it is now then any will usually do.  If you Are trying to see how good youR
1 uSec PPS clock is then you will be picky indeed.

> It would be illuminating if you replied with a transcript of how the
> report looks on your system and pointed out which numbers are the
> significant ones.  If you can include a contrasting report from a
> system with bad chimers, please do.

See the just updated HOWTO.  I suggest that you pick some servers on the
other side of the planet from an ntp pol, put them in your ntp.conf
and see.

> >                       Then the bad (to you) ones will just stand
> > out.  Some are just bad, some will not have a good network
> > connection to you and will appear bad.
>
> That second sentence is *useful*.  New text:
>
>     A chimer can be a poor performer (what the inventor of NTP
> whimsically calls a "falseticker") for either of two reasons. It may
> be shipping bad time, or the best routes between you and it have large
> latency variations.  (Large but fixed latencies can be compensated out
> using a fudge.)

Sure.

> > > How specific can we be about time jitter?  Is this a topic for the
> > > HOWTO at all?
> >
> > We can describe it, but since it is the error part, it will be
> > specific to chimers, time sources, networks and clients.
>
> What sorts of jitter are produced by different parts of the delivery
> chain?  What do typical magnitudes look like?

The jitter from NMEA on a local GPS can be a few hundred mSec.  Run a
traceroute to a server and you see how bad you network can be.  Cable
modems could be up to 500 mSec (and they are wildly asymetric).  The
jitter from a local GPS s defined by the USB transfer, around 1 mSec
for USB 1.1 The jitter from interrupt processing can be 100uSec and
sometimes much worse on a loaded system.

> > >    Those hotplug devices will, however, may be able to use plain,
> > > non-kernel PPS. gpsd tries to automatically fall back to this when
> > > absence of root permissions makes KPPS unavailable. This fallback
> > > is complicated by the fact that gpsd needs to communicate to ntpd
> > > in a different way in root and non-root mode.  This complicates
> > > the configuration in ways beyond the scope of this document and is
> > > strongly discouraged in practice.
> > >
> > > This paragraph troubles me. I'm not sure, but I think it may be
> > > conflating two different issues and two sets of constraints.
> >
> > Yes, two related issues.  KPPS to PPS fallback, and the problems of
> > fallback to non-root.  In general we should just discourage non-root
> > and say it is bad, do not do that.
> Here's how you do this sort of thing right.  First, supply motivation
> - why privilege-dropping happens:
>
>     In order to present the smallest possible attack surface to
>     privilege-escalation attempts, gpsd run as root drops its root
>     privileges very soon after startup - just after it has opened any
>     serial device paths passed on the command line.

Which is a tad too soon.  It also needs to nice() itself, open
needed serial and PPS devices, open required SHM or sockets.

>     Thus, KPPS can only be used with devices passed that way, not with
> GPSes that are later presented to gpsd by the hotplug system.  Those
> hotplug devices will, however, may be able to use plain, non-kernel
> PPS. gpsd tries to automatically fall back to this when absence of
> root permissions makes KPPS unavailable.

Before saying this I would like to see it tested.

> (Here comes the don't-do-that.)
>
>     In general, if you start gpsd as other than root, the following
>     things will happen that slightly degrade the accuracy of reported
>     time:
>
>     1. Devices passed on the command line will be unable to use KPPS
> and will fall back to the same plain PPS that all hotplug devices must
> use, increasing the associated error from ~1 uSec to about ~5 uSec.

Or ptotentially MUCH worse on a single core CPU.

>     2. gpsd will be unable to renice itself to a higher priority.
> This action helps protect it against jitter induced by variable system
> load. It's particularly important if your NTP server is a general-use
> computer that's also handling mail or web service or development.
>
>     3. The way you have to configure ntpd and chrony will change away
>     from what we show you here; ntpd will need to be told different
>     shared-memory segment numbers, and chrony will need a different
>     socket location.
>
>     You may also find gpsd can't open serial devices at all if your OS
>     distribution has done "secure" things with the permissions.
>
> (Notice that the don't-do-that is presented in a way that increases
> the reader's options rather than decreasing them.  Now we transition
> to "here is best practice".)
>
>     When in doubt, the preferred method to start your timekeeping is:
>
>     $ su - # killall -9 gpsd ntpd # ntpd -gN # sleep 2 # gpsd -n
>     /dev/ttyXX # sleep 2 # cgps
>
>     where /dev/ttyXX is whatever 1PPS-capable device you have.  In the
>     rest of these setup instructions will assume that you are starting
>     gpsd as root, with occasional glances at the non-root case.
>
> > > Which set of ntpd segments GPSD can use is constrained by whether
> > > it started up as root or not.
> >
> > Worse, by whether it is root or not when initialized, which may be
> > at hot plug time.
>
> I believe this is incorrect. All shared-memory segments are opened
> in ntpshm_init(), which is called before privilege-dropping and well
> before gpsd begins accepting hotplug notifications.  Please review
> the code to either verify this or point out where and why I'm full of
> crap.

Assuming gpsd was started as root.  But the most common usage of gpsd
as non-root is being started as a common user or a restricted system 
user.

> > > 2) GPSD started as root; device is hotplugged. GPSD will use
> > > privileged ntpd segments 0 and 1,
> >
> > No.  It will use units 2 and 3.  Which is likely not what is in
> > ntp.conf and in practive is not a fail.
>
> Again, I believe this is incorrect.

Maybe in your exact case.  But the common case is where gpsd is not started,
a GPS is plugged in, udevd starts gpsd as non-root.

> > The problem with just keeping the first sentence is the user is not
> > left with an idea of the severity of the problems he will encounter.
>
> Which is why the right thing to do is *document those problems
> explicitly *. As I have done.

The problem is that a beginners document has gotten scary long.  Maybe
best to split in two.  A beginners document: do it this way and it will work.
And an advanced document:  here are some knobs you can adjust to explore
the edge cases.

> What you have just enunciated is a recipe for documentation that
> *sucks*.  I won't do it, and I *will* teach you how and why not to
> fuck up like this if you're not utterly impervious.

Well, we'll prolly dsagree a bit, but i respect the process.

> When your content is "Do A and B and C, and if you wander off the
> narrow path *dragons will eat you*", you are stiffing your users.
> You are, among other things, not supporting their ability to cope if
> reality wanders outside of the scenarios you imagined when you were
> documenting.

I'm not saying to stiff them.  Just to keep them away from the dragons
until after the user sees some success.  Then on to the dragon taming
stuff.

> *Good* documentation doesn't merely teach facts and procedures, it
> nurses the ability to adapt and improvise intelligently.  It does
> this by presenting a causal model that can be applied not merely when
> things go right but when they go wrong - and not merely in the exact
> circumstances the author had in mind but in conditions the author
> didn't anticipate.  It conveys not just operation but understanding.

Agreed, but if the doc is so long the users start skipping parts we
are back to the Millspeak problem.

Maybe we need to hash all the stuff out, then call this the advanced
version and come up with a quickstart version as a next phase.

RGDS
GARY
---------------------------------------------------------------------------
Gary E. Miller Rellim 109 NW Wilmington Ave., Suite E, Bend, OR 97701
        address@hidden  Tel:+1(541)382-8588

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]