Re: [Gnu-arch-users] Re: MD5 is broken

gnu-arch-users
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnu-arch-users] Re: MD5 is broken

From:	Tom Lord
Subject:	Re: [Gnu-arch-users] Re: MD5 is broken
Date:	Mon, 21 Mar 2005 13:42:59 -0800 (PST)


I realize that this is a very long essay but the point is: the idea of
adding more hash functions is based on an architectural misconception.
Instead of adding SHA1 in addition to MD5, it would be far better
to make signing not depend on the checksum data at all and just leave
the use of MD5 alone.

   From: Jan Hudec <address@hidden>

   I agree here. While the hash was not broken in a way that actually
   allows attack on Arch, it is quite likely that such breach will appear
   in not-so-distant future (a year or two). Thus I would advocate to add
   the extra hash now too.

The essential role of MD5 in arch is to provide a checksum which
protects against non-malicious corruption of archive data.  MD5 was
chosen because it is formally specified, widely reimplemented, widely
studied, and more than ample for the requirements of the application.

[Earlier discussions, especially around the time MD5 was added, were
not sharply focussed on protection against non-malicious corruption.
Later, the signing mechanism made a pretty sketchy use of the MD5
data and added to the confusion.  Nevertheless, that first paragraph
about the "essential role" of MD5 in arch remains true.  See below
for a discussion that makes this clearer.]

>From this perspective, the addition of more hash functions is entirely
superfluous and, indeed, reckless.   They would do little to improve
robustness against non-malicious archive corruption.

The accidental role of MD5 in arch is that the signing mechanism was
quickly built to take advantage of it.  Accidentally, archive security
against malicious attack (as typically implemented) depends on the
signing mechanism, which in turn depends on MD5.  Although arch
checksums shouldn't have a deep role in security, because of the
structure of the signing mechanism, they do.  

So it looks at first glance like the solution is to add more hash
functions but, on deeper examination, the solution is to generalize
the signing mechanism instead.

Consider this picture:


           Administrative Schematic for an Arch Deployment

                                                     +-----------+
    +------------------------+                      +-----------+|
    |                        |                     +-----------+||
    |   <your favorite       |  <-- std. hooks --> | <security |||
    |    implementation of   |                     |  "plugin">||+
    |    arch goes here>     |                     | (multiple)|+
    |                        |                     +-----------+
    +------------------------+  <-.                  +-----------+
                                   `- std.          +-----------+|
                                      protocols -> +-----------+|+
                                                   | <servers> |+
                                                   +-----------+

  (of course, other kinds of standard hooks for other things 
  clients ought to agree on)

which might be conceived of by user's as:


               Logical Schematic for an Arch Deployment

                                                     +-----------+
    +------------------------+                      +-----------+|
    |                        |                     +-----------+||
    |   <core arch>          |  <-- std. hooks --> | <security |||
    |                        |                     |  "plugin">||+
    |                        |                     | (multiple)|+
    |                        |                     +-----------+
    +------------------------+  <-.                  +------------+
                                   `- archive       +------------+|
                                      registry ->  +------------+||
                                                   | <archive   |||
                                                   |  locations,|||
                                                   |  params,   ||+
                                                   |  servers>  |+
                                                   +------------+


The box on the left in those diagrams -- core arch -- has an internal
structure that is related to these pictures:


        +------------------+
        |  high level arch |
        |  algorithms      |
        +------------------+
                ^
                |
         least-common-denominator
         "global transactional
          filesystem" API
                |
                v
        +------------------+
        | protocol         |
        |  translator      |
        +------------------+
          ^             ^
          |             |
         std.          std.
         hooks         protocols
          |             |
          v             v
       (user supplied security
        plugins and servers)



The l.c.d. filesystem can not be a *robust media*: real disks and
network filesystems silently corrupt data at measurable rates.  A
simple "protocol translator" can't erase that property.  Network
protocols (e.g., ftp) inherit that non-robustness property from the
disk at the server end and sometimes make it worse depending on how
they use the network.  (None of this has anything to do with malicious
attacks on databases or with access protection for databases -- this
is all about "corrupt blocks" and the like.)

The high level algorithms of arch need a more robust media
than the l.c.d. filesystem can natively be counted on for.
That's why MD5 is used and explains exactly where it fits
in the diagram:


        +------------------+
        |  high level arch |
        |  algorithms      |
        +------------------+
               ^
               |
         "arch-shaped global
          transactional 
          filesystem" API
               |
               v
        +------------------+
        |  arch txn engine |
        |  including       |
        |  checksum        |
        |  verifiation     |
        |  (MD5)           |
        +------------------+
                ^
                |
         "least-common-denominator
          global transactional
          filesystem" API
                |
                v
        +------------------+
        | protocol         |
        |  translator      |
        +------------------+
          ^             ^
          |             |
         std.          std.
         hooks         protocols
          |             |
          v             v
       (user supplied security
        plugins and servers)



MD5's essential role is just to turn unrobust file storage into robust
file storage, at least to the degree needed for storing arch data.
It's strength as a hash function is ample for this use -- it just has
to reliably trap things like truncated files and "NFS 0 blocks" in
blobs of binary data retrieved by high-level arch.

The l.c.d. filesystem *also* lacks a uniform security model
and, worse, most public nodes in the system ought to be 
regarded as insecure.

That presents a second problem to arch: revision control
data is an obvious target for attack and historical 
examples of attacks exist.  How can arch store revision
control data on an insecure file system?

The security requirements for a revision control system are
difficult to state in a general way.  Different deployments
are likely to have different user identity and authentication
realms and mechanisms;  different deployments are likely to 
impose different requirements about the structure and function
of administrative controls over access requirements.

At the same time there's a powerful need for a default "best 
practices" solution which public FOSS projects can adopt as
a de facto standard.

Because the security requirements are so open-ended, we envision
a "plugin-oriented" approach to solutions, with the current
signing hooks standing in as the first approximation of a 
framework for those plugins.

How does this fit into the architectural conceptions we're 
developing of core arch?

One choice is to add it above the protocol translator:

        +------------------+
        |  high level arch |
        |  algorithms      |
        +------------------+
               ^
               |
         "arch-shaped global
          transactional 
          filesystem" API
               |
               v
        +------------------+
        |  arch txn engine |
        |  including       |
        |  checksum and    |
        |  authentication  |
        |  verifiation     |
        |  (MD5 + security |
        |   plugins )      |
        +------------------+
                ^
                |
         "least-common-denominator
          global transactional
          filesystem" API
                |
                v
        +------------------+
        | protocol         |
        |  translator      |
        +------------------+
          ^             ^
               ...

Another choice is to add it *to* the protocol translator

        +------------------+
        |  high level arch |
        |  algorithms      |
        +------------------+
               ^
               |
         "arch-shaped global
          transactional 
          filesystem" API
               |
               v
        +------------------+
        |  arch txn engine |
        |  including       |
        |  checksum and    |
        |  verifiation     |
        |  (MD5)           |
        +------------------+
                ^
                |
         "least-common-denominator
          global transactional
          filesystem" API
                |
                v
        +--------------------+
        | protocol           |
        |  translator        |
        |  (including        |
        |   security plugins)|
        +--------------------+
          ^             ^
               ...


The first diagram most closely resembles the current signing mechanism.

One could argue, somewhat twistedly, that the second diagram is also
perfectly consistent with the current signing mechanism.  That's a win
in the sense that it means neither diagram is precluded while remaining
consistent.

I am fairly certain (:-) that the second diagram is the better
architecture because it allows for transport-based security in
addition to data-based security.  For example, a security plugin might
go through elaborate authentication of a server *connection* rather
than of arch data -- the local policy where such a plugin is used
being that data to and from that server is trustworthy.

The lower-level placement of security plugins leaves arch resting on
top of a non-robust and insecure global transactional filesystem --
fixing non-robustness with checksums and being unconcerned with
security because the protocol translator can be configured to carve
out "secure regions" within the overall insecure global filesystem.

In practical terms, that means that the current linkage between the
signing mechanism and the MD5 code is a layering violation:  code
that should be part of the protocol translator (signing) is depending
on code in a higher layer (checksum code in the arch transaction engine).

As a strawman -- an improved signing system might allow me to configure
a particular archive such that every regular file written to the 
server is accompanied by a detached signature (".sig") file, and that 
these are always verified as files are read back.

As a tinman -- "dumb archive" archives should not rely on accurate
directory listings from the server.   "readdir" should, in some sense,
be eliminated from the least-common-denominator global transactional
filesystem interface (one could blame HTTP but, in truth, "readdir"
is also unreliable in other systems).

Here is an approach to designing the tinman:

If we begin with a system *without* "readdir", but with transactional
"mkdir", "rename", and "putfile" we can implement a robust "readdir"
which reports only files and directories created by authorized clients
and which provides validating clients with some measure of protection
against files and directories maliciously removed.  (This is 
more or less a matter of "transactionalizing" the management of 
.listing files).

In a scenario like that, it would be very convenient if the interface
to security plugins were called from the protocol translators in
the diagram above rather than in the transaction engine.

------------

I realize that that's a very long essay but the point is: the idea of
adding more hash functions is based on an architectural misconception.
Instead of adding SHA1 in addition to MD5, it would be far better
to make signing not depend on the checksum data.


-t
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [Gnu-arch-users] Re: MD5 is broken, (continued)
Prev by Date: [Gnu-arch-users] Re: Source Code Managers Carnival
Next by Date: Re: [Gnu-arch-users] Re: MD5 is broken
Previous by thread: Re: [Gnu-arch-users] Re: MD5 is broken
Next by thread: Re: [Gnu-arch-users] Re: MD5 is broken
Index(es):
- Date
- Thread