[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] systems software research is irrelevant

From: Thomas Lord
Subject: [Gnu-arch-users] systems software research is irrelevant
Date: Tue, 23 Aug 2005 15:32:28 -0700

The list of my dead projects and their statuses upon death.


   17:* UIOS -- a naturalistic "universal I/O API"
   91:* ULOCK -- a basis for transactional IPC on UIOS
  106:* UMSG - a transaction control protocol for UIOS
  121:* UMBOX - a persistent queue for UMSG messages
  142:* UFLDR - a database storage manager for UIOS
  178:* REVC -- a revision control system storage manager
  207:* Awiki -- towards a generic wiki-syntax translator
  257:* Miscellaneous

* UIOS -- a naturalistic "universal I/O API"

  Experience from unix and plan 9 demonstrates the fruitfulness of
  using a simple hierarchical namespace and bytestream-oriented data
  as a generic IPC infrastructure.  In this light, a unix-like
  filesystem is just a special case: it can be understood as a service
  which (more or less) persistently preserves the messages (file
  contents) it is sent.

  Traditionally, this model has always been realized in the context of
  an operating system kernel and/or specialized network protocols:
  unix and Plan 9 filesystems; "special files" on both of those
  systems; the 9P network protocol.   There are significant advantages
  to this approach especially in the areas of simplicity of
  implementation, performance, and the freedom to design a
  semantically optimal API.

  There is also a significant disadvantage to the traditional
  approach: code written against a "native" HFS-IPC API is not
  easily ported to a host which does not provide that API.
  The traditional approach of implementing the HFS-IPC idea
  at the OS level reinforces rather than transcends the 
  zero-sum game about "which OS will be dominant".

  UIOS is based on the observation that a globally distributed 
  unix-like filesystem already exists, albeit with initially awkward
  performance characteristics.   All modern operating systems
  provide hierarchical, byte-stream-oriented filesystems although
  they differ critically in some details.   Network protocols
  FTP, WebDAV, SFTP, NFS, and others make these filesystems
  available over the network.

  UIOS asks the question:  "what if the HFS-IPC API is based
  on a least-common-denominator of the services people popularly
  deploy, rather than on a greatest-common-denominator idea
  of what form those services should take?"

  In other words, UIOS aims to create an API which maps 
  cleanly and simply onto unix filesystems, Windows filesystems,
  FTP, WebDAV, SFTP, NFS, etc.

  As a least-common-denominator API, UIOS is necessarily anemic.  For
  example, random access to file contents can not be included because
  not all of the network protocols we wish to subsume provide that
  capability.  (At the same time, the design of UIOS -- a handful of
  "system calls" -- takes reasonable liberties, unjustified by the
  specifications for the protocols, especially in assuming that the
  `rename' service has certain transactional properties.)  The anemia
  here is, at least in principle, surmountable by stacking a
  higher-level filesystem on top, using UIOS as a back-end.

  The UIOS question is not "what ideal HFS-IPC API can we construct"
  but rather "What HFS-IPC API can we construct over services that are
  already universally deployed?  Is the resulting API a useful-enough
  `universe' in which to write wildly portable and interoperable
  applications?"   An ideal outcome would allow the UIOS API to be
  made available by a trivial amount of code in multiple environments
  (e.g., Javascript, C, every scripting language) and for that API to
  be useful enough to write applications which treat the API as the
  *sole* mechanism for IPC, including persistent data storage.

  UIOS grew out of GNU Arch which contains a crude realization of it:
  the "PFS" API, supporting access to unix, FTP, SFTP, WebDAV, and
  (read-only) HTTP filesystem-like services.

  I built a C implementation of UIOS, initially speaking only to
  native unix system calls.  The initial API to UIOS contains 12 basis
  functions (and a smattering of convenience functions built on top of
  those).  Because I was rushed by financial conditions, my UIOS
  implementation for C lacks the unit tests I would like and I haven't
  yet plugged in support for FTP, SFTP, etc.   The documentation is
  half-decent but unpolished.

* ULOCK -- a basis for transactional IPC on UIOS

  The ULOCK API (initially implemented in C) is a pure
  UIOS client -- it relies on no other system services.

  Assuming only that UIOS `rename' behaves reasonably, ULOCK provides
  a flexible system for robust, persistent locks for use by processes
  communicating via a UIOS filesystem.   Software can leverage ULOCKs
  to build sophisticated, distributed, transactional systems.

  Because I was rushed by financial conditions, my ULOCK
  implementation for C lacks the unit tests I would like.  The
  documentation is half-decent but unpolished.

* UMSG - a transaction control protocol for UIOS

  UMSG allows the exchange of reliably ordered and transactionally
  delivered "packets" between processes communicating over UIOS.
  It expands on the capabilities of ULOCK to allow an ordered list of 
  message payloads to be delivered.

  Each UMSG connection is a stream of packets, permitting multiple
  readers and writers.

  Because I was rushed by financial conditions, my UMSG implementation
  for C lacks the unit tests I would like.  The documentation is
  half-decent but unpolished.

* UMBOX - a persistent queue for UMSG messages

  Consider a communications service, implemented over a UIOS
  filesystem using UMSG, with the additional property that
  each message sent in a stream is preserved -- a mailbox.

  A mailbox-like data structure is obviously useful for representing
  any append-only list-like data but it is also more subtlety useful.
  For example, consider a database in which the state of each "page"
  (loosely speaking) can be inferred by a client if that client is
  given, say, the last 10 messages which modified that page -- a
  mailbox is also useful for maintaining the state of an atomic unit
  of data.

  UMBOX is, as usual, fully transactional.

  And, as usual, because I was rushed by financial conditions, my
  UMBOX implementation for C lacks the unit tests I would like.  The
  documentation is half-decent but unpolished.

* UFLDR - a database storage manager for UIOS

  UFLDR is a "database storage manager" meaning that it implements
  persistent (disk-resident) indexes and records.  UFLDR database
  structures are stored on UIOS, using UMBOX.

  UFLDR provides ACID transactions permitting multiple readers
  concurrent with a single writer (the writer may vary over
  time but, only one at a time -- ULOCK tech is used to exclude
  concurrent writers).

  UFLDR provides a client/server message-based interface with a 
  built-in queue for incoming requests, a server API in support 
  of programs which translate requests into index updates, and
  a read-only API for clients performing queries rather than 

  UFLDR uses write-ahead techniques to achieve transactional
  robustness and functional (immutable) data structures to achieve
  high throughput through greater concurrency.

  UFLDR read-only clients -- those performing simple queries -- do
  not need write access to the database at all: read-transaction locks
  are shared and passively maintained.  Thus, any client able to
  implement the UIOS system calls `connect', `disconnect', `get',
  and `list' can perform UFLDR queries.

  UFLDR is especially useful for communication hubs such as
  net-news servers or shared revision control archives.  It was
  intended to complement REVC (see below).

  UFLDR is working but less "exercised" than the earlier-listed
  modules.  It suffers from the usual rush-induced lack of unit-tests
  and unpolished documentation.

* REVC -- a revision control system storage manager

  Historically, revision control systems have had to make
  use of "delta compression" -- the storing of committed
  revisions to a source tree as (roughly speaking) "diffs",
  in order to save space.

  The success of Arch proved that, at least client side for
  people using contemporary development machines, delta-compressed
  storage was not terribly important (c.f. revision libraries).

  The design of git blew that door wide open: delta-compression is 
  no longer economically justified.

  REVC (so far) provides only *some* of the functionality of a
  complete revision control system (for example, I ran out of time
  before I could port merging technology from Arch 1.x) but it cleanly
  and simply implements high-integrity, easily P2P-able storage for
  revision control data.  It improves on git by being far less
  vulnerable to future cracking of SHA1 and by choosing a storage
  model which maps to the distribution problem more easily.  It
  improves on Arch 1.x by liberalizing the namespace and choosing a
  more parsimonious archive format (e.g., a revc archive would work
  quite well if overlayed with an ordinary FTP site for distributing
  "tar-balls" of source).  REVC is self-hosting and appears to be
  quite scalable to very large trees.  It eliminates the need for a
  "revision library".  It permits a controlled "editing of history".

* Awiki -- towards a generic wiki-syntax translator

  I dissent from the popular opinion that the holy grail of
  wiki-syntax parsers is a single-pass parser.

  Wiki texts, by definition essentially, are short documents.  They
  are legible (and ideally *attractive*) when viewed in source form.
  It is *just fine* to spend a few extra cycles parsing them if the
  result is a syntax which "puns" nicely -- being reasonable both in
  source form and in multiple output forms.

  Moreover, as various wiki syntax authors have finally come to
  (re-)realize: it is critical that at least subsets of the syntax be
  highly structured (e.g., so that users can enter database records
  using wiki syntax).

  Moreover, as is being (re-)realized, wiki syntaxes need to be openly
  extensible so that application and domain-specific syntaxes can be
  added (math, chemical equations, database records, etc.).

  Awiki is (currently) a "mature, functional prototype".  I developed
  the idea of "recursive decomposition" as a strategy for parsing
  wiki-text:  parsing examines the broad structure of an entire text
  and breaks it up into pieces (say, "sections").  Each piece is
  recursively broken up using different parsing rules (say,
  "paragraphs").   Recursive decomposition continues until the entire
  text is resolved.  I was able to make the code implementing these
  levels of parsing very regular, which is the critical point:
  my hard-coded parser can be replaced by one which is driven by
  a data-only syntax specification.

  A pleasing and application-appropriate side-effect of this approach,
  *especially* when combined with the idea that source texts should be
  at least legible and ideally attractive, is error recovery:  an
  unparsable subtext can be rendered as its source without
  invalidating the parse of subsequent parts of the text.

  I coded Awiki in highly stylized C for a particular syntax.  I
  hard-coded that syntax.   In the process, I believe I've figured out
  roughly how to, instead -- in the next major revision of Awiki --
  make the recursive parsing process nicely data-driven:  one can
  describe a wiki syntax (or its extensions) through static
  declarations, similarly to how one can use YACC.

  Awiki thus needs a complete rewrite but it is the system I've used
  for a couple of years now to generate Arch documentation and all of
  my web pages.  (Substituting ".txt" for ".html" will reveal source
  for most of those pages.)

* Miscellaneous

  - VU: a hook-based layer over the basic unix filesystem calls.

  - Rx: a very good Unicode-capable regexp engine, about ready for 
    another round of overhaul

  - Hackerlab Unicode string foo: An "adaptive representation"
    (choosing UTF-8, UTF-16 or other encoding forms on-the-fly to
    ensure O(1) random access to string contents) string library.

  - Pika Scheme: Finally, a Scheme run-time-system based on a
    GC-strategy-neutral API

  - XL: A high-level language based on the composition of 
    concurrently operating finite state machines each specified
    in a referentially transparent language.

  - many stale things
    I've omitted various other thwarted projects over the years.

"Like tears in the rain.  Time to die."

reply via email to

[Prev in Thread] Current Thread [Next in Thread]