[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnu-arch-users] systems software research is irrelevant
From: |
Thomas Lord |
Subject: |
[Gnu-arch-users] systems software research is irrelevant |
Date: |
Tue, 23 Aug 2005 15:32:28 -0700 |
The list of my dead projects and their statuses upon death.
Contents:
17:* UIOS -- a naturalistic "universal I/O API"
91:* ULOCK -- a basis for transactional IPC on UIOS
106:* UMSG - a transaction control protocol for UIOS
121:* UMBOX - a persistent queue for UMSG messages
142:* UFLDR - a database storage manager for UIOS
178:* REVC -- a revision control system storage manager
207:* Awiki -- towards a generic wiki-syntax translator
257:* Miscellaneous
* UIOS -- a naturalistic "universal I/O API"
Experience from unix and plan 9 demonstrates the fruitfulness of
using a simple hierarchical namespace and bytestream-oriented data
as a generic IPC infrastructure. In this light, a unix-like
filesystem is just a special case: it can be understood as a service
which (more or less) persistently preserves the messages (file
contents) it is sent.
Traditionally, this model has always been realized in the context of
an operating system kernel and/or specialized network protocols:
unix and Plan 9 filesystems; "special files" on both of those
systems; the 9P network protocol. There are significant advantages
to this approach especially in the areas of simplicity of
implementation, performance, and the freedom to design a
semantically optimal API.
There is also a significant disadvantage to the traditional
approach: code written against a "native" HFS-IPC API is not
easily ported to a host which does not provide that API.
The traditional approach of implementing the HFS-IPC idea
at the OS level reinforces rather than transcends the
zero-sum game about "which OS will be dominant".
UIOS is based on the observation that a globally distributed
unix-like filesystem already exists, albeit with initially awkward
performance characteristics. All modern operating systems
provide hierarchical, byte-stream-oriented filesystems although
they differ critically in some details. Network protocols
FTP, WebDAV, SFTP, NFS, and others make these filesystems
available over the network.
UIOS asks the question: "what if the HFS-IPC API is based
on a least-common-denominator of the services people popularly
deploy, rather than on a greatest-common-denominator idea
of what form those services should take?"
In other words, UIOS aims to create an API which maps
cleanly and simply onto unix filesystems, Windows filesystems,
FTP, WebDAV, SFTP, NFS, etc.
As a least-common-denominator API, UIOS is necessarily anemic. For
example, random access to file contents can not be included because
not all of the network protocols we wish to subsume provide that
capability. (At the same time, the design of UIOS -- a handful of
"system calls" -- takes reasonable liberties, unjustified by the
specifications for the protocols, especially in assuming that the
`rename' service has certain transactional properties.) The anemia
here is, at least in principle, surmountable by stacking a
higher-level filesystem on top, using UIOS as a back-end.
The UIOS question is not "what ideal HFS-IPC API can we construct"
but rather "What HFS-IPC API can we construct over services that are
already universally deployed? Is the resulting API a useful-enough
`universe' in which to write wildly portable and interoperable
applications?" An ideal outcome would allow the UIOS API to be
made available by a trivial amount of code in multiple environments
(e.g., Javascript, C, every scripting language) and for that API to
be useful enough to write applications which treat the API as the
*sole* mechanism for IPC, including persistent data storage.
UIOS grew out of GNU Arch which contains a crude realization of it:
the "PFS" API, supporting access to unix, FTP, SFTP, WebDAV, and
(read-only) HTTP filesystem-like services.
I built a C implementation of UIOS, initially speaking only to
native unix system calls. The initial API to UIOS contains 12 basis
functions (and a smattering of convenience functions built on top of
those). Because I was rushed by financial conditions, my UIOS
implementation for C lacks the unit tests I would like and I haven't
yet plugged in support for FTP, SFTP, etc. The documentation is
half-decent but unpolished.
* ULOCK -- a basis for transactional IPC on UIOS
The ULOCK API (initially implemented in C) is a pure
UIOS client -- it relies on no other system services.
Assuming only that UIOS `rename' behaves reasonably, ULOCK provides
a flexible system for robust, persistent locks for use by processes
communicating via a UIOS filesystem. Software can leverage ULOCKs
to build sophisticated, distributed, transactional systems.
Because I was rushed by financial conditions, my ULOCK
implementation for C lacks the unit tests I would like. The
documentation is half-decent but unpolished.
* UMSG - a transaction control protocol for UIOS
UMSG allows the exchange of reliably ordered and transactionally
delivered "packets" between processes communicating over UIOS.
It expands on the capabilities of ULOCK to allow an ordered list of
message payloads to be delivered.
Each UMSG connection is a stream of packets, permitting multiple
readers and writers.
Because I was rushed by financial conditions, my UMSG implementation
for C lacks the unit tests I would like. The documentation is
half-decent but unpolished.
* UMBOX - a persistent queue for UMSG messages
Consider a communications service, implemented over a UIOS
filesystem using UMSG, with the additional property that
each message sent in a stream is preserved -- a mailbox.
A mailbox-like data structure is obviously useful for representing
any append-only list-like data but it is also more subtlety useful.
For example, consider a database in which the state of each "page"
(loosely speaking) can be inferred by a client if that client is
given, say, the last 10 messages which modified that page -- a
mailbox is also useful for maintaining the state of an atomic unit
of data.
UMBOX is, as usual, fully transactional.
And, as usual, because I was rushed by financial conditions, my
UMBOX implementation for C lacks the unit tests I would like. The
documentation is half-decent but unpolished.
* UFLDR - a database storage manager for UIOS
UFLDR is a "database storage manager" meaning that it implements
persistent (disk-resident) indexes and records. UFLDR database
structures are stored on UIOS, using UMBOX.
UFLDR provides ACID transactions permitting multiple readers
concurrent with a single writer (the writer may vary over
time but, only one at a time -- ULOCK tech is used to exclude
concurrent writers).
UFLDR provides a client/server message-based interface with a
built-in queue for incoming requests, a server API in support
of programs which translate requests into index updates, and
a read-only API for clients performing queries rather than
updates.
UFLDR uses write-ahead techniques to achieve transactional
robustness and functional (immutable) data structures to achieve
high throughput through greater concurrency.
UFLDR read-only clients -- those performing simple queries -- do
not need write access to the database at all: read-transaction locks
are shared and passively maintained. Thus, any client able to
implement the UIOS system calls `connect', `disconnect', `get',
and `list' can perform UFLDR queries.
UFLDR is especially useful for communication hubs such as
net-news servers or shared revision control archives. It was
intended to complement REVC (see below).
UFLDR is working but less "exercised" than the earlier-listed
modules. It suffers from the usual rush-induced lack of unit-tests
and unpolished documentation.
* REVC -- a revision control system storage manager
Historically, revision control systems have had to make
use of "delta compression" -- the storing of committed
revisions to a source tree as (roughly speaking) "diffs",
in order to save space.
The success of Arch proved that, at least client side for
people using contemporary development machines, delta-compressed
storage was not terribly important (c.f. revision libraries).
The design of git blew that door wide open: delta-compression is
no longer economically justified.
REVC (so far) provides only *some* of the functionality of a
complete revision control system (for example, I ran out of time
before I could port merging technology from Arch 1.x) but it cleanly
and simply implements high-integrity, easily P2P-able storage for
revision control data. It improves on git by being far less
vulnerable to future cracking of SHA1 and by choosing a storage
model which maps to the distribution problem more easily. It
improves on Arch 1.x by liberalizing the namespace and choosing a
more parsimonious archive format (e.g., a revc archive would work
quite well if overlayed with an ordinary FTP site for distributing
"tar-balls" of source). REVC is self-hosting and appears to be
quite scalable to very large trees. It eliminates the need for a
"revision library". It permits a controlled "editing of history".
* Awiki -- towards a generic wiki-syntax translator
I dissent from the popular opinion that the holy grail of
wiki-syntax parsers is a single-pass parser.
Wiki texts, by definition essentially, are short documents. They
are legible (and ideally *attractive*) when viewed in source form.
It is *just fine* to spend a few extra cycles parsing them if the
result is a syntax which "puns" nicely -- being reasonable both in
source form and in multiple output forms.
Moreover, as various wiki syntax authors have finally come to
(re-)realize: it is critical that at least subsets of the syntax be
highly structured (e.g., so that users can enter database records
using wiki syntax).
Moreover, as is being (re-)realized, wiki syntaxes need to be openly
extensible so that application and domain-specific syntaxes can be
added (math, chemical equations, database records, etc.).
Awiki is (currently) a "mature, functional prototype". I developed
the idea of "recursive decomposition" as a strategy for parsing
wiki-text: parsing examines the broad structure of an entire text
and breaks it up into pieces (say, "sections"). Each piece is
recursively broken up using different parsing rules (say,
"paragraphs"). Recursive decomposition continues until the entire
text is resolved. I was able to make the code implementing these
levels of parsing very regular, which is the critical point:
my hard-coded parser can be replaced by one which is driven by
a data-only syntax specification.
A pleasing and application-appropriate side-effect of this approach,
*especially* when combined with the idea that source texts should be
at least legible and ideally attractive, is error recovery: an
unparsable subtext can be rendered as its source without
invalidating the parse of subsequent parts of the text.
I coded Awiki in highly stylized C for a particular syntax. I
hard-coded that syntax. In the process, I believe I've figured out
roughly how to, instead -- in the next major revision of Awiki --
make the recursive parsing process nicely data-driven: one can
describe a wiki syntax (or its extensions) through static
declarations, similarly to how one can use YACC.
Awiki thus needs a complete rewrite but it is the system I've used
for a couple of years now to generate Arch documentation and all of
my web pages. (Substituting ".txt" for ".html" will reveal source
for most of those pages.)
* Miscellaneous
- VU: a hook-based layer over the basic unix filesystem calls.
- Rx: a very good Unicode-capable regexp engine, about ready for
another round of overhaul
- Hackerlab Unicode string foo: An "adaptive representation"
(choosing UTF-8, UTF-16 or other encoding forms on-the-fly to
ensure O(1) random access to string contents) string library.
- Pika Scheme: Finally, a Scheme run-time-system based on a
GC-strategy-neutral API
- XL: A high-level language based on the composition of
concurrently operating finite state machines each specified
in a referentially transparent language.
- many stale things
I've omitted various other thwarted projects over the years.
"Like tears in the rain. Time to die."
-t
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Gnu-arch-users] systems software research is irrelevant,
Thomas Lord <=