guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

CapC: making C programs safe


From: Mark Seaborn
Subject: CapC: making C programs safe
Date: Mon, 31 Dec 2001 14:48:04 GMT

Hello Guile developers,

Back in September I e-mailed this list about a scheme I had been
working on for translating C programs into a memory-safe language.
Since then I've been implementing this, writing a C compiler which I'm
calling `CapC' (as it implements C using capabilities).  It's now at a
point where it will run a demonstration program.  I've put a copy of
it at:

  <http://www.srcf.ucam.org/~mrs35/comp/safe-c/>

The relevance of this to Guile is that it lets C programs be subjected
to precise garbage collection.  So a C program using Guile could be
recompiled with this scheme to give a potential performance
improvement, as well as improving reliability.  Also, Guile itself
could be recompiled with CapC, after stripping out its garbage
collector, to use the garbage collector provided by whatever backend
CapC uses.  This could be applied to other interpreters, like Emacs
Lisp.

Here's the longer introduction from the Web site:

-----------------

CapC is a C compiler which aims to convert any C program into one that
is memory-safe, without human intervention.  The behaviour of a C
program will only change in the cases where it is buggy or malicious,
in which case it will throw an error at run-time.

How is this done? It is not enough to check that every memory access
is for a location in an allocated block (as various debuggers do),
since it is easy to overflow an array bound and get a pointer to the
wrong memory block.

CapC assumes that a block of memory should only be accessed through
pointer values that are dependent on the block's address (a number
which is arbitrarily chosen by `malloc').  This can be approximated by
storing, with each word-sized value, the set of blocks whose address
the value is dependent on.  A numeric constant would evaluate to a
value associated with an empty set, while a call to `malloc' would
return a value associated with a set containing one block.  These sets
are merged on arithmetic operations, and checked on pointer accesses.
Effectively, an abstract `word' data type is provided, through which
blocks containing further words can be indirectly accessed.

I am optimistic that the resulting safe program can be made fast
enough that CapC can be used not just as a debugging aid, but as a way
to compile programs normally.  This can be done using static analysis.

What are the implications of this?

 * It can be used as a debugging aid, to catch bugs during testing.

 * It can be used to improve security, preventing buffer overrun
   exploits.

 * It lets precise garbage collection be applied to C programs.

 * It lets C programs be made portably persistent, allowing the state
   of a program to be saved to disc periodically so that it could be
   restored after a system crash or migrated to a different machine --
   and without relying on operating-system specific features for
   implementing persistence.

 * It makes it easier to interoperate languages:

    * A C library and a program in a high-level language can run under
      the same run-time system (once the C code is recompiled).  The
      high-level language's run-time system no longer has to be
      designed to work with traditionally-compiled C code.
    * CapC could be used to process C header files to provide
      high-level programs with direct access to C structs and
      functions.
    * An ageing language implementation, such as Emacs Lisp, could be
      rejuvenated by removing its garbage collector and compiling it
      with CapC, which can provide a more efficient garbage collector.

 * CapC turns C's pointers into references (which memory-safe
   languages like Scheme, Java and ML have), also known as
   capabilities.  The same approach can be used to turn filenames into
   capabilities.  This can be used to eliminate the Confused
   Deputy Problem, which is caused by Unix's setuid feature, and by
   principal/ACL-based security systems (such as Unix) in general.

 * The CapC compiler can serve as a basis for experimenting with
   extensions to C: it is hopefully easier to understand than a large
   compiler like gcc because it is small and written in a high-level
   language (OCaml).

CapC is based on a formal semantics of C written by Nikolaos
Papaspyrou.

Current progress:  CapC is currently an interpreter and can run a test
program slowly.  Some language features (eg. gotos, declaration
initializers) have not been implemented yet.  It's slow because the
Word type hasn't been optimised yet, and run-time variable lookups are
done via a binary tree; I'm going to add a compiler backend.

-- 
         Mark Seaborn
   - address@hidden - http://www.srcf.ucam.org/~mrs35/ -

                A bad tool blames its workman.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]