gnu-arch-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnu-arch-users] user space page-oriented, persistent transactional memo


From: Thomas Lord
Subject: [Gnu-arch-users] user space page-oriented, persistent transactional memory
Date: Wed, 11 Jan 2006 11:59:53 -0800

  More on building a user-space file system....

  `vudev', in the last message, simulates a raw disk which abstracts
  away most details of disk geometry.

  Real disks are increasingly likely to have very powerful
  controllers, a large chunk of fairly fast non-volatile ram, and of
  course a slow but huge capacity raw disk.  What should the
  controller be doing with its abundant compute capacity and
  non-volatile ram?

  I think that controllers ought to be doing more
  to help implement fast, ACID transactions.


* `vumnd' - page-oriented, ACID transactional memory

  The `vumnd' data structure is an array of fixed-size
  pages of bytes.   A client can transiently map 
  page-aligned, page-size-increment regions of data.

  (`vumnd' can be implemented in  bit under 2KLOC (having a
  good bitset library helps.  So, total size so far is about 2.5KLOC
  (vumnd + vudev)))

  vumnd divides the address space of pages into two parts:
  _ctrl pages_ and _heap pages_:


** ctrl pages

  For some constant, K, pages 0..K-1 are "ctrl pages".

  In a single write transaction a vumnd client can arbitrarilly modify
  the ctrl pages.  All of these changes take atomically become visible
  to other clients only at the successful completion of the
  transaction.


** Heap Pages

  All pages K..MAX are "heap pages".

  At all times, every heap page is either allocated or free.

  Allocated pages can be mapped for read-only purposes.  It is
  an error to try to map a non-allocated page.

  New pages are allocated by specifying their contents.  Thus,
  heap pages are write-once (until unallocated) and are 
  written at allocation time.

  In a single write transaction a vumnd client can allocate
  arbitrarilly many new pages.  The newly allocated pages will
  atomically become visible to other clients only at the the
  successful completion of the transaction.

** Rationale

  We schematically conceive of the economic sweetspot for tertiary 
  storage devices to be:

  


                storage host       <--->  .... to host system
                (on controller
                 g.p. system)
               ^              ^
               |              |
            fast             low-level
            non-volatile <-> controller
            RAM               |
                              |
                              v
                           raw storage

  vumnd "ctrl pages" are regions of the non-volatile ram made
  accessible to host system control.   The storage host gets
  to make sure that ctrl pages behave transactionally.

  vumnd "heap pages" are regions of the raw storage.   We
  simply give up on hard questions like concurrent writes
  or writes concurrent with reads.  The write-once-at-alloc-time
  is as much as we actually need for a file-system and is
  certainly a low common denominator of how real disks
  will be viewable.  The allocator can be implemented 
  storage-host side but, if not, is easy to implement
  host system side.   (If storage-host side, device/host 
  i/o bandwidth is conserved.)


* API


    * int vumnd_create (const t_uchar ** const err,
                         const t_uchar * const uri,
                         t_vudev_page_addr n_ctrl_pages);
    * t_vumnd_connection vumnd_connect (const t_uchar ** const err,
                                         const t_uchar * const uri);
    * t_vumnd_connection vumnd_dup (const t_uchar ** const err,
                                     t_vumnd_connection cxn);
    * int vumnd_disconnect (const t_uchar ** const err,
                             t_vumnd_connection cxn);

        Create, connect to, duplicate a connection to, or 
        disconnect from a vumnd-formated vudev virtual disk.



    * t_vudev_page_addr vumnd_n_ctrl_pages (const t_uchar ** const err,
                                             t_vumnd_connection cxn);

        The number ctrl (transactional) pages (addressed 0..N-1).


    * int vumnd_write_lock (const t_uchar ** const err,
                             t_vumnd_connection const cxn);
    * int vumnd_have_write_lock (const t_uchar ** const err,
                                  t_vumnd_connection const cxn);
    * int vumnd_write_unlock (const t_uchar ** const err,
                               t_vumnd_connection const cxn);
    * int vumnd_read_lock (const t_uchar ** const err,
                            t_vumnd_connection const cxn);
    * int vumnd_have_read_lock (const t_uchar ** const err,
                                 t_vumnd_connection const cxn);
    * int vumnd_read_unlock (const t_uchar ** const err,
                              t_vumnd_connection const cxn);

        Acquire, test or release a  write or read lock.


    * t_vumnd_chunk vumnd_pre_ctrl (const t_uchar ** const err,
                                     t_vumnd_connection const cxn,
                                     t_vudev_page_addr page,
                                     t_vudev_page_addr n_pages);
    * t_vumnd_chunk vumnd_pre_heap (const t_uchar ** const err,
                                     t_vumnd_connection const cxn,
                                     t_vudev_page_addr page,
                                     t_vudev_page_addr n_pages);

        Return a chunk of memory for reading the pre-transaction
        state of the indicated pages.   Heap pages must have been
        allocated at the start of the transaction.   


    * t_vumnd_chunk vumnd_post_ctrl (const t_uchar ** const err,
                                      t_vumnd_connection const cxn,
                                      t_vudev_page_addr page,
                                      t_vudev_page_addr n_pages);

        Return a chunk of memory for writing to a ctrl page.
        All changes made to a ctrl page will be made atomically
        visible to other clients only at the successful end of the
        transaction.

    * t_vudev_page_addr vumnd_post_alloc (const t_uchar ** const err,
                                           t_vumnd_connection const cxn,
                                           t_uchar * data,
                                           t_vudev_page_addr n_pages);

        Allocate heap pages and fill them with `n_pages' copied from
        `data'.  Return the address of the new pages.


    * int vumnd_post_free (const t_uchar ** const err,
                           t_vumnd_connection const cxn,
                           t_vudev_page_addr page,
                           t_vudev_page_addr n_pages);

        Free heap pages.







reply via email to

[Prev in Thread] Current Thread [Next in Thread]