bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#1028: Improvement: Persistent Hash Store with GDBM


From: barry
Subject: bug#1028: Improvement: Persistent Hash Store with GDBM
Date: Thu, 25 Sep 2008 22:44:56 -0400
User-agent: Thunderbird 1.5 (X11/20051201)

From: barry <barry.krofchick@sympatico.ca>
To: bug-gnu-emacs@gnu.org
Subject: Improvement: Persistent Hash Store with GDBM
--text follows this line--

Please write in English if possible, because the Emacs maintainers
usually do not have translators to read other languages for them.

Your bug report will be posted to the bug-gnu-emacs@gnu.org mailing list,
and to the gnu.emacs.bug news group.

Please describe exactly what actions triggered the bug
and the precise symptoms of the bug:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

This is not a bug but an enhancement, which implements persistent
hash store using gdbm to effect the storage.

The code is in file gdbm.c (below), with some minor
changes to Makefile.in (in the emacs/src directory) and
emacs.c to include the new functions into the emacs build.

The new functions are as follows and mirror the
equivalent functions in the gdbm package itself:

1. gdbm-open
2. gdbm-close
3. gdbm-fetch
4. gdbm-store
5. gdbm-delete
6. gdbm-exists
7. gdbm-firstkey
8. gdbm-nextkey
9. gdbm-sync
10.gdbm-reorganize

The doc strings for each of these functions are
included below.

The array gdbm-open-files (maximum size set by configuration
parameter max_gdbm_open_files - in syms_of_gdbm - set to
10 in gdbm.c below) contains a cons cell for
each open gdbm file (referenced by an integer file id number).
The car of the cons cell is the gdbm data file pointer, and
the cdr is the name of the opened hash file.

The values written to the hash files are all strings.
Normally one would use prin1-to-string to store arbitrary
lisp expressions and read-from-string to recover them.

This mechanism allows for a simple and fast persistent
hash storage for lisp data, directly within emacs lisp
code, without the need to resort to external databases.

The doc strings for the functions are shown below:

1. gdbm-open is a built-in function in `C source code'.
(gdbm-open IDNO FILE ACCESS &optional MODE)

Open FILENAME as a gdbm database and assign it
the ID IDNO where IDNO is an integer in range
0 to max-gdbm-open-files - 1
ACCESS specifies access rights as one of strings:
r for read
w for read/write
c for create (if none exists)
n for force create a new one even if one exists
MODE if present on new db create specifies the
file permissions as a number ala chmod
Returns: gdbm file reference ID on success or
nil on failure

2. gdbm-close is a built-in function in `C source code'.
(gdbm-close DBF)

Close a gdbm database of the specified number.

3. gdbm-fetch is a built-in function in `C source code'.
(gdbm-fetch DBF KEY)

Fetch data from a gdbm database.
Returns: string data stored under KEY or nil
if no data under that key.

4. gdbm-store is a built-in function in `C source code'.
(gdbm-store DBF KEY DATA)

Store data in a gdbm database.
KEY and DATA must be strings
(to save binary data use prin1-to-string on
key and/or data)
If KEY already exists in the database it will
be replaced with the new DATA
If DATA is nil or empty then KEY will be deleted.
Returns: 0 on successful insert
-1 if open for read and tries insert.

5. gdbm-delete is a built-in function in `C source code'.
(gdbm-delete DBF KEY)

Delete data from a gdbm database.
KEY must be a string
Returns: 0 on successful delete
-1 if key not in database

6. gdbm-exists is a built-in function in `C source code'.
(gdbm-exists DBF KEY)

Returns t if KEY is in the hash otherwise nil

7. gdbm-firstkey is a built-in function in `C source code'.
(gdbm-firstkey DBF)

Fetch first key data from a gdbm database.
Returns: first key in GDBM hash or nil if none

8. gdbm-nextkey is a built-in function in `C source code'.
(gdbm-nextkey DBF KEY)

Fetch next key data from a gdbm database.
Returns: the key following KEY in the gdbm hash table
or nil if KEY is the last key.

9. gdbm-sync is a built-in function in `C source code'.
(gdbm-sync DBF)

Sync a gdbm database.
Writes all buffered data to disk.

10. gdbm-reorganize is a built-in function in `C source code'.
(gdbm-reorganize DBF)

Reorganize a gdbm database.

-------------------------------------------------------------
Following is the file emacs-22.1/src/gdbm.c to effect the
above functions
-------------------------------------------------------------
/* GDBM Library Interface */
#include <config.h>
#include "lisp.h"
#include "blockinput.h"
#include "commands.h"
#include "keyboard.h"
#include "dispextern.h"
#include "charset.h"
#include "coding.h"
#include <gdbm.h>
#include <string.h>

int max_gdbm_open_files;
Lisp_Object Qgdbm_open_files,Vgdbm_open_files;

DEFUN ("gdbm-open", Fgdbm_open, Sgdbm_open, 3, 4,
      0,
  "Open FILENAME as a gdbm database and assign it \n\
the ID IDNO where IDNO is an integer in range \n\
0 to max-gdbm-open-files - 1 \n\
ACCESS specifies access rights as one of strings: \n\
r for read \n\
w for read/write \n\
c for create (if none exists)\n \
n for force create a new one even if one exists\n\
MODE if present on new db create specifies the \n\
file permissions as a number ala chmod\n\
Returns: gdbm file reference ID on success or\n\
nil on failure")
 (idno,file,access,mode)
    Lisp_Object idno, file, access, mode;
{
 int imode,iaccess;
 GDBM_FILE dbf;
 unsigned char *caccess;

 struct gcpro gcpro1, gcpro2, gcpro3;
 Lisp_Object ef, ef1, val;

 ef = Qnil;
 GCPRO3 (file, ef, ef1);
 //ensure id number is in range
 CHECK_NUMBER(idno);
 if((XINT(idno) < 0) || XINT(idno) >= max_gdbm_open_files)
   error("gdbm ID out of range");
 //if we haven't yet set up the open files vector
 //do it now
 if(!VECTORP (Vgdbm_open_files))
   Vgdbm_open_files=Fmake_vector(make_number(max_gdbm_open_files),
                 Qnil);
 //see if there is an open file at the idno
 ef = AREF(Vgdbm_open_files, XINT(idno));
 if(!NILP (ef)){
   if(!CONSP(ef) || !NUMBERP(CAR(ef)))
      error("gdbm-open-files corrupted");
   //if so close it
   gdbm_close((GDBM_FILE) XPNTR(CAR(ef)));
   ASET(Vgdbm_open_files,XINT(idno),Qnil);
 }
 CHECK_STRING (file);
 CHECK_STRING (access);
 if(NILP (file))return Qnil;

 if(!NILP (mode)){
   CHECK_NUMBER(mode);
   imode = XUINT (mode);
 } else imode = 0666;
 ef = Fexpand_file_name (file, Qnil);
 ef1 = ENCODE_FILE (ef);

 caccess = XSTRING (access)->data;
 if(NILP (access))iaccess = GDBM_READER;
 else {
   switch (caccess[0])
     {
     case 'r':
     case 'R':
   iaccess = GDBM_READER;
   break;
     case 'w':
     case 'W':
   iaccess = GDBM_WRITER;
   break;
     case 'c':
     case 'C':
   iaccess = GDBM_WRCREAT;
   break;
     case 'n':
     case 'N':
   iaccess = GDBM_NEWDB;
   break;
     default:
   iaccess = GDBM_READER;
     }
 }

 dbf = gdbm_open((char *)XSTRING(ef1)->data,0,iaccess,imode,0);

 if(!dbf)return(Qnil);

 val = XPNTR((unsigned)dbf);

 ASET(Vgdbm_open_files,XINT(idno),Fcons(val,ef1));
 UNGCPRO;

 return idno;
}

static Lisp_Object idToGdbmKey(Lisp_Object dbf)
{
 Lisp_Object val;

 //ensure id number is in range
 CHECK_NUMBER(dbf);
 if((XINT(dbf) < 0) || XINT(dbf) >= max_gdbm_open_files)
   error("gdbm ID out of range");
 if(!VECTORP (Vgdbm_open_files))
   error("no open files");
 //see if there is an open file at the idno
 val = AREF(Vgdbm_open_files, XINT(dbf));
 if(NILP(val))error("operation but no gdbm file open");
 if(!CONSP(val) || !NUMBERP(CAR(val)))
      error("gdbm-open-files corrupted");
 return(XPNTR(CAR(val)));
}


DEFUN ("gdbm-close", Fgdbm_close, Sgdbm_close, 1, 1,
      0,
      "Close a gdbm database of the specified number.")
    (dbf)
    Lisp_Object dbf;
{

 GDBM_FILE idbf;
 int ival;
 Lisp_Object val;

 val = idToGdbmKey(dbf);
 gdbm_close((GDBM_FILE) val);
 ASET(Vgdbm_open_files,XINT(dbf),Qnil);

 return (Qt);
}

DEFUN ("gdbm-delete", Fgdbm_delete, Sgdbm_delete, 2, 2,
      0,
 "Delete data from a gdbm database.\n\
KEY must be a string\n\
Returns: 0 on successful delete \n\
-1 if key not in database")
    (dbf, key)
    Lisp_Object dbf, key;
{
 Lisp_Object val;
 int oval;
 GDBM_FILE odbf;
 datum okey;

 val = idToGdbmKey(dbf);
 CHECK_STRING (key);

 odbf = (GDBM_FILE)XPNTR (val);
 okey.dptr = (char*)XSTRING (key)->data;
 //okey.dsize = XINT(Flength(key));
 okey.dsize = STRING_BYTES(XSTRING (key));

 oval = gdbm_delete(odbf, okey);
 val = make_number(oval);
 return(val);
}

DEFUN ("gdbm-store", Fgdbm_store, Sgdbm_store, 3, 3,
      0,
 "Store data in a gdbm database.\n\
KEY and DATA must be strings \n\
(to save binary data use prin1-to-string on \n\
key and/or data)\n\
If KEY already exists in the database it will\n\
be replaced with the new DATA \n\
If DATA is nil or empty then KEY will be deleted.\n\
Returns: 0 on successful insert\n\
-1 if open for read and tries insert.")
    (dbf, key, data)
    Lisp_Object dbf, key, data;
{
 Lisp_Object val;
 datum okey, odata;
 GDBM_FILE odbf;
 int ival;

 val = idToGdbmKey(dbf);
 CHECK_STRING (key);
 if(NILP(data))return(Fgdbm_delete(dbf, key));
 CHECK_STRING (data);

 odbf = (GDBM_FILE)XPNTR (val);
 okey.dptr = (char *)XSTRING (key)->data;
 okey.dsize = STRING_BYTES(XSTRING (key));
 odata.dptr = (char *)XSTRING (data)->data;
 odata.dsize = STRING_BYTES(XSTRING (data));

 if(okey.dsize == 0)ival=0;
 else ival = gdbm_store(odbf,okey,odata,GDBM_REPLACE);

 val = make_number(XUINT(ival));
 return val;
}

DEFUN ("gdbm-fetch", Fgdbm_fetch, Sgdbm_fetch, 2, 2,
      0,
 "Fetch data from a gdbm database.\n\
Returns: string data stored under KEY or nil \n\
if no data under that key.")
    (dbf, key)
    Lisp_Object dbf, key;
{
 Lisp_Object val;
 GDBM_FILE odbf;
 datum okey,oval;

 val = idToGdbmKey(dbf);
 CHECK_STRING (key);

 odbf = (GDBM_FILE)XPNTR (val);
 okey.dptr = (char *)XSTRING (key)->data;
 okey.dsize = STRING_BYTES(XSTRING (key));

 oval = gdbm_fetch(odbf, okey);
 if(oval.dptr == NULL)return Qnil;
 val = make_string(oval.dptr, oval.dsize);
 free(oval.dptr);
 return val;
}

DEFUN ("gdbm-firstkey", Fgdbm_firstkey, Sgdbm_firstkey, 1, 1,
      0,
 "Fetch first key data from a gdbm database.\n\
Returns: first key in GDBM hash or nil if none")
    (dbf)
    Lisp_Object dbf;
{
 Lisp_Object val;
 GDBM_FILE odbf;
 datum oval;

 val = idToGdbmKey(dbf);
 odbf = (GDBM_FILE)XPNTR (val);

 oval = gdbm_firstkey(odbf);
 if(oval.dptr == NULL)return Qnil;
 val = make_string(oval.dptr, oval.dsize);
 free(oval.dptr);
 return val;
}

DEFUN ("gdbm-nextkey", Fgdbm_nextkey, Sgdbm_nextkey, 2, 2,
      0,
 "Fetch next key data from a gdbm database.\n\
Returns: the key following KEY in the gdbm hash table\n\
or nil if KEY is the last key.")
    (dbf, key)
    Lisp_Object dbf, key;
{
 Lisp_Object val;
 GDBM_FILE odbf;
 datum okey,oval;
 struct gcpro gcpro1;

 val = idToGdbmKey(dbf);
 GCPRO1 (val);
 CHECK_STRING (key);

 odbf = (GDBM_FILE)XPNTR (val);
 okey.dptr = (char *)XSTRING (key)->data;
 okey.dsize = STRING_BYTES(XSTRING (key));

 oval = gdbm_nextkey(odbf, okey);
 if(oval.dptr == NULL)return Qnil;
 val = make_string(oval.dptr, oval.dsize);
 free(oval.dptr);
 UNGCPRO;
 return val;
}

DEFUN ("gdbm-exists", Fgdbm_exists, Sgdbm_exists, 2, 2,
      0,
 "Returns t if KEY is in the hash otherwise nil")
    (dbf, key)
    Lisp_Object dbf, key;
{
 Lisp_Object val;
 GDBM_FILE odbf;
 datum okey;
 int oval;

 val = idToGdbmKey(dbf);

 CHECK_STRING (key);

 odbf = (GDBM_FILE) XPNTR (val);
 okey.dptr = (char *)XSTRING (key)->data;
 okey.dsize = STRING_BYTES(XSTRING (key));

 oval = gdbm_exists(odbf, okey);
 if(oval)return Qt;
 return Qnil;
}

DEFUN ("gdbm-reorganize", Fgdbm_reorganize, Sgdbm_reorganize, 1, 1,
      0,
      "Reorganize a gdbm database.")
    (dbf)
    Lisp_Object dbf;
{
 Lisp_Object val;
 int ival;
 GDBM_FILE odbf;

 val = idToGdbmKey(dbf);

 odbf = (GDBM_FILE)XPNTR (val);

 ival = gdbm_reorganize(odbf);

 val = make_number(ival);
 return val;
}

DEFUN ("gdbm-sync", Fgdbm_sync, Sgdbm_sync, 1, 1,
      0,
 "Sync a gdbm database.\n\
Writes all buffered data to disk.")
    (dbf)
    Lisp_Object dbf;
{
 Lisp_Object val;
 int ival;
 GDBM_FILE odbf;

 val = idToGdbmKey(dbf);

 odbf = (GDBM_FILE)XPNTR (val);
 gdbm_sync(odbf);

 return Qt;
}

void
syms_of_gdbm ()
{
 DEFVAR_INT ("max-gdbm-open-files", &max_gdbm_open_files,
   "*Maximum number of open gdbm files.");
 max_gdbm_open_files=10;
 DEFVAR_INT ("gdbm_errno",(int *)&gdbm_errno,
         "*GDBM returned error number");

 DEFVAR_LISP ("gdbm-open-files", &Vgdbm_open_files,
          "List of open GDBM files");
 Vgdbm_open_files = Fmake_vector(make_number(max_gdbm_open_files),Qnil);
 Qgdbm_open_files = intern("gdbm-open-files");
 staticpro(&Qgdbm_open_files);

 defsubr (&Sgdbm_open);
 defsubr (&Sgdbm_close);
 defsubr (&Sgdbm_store);
 defsubr (&Sgdbm_fetch);
 defsubr (&Sgdbm_delete);
 defsubr (&Sgdbm_firstkey);
 defsubr (&Sgdbm_nextkey);
 defsubr (&Sgdbm_exists);
 defsubr (&Sgdbm_reorganize);
 defsubr (&Sgdbm_sync);
}

-------------------------------------------------------------
Following are the changes to Makefile.in in emacs-22.1/src
to include the gdbm.c module and the gdbm library in the
build (Note that this could be handled better along
with the max_open_gdbm_files as a configuration parameter/option)
--------------------------------------------------------------

diff -r emacs-22.1/src/Makefile.in /users/barry/emacs-special/emacs-22.1/src/Makefile.in
589c589
<     minibuf.o fileio.o dired.o filemode.o \
---
>     minibuf.o fileio.o dired.o filemode.o gdbm.o\
938c938
<    LIBS_DEBUG $(GETLOADAVG_LIBS) $(GNULIB_VAR) LIB_MATH LIB_STANDARD \
---
> LIBS_DEBUG -lgdbm $(GETLOADAVG_LIBS) $(GNULIB_VAR) LIB_MATH LIB_STANDARD \
1141a1142
> gdbm.o: gdbm.c $(config_h) blockinput.h commands.h keyboard.h dispextern.h charset.h coding.h

---------------------------------------------------------------
---------------------------------------------------------------

Following are the changes to emacs.c to reference the gdbm.c module
in the build:

----------------------------------------------------------------
diff -r emacs-22.1/src/emacs.c /users/barry/emacs-special/emacs-22.1/src/emacs.c
1562a1563
>       syms_of_gdbm ();

----------------------------------------------------------------
Changelog entry:

2008-09-21  Barry Krofchick <barry.krofchick@sympatico.ca>

       * gdbm.c  Added built-in gdbm-based persistent hash
                 tables for lisp and other data

----------------------------------------------------------------


That's it.

Thanks for all the great work on emacs, a beautiful piece of
software.

I hope you can include the gdbm hash tables in future releases.
They are extremely useful for managing large persistent lisp
knowledge bases, quickly and easily from within emacs lisp
code.

I had, prior to this implementation used external custom
server to do the same job, with significant reduction in
performance.

Thanks,

Barry
barry.krofchick@sympatico.ca

------------------------------------------------------------------------

In GNU Emacs 22.1.1 (i686-pc-linux-gnu, X toolkit)
of 2008-01-23 on benny
Windowing system distributor `The XFree86 Project, Inc', version 11.0.40500000
Important settings:
 value of $LC_ALL: nil
 value of $LC_COLLATE: nil
 value of $LC_CTYPE: nil
 value of $LC_MESSAGES: nil
 value of $LC_MONETARY: nil
 value of $LC_NUMERIC: nil
 value of $LC_TIME: nil
 value of $LANG: nil
 locale-coding-system: nil
 default-enable-multibyte-characters: t

Major mode: Info

Minor modes in effect:
 shell-dirtrack-mode: t
 tooltip-mode: t
 tool-bar-mode: t
 mouse-wheel-mode: t
 menu-bar-mode: t
 file-name-shadow-mode: t
 global-font-lock-mode: t
 font-lock-mode: t
 blink-cursor-mode: t
 unify-8859-on-encoding-mode: t
 utf-translate-cjk-mode: t
 auto-compression-mode: t
 line-number-mode: t
 abbrev-mode: t








reply via email to

[Prev in Thread] Current Thread [Next in Thread]