pspp-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Long-name/short-name complexity


From: Ben Pfaff
Subject: Re: Long-name/short-name complexity
Date: Mon, 25 Apr 2005 19:48:23 -0700
User-agent: Gnus/5.1007 (Gnus v5.10.7) Emacs/21.4 (gnu/linux)

John Darrington <address@hidden> writes:

> My conclusion is that SPSS does indeed keep its long-short name map,
> and does not allow short names to magically change.  So I think we
> should do the same.  I don't think it adds too much extra complexity.
> Variables need only to have one name (the long one).  The map needs to
> be a member of the dictionary. The only modules which will need to use
> it however will be sfm-read and sfm-write.
>
> I suppose the question still remains about what should happen if the
> variables are renamed.  Tom Watson's comments seem to suggest that
> SPSS simply ignores the short names and renames only the long ones.
> We can probably do better than this.

Thank you very much for your investigation.  It clarifies SPSS
behavior significantly.

I think that there is a simple implementation model that accounts
for the observed SPSS behavior.  Suppose every variable has a
primary name, which is up to 64 bytes and the real name of the
variable for all normal purposes.  Then suppose that variables
that have been read from (or written to?) a system file also have
a 8-byte (at most) short name that is not changed by any normal
operation, even renaming.  When a system file is written, short
names are assigned to any variable that doesn't have one, and any
duplicates among those that already had short names are given new
short names (if duplicates are possible--I'm not sure that they
are).

I did my own experiment today, too.  I decided afterward that it
was invalid because of a detail, but I'll present it anyhow
because it does seem to clarify one question.  I fed the
following syntax, based on your original suggestion, into SPSS
12:

    data list /ABCDEFGHIJ 1.
    begin data.
    1
    2
    3
    end data.
    save outfile='foo.sav'.

    get /file='foo.sav'.
    aggregate /outfile=* 
/break=abcdefghij/abcdefghi=max(abcdefghij)/abcdefgh=min(abcdefghij).
    save outfile='bar.sav'.

Part of foo.sav looked like this:

00000150: 1300 0000 4142 4344 4546 4748 3d41 4243  ....ABCDEFGH=ABC
00000160: 4445 4647 4849 4ae7 0300 0000 0000 0065  DEFGHIJ........e

Part of bar.sav looked like this:

000001a0: 0d00 0000 0100 0000 3800 0000 4142 4344  ........8...ABCD
000001b0: 4546 5f41 3d41 4243 4445 4647 4849 4a09  EF_A=ABCDEFGHIJ.
000001c0: 4142 4344 4546 5f42 3d61 6263 6465 6667  ABCDEF_B=abcdefg
000001d0: 6869 0941 4243 4445 4647 483d 6162 6364  hi.ABCDEFGH=abcd
000001e0: 6566 6768 e703 0000 0000 0000 6565 6566  efgh........eeef

In other words, long name abcdefgh has a stronger claim on short
name ABCDEFGH than does long name ABCDEFGHIJ, even though the
latter long name had that short name first.

(At that point, I walked away from the computer, assuming that
short names were reassigned on each SAVE, but it's clear from
your results that that's not the case--this is a special case for
conflicting long and short names.  If it was not a special case,
then I would expect that the first long name in dictionary order,
that is, ABCDEFGHIJ, would be entitled to short named ABCDEF_A.)

> Another question is the geometry of the long-short name map --- should
> it be indexed by shortname or by longname.  I remember wondering if I
> made the right choice when I was implementing it.

I think that adopting the model I describe above makes sense.  I
don't think there's any need to maintain any map, because we can
resolve any conflicts at SAVE (EXPORT, etc.) time.  When we
create a new variable without reference to a system file, we
don't have to assign it a short name at all; again, that can be
delayed until SAVE.

Renaming is easy with this model.  If we want SPSS-compatible
behavior for renames, renaming variables doesn't change the short
name.  If we want "enhanced" behavior, renaming variables deletes
the short name, because that will cause those variables to
receive new and appropriate short names at SAVE time.  It'll be
easy to support either behavior with our usual command-line
switch.

Comments?
-- 
"In the PARTIES partition there is a small section called the BEER.
 Prior to turning control over to the PARTIES partition,
 the BIOS must measure the BEER area into PCR[5]."
--TCPA PC Specific Implementation Specification




reply via email to

[Prev in Thread] Current Thread [Next in Thread]