I don't think the widest variable should be saved - the same old
rules should be followed (eg. in MATCH, if the var is already
present leave it alone). Point is, in 99% of cases the
incompatible-width vars are completely irrelevant for the MATCH at
hand, which will fail only because a feeding file happens to have
the same var somewhere but with different width.
To ftr: people create .sav files from Excel data, or using import
wizards. And even trained staff may need to cope with a name that is
longer than expected.
Cheers
frans
On 26/03/2015 12:10, Alan Mead wrote:
On 3/26/2015 3:59 AM, ftr wrote:
So this means that the programs that produce the CSV files
produce output with different string variable width ?
This is due to the programs or to the people that use the progs
?
In general, when you import text files you fix the variable
width in the DATA LIST.
Or you use GET DATA/TYPE=
http://www.gnu.org/software/pspp/manual/pspp.html#GET-DATA
And why don't you set FORMAT on each of the separate files
before you integrate them ?
When I worked in a project that sounds similar to yours we did a
serious pre-field work training of the local data producers that
succeeded in making the local projects aware what was on stake
(motivation), that made the local heads control the consistency
of data to be sent - something we could not do because we had no
direct access to the local projects, for which the local heads
had better knowledge, and it would have cost us too much (data
control) - and that assured that data were sent in a coherent
format and at time.
Maybe you have to train your local people ?
Just some ideas for local problem solving. I am happy that we
have volunteers doing the programming work so we should not
overcharge them with more work that we can at our side.
ftr,
It sounds like you don't run into this problem, so maybe this
discussion isn't relevant for you.
But to repeat the reasons why this change is a good idea: (1) it
would still be EASIER to have PSPP deal with this problem
automatically, rather than forcing me to deal with this issue; and
(2) and it would be a simple way to create another point
distinguishing PSPP as superior to SPSS.
I have given some thought to why SPSS has this limitation. One
possibility is that it's simply an old limitation due to some
original hardware or software issues. I speculate below that at
the time of SPSS's inception, string data was not particularly
common nor important and that variable lengths would be rare.
Also, it could be due to performance issues, but if so I'm sure it
would be faster for PSPP to resolve this issue than for me to due
so manually; I assume that fixing this issue wouldn't generally
slow down merge/join files?
I cannot imagine a situation where having this restriction on
matching string length would be a feature. But if PSPP solves the
problem by truncating longer strings, then some data would be lost
and sometimes that will be unacceptable so it would be good to
issue a warning or force people to turn on this feature. If the
solution can be to change the final string length to the longest
encountered string length (and, I assume, therefore truncate no
data) then I cannot see a problem arising from this feature.
I also speculate that this problem is far more of an issue today
than when SPSS was first created, because string data is easier
(sometimes more natural) to collect today. SPSS would have
originally (i.e., cerca 1970) been fed punch cards and most string
data would have been generated either by the researcher (like a
coding) or by something like a scantron or a scantron-like
response grid. I'm sure someone had participants respond by
writing something in but it would have been keyed into the
computer into a fixed width. Using a physical storage medium
(cards) would have discouraged strings unless they were necessary
and encouraged researchers to use the shortest possible length.
Compare that to now: my web-based surveys often have variable
length strings like email, useragent and other string-based
meta-data and often the survey includes fill-in-the-blank or short
answer questions. Often I get datasets where responses are
strings, rather than numeric codes (e.g., "male" and "female").
Even if they are the same data (e.g., email), it would be natural
for these variables to have different lengths across different
surveys. I don't foresee these conditions changing.
-Alan
--
Alan D. Mead, Ph.D.
President, Talent Algorithms Inc.
science + technology = better workers
+815.588.3846 (Office)
+267.334.4143 (Mobile)
http://www.alanmead.org
Announcing the Journal of Computerized Adaptive Testing (JCAT), a
peer-reviewed electronic journal designed to advance the science and
practice of computerized adaptive testing: http://www.iacat.org/jcat
_______________________________________________
Pspp-users mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/pspp-users
|