Re: find duplicate in dataset

pspp-users

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: find duplicate in dataset

From:	Alan Mead
Subject:	Re: find duplicate in dataset
Date:	Mon, 25 Feb 2019 08:20:43 -0600
User-agent:	Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.5.0

I would construct a new variable with values A1, A2, A2, B1, etc. but you could do something like this (from memory/untested):

sort cases by var 1 var2.
compute dup=0.
execute.
if( lag(var1)=var1 and lag(var2)=var2) dup = lag(dup)+1.
execute.

Sometimes lag() surprises me, but I think the above should work.

-Alan

On 2/25/2019 4:56 AM, Matteo Ga wrote:

Hi,

I have a dataset with dupliucated cases that could be identified by 2 variable.

EX:

Case -- var1 --- var2

1 -- A --- 1

2 -- A --- 2

3 -- A --- 2

4 -- B --- 1

5 -- B --- 2

I want to find (and then remove) any cases like 3

I searched online but I couldn't find any way how to do that.

Any help?

Thank you
_______________________________________________
Pspp-users mailing list
address@hidden
https://lists.gnu.org/mailman/listinfo/pspp-users

-- 

Alan D. Mead, Ph.D.
President, Talent Algorithms Inc.

science + technology = better workers

http://www.alanmead.org

"You're an interesting species. An interesting mix. 
You're capable of such beautiful dreams, and such 
horrible nightmares. You feel so lost, so cut off, 
so alone, only you're not. See, in all our 
searching, the only thing we've found that makes 
the emptiness bearable, is each other."

-- Carl Sagan, Contact

[Prev in Thread]

Current Thread

[Next in Thread]

find duplicate in dataset, Matteo Ga, 2019/02/25
- Re: find duplicate in dataset, Alan Mead <=
  - Re: find duplicate in dataset, Matteo Ga, 2019/02/25
  - string to date (actually datetime), Matteo Ga, 2019/02/25
    - Re: string to date (actually datetime), Oren Ish-Shalom, 2019/02/25
    - Re: string to date (actually datetime), Ben Pfaff, 2019/02/25

Prev by Date: Re: Fw: Re: Import CVS with carriage return inside double quotes
Next by Date: Re: find duplicate in dataset
Previous by thread: find duplicate in dataset
Next by thread: Re: find duplicate in dataset
Index(es):
- Date
- Thread