emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#14988: closed (sort enhancement request)


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#14988: closed (sort enhancement request)
Date: Wed, 31 Jul 2013 14:00:10 +0000

Your message dated Wed, 31 Jul 2013 07:59:23 -0600
with message-id <address@hidden>
and subject line Re: bug#14988: sort enhancement request
has caused the debbugs.gnu.org bug report #14988,
regarding sort enhancement request
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
14988: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=14988
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: sort enhancement request Date: Tue, 30 Jul 2013 20:51:08 +0000

Hi guys,

I am presently using version 7.1 on a Solaris box.  I downloaded 8.21 and really love the improvement in speed (almost 50% in some tests).  I am looking to replace the commercial product NSORT and would like this feature in the source instead of a wrapper.  If I have a file

XXXX300001XXXX

XXXX300002XXXX

XXXX300003XXXX

XXXX300003XXXX

XXXX300003XXXX

XXXX300003XXXX

XXXX300004XXXX

XXXX300005XXXX

XXXX300006XXXX

XXXX300007XXXX

 

NSORT keeps the 4 300003 records together in entry sequence.   My present work-around is to use a Python script that reads in the whole file and creates a pseudo-key that is 30000X plus an 8 digit sequence number (I process millions of records).  What I am thinking of is an –es (--entry-sequence) that would add a hidden –k to process on this internal sequence.  If I figure out how to do this on my own, I will submit it to you.

 

 

Thanks,

Danny Nicholas

Applications Programmer
Pinnacle Data Systems L.L.C.

Office: (205) 307-6874

daaddress@hidden

www.pinnacledatasystems.com

 

   Description: Description: Description: https://encrypted-tbn1.google.com/images?q=tbn:ANd9GcRglmT5RwJEUk-1ZNPo_FI8y_udB6BL29pkwTt-Qh442v-FI1gH Description: Description: Description: https://encrypted-tbn0.google.com/images?q=tbn:ANd9GcSfD26ooDfMWD_xWRaMfbMcaBmkIKcG2oRxlaj6tBGYguC_aD71lw

Follow us on LinkedIn and Twitter

 

CONFIDENTIALITY:  This email (including any attachments) may contain confidential, proprietary and privileged information, and unauthorized disclosure or use is prohibited.  If you received this email in error, please notify the sender and delete this email from your system.

 

 

 

 


--- End Message ---
--- Begin Message --- Subject: Re: bug#14988: sort enhancement request Date: Wed, 31 Jul 2013 07:59:23 -0600 User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130625 Thunderbird/17.0.7
tag 14988 notabug
thanks

[re-adding the list; and please don't top-post on technical lists]

On 07/31/2013 07:19 AM, Danny Nicholas wrote:
> Thank you Eric.  We have two sorts on our system.  Our /usr/bin/sort does not 
> support the -s option,

Makes sense - the '-s' option is a GNU extension, and your /usr/bin/sort
is probably not GNU sort.  If you want stable sorting using only POSIX
features, then you have to supply enough sort keys so that no two lines
ever compare equal (since POSIX has no way to disable the full-line sort
of last resort).  And depending on your input to be sorted; this may
indeed require a pre-filter run that adds line numbering (by the way,
sed's '=' command can do this much more efficiently than a python
script), then sorting, then a post-filter run that removes the line number.

> but our /usr/local/bin/sort does.

Indeed - life is simpler if you can write your script to ensure that it
always sets PATH to use the full power of the GNU tools.

>  Unfortunately, that did not resolve the issue. Here is a portion of the file 
> I'm trying to sort

Thank you - THIS makes much more sense for understanding your problem.

> 010_000001_0000731_00001_200000081610_<Customer>
> 010_000001_0000731_00002_200000081610_     <CCODEPAGE>4102 LANGUAGE 
> EN</CCODEPAGE>
> 010_000001_0000731_00003_200000081610_     <FirstCopy>YES</FirstCopy>
> 010_000001_0000731_00003_200000081610_     <eapprovetype>010</eapprovetype>
> 010_000001_0000731_00003_200000081610_     
> <lastpaymentdate>06/12/2013</lastpaymentdate>
> 010_000001_0000731_00003_200000081610_     <lastpaymentamount>           
> 277.59</lastpaymentamount>
> 010_000001_0000731_00003_200000081610_     
> <SuppressOutBadVariableCopies></SuppressOutBadVariableCopies>
> 010_000001_0000731_00003_200000081610_     <CPAGENAME>PAGE1</CPAGENAME>
> 010_000001_0000731_00004_200000081610_     
> <DG_BILL_LAYOUT>REGULAR</DG_BILL_LAYOUT>
> 010_000001_0000731_00005_200000081610_     <DC-DEVICE>PRINTER</DC-DEVICE>
> 010_000001_0000731_00006_200000081610_     <DC-RDI>S</DC-RDI>
> 010_000001_0000731_00007_200000081610_     <DC-SENDTYPE>PRINTER</DC-SENDTYPE>
> 010_000001_0000731_00008_200000081610_     <DSY-SYSID>R3P</DSY-SYSID>
> 
> What I am executing is /usr/local/bin/sort -k 1,36 -s file -o file2

So, with "-k1,36" you asked sort to treat as its sort key the portion of
the line ranging from the first field to the 36th field.  I only see 2
fields in most of the lines (a few have more, but none of them with 36
fields), so you are basically sorting by the entire line.  You didn't
provide any other keys, but since your first key is already botched as
the ENTIRE line, there were no lines that compared equal for -s to make
any difference.  Again, sort --debug makes this clear (using a subset of
just two lines of your input):

>> $ printf '010_000001_0000731_00003_200000081610_     
>> <SuppressOutBadVariableCopies></SuppressOutBadVariableCopies>\n010_000001_0000731_00003_200000081610_
>>      <CPAGENAME>PAGE1</CPAGENAME>\n' \
>>    | LC_ALL=C sort --debug -k1,36 -s
>> sort: using simple byte comparison
>> 010_000001_0000731_00003_200000081610_     <CPAGENAME>PAGE1</CPAGENAME>
>> _______________________________________________________________________
>> 010_000001_0000731_00003_200000081610_     
>> <SuppressOutBadVariableCopies></SuppressOutBadVariableCopies>
>> ________________________________________________________________________________________________________

But it appears that what you WANTED was to sort on just the first 36
bytes, with a stable sort of the results.  If so, then ASK for that, by
using the correct -k option:

>> $ printf '010_000001_0000731_00003_200000081610_     
>> <SuppressOutBadVariableCopies></SuppressOutBadVariableCopies>\n010_000001_0000731_00003_200000081610_
>>      <CPAGENAME>PAGE1</CPAGENAME>\n' \
>>    | LC_ALL=C sort --debug -k1,1.36 -s
>> sort: using simple byte comparison
>> 010_000001_0000731_00003_200000081610_     
>> <SuppressOutBadVariableCopies></SuppressOutBadVariableCopies>
>> ____________________________________
>> 010_000001_0000731_00003_200000081610_     <CPAGENAME>PAGE1</CPAGENAME>
>> ____________________________________

Note how I asked for a sort key -k1,1.36, which says to start in the
first field, and end 36 bytes into the first field (hmm, it looks like
you actually want 38 bytes - but I'll leave that for you to decide).
Also note that -s now makes a difference, when the content of that first
sort key is identical so the last-resort full-line comparison swaps
unequal lines when -s is not used:

>> $ printf '010_000001_0000731_00003_200000081610_     
>> <SuppressOutBadVariableCopies></SuppressOutBadVariableCopies>\n010_000001_0000731_00003_200000081610_
>>      <CPAGENAME>PAGE1</CPAGENAME>\n' \
>>    | LC_ALL=C sort --debug -k1,1.36
>> sort: using simple byte comparison
>> 010_000001_0000731_00003_200000081610_     <CPAGENAME>PAGE1</CPAGENAME>
>> ____________________________________
>> _______________________________________________________________________
>> 010_000001_0000731_00003_200000081610_     
>> <SuppressOutBadVariableCopies></SuppressOutBadVariableCopies>
>> ____________________________________
>> ________________________________________________________________________________________________________

As this is a case of you not passing the correct command line arguments,
rather than a bug in sort, I am marking this bug as closed.  However,
feel free to continue to comment on the topic (preferably on-list) if
you have more questions.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]