bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: you are not going to be able to sort this by the fifth field.


From: Pádraig Brady
Subject: Re: you are not going to be able to sort this by the fifth field.
Date: Fri, 05 Mar 2010 00:38:40 +0000
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100216 Thunderbird/3.0.2

On 04/03/10 19:59, address@hidden wrote:
Try as you might, there is no way you are going to sort by this field,
$ LC_CTYPE=zh_TW.UTF-8 w3m -dump \
   
http://www.tcb-bank.com.tw/tcb/servicesloc/atm_location/taichung_county_atm.htm 
|
   perl -anlwe 'print $F[4] if exists $F[4]'|LC_CTYPE=C sort
without ripping it out of the table first using perl. Go ahead, try -t ... -k 
...,...
You won't be able to order that field in the same way one can after
ripping it out of the table.

This seems to work for me: LC_CTYPE=C sort -sb -k5,5
I confirmed by extracting the field as perl above _after_ the sort using:
sed 's/^ *//; s/ +/ /g; s/\r$//' | cut -d ' ' -s -f5 | sed '/^$/d'

sort (GNU coreutils) 8.4
P.S., perhaps add a --debug-fields mode which adds field boundary | pipe
symbols into the output.

Yes I agree it's very difficult to know exactly what's going on
with the field processing in sort. I actually proposed and
mostly implemented a --debug option. Here are some examples:

$ LC_CTYPE=C sort --debug -sb -k5,5 < taichung_county_atm.htm
</html>
** no match for key **
<meta name=Generator content="Microsoft Excel 11">
                                              ____
<table x:str border=0 cellpadding=0 cellspacing=0 width=803 
style='border-collapse:
                                    _____________
  <td height=26 class=xl24 width=68 style='height:19.9pt;border-top:none;
                                    _____________________________________

$ echo " +1234 1,234e1" | LANG=C ./sort --debug -k2g,2 -k1.2bn,1
 +1234 1,234e1
       _
  ____


$ echo " 1234Mi 1,234e1" | LANG=de_DE ./sort --debug -k2g,2 -k1bh,1
 1234Mi 1,234e1
        _______
 ______

cheers,
Pádraig.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]