bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Undocumented cut feature


From: Gambs, David (CONT)
Subject: RE: Undocumented cut feature
Date: Fri, 26 Oct 2007 15:21:16 -0400

Bob,

I re-ran the script with the files being output and may have changed the
names. File2 was being cut into file3. The ^M (\r) was not in file2 but
shows up in file3. I did the first cut at 22 so that I would have a
leading space just to avoid getting the whole line. If you look at file2
in the previous message, you will see that it is indented one space on
all lines. Therefore, the optvg line does have a delimiter - the leading
space and that is why I am pulling field 2 instead of field 1. I tried
it the way you are suggesting and still got the ^M.

Unfortunately, we are not moving the actual file, we are taking the
output from the screen and copying that to Excel for further processing.

I never said I was a gawk scripter. But it was the only way I could
think of to quickly get groups of three lines catted together. Remember,
we usually have multiple groups of three lines in the input file. I
would have preferred Perl, but that is not on all *nix systems there.

There are no extra ^M characters in the output from vgdisplay. I can
reproduce this on that particular system repeatedly (have not tried it
on sister systems yet).

My main concern was a potential gothca in the coreutils that could creep
back in future releases. You may be right in that this might be a RedHat
problem that they corrected in a later release. But you never know.

I will be glad when the client forces everyone onto a current release.

dmg


-----Original Message-----
From: Bob Proulx [mailto:address@hidden 
Sent: Friday, October 26, 2007 2:55 PM
To: Gambs, David (CONT)
Cc: address@hidden
Subject: Re: Undocumented cut feature

Gambs, David (CONT) wrote:
> From vi w/set list I have the following -
> 
> file3:
> optvg^M$
> 4$
> 3171$

That shows that the carriage return was already in the file before 'cut'
processed it.  That is the source of the issue.

> file2 (the one that I cut on):

But your previous example showed that you were cutting file3 into file1.

>  optvg$
>  4 MB$
>  3171 / 12.39 GB$
> 
> The command:
> cut -f2 -d' ' ~/file2 > ~/file3

Okay.  No carriage returns going in.

> Your suggested command gives:
> $ cut -f2 -d' ' file2 | od -tx1 -c
> 0000000 6f 70 74 76 67 0d 0a 34 0a 33 31 37 31 0a
>           o   p   t   v   g  \r  \n   4  \n   3   1   7   1  \n
                               ^^ A carriage return.

I cannot recreate this behavior on a RHEL3 machine.  Can you double
check that your input files?  I believe there may be a mixup in which
file is which file.  Your first example in the previous message showed
you using file3 and the above shows that file3 contains carriage returns
in the data.

Note that cut prints the entire line if no delimiter is present.

  `-f FIELD-LIST'
  `--fields=FIELD-LIST'
       Select for printing only the fields listed in FIELD-LIST.  Fields
       are separated by a TAB character by default.  Also print any line
       that contains no delimiter character, unless the
       `--only-delimited' (`-s') option is specified

I believe what is happening is that your original input data contains a
carriage return in the input.  The optvg line is the only line without
any delimiters and is therefore passed through by cut.  This is why you
are seeing the carriage return in the output.

> And I have found differences within RedHat on the vgdisplay. This 
> vgdisplay is in /sbin and not linked to anything. On the system where 
> the problem does not happen (newer coreutils & OS release) the command

> is /usr/sbin/vgdisplay and is linked to lvm. Don't know where that 
> would make a difference though.

You should be able to use 'rpm -qf FILE' where FILE is /sbin/vgdisplay
and /usr/sbin/vgdisplay to determine what package contains that file.
I don't think vgdisplay should output carriage returns.

> cd ~
> /sbin/vgdisplay | egrep -e Name -e "PE Size" -e Free | cut -b 22- | 
> cut
> -f2 -d' ' > file1
> rm file.out
> touch file.out
> gawk '
> { line0 = /[:alpha:]/ }
> { printf "%s ", $line0 >> "file.out" } { getline } { line1 = 
> /[:print:]/ } { printf "%i ", $1 >> "file.out" } { getline } { line2 =

> /[:print:]/ } { print $1 >> "file.out" } ' file1 rm file1

That is a very unconventional awk script!  Unfortunately I do not have
the time right now to look at what it is doing in detail.

> In the gawk script when you output line0, the ^M puts the cursor at 
> the beginning of the line. The next print lines then overwrite what 
> was there. In this case optvg is completely overwritten. A longer vg 
> name would have some of it left.

Overwriting would only happen to a terminal.  The character stream would
still contain all of the characters.

> $ cat file.out
>  4 3171

I think if you can debug why CRs are in the vgdisplay output and ensure
that they are removed there that everything else will flow through
normally.

> And all this started on HP-UX. The script works just fine there. It 
> was when I brought it over to Linux that problems arose and 
> modifications had to be made.

About the time you have ported to three different systems is when most
scripts start to get portable.  :-)

Bob







The information contained in this e-mail is confidential and/or proprietary

to Capital One and/or its affiliates. The information transmitted herewith

is intended only for use by the individual or entity to which it is 

addressed.  If the reader of this message is not the intended recipient, 

you are hereby notified that any review, retransmission, dissemination, 

distribution, copying or other use of, or taking of any action in reliance 

upon this information is strictly prohibited. If you have received this 

communication in error, please contact the sender and delete the material 

from your computer.







reply via email to

[Prev in Thread] Current Thread [Next in Thread]