bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cut -b gives bogus results on large files


From: Bob Proulx
Subject: Re: cut -b gives bogus results on large files
Date: Mon, 9 Feb 2009 07:04:01 -0600
User-agent: Mutt/1.5.18 (2008-05-17)

Daniel Janus wrote:
>    ~/coreutils/coreutils-6.12/src$ ./dd if=/dev/urandom of=test bs=1M 
> count=200

A file with random data will not be a text file.  The cut program is
designed to work on text files not binary files.  Sure you can use it
on binary files.  But then you probably won't get the results you want
to get.

>    ~/coreutils/coreutils-6.12/src$ ./cut -b 1-200 <test >out

Because you are putting random data into this I won't be able to
reproduce your result exactly.  The newline characters in the file
will be in different places.  And besides, cut isn't the right tool
for working with binary / non-text files.

>    ~/coreutils/coreutils-6.12/src$ ls -l out
>    -rw-r--r-- 1 djanus djanus 114189372 Feb  9 09:13 out
> 
> I would expect the `out' file to be exactly 200 bytes in size
> and contain the first 200 bytes of `test'.
> 
> What gives?

What gives is that you wanted to use 'head --bytes=200' but mistakenly
used 'cut -b 1-200' instead.  :-)

The cut program cuts columns along text lines that are newline
terminated.  See this example.

  $ printf "123456789\nabcdefghi\n" | cut -b1-3
  123
  abc

See that cut outputs columns identified by bytes 1-3 for each line of
input.  Lines of input are terminated by newline characters.

When you feed cut a binary file consisting of random bytes then
randomly some of those bytes will be newline characters.  After every
newline character the current text line will be terminated and
'cut -b1-200' will reset to the beginning of line and output columns
identified by 1-200.  It will do this for every text line in the file.

You can accomplish what you wanted to accomplish by using 'head -c200'
instead.

Hope that helps,
Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]