[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: sort complexity on nearly sorted input
From: |
Pádraig Brady |
Subject: |
Re: sort complexity on nearly sorted input |
Date: |
Sun, 12 Feb 2012 00:06:06 +0000 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:6.0) Gecko/20110816 Thunderbird/6.0 |
On 02/11/2012 04:20 PM, Peng Yu wrote:
> Hi,
>
> I assume the time complexity of 'sort' is log N, where N is the input size.
>
> But I'm not familiar with 'sort' enough to tell the complexity of
> sorting a nearly sorted input. Suppose that I have a listed of N
> numbers, there only k numbers (k << N, say k=N/100) that are not in
> the correct position and all the other N-k numbers are in the correct
> position. Does anybody knows the time complexity of this case, or it
> is still log N?
>
Well discounting all the memory, thread and data type management
that `sort` gives you, centrally there is this description
on sequential_sort():
Use a recursive divide-and-conquer algorithm, in the style
suggested by Knuth volume 3 (2nd edition), exercise 5.2.4-23. Use
the optimization suggested by exercise 5.2.4-10; this requires room
for only 1.5*N lines, rather than the usual 2*N lines. Knuth
writes that this memory optimization was originally published by
D. A. Bell, Comp J. 1 (1958), 75.
I searched for the above text out of interest, and noticed
this good overview of the operation of `sort`:
http://philip.greenspun.com/software/abstraction-filtration-comparison/sort/
It's trivial to empirically measure how sort behaves with different inputs:
$ for i in $(seq 4 7); do
> seq -w $((10**$i)) > sorted.$i
> shuf < sorted.$i > unsorted.$i
> done
$ export OMP_NUM_THREADS=1
$ export LANG=C
$ for i in $(seq 4 7); do
> time sort sorted.$i > /dev/null
> time sort unsorted.$i > /dev/null
> done
We can also compare python's in place "timsort",
given that the data set fits in RAM and timsort
is said to be especially good for mostly sorted data:
$ for type in sorted unsorted; do
> time python -c "
> import sys
> a=sys.stdin.readlines()
> a.sort()
> sys.stdout.writelines(a)
> " > /dev/null < $type.7
> done
Results:
N sorted unsorted py_sorted py_unsorted
10000 0.015 0.021
100000 0.058 0.088
1000000 0.522 1.105
10000000 7.020 15.644 2.882 26.285
cheers,
Pádraig.