Re: improve performance of a script

bug-bash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: improve performance of a script

From:	Pádraig Brady
Subject:	Re: improve performance of a script
Date:	Wed, 26 Mar 2014 12:54:12 +0000
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 03/25/2014 02:12 PM, xeon Mailinglist wrote:
> For each file inside the directory $output, I do a cat to the file and 
> generate a sha256 hash. This script takes 9 minutes to read 105 files, with 
> the total data of 556MB and generate the digests. Is there a way to make this 
> script faster? Maybe generate digests in parallel?
> 
> for path in $output
> do
>     # sha256sum
>     digests[$count]=$( $HADOOP_HOME/bin/hdfs dfs -cat "$path" | sha256sum | 
> awk '{ print $1 }')
>     (( count ++ ))
> done

This is not a bach question so please ask in a more appropriate user
oriented rather than developer oriented list in future.
Off the top of my head I'd do something like the following to get xargs to 
parallelize:

digests=( $(
 find "$output" -type f |
 xargs -I '{}' -n1 -P$(nproc) \
 sh -c "$HADOOP_HOME/bin/hdfs dfs -cat '{}' | sha256sum" |
 cut -f1 -d' '
) )

You might want to distribute that load across systems too
with something like dxargs or perhaps something like hadoop :p

thanks,
Pádraig.

[Prev in Thread]

Current Thread

[Next in Thread]

improve performance of a script, xeon Mailinglist, 2014/03/26
- Re: improve performance of a script, Pádraig Brady <=
  - Re: improve performance of a script, Greg Wooledge, 2014/03/26
- Re: improve performance of a script, Eduardo A . Bustamante López, 2014/03/26

Prev by Date: improve performance of a script
Next by Date: Re: improve performance of a script
Previous by thread: improve performance of a script
Next by thread: Re: improve performance of a script
Index(es):
- Date
- Thread