[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: feature request: list statistical properties for the list of files
Re: feature request: list statistical properties for the list of files passed
Fri, 24 Apr 2009 02:53:20 +0100
On Thu, Apr 23, 2009 at 1:06 PM, George Marselis <address@hidden> wrote:
> Hey guys, thanks for all the hard work. I'm working as a sysadmin for a shop
> that specializes in Debian GNU/Linux.
> i got a directory with a couple of tens of milions of files.
It's not such a great idea to do that. Far better to chop that up
into at least 1000 subdirectories. But take care not to use more than
about 32,000 subdirectories, as some Linux filesystems either don't
allow it (ext3) or some operations become less efficient at that point
(ext4's st_nlink changes meaning).
> I was trying to
> find the median access time of the files in that director and sort by
> percentiles. i got a little python script together, but i can't help
> thinking that this is a feature that will be needed in the future, with
> bigger filesystems.
I'm not sure determine_median_access_time has such a big potential
user base, to be honest.
However, I'm not sure what your performance requirements are, but if
performance is an issue, bear in mind that there are several
partitioning algorithms that allow you to find the median of a dataset
without fully sorting it.