[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-ddrescue] Slow reads for x time to exit and whitespace skipping

From: Scott Dwyer
Subject: Re: [Bug-ddrescue] Slow reads for x time to exit and whitespace skipping
Date: Fri, 27 Jan 2017 23:10:12 -0500
User-agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0

Even though I have initially slammed the idea of skipping "whitespace", I have thought more about it, and will provide a possible theory of operation, if it were to ever be implemented. Although I still say it is difficult to implement, and would only be feasible for certain situations.

The definition of whitespace would be areas filled completely with zeros. Meaning the entire cluster being read must be processed to see if any bytes were not zero. If a non-zero byte is found, then the processing of that cluster stops, and the cluster is considered used. But if it is all zeros, then it is considered whitespace. This would add some overhead to the program, although it is unclear how much it would affect performance.

Once it is determined that a number of zero filled clusters have been read in a row, it could trigger a form of skipping. The skipping would end and be reset once a cluster was found that was non-zero. How much to skip is the question, as you are skipping for a different reason than a bad spot, so you don't want to get crazy with the skipping. It must be reasonably limited. The data could be read backwards after data was found, or maybe a reverse pass would be better.

That all sounds great, until you try to implement it alongside with the normal skipping algorithm of bad blocks. It suddenly gets very complicated, as you have to try to figure out what to do when you have both bad blocks and whitespace. Also, it must be decided what size dictates possible whitespace. If you based it on a number of empty clusters, what happens when the user changes the cluster size to 1? That could cause premature skipping, so there would need to be a size value provided to base skipping on. And do you keep separate track of areas skipped because of bad/slow blocks and areas skipped due to suspected whitespace? If so, how is that best processed in further passes?

And all of this is based on assuming that large chunks of zeros are actually unused space. While this is most likely true, it cannot always be assumed. And this would most likely work best on large drives that only had a small percent of space used. With modern drive size growing, I guess this condition is more likely to happen. There could/would be large areas of the drive that have not been written to since the drive has been in use. Filesystems can tend to clump things together, but there is no guarantee that you would not skip good data. But then the point could be made that you can skip good data when skipping due to bad blocks.

So is this a good idea? I don't know. It is like a poor man's version of processing the filesystem. My initial instinct is that it is not the best idea, but I guess it could work in some cases if done right.


On 1/27/2017 2:47 PM, Antonio Diaz Diaz wrote:
Thanks to all for the feedback.

I tend to agree with Scott in that skipping unused space can't possibly work with any sort of consistency. Therefore I'll forget about it until someone shows with data that it can be useful. For example showing a correspondence between unused sectors and sectors containing the empty pattern, plus a bitmap showing that the used sectors are grouped. If the used sectors are scattered, then finding them is, as Scott said, like playing roulette.


Bug-ddrescue mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]