lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] an xml schema for (single|multiple)_cell_document file XML for


From: Greg Chicares
Subject: Re: [lmi] an xml schema for (single|multiple)_cell_document file XML format
Date: Mon, 12 Mar 2012 15:22:55 +0000
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:10.0.2) Gecko/20120216 Thunderbird/10.0.2

On 2012-02-27 15:32Z, Vadim Zeitlin wrote:
> On Mon, 27 Feb 2012 12:44:46 +0000 Greg Chicares <address@hidden> wrote:
> 
> GC> Done 20120220T0158Z, revision 5402:
> GC>   http://svn.savannah.nongnu.org/viewvc?view=rev&root=lmi&revision=5402
> GC> 
> GC> That exercise was unexpectedly interesting. I started with a simple
> GC> "use enclosing elements" change, essentially as described here:
> GC>   http://lists.nongnu.org/archive/html/lmi/2010-08/msg00015.html
> GC> That made loading a file too slow: I could feel it plainly even before
> GC> I measured it. The counter displayed on the statusbar paused noticeably
> GC> after loading about 32 cells, then about 64, then about 128--whereas it
> GC> incremented smoothly for the old file format. It turns out that knowing
> GC> the size in advance lets us call std::vector::reserve() so that the
> GC> initial capacity is sufficient and expensive reallocations are avoided.
> 
>  It's, of course, always better to preallocate memory, but I had no idea
> that reallocations could be so expensive that you would be visually able to
> notice this. It looks like there might be something else wrong here, e.g.
> maybe the copy ctor of the elements of this vector is particularly
> inefficient?

Class Input's copy ctor is probably time intensive. At a quick glance,
it would appear to convert numbers to strings and back again. As noted
inline, the solution that seemed obvious to me (a long time ago) does
not work. I hesitate to mess with it without improving both of these:
  Input::operator=()
  Input::operator==()
at the same time (not to mention the ghastly Input::magically_rectify()).
But solving those issues looks like it'd take a lot of effort, and the
code does work correctly AFAIK.

Of course, if you'd like to clean that Augean stable, you're more than
welcome to. (In which case I'd suggest that Input::magically_rectify()
is a Stymphalian bird, better deferred as a separate Labor.)

> GC> The final code in the repository is as fast and smooth as the original
> GC> because it writes the enclosing elements with a size attribute, e.g.:
> GC>   <particular_cells size_hint="180">
> GC> and reserves the hinted number of elements before reading them. That
> GC> attribute is optional; omitting it affects speed, but not correctness.
> 
>  FWIW I don't like this approach very much. The information about the
> number of cells is already in the file, why do we need to keep a separate
> hint about it? Couldn't we just count the cells first, before processing
> them?

Yes: Vaclav showed how in his reply, and I committed that 20120312T1359Z
as revision 5430.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]