Re: Changesets: Re: Parallel access to data created with Octave

octave-maintainers
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Changesets: Re: Parallel access to data created with Octave

From:	Jaroslav Hajek
Subject:	Re: Changesets: Re: Parallel access to data created with Octave
Date:	Thu, 27 May 2010 07:54:46 +0200
On Wed, May 26, 2010 at 6:20 PM, Jarno Rajahalme
<address@hidden> wrote:
>
> On May 25, 2010, at 11:55 PM, ext Jaroslav Hajek wrote:
>
>> On Tue, May 25, 2010 at 9:05 PM, Jarno Rajahalme
>> <address@hidden> wrote:
>>>
>>> On May 25, 2010, at 12:43 AM, ext Jaroslav Hajek wrote:
>>>
>>>> On Tue, May 25, 2010 at 6:16 AM, Jarno Rajahalme
>>>> <address@hidden> wrote:
>>>>>
>>>>> liboctave:
>>>>>
>>>>> - Followed the Array.h practice of returning const ref from dims() 
>>>>> (Sparse.h)
>>>>
>>>> I'm not sure this is a good idea. It is a bit redundant to store a
>>>> dim_vector in Sparse, because it can only have 2 dimensions, and we
>>>> may want to optimize the storage in the future.
>>>>
>>>
>>> Please note that I only changed the existing Sparse::dims () to return a 
>>> const reference instead of the full object. I did not add any storage 
>>> requirement anywhere! Also, the modified Sparse::dims () can still be used 
>>> for copying the object itself, but now the copy is done after the dims() 
>>> returns, not before, so the caller has a choice. I.e.:
>>>
>>> // Create a local copy (implies reference counting, copy-on-write, etc.)
>>>
>>>  dim_vector dv = dims();
>>>
>>> // Create a const reference (no reference counting, cannot be modified)
>>>
>>>  const dim_vector& dv = dims();
>>>
>>> Without this change the latter use will ALSO create a temporary copy, so it 
>>> has the same overhead and thread-unsafeness as the former one.
>>
>> The storage requirement is implied. Suppose we decide, instead of
>> storing a dim_vector in Sparse<T>, to store just the row dimension in
>> order to conserve space (column dim can be inquired from SparseRep).
>> What will dims() return a reference to? It will need to be changed
>> back.
>>
>>>
>>> Again, this exact change has been made in Array.h earlier, there is even a 
>>> comment about it :-)
>>>
>>
>> Yes, but still it was disputable. dim_vector was created for Array, so
>> it will probably always be stored directly. In any case, I made this
>> change relatively recently, and I may still change my mind about it.
>>
>
> I was not aware of the need for preserving space for Sparse or SparseRep. But 
> looking at SparseRep, it seems that it's count, nrows and ncols could be 
> replaced by an array resembling the dim_vector's rep, or a dim_vector object 
> itself. That way the extra dim_vector in the Sparse itself would not be 
> needed any more? Or is it more complicated than that?

True, technically Sparse<T> needs not the dimensions member at all,
both dimensions are in SparseRep. The number of rows didn't need to be
there, to allow adding empty rows without copying data, but I don't
think that is much useful.
So maybe we should just remove it, it really is just a waste of space.
I think you don't even need Sparse::dims to be thread-safe for your
purposes, just use Sparse<T>::rows and Sparse<T>::columns.


>
>>>
>>> Maybe changing more code to use dims () directly (e.g. dims ().length () ), 
>>> i.e. without using a local copy would be better in cases where the dv is 
>>> actually used only once or twice.
>>
>> Unless it's a performance critical place, people should use whatever
>> approach seems more convenient.
>>
>
> OK.
>
>>>>
>>>>> - Changed dim_vector local variable to const ref when available 
>>>>> (bitfcns.cc, data.cc, oct-map.cc, ov-base-diag.cc, ov-base-int.cc, 
>>>>> ov-base-mat.cc, ov-base-sparse.cc, ov-bool-mat.cc, ov-bool-sparse.cc, 
>>>>> ov-cell.cc, ov-cx-mat.cc, ov-cx-sparse.cc, ov-flt-cx-mat.cc, 
>>>>> ov-flt-re-mat.cc, ov-intx.h, ov-perm.cc, ov-re-mat.cc, ov-re-sparse.cc, 
>>>>> ov-str-mat.cc, ov-struct.cc, pr-output.cc, pt-mat.cc, strfns.cc, xpow.cc)
>>>>
>>>> See above. The return-by-value style is idiomatic and convenient
>>>> (because it doesn't matter whether the function returns a value or a
>>>> reference), even if we make this change now, there's no guarantee
>>>> people will keep using this style. I probably won't.
>>>>
>>>
>>> First of all, the reference idiom also works regardless whether the 
>>> function returns a value or a reference, and it has no penalty in either 
>>> case.
>>
>> No, it doesn't. It's OK if you return a const ref and assign it to a
>> variable (because you just assign the referenced object) and it's OK
>> if you return an object and make a const reference to it (because C++
>> holds the temporary for the duration of the const reference, but some
>> people don't know this). But, whenever a reference is returned from a
>> function (or, for that matter, passed to it), it is solely the
>> programmer's responsibility to ensure that the reference remains valid
>> as long as it is used. The compiler won't guarantee this because it
>> has no way to do so. Consider this snippet:
>>
>> const dim_vector& orig_dims = matrix.dims (); // matrix is Array<T> subclass
>> ...
>> matrix.resize (new_dims); // resize to new dimensions
>> ...
>> matrix.resize (orig_dims); // gotcha!
>>
>> the problem is that at the second line, the dim_vector object
>> referenced by orig_dims silently changes, so the final resize will do
>> something unexpected (do nothing).
>>
>> Another simple instance is (disregarding the fact that it makes little 
>> sense):
>>
>> octave_value val = ...
>> const dim_vector& dv = val.array_value ().dims ();
>>
>> here, val.array_value () creates a temporary NDArray object, whose
>> dims () method is the called. This method returns a reference, so no
>> temporary is withheld. C++ has no way of knowing that it needs to hold
>> the intermediate expressions, too. It can only hold the final one. So
>> after this statement, the temporary NDArray object is destroyed, but
>> the reference still points to the temporary object. Ouch.
>>
>> Yes, both examples seem somewhat artificial and probably can be
>> avoided easily, but there may be less obvious cases. And with the
>> other reference counted classes, it may be even easier to shoot
>> yourself in the foot.
>>
>>
>>> So, people don't need to, the code works either way. I did change all the 
>>> applicable cases, though, so as long as people cut & paste within the 
>>> codeset, the practice might as well stick, who knows. Anyway, it IS more 
>>> efficient and safe.
>>
>> More efficient, yes (at least usually). Safe, no. I have no problem
>> with using this, especially where performance seems critical, but I'm
>> against making the changes globally.
>>
>>
>> "usually" is exact because a reference is ultimately just a pointer
>> that is usually optimized away; usually, but not always. It can only
>> be optimized away when the compiler can determine that the
>> const-referenced object does not change (by means other than the const
>> reference). The compiler has to assume the worst here. Fortran is
>> better(?) in this regard because it simply declares all such programs
>> illegal and so the compiler needs not care :)
>>
>> If the caller tries to modify the dv, the const ref has to be changed
>> to a non-const local instance and everything is OK.
>
>
> Agree. I would conclude that it is safe to use "const dim_vector&" whenever 
> it is taken from an object that is passed in as a const & parameter (which 
> means that the object must not be changed, not even it's dim_vector).

Actually, const & only means the object may not be changed through the
reference, but it may still be changed by other means, for instance by
another non-const reference bound to the same object. You generally
need to be quite sick to arrive at such a situation, though :)

Also, one-off use (e.g. dims ().length ()) is also safe, but actually
storing the reference in other cases should be done only in
performance critical cases, when it can be maintained that the
underlying dim_vector does not change.

Yes.

>
> Currently, it seems that reference counting is not an performance issue, but 
> this might change, if/when the reference counting is made thread-safe (see 
> the other thread).
>
>>>>> - Changed to avoid calling dims() or numel() via the virtual members 
>>>>> (ov-base-diag.cc, ov-base-mat.cc, ov-base-sparse.cc, ov-bool-mat.cc, 
>>>>> ov-bool-mat.h, ov-bool-sparse.cc, ov-cell.cc, ov-cx-mat.cc, 
>>>>> ov-cx-sparse.cc, ov-flt-cx-mat.cc, ov-flt-re-mat.cc, ov-intx.h, 
>>>>> ov-perm.cc, ov-range.cc, ov-re-mat.cc, ov-re-sparse.cc, ov-str-mat.cc, 
>>>>> ov-struct.cc)
>>>>>
>>>>
>>>> Except in supposedly performance-critical places, I'd rather avoid
>>>> doing this. Pieces of code are copied often between subclass
>>>> implementations, and this change makes it harder.
>>>
>>> In my (very recent) experience with octave code, most of the subclasses use 
>>> the "matrix" member, so there should be a problem. Also, I only followed 
>>> the practice already used in parts of the code.
>>
>> I don't think so. I used to change the virtual calls to direct calls
>> in certain places where I thought it could help, but there's a number
>> of, say, octave_matrix:: methods that just call dims() or numel ().
>> Check for yourself.
>>
>
> I was not clear enough. I meant that most of the time, the calls are in 
> object that have a "matrix" member, so using it is not a big barrier for 
> cur&paste coding.
>
>>>
>>> Let me elaborate the reasoning here:
>>>
>>> - Classes derived from Array have access to a const reference to the 
>>> dim_vector via Array::dims (). The subclasses can use the dims () method 
>>> for accessing the dim_vector without any performance penalty, and in a 
>>> thread safe manner. All use of dims () can be inlined, etc.
>>
>> Yes, this was the reason why I changed Array<T>::dims. Originally it
>> returned a value.
>>
>>> - Classes derived from octave_base have dims () method that returns 
>>> dim_vector object, that in some cases is newly constructed, and always 
>>> copied, involving (in many cases unnecessary) reference counting. This dims 
>>> () being virtual, it can never be inlined, as the compiler will not know at 
>>> the compile time which implementation will be used.
>>>
>>> So the latter has performance penalties via:
>>> 1) dim_vector constructor/destructor
>>> 2) dim_vector::rep reference counting
>>> 3) no inlining
>>> 4) mandatory dispatch via vtable, many times canceling the benefit of 
>>> branch prediction, which can double the execution time
>>> 5) not thread safe
>>>
>>
>> Yes, but you can't avoid the virtual methods normally,
>
> But it gets even worse when there are virtual methods calling each other. 
> None of them can be inlined.
>
>>
>> I think you've gone much farther than suitable with your changes. I'm
>> against implementing octave_fcn_handle::rows et al just for the sake
>> of preventing octave_value::numel doing reference counting behind the
>> hood. As I said, I'm OK having the overrides for scalars and matrices,
>> but not everywhere.
>>
>
>
> I'll make an another proposal along the lines of this discussion in a few 
> days.
>
>>>>>
>>>>
>>>> Yes. The point is that dim_vector is just one class that is reference
>>>> counted, there are many more - idx_vector, Array, Sparse, SparseQR,
>>>> octave_shlib, octave_mutex, octave_value, graphics properties ...
>>>>
>>>
>>> There is a subtle, but important difference: none of the above are needed 
>>> for just *reading* the octave data, while in practice either numel(), 
>>> columns(), rows() or capacity() is needed to know how much data there is.
>>
>> No, it isn't. Just like you reference the octave_base_matrix::matrix
>> members directly through pointer tricks, you can reference
>> octave_base_matrix::matrix::dims (), and it's thread-safe (thanks to
>> the const-ref being returned). octave_value::numel() is *not* a safe
>> way to determine what polymorphic instance you have, and never will
>> be. Admittedly, if Array::dims didn't return a const ref, you would
>> have no thread-safe way to get there, so it's good this has changed,
>> but I think it's not needed to go much farther.
>>
>
> Could you please show how I get from a octave_base_value * to the length of 
> the data without using numel ?
>

Sorry, I see it can't be done, because octave_base_value::matrix_ref
has no const version. I'll add it, then you should be able to do
matrix_ref ().dims ().


>>>
>>>>> There are minor efficiency increases all over the place, due to not 
>>>>> creating a local instance of dim_vector, where it is not needed.
>>>>
>>>> The difference between const dim_vector and const dim_vector& is just
>>>> the counter increment/decrement, and I doubt that is visible in most
>>>> places.
>>>>
>>>
>>> Right. I totally agree that when printing, saving, etc. the overhead is 
>>> totally absorbed by orders of magnitude bigger overheads. But maybe keeping 
>>> the changes I proposed would entice others later using the same idioms, 
>>> when creating new code that might be more performance critical (cut & 
>>> paste)?
>>
>> Cf. above arguments against adopting this as idiomatic. It's OK if it
>> feels natural to you and you can avoid the subtle problems (I suppose
>> you can easily), but otherwise I'm against "fixing" this in others'
>> code. Also, note that JWE often makes "style fixes" to the code and
>> he's even much less concerned by performance than me, so if you use a
>> const reference on purpose for performance reasons, better add an
>> explanation comment (such as the one for Array<T>::dims) to prevent
>> someone else from changing it later.
>>
>
> OK.
>
>>>
>>> Btw. I also noticed that cellfun() itself has some overhead. My own 
>>> "celltest.cc", which copied only the relevant parts of cellfun() was faster.
>>
>> Hmm. Maybe you omitted something that cellfun needs to do? If you
>> don't think so, please show me the code. Or if you can figure out how
>> cellfun can be further sped up (preferably without invasive changes),
>> that would be even better :)
>>
>
> I just cut&pasted the cellfun itself, but only upto the point where the arg0 
> is a string recognized by cellfun (e.g., "numel") and without any of the 
> preceding C++ helper code.
>

That still sounds like it should be equally fast, so I'd like to see it.


-- 
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz
[Prev in Thread]
Current Thread
[Next in Thread]
Re: Changesets: Re: Parallel access to data created with Octave, (continued)
Prev by Date: Jordan canonical form
Next by Date: RTD Logic Integration
Previous by thread: Re: Changesets: Re: Parallel access to data created with Octave
Next by thread: Re: Changesets: Re: Parallel access to data created with Octave
Index(es):
- Date
- Thread