Dataflow: memory consumption and copying around (PROPOSAL)

The general issue I want to discuss here is the hunting for unnecessary
copying and data-holding on the RTT dataflow. There are quite a few
places where data samples get copied and stored, and I'd like to start
removing the unnecessary ones (and discussing which are necessary and
which are not).

First, I'll make a list of each of the places where samples get stored.
I'll probably make some mistakes, so please correct me if I am wrong.
Then, I'll outline some of the solutions.

To give some figures, **writing** 2x30fps images at 640x480 RGB
currently eat ~10% CPU on a Core 2 Duo 1.7 GHz. That is without
transport, i.e. only writing the thing on a in-process connection !

Cases
-----
- each channel elements keeps the last read sample to be able to
return it in the OldData case
- the underlying data-holding structures do *not* reinitialize the
elements when they get returned, so as to avoid memory allocations
when re-writing on them. This means that a 100 element
buffer actually keeps up to the last 100 elements written to it.
This is a "up to" as the code reuses the most-recently-used pool
elements first, so if you have a 100 element buffer where
read/writes are always perfectly interleaved, you will store only
one copy.
- transports commonly store some copies. My last discovery is
in-between the dispatcher and the actual CORBA connection. I've
already done some hunting on the MQ a while ago. I'm starting the
same on the CORBA transport as well.

Solutions
---------
- We already had a discussion, a while ago, about making read() not
copy any sample in the OldData case. This would save one "copy
holder", namely the need for each connection channel to keep a
single copy of each samples. To retain backward compatibility, we
should probably introduce either an additional flag in the policy or
allow the input ports to declare what they are expecting (I prefer
the latter).
- Change how the data-holding structure to always reinitialize the
pool elements (the pending data samples) with the data sample it
got. Components that do "the right thing", namely call setDataSample
on their output ports to avoid memory allocations on their
connections would retain RT behaviour. The others would not, but
they had no guarantee so far (as they would get "some" allocations
"sometimes")
- browse the transports to remove the unnecessary copying

Thoughts ?

Dataflow: memory consumption and copying around (PROPOSAL)

Hi Sylvain,

On Fri, Apr 20, 2012 at 1:45 PM, Sylvain Joyeux <sylvain [dot] joyeux [..] ...> wrote:
> The general issue I want to discuss here is the hunting for unnecessary
> copying and data-holding on the RTT dataflow. There are quite a few
> places where data samples get copied and stored, and I'd like to start
> removing the unnecessary ones (and discussing which are necessary and
> which are not).
>
> First, I'll make a list of each of the places where samples get stored.
> I'll probably make some mistakes, so please correct me if I am wrong.
> Then, I'll outline some of the solutions.
>
> To give some figures, **writing** 2x30fps images at 640x480 RGB
> currently eat ~10% CPU on a Core 2 Duo 1.7 GHz. That is without
> transport, i.e. only writing the thing on a in-process connection !

Didn't we add the read/write pointer for these cases.. ie, where even
a single copy is already quite costly ? Aside from that, I'm all for
going through the data flow again to see where we can improve it.

>
> Cases
> -----
>  - each channel elements keeps the last read sample to be able to
>    return it in the OldData case
>  - the underlying data-holding structures do *not* reinitialize the
>    elements when they get returned, so as to avoid memory allocations
>    when re-writing on them. This means that a 100 element
>    buffer actually keeps up to the last 100 elements written to it.
>    This is a "up to" as the code reuses the most-recently-used pool
>    elements first, so if you have a 100 element buffer where
>    read/writes are always perfectly interleaved, you will store only
>    one copy.
>  - transports commonly store some copies. My last discovery is
>    in-between the dispatcher and the actual CORBA connection. I've
>    already done some hunting on the MQ a while ago. I'm starting the
>    same on the CORBA transport as well.

Ack all.

>
> Solutions
> ---------
>  - We already had a discussion, a while ago, about making read() not
>    copy any sample in the OldData case. This would save one "copy
>    holder", namely the need for each connection channel to keep a
>    single copy of each samples. To retain backward compatibility, we
>    should probably introduce either an additional flag in the policy or
>    allow the input ports to declare what they are expecting (I prefer
>    the latter).

I think too (in hindsight) that modifying the read was not useful, and
that we should have modified the port as a whole, using some option.
Even more, the old data can be easily emulated by a user by changing

void foo() {
Data data; // at function level
read(data); // returns OldData, with writing data
}

to

Data data; // at class level
void foo() {
read(data); // returns OldData, but without writing data.
}

So paying the extra copy-for-storage cost in the port was not adding
anything the user couldn't add himself without writing more code.

>  - Change how the data-holding structure to always reinitialize the
>    pool elements (the pending data samples) with the data sample it
>    got. Components that do "the right thing", namely call setDataSample
>    on their output ports to avoid memory allocations on their
>    connections would retain RT behaviour. The others would not, but
>    they had no guarantee so far (as they would get "some" allocations
>    "sometimes")

I'm not sure if this will work out, I'd probably need to see a patch
first to see what you're at...

>  - browse the transports to remove the unnecessary copying
>
> Thoughts ?

I agree that we're not as copy-efficient as we should have been, but
adding a generic/fully supported solution for people who only want to
copy data references would be even more useful in the long term.

Peter

Dataflow: memory consumption and copying around (PROPOSAL)

On 04/21/2012 07:04 AM, Peter Soetens wrote:
> Hi Sylvain,
>
> On Fri, Apr 20, 2012 at 1:45 PM, Sylvain Joyeux<sylvain [dot] joyeux [..] ...> wrote:
>> The general issue I want to discuss here is the hunting for unnecessary
>> copying and data-holding on the RTT dataflow. There are quite a few
>> places where data samples get copied and stored, and I'd like to start
>> removing the unnecessary ones (and discussing which are necessary and
>> which are not).
>>
>> First, I'll make a list of each of the places where samples get stored.
>> I'll probably make some mistakes, so please correct me if I am wrong.
>> Then, I'll outline some of the solutions.
>>
>> To give some figures, **writing** 2x30fps images at 640x480 RGB
>> currently eat ~10% CPU on a Core 2 Duo 1.7 GHz. That is without
>> transport, i.e. only writing the thing on a in-process connection !
>
> Didn't we add the read/write pointer for these cases.. ie, where even
> a single copy is already quite costly ? Aside from that, I'm all for
> going through the data flow again to see where we can improve it.
Yes, I actually went for that solution. However, this trick only works
for non-realtime data as it requires memory allocation.

>> - Change how the data-holding structure to always reinitialize the
>> pool elements (the pending data samples) with the data sample it
>> got. Components that do "the right thing", namely call setDataSample
>> on their output ports to avoid memory allocations on their
>> connections would retain RT behaviour. The others would not, but
>> they had no guarantee so far (as they would get "some" allocations
>> "sometimes")
>
> I'm not sure if this will work out, I'd probably need to see a patch
> first to see what you're at...
Well, I can send a prototype patch (i.e. not the real thing, but close to).

Another trick I thought about would be to use std::swap as a RT-friendly
std::move(). We would need to segregate const-ref value transport (from
user to dataflow) from non-const-ref value transport (from dataflow to
dataflow) for this to work, but that would be pretty neat.

> I agree that we're not as copy-efficient as we should have been, but
> adding a generic/fully supported solution for people who only want to
> copy data references would be even more useful in the long term.
I'm actually not so sure about that. ReadOnlyPointer does the trick, and
has pretty neat semantics, in a component-based context. Now, copying
too many things around is also a liability on low-end systems.

Thanks to ReadOnlyPointer, I don't think that this is an urgent issue.
Just wanted to start a discussion.