EventCatcherImpl

Submitted by Vandenbroucke Sander on Tue, 2007-11-06 14:55

RTT-dev

Hi,

I've got an issue with the current implementation of an EventProcessor.
The deferred execution is put in a list. The list is run each time the
EventProcessor's task gets triggered, calling
EventCatcherImpl<...>::complete() of each item in the list.

Let's say one Event is fired each 10 ms and one is called each second
and they are both run in the same EventProcessor.
The complete() of the '1s Event' will be called 99 times to many...

This approach is in contrast with a CommandProcessor witch queue's its
commands.

Would it not nice to queue the EventCatcher's as well?

Sander.

EventCatcherImpl

Submitted by bruyninc on Wed, 2007-11-07 10:45.

On Tue, 6 Nov 2007, Vandenbroucke Sander wrote:

> I've got an issue with the current implementation of an EventProcessor.
> The deferred execution is put in a list. The list is run each time the
> EventProcessor's task gets triggered, calling
> EventCatcherImpl<...>::complete() of each item in the list.
>
> Let's say one Event is fired each 10 ms and one is called each second
> and they are both run in the same EventProcessor.
> The complete() of the '1s Event' will be called 99 times to many...

But after completion that event should not be in the list anymore...?

> This approach is in contrast with a CommandProcessor witch queue's its
> commands.
This contrast is _necessary_ because it reflects the semantic difference
between events (= fired and handled by "someone" and then forgotten) and
commands (= launched by someone to a particular identified "server", and
that someone wants to be able to follow the status of the command
execution).

> Would it not nice to queue the EventCatcher's as well?
Is this an _implementation_ question, or a _user_visible semantic_
question?

Herman

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

RE: EventCatcherImpl

Submitted by Vandenbroucke Sander on Fri, 2007-11-16 07:50.

>Ok, The old AtomicQueue is probably not an option, as the code could
break
>when multiple enqueue's are happening simultaneously. What the
>EventProcessor
>needs is a Multi-Write/Single-Read Queue implementation. I wrote a MWSR
>queue
>based on a single mutex for guarding the writes. You can copy this file
>over
>to your RTT src/ dir and edit EventProcessor.hpp to use this class. See
>what
>the performance is on your embedded system.
>
>Also, please review the class to be sure.
>
Peter,

I haven't got time to test this queue. I don't like the use of a mutex
in the queue; it prevents it from being used in isr/dsr...

Maybe we could split-up the AtomicQueue.h in a .h and .impl file and
allow the user to define a custom queue implementation?

Sander.

RE: EventCatcherImpl

Submitted by Vandenbroucke Sander on Wed, 2007-11-07 11:15.

>
>But after completion that event should not be in the list anymore...?
>
It should not be in the list anymore, but it still is!
There is a flag guarding execution of the event:

Code from EventProcessor.cxx:
void EventProcessor::step() {
if ( catchers.empty() )
return;
catchers.apply( boost::bind(&EventCatcher::complete, _1 ) );
}

Code from EventProcessor.hpp:
struct EventCatcherImpl<0, SignalType, ContainerType>
: public EventCatcher
{
.....
Result handler( void ) {
work = this->enabled;
if ( work && mep ) // if enabled and mep, there is an
getActivity() present.
signalWork();
return detail::NA::na();
}

virtual void complete() {
if (!work)
return;
f();
work = false;
}
.....
};

step() is executed each tick (in case the ExecutionEngine's activity is
periodic) or each time an event, command, ... triggers the activity (in
case of non periodic activity).

Lets take a look at the handler().
This function is called when the event is fired. It sets the 'work' flag
and triggers the ExecutionsEngine's activity with signalWork().

For the complete() function.
This function is called each time signalWork() or an other function that
triggers the activity is called. It uses the 'work' flag to determine it
has to execute the complete function or not. From my point of view this
is a pure waste of time.

>> This approach is in contrast with a CommandProcessor witch queue's
its
>> commands.
>This contrast is _necessary_ because it reflects the semantic
difference
>between events (= fired and handled by "someone" and then forgotten)
and
>commands (= launched by someone to a particular identified "server",
and
>that someone wants to be able to follow the status of the command
>execution).
>
This is a semantic contrast; I was referring to the implementation.

>Is this an _implementation_ question, or a _user_visible semantic_
>question?
>
It is an implementation question that can save a lot of overhead.
Imagine:

void EventProcessor::step() {
EventCatcher* ec;
while(!queue.dequeue(ec))
{
ec->complete();
}
}

struct EventCatcherImpl<0, SignalType, ContainerType>
: public EventCatcher
{
.....
Result handler( void ) {
if(mep) // if enabled and mep, there is an getActivity()
present.
{
Mep->queue->enqueue(this);
signalWork();
}
return detail::NA::na();
}

virtual void complete() {
f();
}
.....
};

Now, this would kick ass wouldn't it?

Sander.

RE: EventCatcherImpl

Submitted by Vandenbroucke Sander on Wed, 2007-11-07 12:20.

>> It is an implementation question that can save a lot of overhead.
>> Imagine:
>>
>> void EventProcessor::step() {
>> EventCatcher* ec;
>> while(!queue.dequeue(ec))
>> {
>> ec->complete();
>> }
>> }
>>
>> struct EventCatcherImpl<0, SignalType, ContainerType>
>> : public EventCatcher
>> {
>> .....
>> Result handler( void ) {
>> if(mep) // if enabled and mep, there is an getActivity()
>> present.
>> {
>> Mep->queue->enqueue(this);
>
>Or add an enqueue(EventCatcher*) method to EventProcessor....
>
>> signalWork();
>> }
>> return detail::NA::na();
>> }
>>
>> virtual void complete() {
>> f();
>> }
>> .....
>> };
>>
>> Now, this would kick ass wouldn't it?
>
>Yes. Did you test this ? patch ?
>
>I took a quick look at the code and it appears you are right...
>
Nope, I wrote this code while replying. That's where you guys come in.
I put in some ideas and the experts check all pitfalls :-)

Seriously, what to do with the catchers list? It is also used for
enabling/disabling the EventCatchers. Then there is left the destructor
who deletes all catchers upon destruction of the EventProcessor. We
could use a vector for this administration. In that case we need a
'cheap' way to determine if an EventCatcher is enabled in the handler()
function. Therefore the EventCatcher can ask the EventProcessor if it is
active. Maby flag setting in initialize()/finalize()? And what abaout
the queque-depth?

Sander.

RE: EventCatcherImpl

Submitted by Vandenbroucke Sander on Mon, 2007-11-12 12:35.

>
>1. Just being curious: Could you measure the performance improvement
with
>this
>patch ? I mean, when your next mail mentions 50% load, is this with or
>without this patch ?
This is with this patch. Without this patch performance is in our case
much worse, meaning 80% cpu load.

>2. You have put the
>void EventCatcher::queueCatcher()
>{
> if(mep)
> mep->queueCatcher(this);
>}
>function in the header... doesn't this break your app during linking ?
Or
>you
>put it in the class definition (inlining) or you put it in the .cpp
file.
I don't get any linker errors. Maby it is an other story with dynamic
linking. This function could be moved to the cpp file, it doesn't get
inlined anyway.

I sugest to leave
void queueCatcher(detail::EventCatcher* ec)
{
if(enabled && ecQueue.enqueue(ec))
{
getActivity()->trigger();
}
}

as an inline function. I really have to squeeze for the last drop!

>3. In the setup() code: catchers.push_back( eci.get() ); will cause a
>segfault
>when the shared pointer is destroyed. You should remove the catchers
vector
>all-together, also the delete(*it) code in the EventProcessor's
destructor.
>The event code will cleanup the catcher automatically, the EP does not
need
>to do/track this.
>
Ok, I didn't pay too much attention to the destruction-part; We are not
loading/unloading components dynamically.

I will change my code according your suggestions.

Sander.

EventCatcherImpl

Submitted by sspr on Mon, 2007-11-12 13:20.

On Monday 12 November 2007 13:33:51 Vandenbroucke Sander wrote:
> >1. Just being curious: Could you measure the performance improvement
>
> with
>
> >this
> >patch ? I mean, when your next mail mentions 50% load, is this with or
> >without this patch ?
>
> This is with this patch. Without this patch performance is in our case
> much worse, meaning 80% cpu load.

I don't understand this. In case the (tick) event carries no data, the
implementation is very efficient and is mainly storing a function pointer for
later calling in the EP. Once the event has data, performance drops about 1/2
(I'm guessing) the speed because of the data storing and retrieval process. I
think we need to profile (gprof) a small example (Event + EP) on gnulinux and
see where the bottleneck sits.

>
> Ok, I didn't pay too much attention to the destruction-part; We are not
> loading/unloading components dynamically.
>
> I will change my code according your suggestions.

Ok.

Peter

RE: EventCatcherImpl

Submitted by Vandenbroucke Sander on Fri, 2007-11-09 09:55.

>Quoting Vandenbroucke Sander <Sander [dot] Vandenbroucke [..] ...>:
>>>> Now, this would kick ass wouldn't it?
>>>
>>> Yes. Did you test this ? patch ?
>>>

See attachment. I tested it on our old trunk version, merged it to
rtt-1.4. At least it compiles, didn't have to change any code. So I'm
pretty sure it works on rtt-1.4 too.

Sander.

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

EventCatcherImpl

Submitted by sspr on Mon, 2007-11-12 09:15.

On Friday 09 November 2007 10:51:55 Vandenbroucke Sander wrote:
> >Quoting Vandenbroucke Sander <Sander [dot] Vandenbroucke [..] ...>:
> >>>> Now, this would kick ass wouldn't it?
> >>>
> >>> Yes. Did you test this ? patch ?
>
> See attachment. I tested it on our old trunk version, merged it to
> rtt-1.4. At least it compiles, didn't have to change any code. So I'm
> pretty sure it works on rtt-1.4 too.

1. Just being curious: Could you measure the performance improvement with this
patch ? I mean, when your next mail mentions 50% load, is this with or
without this patch ?
2. You have put the
void EventCatcher::queueCatcher()
{
if(mep)
mep->queueCatcher(this);
}
function in the header... doesn't this break your app during linking ? Or you
put it in the class definition (inlining) or you put it in the .cpp file.
3. In the setup() code: catchers.push_back( eci.get() ); will cause a segfault
when the shared pointer is destroyed. You should remove the catchers vector
all-together, also the delete(*it) code in the EventProcessor's destructor.
The event code will cleanup the catcher automatically, the EP does not need
to do/track this.

If you can comment on or address these issues, the patch is ready for
inclusion.

Peter

RE: EventCatcherImpl

Submitted by sspr on Wed, 2007-11-07 13:05.

Quoting Vandenbroucke Sander <Sander [dot] Vandenbroucke [..] ...>:
>>> Now, this would kick ass wouldn't it?
>>
>> Yes. Did you test this ? patch ?
>>
>> I took a quick look at the code and it appears you are right...
>>
> Nope, I wrote this code while replying. That's where you guys come in.
> I put in some ideas and the experts check all pitfalls :-)
>
> Seriously, what to do with the catchers list? It is also used for
> enabling/disabling the EventCatchers. Then there is left the destructor
> who deletes all catchers upon destruction of the EventProcessor. We
> could use a vector for this administration. In that case we need a
> 'cheap' way to determine if an EventCatcher is enabled in the handler()
> function. Therefore the EventCatcher can ask the EventProcessor if it is
> active. Maby flag setting in initialize()/finalize()? And what abaout
> the queque-depth?

This 'enabled' check was meant to ignore asyn events when the EP is
not running,
such that when the EP is started, no 'old' events are processed. In
your method, this check is not necessary, because as long as the EP is
not running, it's enqueue() function does not enqueue the
EventCatcher, hence, no old events can get in the queue.

But 'enabled' is also used in another way. The EventCatchers are smart
pointers and the 'asyn' Handle stores it. Thus it is possible that the
EP is deleted, but the EC is still around (meaning: the TaskContext
subscribed to an asyn event, and did not destroy that subscription in
its destructor) The question is whether this is a problem we want to
fix. SYN events have the same problem, but 'deletion of the event
receiver' is not checked at all, same holds for methods, commands etc.
: we don't check if the target object has been deleted, mostly because
it is just plain impossible in C++. So why checking deletion of the
asyn event receiver ? Because we can (in this isolated case)?

I wouldn't mind dropping that code.

Peter

RE: EventCatcherImpl

Submitted by Vandenbroucke Sander on Mon, 2007-11-12 13:40.

>>
>> This is with this patch. Without this patch performance is in our
case
>> much worse, meaning 80% cpu load.
>
>I don't understand this. In case the (tick) event carries no data, the
>implementation is very efficient and is mainly storing a function
pointer
>for
>later calling in the EP. Once the event has data, performance drops
about
>1/2
>(I'm guessing) the speed because of the data storing and retrieval
process.
>I
>think we need to profile (gprof) a small example (Event + EP) on
gnulinux
>and
>see where the bottleneck sits.
>
Don't underestimate the boost::bind! You may not see it with your
profiler on a P4 @ 3GHz. Maybe I can do some measurements as well.

Sander.

EventCatcherImpl

Submitted by sspr on Mon, 2007-11-12 16:30.

On Monday 12 November 2007 14:30:35 Vandenbroucke Sander wrote:
> Don't underestimate the boost::bind! You may not see it with your
> profiler on a P4 @ 3GHz. Maybe I can do some measurements as well.

I suspect AtomicQueue is the culprit. When a lot of events are in the queue,
the insertion/removal operations become quite expensive, as the contents of
the queue are copied over. What happens is this:

1. the AtomicQueue is extended 64 times with one element upon each tick event.
each extension = copy of existing queue + 1 addition. = 2080 pointer copies
2. the AtomicQueue is reduced 64 times with each tick event processed.
each reduction = copy of existing queue - 1 element = 2080 pointer copies

Conclusion: 1 event = 128 copies of the event queue or 4160 memory copies.
That must bite your embedded system...

You could try using an std::queue with a mutex, but I don't know how heavy the
locking of the mutex will be on your system, and I also don't know if
std::queue uses malloc() during push()/pop() and how it copies or does not
copy...

The old implementation of AtomicQueue uses an much cheaper implementation (no
copies at all, very fast) but is only thread safe if there is only one writer
and one reader. There used to be a bug report about this, which then lead to
the new, less performant implementation. The interface remained the same...

Peter

EventCatcherImpl

Submitted by sspr on Tue, 2007-11-13 11:05.

On Monday 12 November 2007 17:24:14 Peter Soetens wrote:
>
> You could try using an std::queue with a mutex, but I don't know how heavy
> the locking of the mutex will be on your system, and I also don't know if
> std::queue uses malloc() during push()/pop() and how it copies or does not
> copy...
>
> The old implementation of AtomicQueue uses an much cheaper implementation
> (no copies at all, very fast) but is only thread safe if there is only one
> writer and one reader. There used to be a bug report about this, which then
> lead to the new, less performant implementation. The interface remained the
> same...

Ok, The old AtomicQueue is probably not an option, as the code could break
when multiple enqueue's are happening simultaneously. What the EventProcessor
needs is a Multi-Write/Single-Read Queue implementation. I wrote a MWSR queue
based on a single mutex for guarding the writes. You can copy this file over
to your RTT src/ dir and edit EventProcessor.hpp to use this class. See what
the performance is on your embedded system.

Also, please review the class to be sure.

About your patch: I had to add 'inline' to the class definition of the
enqueueCatcher function of the EventCatcher, and I left the function
definition at the bottom. This solved the linking problem.

Peter

EventCatcherImpl

Submitted by sspr on Mon, 2007-11-12 15:05.

On Monday 12 November 2007 14:30:35 Vandenbroucke Sander wrote:
>
> Don't underestimate the boost::bind! You may not see it with your
> profiler on a P4 @ 3GHz. Maybe I can do some measurements as well.

These are the results of an event-test program fire-ing and processing an
event 1000000 times as in:
for(int i=0; i != 1000000; ++i) {
t_event();
event_proc->step();
}

rtt and application compiled with -g -pg -O2, see the attachment. It looses
some detail as the inlined functions have disappeared... But with -O2, I got
too much detail... where non-inlined functions were causing to much
overhead...

Peter

RE: EventCatcherImpl

Submitted by sspr on Wed, 2007-11-07 11:50.

Quoting Vandenbroucke Sander <Sander [dot] Vandenbroucke [..] ...>:

>>
>> But after completion that event should not be in the list anymore...?
>>
> It should not be in the list anymore, but it still is!

The reason for not adding/removing a catcher from the queue was to
avoid memory allocations. I have a feeling that this is no longer
necessary in this implementation.

> There is a flag guarding execution of the event:
>
> Code from EventProcessor.cxx:
> void EventProcessor::step() {
> if ( catchers.empty() )
> return;
> catchers.apply( boost::bind(&EventCatcher::complete, _1 ) );
> }
>
> Code from EventProcessor.hpp:
> struct EventCatcherImpl<0, SignalType, ContainerType>
> : public EventCatcher
> {
> .....
> Result handler( void ) {
> work = this->enabled;
> if ( work && mep ) // if enabled and mep, there is an
> getActivity() present.
> signalWork();
> return detail::NA::na();
> }
>
> virtual void complete() {
> if (!work)
> return;
> f();
> work = false;
> }
> .....
> };

Yep, this looks like to much work.

>
> step() is executed each tick (in case the ExecutionEngine's activity is
> periodic) or each time an event, command, ... triggers the activity (in
> case of non periodic activity).
>
> Lets take a look at the handler().
> This function is called when the event is fired. It sets the 'work' flag
> and triggers the ExecutionsEngine's activity with signalWork().
>
> For the complete() function.
> This function is called each time signalWork() or an other function that
> triggers the activity is called. It uses the 'work' flag to determine it
> has to execute the complete function or not. From my point of view this
> is a pure waste of time.

The only reason why it could have been written like this is for
avoiding the insert/remove of the catchers queue.

>> Is this an _implementation_ question, or a _user_visible semantic_
>> question?
>>
> It is an implementation question that can save a lot of overhead.
> Imagine:
>
> void EventProcessor::step() {
> EventCatcher* ec;
> while(!queue.dequeue(ec))
> {
> ec->complete();
> }
> }
>
> struct EventCatcherImpl<0, SignalType, ContainerType>
> : public EventCatcher
> {
> .....
> Result handler( void ) {
> if(mep) // if enabled and mep, there is an getActivity()
> present.
> {
> Mep->queue->enqueue(this);

Or add an enqueue(EventCatcher*) method to EventProcessor....

> signalWork();
> }
> return detail::NA::na();
> }
>
> virtual void complete() {
> f();
> }
> .....
> };
>
> Now, this would kick ass wouldn't it?

Yes. Did you test this ? patch ?

I took a quick look at the code and it appears you are right...

Peter

RE: EventCatcherImpl

Submitted by bruyninc on Wed, 2007-11-07 16:05.

On Wed, 7 Nov 2007, Peter Soetens wrote:

> Quoting Vandenbroucke Sander <Sander [dot] Vandenbroucke [..] ...>:
>
>>>
>>> But after completion that event should not be in the list anymore...?
>>>
>> It should not be in the list anymore, but it still is!
>
> The reason for not adding/removing a catcher from the queue was to avoid
> memory allocations.

This is a traditional realtime trade-off: memory should not be allocated at
runtime, at the cost of having a hard limit on the number of (in this case)
events that your system can handle... So, similarly to using memory pools
or other deterministic memory management approaches, Orocos should consider
using them for the event data.

Herman