Events & rt_list.

Hi,

I'm working on a rt_list implementation to replace the ListLockFree for
Events, this to reduce memory footprint.

Now I have the following situation:

void signal_base::conn_setup( connection_t conn )
{
// allocate empty slot in list.
mconnections.rt_grow(1);
}

void signal_base::conn_connect( connection_t conn )
{
// derived class must make sure that list contained enough list
items !
//assert( itend != mconnections.end() );

// connection (push_back) in emit() does not invalidate
iterators, so this
// function is straightforwardly implemented.
OS::MutexLock lock(m);
mconnections.push_back( conn );
}

void signal_base::conn_destroy( connection_t conn )
{
this->conn_disconnect(conn);
// increase number of connections destroyed.
// free memory
mconnections.rt_shrink(1);
}

void signal_base::conn_disconnect( connection_t conn )
{
assert( conn && "virtually impossible ! only connection base
should call this function !" );

iterator tgt;
// avoid invalidating iterator of emit() upon self or cross
removal of conn.
OS::MutexLock lock(m);
if ( (tgt = std::find( mconnections.begin(),
mconnections.end(),
conn)) != mconnections.end() )
{
if ( !emitting )
{
mconnections.erase( tgt ); // safe to remove we
hold mutex + no self removal.
return;
}
// only done when in emit(). cleanup() is guaranteed to
be called afterwards.
++disconcount; // used in cleanup() to detect
self-disconnections.

// cfr for loop in cleanup()
connection_t d(0);
*tgt = d; //clear out, no erase, keep all iterators
valid !
}
}

Next step: Remove the mutex...
Do you think this is possible when:
mconnections.rt_grow(x)
mconnections.push_back( conn )
mconnections.erase( tgt )
are atomic as far as tampering with the list is concerned? I mean that
add/removing of nodes can't be interrupted?

Kind regards,
Sander.

Events & rt_list.

On Wednesday 05 March 2008 11:53:23 Vandenbroucke Sander wrote:
> Hi,
>
> I'm working on a rt_list implementation to replace the ListLockFree for
> Events, this to reduce memory footprint.

Thanks for joining the crusade.

>
> Now I have the following situation:
>
> void signal_base::conn_setup( connection_t conn )
> {
> // allocate empty slot in list.
> mconnections.rt_grow(1);
> }
>
> void signal_base::conn_connect( connection_t conn )
> {
> // derived class must make sure that list contained enough list
> items !
> //assert( itend != mconnections.end() );
>
> // connection (push_back) in emit() does not invalidate
> iterators, so this
> // function is straightforwardly implemented.
> OS::MutexLock lock(m);
> mconnections.push_back( conn );
> }
>
> void signal_base::conn_destroy( connection_t conn )
> {
> this->conn_disconnect(conn);
> // increase number of connections destroyed.
> // free memory
> mconnections.rt_shrink(1);
> }
>
> void signal_base::conn_disconnect( connection_t conn )
> {
> assert( conn && "virtually impossible ! only connection base
> should call this function !" );
>
> iterator tgt;
> // avoid invalidating iterator of emit() upon self or cross
> removal of conn.
> OS::MutexLock lock(m);
> if ( (tgt = std::find( mconnections.begin(),
> mconnections.end(),
> conn)) != mconnections.end() )
> {
> if ( !emitting )
> {
> mconnections.erase( tgt ); // safe to remove we
> hold mutex + no self removal.
> return;
> }
> // only done when in emit(). cleanup() is guaranteed to
> be called afterwards.
> ++disconcount; // used in cleanup() to detect
> self-disconnections.
>
> // cfr for loop in cleanup()
> connection_t d(0);
> *tgt = d; //clear out, no erase, keep all iterators
> valid !
> }
> }
>
> Next step: Remove the mutex...
> Do you think this is possible when:
> mconnections.rt_grow(x)
> mconnections.push_back( conn )
> mconnections.erase( tgt )
> are atomic as far as tampering with the list is concerned? I mean that
> add/removing of nodes can't be interrupted?

I don't know the status of the RT_LIST code, even with the mutex. For example,
in cleanup(), newit is incremented and then invalidated with the erasure of
an element. That looks wrong for starters.

The major difficulty with this implementation is that it wants to protect when
self or cross-removal (or adding) of a connection is done, possibly from
another thread, and/or when the event is emitted. The lock guards the
conn_connect() and conn_disconnect() functions and any iterator over the
connection (thus emit() as well) which may not be invalidated after calling
these functions. If you could in addition implement a thread-safe

mconnections.setnull(conn) // sets the element holding conn to '0'

you can drop the mutex and get rid of the non-thread safe std::find in
conn_disconnect():

void signal_base::conn_disconnect( connection_t conn ) {

++disconcount;
//clear out, no erase, keep all iterators valid !
mconnections.setnull( conn );
}

And let cleanup() do the real erasure after the next emit().

Note that signal_template.hpp emit() has a MutexLock as well and that
mconnections.push_back( conn ) should not invalidate any iterators.

The LIST_LOCKFREE has the advantage that it creates a copy of the list when
modified and that the changes to the list are thus only taken into account in
the next emit(). This makes cross or self removal a non-problem. It's a nice
principle, but it also means that you need as many allocated lists as
threads*2. The rt_list implementation sets list element pointers to zero to
delay element erasure as well. The whole problem is that you need to keep
your iterators valid in emit().

Peter

Events & rt_list.

On Friday 07 March 2008 17:46:57 Peter Soetens wrote:
>
> void signal_base::conn_disconnect( connection_t conn ) {
>
> ++disconcount;
> //clear out, no erase, keep all iterators valid !
> mconnections.setnull( conn );
> }
>
> And let cleanup() do the real erasure after the next emit().

There is a race in the code above... You need to increase disconcount after
setnull().

Also, the emit() code in signal_template.hpp is not 100% robust. it uses a
standard pointer to connection_impl instead of a shared_ptr, which means that
the 'ci' pointer may have been freed prior to
ci->emit(OROCOS_SIGNATURE_ARGS);

Peter

RE: Events & rt_list.

>Also, the emit() code in signal_template.hpp is not 100% robust. it
uses a
>standard pointer to connection_impl instead of a shared_ptr, which
means
>that
>the 'ci' pointer may have been freed prior to
>ci->emit(OROCOS_SIGNATURE_ARGS);
>

const_iterator end = mconnections.end();
for (; it != end; ++it ) {
connection_impl* ci = static_cast( it->get()
);
if (ci)
ci->emit(OROCOS_SIGNATURE_ARGS);
}

ci is holding an address to a connection_impl in memory.
If ci is not null, emit is called, right.

What you are saying is that when a thread preempts emit() after 'if
(ci)' and before 'ci->emit(OROCOS_SIGNATURE_ARGS)' to clear a
connection, ci will be pointing to null?

Sander.

Events & rt_list.

On Monday 10 March 2008 12:00:43 Vandenbroucke Sander wrote:
> >Also, the emit() code in signal_template.hpp is not 100% robust. it
>
> uses a
>
> >standard pointer to connection_impl instead of a shared_ptr, which
>
> means
>
> >that
> >the 'ci' pointer may have been freed prior to
> >ci->emit(OROCOS_SIGNATURE_ARGS);
>
> const_iterator end = mconnections.end();
> for (; it != end; ++it ) {
> connection_impl* ci = static_cast( it->get()
> );
> if (ci)
> ci->emit(OROCOS_SIGNATURE_ARGS);
> }
>
> ci is holding an address to a connection_impl in memory.
> If ci is not null, emit is called, right.
>
> What you are saying is that when a thread preempts emit() after 'if
> (ci)' and before 'ci->emit(OROCOS_SIGNATURE_ARGS)' to clear a
> connection, ci will be pointing to null?

No. ci might become 'dangling' pointer, which means it points to memory
already freed by a delete statement. The if (ci) statement is correct (will
never become null) because ci is a copy. I believe a fix would be:

const_iterator end = mconnections.end();
for (; it != end; ++it ) {
connection_t shared_ci = *it; // protect connection from delete.
connection_impl* ci = static_cast( shared_ci.get() );
if (ci) // ci is null or valid.
ci->emit(OROCOS_SIGNATURE_ARGS);
}
}

Peter

RE: Events & rt_list.

>No. ci might become 'dangling' pointer, which means it points to memory
>already freed by a delete statement. The if (ci) statement is correct
(will
>never become null) because ci is a copy. I believe a fix would be:
>
> const_iterator end = mconnections.end();
> for (; it != end; ++it ) {
> connection_t shared_ci = *it; // protect connection from delete.
> connection_impl* ci = static_cast(
shared_ci.get() );
> if (ci) // ci is null or valid.
> ci->emit(OROCOS_SIGNATURE_ARGS);
> }
> }
>

Ok, I see you're point. But won't this break real-time? I mean you will
probably delete the connection when shared_ci goes out of scope?

Sander.

Events & rt_list.

On Monday 10 March 2008 14:36:22 Vandenbroucke Sander wrote:
> >No. ci might become 'dangling' pointer, which means it points to memory
> >already freed by a delete statement. The if (ci) statement is correct
>
> (will
>
> >never become null) because ci is a copy. I believe a fix would be:
> >
> > const_iterator end = mconnections.end();
> > for (; it != end; ++it ) {
> > connection_t shared_ci = *it; // protect connection from delete.
> > connection_impl* ci = static_cast(
>
> shared_ci.get() );
>
> > if (ci) // ci is null or valid.
> > ci->emit(OROCOS_SIGNATURE_ARGS);
> > }
> > }
>
> Ok, I see you're point. But won't this break real-time? I mean you will
> probably delete the connection when shared_ci goes out of scope?

Which is worse in this very, very unlikely event ? A crash or a hickup ? If
you want to avoid this behaviour (the delete) in your application, don't
destroy a RTT::Handle object when your application is running. A connection
is only deleted when the last handle to it goes out of scope (and the
callback is not connected to the event).

Peter

RE: Events & rt_list.

>
>I don't know the status of the RT_LIST code, even with the mutex. For
>example,
There isn't any RT_LIST code, I have implemented a rt_list.h in my fosi.
I guard the critical sections with disabling/enabling interrupts, so
it's kind of platform dependend. :-(

>in cleanup(), newit is incremented and then invalidated with the
erasure of
>an element. That looks wrong for starters.
>
>The major difficulty with this implementation is that it wants to
protect
>when
>self or cross-removal (or adding) of a connection is done, possibly
from
>another thread, and/or when the event is emitted. The lock guards the
>conn_connect() and conn_disconnect() functions and any iterator over
the
>connection (thus emit() as well) which may not be invalidated after
calling
>these functions. If you could in addition implement a thread-safe
>
To simplify this problem I made the following assumption: only rt_grow()
and rt_shrink() can actually add/remove items; erase only sets the
element to zero (it does not put it in the back!). The push_back looks
for the first empty element. This way iterators are always valid.

>mconnections.setnull(conn) // sets the element holding conn to '0'
>
I made a clear(conn) for this.

>you can drop the mutex and get rid of the non-thread safe std::find in
>conn_disconnect():
>
> void signal_base::conn_disconnect( connection_t conn ) {
>
> ++disconcount;
> //clear out, no erase, keep all iterators valid !
> mconnections.setnull( conn );
> }
>
>And let cleanup() do the real erasure after the next emit().
>
So the cleanup can be simplified as well?

>Note that signal_template.hpp emit() has a MutexLock as well and that
>mconnections.push_back( conn ) should not invalidate any iterators.
>
I ifdef'ed all mutex'es. They are in effect when choosing the std::list
implementation.

>The LIST_LOCKFREE has the advantage that it creates a copy of the list
when
>modified and that the changes to the list are thus only taken into
account
>in
>the next emit(). This makes cross or self removal a non-problem. It's a
>nice
>principle, but it also means that you need as many allocated lists as
>threads*2. The rt_list implementation sets list element pointers to
zero to
>delay element erasure as well. The whole problem is that you need to
keep
>your iterators valid in emit().
>
I prefer the LisTLockFree aswell, it is just to much for an embedded
application. We spend +800 bytes / Event and an other 140 bytes /
connection.
My current implementation only uses 160 bytes / Event and 70 bytes /
connection.

Sander.

RE: Events & rt_list.

On Mon, 10 Mar 2008, Vandenbroucke Sander wrote:

[...]
> I prefer the LisTLockFree aswell, it is just to much for an embedded
> application. We spend +800 bytes / Event and an other 140 bytes /
> connection.
> My current implementation only uses 160 bytes / Event and 70 bytes /
> connection.

If you are so tight in resources, it's maybe just better to forget about
using a framework such as Orocos, and just make your application as
platform-dependent as needed to satisfy your resource constraints...?
Doing the opposite (trying to change the framework to make it do
non-generic things for one particular application) might be much more
work (also for maintenance later on)... And your code will never be
portable anyway, since you have your customized fosi.h...

Herman

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

RE: Events & rt_list.

>
>If you are so tight in resources, it's maybe just better to forget
about
>using a framework such as Orocos, and just make your application as
>platform-dependent as needed to satisfy your resource constraints...?
>Doing the opposite (trying to change the framework to make it do
>non-generic things for one particular application) might be much more
>work (also for maintenance later on)... And your code will never be
>portable anyway, since you have your customized fosi.h...
>

Device drivers are not portable either. I see my rt_list as a 'hot
spot'. I didn't really change the framework.

There are already such platform depended tweaks in Orocos. The CAS
instruction is x86 specific. Atomic operations are CISC specific...

Anyway: The concept "resource constraints" should be taken into account
since not every controller is a P4 with 2 GB internal memory.

Sander.

RE: Events & rt_list.

On Mon, 10 Mar 2008, Vandenbroucke Sander wrote:

>> If you are so tight in resources, it's maybe just better to forget
> about
>> using a framework such as Orocos, and just make your application as
>> platform-dependent as needed to satisfy your resource constraints...?
>> Doing the opposite (trying to change the framework to make it do
>> non-generic things for one particular application) might be much more
>> work (also for maintenance later on)... And your code will never be
>> portable anyway, since you have your customized fosi.h...
>>
>
> Device drivers are not portable either. I see my rt_list as a 'hot
> spot'. I didn't really change the framework.

Good! I didn't see a mention about this in your previous posts, but
maybe I overlooked it!

> There are already such platform depended tweaks in Orocos. The CAS
> instruction is x86 specific. Atomic operations are CISC specific...
But they are indeed hidden behind architecture-independent abstractions,
aren't they?

> Anyway: The concept "resource constraints" should be taken into account
> since not every controller is a P4 with 2 GB internal memory.
I fully agree. But how and to which extent...? The ideal solution in my
opinion would be that we find a way that this flexibility is possible
via _configuration_ of the policy with which the Orocos features are
used, such that the _mechanism_ code can be shared for 100%...

Herman

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Events & rt_list.

On Monday 10 March 2008 08:55:23 Herman Bruyninckx wrote:
> On Mon, 10 Mar 2008, Vandenbroucke Sander wrote:
>
> [...]
>
> > I prefer the LisTLockFree aswell, it is just to much for an embedded
> > application. We spend +800 bytes / Event and an other 140 bytes /
> > connection.
> > My current implementation only uses 160 bytes / Event and 70 bytes /
> > connection.
>
> If you are so tight in resources, it's maybe just better to forget about
> using a framework such as Orocos, and just make your application as
> platform-dependent as needed to satisfy your resource constraints...?
> Doing the opposite (trying to change the framework to make it do
> non-generic things for one particular application) might be much more
> work (also for maintenance later on)... And your code will never be
> portable anyway, since you have your customized fosi.h...

It's not the size of the embedded platform that mandates this choice, it's the
kinds/size of the features a developer wants. If there's a library that
matches your needs, you should use it. Optionally/Later on, you develop a
patch that shrinks its size to acceptable measures by removing features. This
is still less work than writing a library from the ground up, and upstream is
always interested in code size reduction patches.

Now for the event footprint. Sander's efforts are totally justified for small
footprint systems. The current implementation assumes wide availability of
memory (and cache) and gives in return computational efficiency by not having
to do any system call, like mutex_lock() or malloc(). Also, the current code
can be called from interrupt handlers because no mutexes/mallocs are used.

Apart from Sander's efforts, we could look (again) for a (lock-free), O(1),
real-time and interrupt-safe memory allocator for use within our
communication algorithms. TLSF[*] is a close call, certainly for embedded
systems (heck, even eCos would benefit from it), but you have a global point
of contention as it uses locks (or disables interrupts on your tiny-os) to
protect concurrent access. It quite trashes you lock-free algorithm if your
underlying memory allocator (which *is* called in every queue/list/...
operation) uses locks anyway.

Back to square #1

Peter

[*] http://rtportal.upv.es/rtmalloc/ and http://www.orocos.org/node/462

Events & rt_list.

On Mon, 10 Mar 2008, Peter Soetens wrote:

> On Monday 10 March 2008 08:55:23 Herman Bruyninckx wrote:
>> On Mon, 10 Mar 2008, Vandenbroucke Sander wrote:
>> [...]
>>> I prefer the LisTLockFree aswell, it is just to much for an embedded
>>> application. We spend +800 bytes / Event and an other 140 bytes /
>>> connection.
>>> My current implementation only uses 160 bytes / Event and 70 bytes /
>>> connection.
>>
>> If you are so tight in resources, it's maybe just better to forget about
>> using a framework such as Orocos, and just make your application as
>> platform-dependent as needed to satisfy your resource constraints...?
>> Doing the opposite (trying to change the framework to make it do
>> non-generic things for one particular application) might be much more
>> work (also for maintenance later on)... And your code will never be
>> portable anyway, since you have your customized fosi.h...
>
> It's not the size of the embedded platform that mandates this choice, it's the
> kinds/size of the features a developer wants. If there's a library that
> matches your needs, you should use it. Optionally/Later on, you develop a
> patch that shrinks its size to acceptable measures by removing features. This
> is still less work than writing a library from the ground up, and upstream is
> always interested in code size reduction patches.

If they leave all features and configuration options intact, which is
not what I have the impression that Sander is doing?

> Now for the event footprint. Sander's efforts are totally justified for small
> footprint systems.
Of course, I have no objections against that!

> The current implementation assumes wide availability of
> memory (and cache) and gives in return computational efficiency by not having
> to do any system call, like mutex_lock() or malloc(). Also, the current code
> can be called from interrupt handlers because no mutexes/mallocs are used.
>
> Apart from Sander's efforts, we could look (again) for a (lock-free), O(1),
> real-time and interrupt-safe memory allocator for use within our
> communication algorithms. TLSF[*] is a close call, certainly for embedded
> systems (heck, even eCos would benefit from it), but you have a global point
> of contention as it uses locks (or disables interrupts on your tiny-os) to
> protect concurrent access. It quite trashes you lock-free algorithm if your
> underlying memory allocator (which *is* called in every queue/list/...
> operation) uses locks anyway.
>
> Back to square #1
But with more insights! Thanks...

So, the design challenge in this context is to find a way out of the
dilemma: faster/smaller footprint lock free feature versus the "global
point of contention" problem... Is this correct?

Herman

Events & rt_list.

On Monday 10 March 2008 10:59:40 Herman Bruyninckx wrote:
>
> So, the design challenge in this context is to find a way out of the
> dilemma: faster/smaller footprint lock free feature versus the "global
> point of contention" problem... Is this correct?

Yes. And I believe it is not solvable. Memory is a global point of contention
(physically *and* in malloc() implementations). The only way to solve that is
by partitioning it per thread, leading to large amounts of (pre)allocated,
but seldomly used memory.

Peter