RTT v1 buffer race condition

We've recently been exciting this assert(), amongst others. Could this occur due to more than ORO_OS_MAX_THREADS components trying to access the same buffer? We're still trying to make this reproducible ...

deployer-corba-gnulinux: /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/rtt/BufferLockFree.hpp:146: bool RTT::BufferLockFree<T, ReadPolicy, WritePolicy>::Push(typename RTT::WriteInterface<T>::param_t) [with T = OCL::logging::LoggingEvent, ReadPolicy = RTT::NonBlockingPolicy, WritePolicy = RTT::NonBlockingPolicy]: Assertion `false && "Race detected in Push()"' failed.

Backtrace attached (from gnulinux).
S

AttachmentSize
buffer-error.txt7.88 KB

RTT v1 buffer race condition

On Tuesday 12 October 2010 02:48:35 S Roderick wrote:
> We've recently been exciting this assert(), amongst others. Could this
> occur due to more than ORO_OS_MAX_THREADS components trying to access the
> same buffer? We're still trying to make this reproducible ...
>
>

> deployer-corba-gnulinux:
> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/rtt/Bu
> fferLockFree.hpp:146: bool RTT::BufferLockFree<T, ReadPolicy,
> WritePolicy>::Push(typename RTT::WriteInterface<T>::param_t) [with T =
> OCL::logging::LoggingEvent, ReadPolicy = RTT::NonBlockingPolicy,
> WritePolicy = RTT::NonBlockingPolicy]: Assertion `false && "Race detected
> in Push()"' failed. 

>
> Backtrace attached (from gnulinux).
> S

2.x has a Multi-writer/Single reader lock-free buffer implementation. It can
handle any number of threads, but can store no more than 65535 elements.

You could backport it to 1.x, but the front() function has been removed from
the 2.x API. Since its single reader, you could emulate it though or leave it
empty (the RTT itself does not use Buffers, so only user code would be harmed).

Peter

RTT v1 buffer race condition

On Oct 12, 2010, at 03:28 , Peter Soetens wrote:

> On Tuesday 12 October 2010 02:48:35 S Roderick wrote:
>> We've recently been exciting this assert(), amongst others. Could this
>> occur due to more than ORO_OS_MAX_THREADS components trying to access the
>> same buffer? We're still trying to make this reproducible ...
>>
>>

>> deployer-corba-gnulinux:
>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/rtt/Bu
>> fferLockFree.hpp:146: bool RTT::BufferLockFree<T, ReadPolicy,
>> WritePolicy>::Push(typename RTT::WriteInterface<T>::param_t) [with T =
>> OCL::logging::LoggingEvent, ReadPolicy = RTT::NonBlockingPolicy,
>> WritePolicy = RTT::NonBlockingPolicy]: Assertion `false && "Race detected
>> in Push()"' failed. 

>>
>> Backtrace attached (from gnulinux).
>> S
>
> 2.x has a Multi-writer/Single reader lock-free buffer implementation. It can
> handle any number of threads, but can store no more than 65535 elements.
>
> You could backport it to 1.x, but the front() function has been removed from
> the 2.x API. Since its single reader, you could emulate it though or leave it
> empty (the RTT itself does not use Buffers, so only user code would be harmed).

Understood, but for the moment, is it theoretically possible/likely that excessive threads could cause the above?

I doubt we'd exceed 64k elements, but we do send several thousand logging messages per second in some cases. This might be an option if this turns out to be a problem.
S

RTT v1 buffer race condition

On Tuesday 12 October 2010 13:01:07 Stephen Roderick wrote:
> On Oct 12, 2010, at 03:28 , Peter Soetens wrote:
> > On Tuesday 12 October 2010 02:48:35 S Roderick wrote:
> >> We've recently been exciting this assert(), amongst others. Could this
> >> occur due to more than ORO_OS_MAX_THREADS components trying to access
> >> the same buffer? We're still trying to make this reproducible ...
> >>
> >>

> >> deployer-corba-gnulinux:
> >> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/rtt/
> >> Bu fferLockFree.hpp:146: bool RTT::BufferLockFree<T, ReadPolicy,
> >> WritePolicy>::Push(typename RTT::WriteInterface<T>::param_t) [with T =
> >> OCL::logging::LoggingEvent, ReadPolicy = RTT::NonBlockingPolicy,
> >> WritePolicy = RTT::NonBlockingPolicy]: Assertion `false && "Race
> >> detected in Push()"' failed. 

> >>
> >> Backtrace attached (from gnulinux).
> >> S
> >
> > 2.x has a Multi-writer/Single reader lock-free buffer implementation. It
> > can handle any number of threads, but can store no more than 65535
> > elements.
> >
> > You could backport it to 1.x, but the front() function has been removed
> > from the 2.x API. Since its single reader, you could emulate it though
> > or leave it empty (the RTT itself does not use Buffers, so only user
> > code would be harmed).
>
> Understood, but for the moment, is it theoretically possible/likely that
> excessive threads could cause the above?
>
> I doubt we'd exceed 64k elements, but we do send several thousand logging
> messages per second in some cases. This might be an option if this turns
> out to be a problem. S

In that case, you might try to use the BufferLocked implementation, because
thousands of messages incur a large copy-overhead in the lock-free
implementation. We solved this performance bug on the 2.x line, but for 1.x,
large buffer sizes are not recommended with the lock-free versions.

Peter

RTT v1 buffer race condition

On Oct 12, 2010, at 09:47 , Peter Soetens wrote:

> On Tuesday 12 October 2010 13:01:07 Stephen Roderick wrote:
>> On Oct 12, 2010, at 03:28 , Peter Soetens wrote:
>>> On Tuesday 12 October 2010 02:48:35 S Roderick wrote:
>>>> We've recently been exciting this assert(), amongst others. Could this
>>>> occur due to more than ORO_OS_MAX_THREADS components trying to access
>>>> the same buffer? We're still trying to make this reproducible ...
>>>>
>>>>

>>>> deployer-corba-gnulinux:
>>>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/rtt/
>>>> Bu fferLockFree.hpp:146: bool RTT::BufferLockFree<T, ReadPolicy,
>>>> WritePolicy>::Push(typename RTT::WriteInterface<T>::param_t) [with T =
>>>> OCL::logging::LoggingEvent, ReadPolicy = RTT::NonBlockingPolicy,
>>>> WritePolicy = RTT::NonBlockingPolicy]: Assertion `false && "Race
>>>> detected in Push()"' failed. 

>>>>
>>>> Backtrace attached (from gnulinux).
>>>> S
>>>
>>> 2.x has a Multi-writer/Single reader lock-free buffer implementation. It
>>> can handle any number of threads, but can store no more than 65535
>>> elements.
>>>
>>> You could backport it to 1.x, but the front() function has been removed
>>> from the 2.x API. Since its single reader, you could emulate it though
>>> or leave it empty (the RTT itself does not use Buffers, so only user
>>> code would be harmed).
>>
>> Understood, but for the moment, is it theoretically possible/likely that
>> excessive threads could cause the above?
>>
>> I doubt we'd exceed 64k elements, but we do send several thousand logging
>> messages per second in some cases. This might be an option if this turns
>> out to be a problem. S
>
> In that case, you might try to use the BufferLocked implementation, because
> thousands of messages incur a large copy-overhead in the lock-free
> implementation. We solved this performance bug on the 2.x line, but for 1.x,
> large buffer sizes are not recommended with the lock-free versions.

We almost have a reproducible test case for this. Still using LockFree, pushing about 3000 logging events per second from 12 test components, mix of RT and non-RT, all Activity's, with a single 200-element buffer. Either get the above or

stressLogging: /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/rtt/MemoryPool.hpp:389: bool RTT::FixedSizeMemoryPool<T>::deallocate(T*) [with T = OCL::logging::LoggingEvent]: Assertion `false && "Deallocating more elements than allocated !"' failed.

Will get you test case as soon as is good. We increased ORO_OS_CONC_ACCESS from 8 to 64, but no apparent effect.

How do you substitute use of BufferLocked in v1? It is not at all obvious from any of the doc's.
S

RTT v1 buffer race condition

On Tuesday 12 October 2010 22:27:32 Stephen Roderick wrote:
> On Oct 12, 2010, at 09:47 , Peter Soetens wrote:
> > On Tuesday 12 October 2010 13:01:07 Stephen Roderick wrote:
> >> On Oct 12, 2010, at 03:28 , Peter Soetens wrote:
> >>> On Tuesday 12 October 2010 02:48:35 S Roderick wrote:
> >>>> We've recently been exciting this assert(), amongst others. Could this
> >>>> occur due to more than ORO_OS_MAX_THREADS components trying to access
> >>>> the same buffer? We're still trying to make this reproducible ...
> >>>>
> >>>>

> >>>> deployer-corba-gnulinux:
> >>>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/rt
> >>>> t/ Bu fferLockFree.hpp:146: bool RTT::BufferLockFree<T, ReadPolicy,
> >>>> WritePolicy>::Push(typename RTT::WriteInterface<T>::param_t) [with T
> >>>> = OCL::logging::LoggingEvent, ReadPolicy = RTT::NonBlockingPolicy,
> >>>> WritePolicy = RTT::NonBlockingPolicy]: Assertion `false && "Race
> >>>> detected in Push()"' failed. 

> >>>>
> >>>> Backtrace attached (from gnulinux).
> >>>> S
> >>>
> >>> 2.x has a Multi-writer/Single reader lock-free buffer implementation.
> >>> It can handle any number of threads, but can store no more than 65535
> >>> elements.
> >>>
> >>> You could backport it to 1.x, but the front() function has been removed
> >>> from the 2.x API. Since its single reader, you could emulate it though
> >>> or leave it empty (the RTT itself does not use Buffers, so only user
> >>> code would be harmed).
> >>
> >> Understood, but for the moment, is it theoretically possible/likely that
> >> excessive threads could cause the above?
> >>
> >> I doubt we'd exceed 64k elements, but we do send several thousand
> >> logging messages per second in some cases. This might be an option if
> >> this turns out to be a problem. S
> >
> > In that case, you might try to use the BufferLocked implementation,
> > because thousands of messages incur a large copy-overhead in the
> > lock-free implementation. We solved this performance bug on the 2.x
> > line, but for 1.x, large buffer sizes are not recommended with the
> > lock-free versions.
>
> We almost have a reproducible test case for this. Still using LockFree,
> pushing about 3000 logging events per second from 12 test components, mix
> of RT and non-RT, all Activity's, with a single 200-element buffer.
> Either get the above or
>
>
> stressLogging:
> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/rtt/Me
> moryPool.hpp:389: bool RTT::FixedSizeMemoryPool<T>::deallocate(T*) [with T
> = OCL::logging::LoggingEvent]: Assertion `false && "Deallocating more
> elements than allocated !"' failed. 

>
> Will get you test case as soon as is good. We increased ORO_OS_CONC_ACCESS
> from 8 to 64, but no apparent effect.

You mean: ORONUM_OS_MAX_THREADS = 64 ?

>
> How do you substitute use of BufferLocked in v1? It is not at all obvious
> from any of the doc's. S

It's quite simple:

port = new BufferLocked<OCL::logging::LoggingEvent>(200); // size=200

Can be done before or after the port is connected to other ports.

Peter

RTT v1 buffer race condition

On Oct 13, 2010, at 03:02 , Peter Soetens wrote:

> On Tuesday 12 October 2010 22:27:32 Stephen Roderick wrote:
>> On Oct 12, 2010, at 09:47 , Peter Soetens wrote:
>>> On Tuesday 12 October 2010 13:01:07 Stephen Roderick wrote:
>>>> On Oct 12, 2010, at 03:28 , Peter Soetens wrote:
>>>>> On Tuesday 12 October 2010 02:48:35 S Roderick wrote:
>>>>>> We've recently been exciting this assert(), amongst others. Could this
>>>>>> occur due to more than ORO_OS_MAX_THREADS components trying to access
>>>>>> the same buffer? We're still trying to make this reproducible ...
>>>>>>
>>>>>>

>>>>>> deployer-corba-gnulinux:
>>>>>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/rt
>>>>>> t/ Bu fferLockFree.hpp:146: bool RTT::BufferLockFree<T, ReadPolicy,
>>>>>> WritePolicy>::Push(typename RTT::WriteInterface<T>::param_t) [with T
>>>>>> = OCL::logging::LoggingEvent, ReadPolicy = RTT::NonBlockingPolicy,
>>>>>> WritePolicy = RTT::NonBlockingPolicy]: Assertion `false && "Race
>>>>>> detected in Push()"' failed. 

>>>>>>
>>>>>> Backtrace attached (from gnulinux).
>>>>>> S
>>>>>
>>>>> 2.x has a Multi-writer/Single reader lock-free buffer implementation.
>>>>> It can handle any number of threads, but can store no more than 65535
>>>>> elements.
>>>>>
>>>>> You could backport it to 1.x, but the front() function has been removed
>>>>> from the 2.x API. Since its single reader, you could emulate it though
>>>>> or leave it empty (the RTT itself does not use Buffers, so only user
>>>>> code would be harmed).
>>>>
>>>> Understood, but for the moment, is it theoretically possible/likely that
>>>> excessive threads could cause the above?
>>>>
>>>> I doubt we'd exceed 64k elements, but we do send several thousand
>>>> logging messages per second in some cases. This might be an option if
>>>> this turns out to be a problem. S
>>>
>>> In that case, you might try to use the BufferLocked implementation,
>>> because thousands of messages incur a large copy-overhead in the
>>> lock-free implementation. We solved this performance bug on the 2.x
>>> line, but for 1.x, large buffer sizes are not recommended with the
>>> lock-free versions.
>>
>> We almost have a reproducible test case for this. Still using LockFree,
>> pushing about 3000 logging events per second from 12 test components, mix
>> of RT and non-RT, all Activity's, with a single 200-element buffer.
>> Either get the above or
>>
>>
>> stressLogging:
>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/rtt/Me
>> moryPool.hpp:389: bool RTT::FixedSizeMemoryPool<T>::deallocate(T*) [with T
>> = OCL::logging::LoggingEvent]: Assertion `false && "Deallocating more
>> elements than allocated !"' failed. 

>>
>> Will get you test case as soon as is good. We increased ORO_OS_CONC_ACCESS
>> from 8 to 64, but no apparent effect.
>
> You mean: ORONUM_OS_MAX_THREADS = 64 ?
>
>>
>> How do you substitute use of BufferLocked in v1? It is not at all obvious
>> from any of the doc's. S
>
> It's quite simple:
>
> port = new BufferLocked<OCL::logging::LoggingEvent>(200); // size=200
>
> Can be done before or after the port is connected to other ports.

That doesn't work. RTT::BufferLocked is a buffer, not a port (like RTT::WriteBufferPort).

I may have found an issue though. Would calling port.ready() on a port (type = WriteBufferPort) shared by multiple threads cause an issue? Certainly adding this line causes a crash. I will get a branch up on gitorious with my stress test shortly, so you can test it out. This is reproducible on gnulinux.
S

RTT v1 buffer race condition

On Wednesday 13 October 2010 12:36:47 Stephen Roderick wrote:
> On Oct 13, 2010, at 03:02 , Peter Soetens wrote:
> > On Tuesday 12 October 2010 22:27:32 Stephen Roderick wrote:
> >> On Oct 12, 2010, at 09:47 , Peter Soetens wrote:
> >>> On Tuesday 12 October 2010 13:01:07 Stephen Roderick wrote:
> >>>> On Oct 12, 2010, at 03:28 , Peter Soetens wrote:
> >>>>> On Tuesday 12 October 2010 02:48:35 S Roderick wrote:
> >>>>>> We've recently been exciting this assert(), amongst others. Could
> >>>>>> this occur due to more than ORO_OS_MAX_THREADS components trying to
> >>>>>> access the same buffer? We're still trying to make this
> >>>>>> reproducible ...
> >>>>>>
> >>>>>>

> >>>>>> deployer-corba-gnulinux:
> >>>>>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/
> >>>>>> rt t/ Bu fferLockFree.hpp:146: bool RTT::BufferLockFree<T,
> >>>>>> ReadPolicy, WritePolicy>::Push(typename
> >>>>>> RTT::WriteInterface<T>::param_t) [with T =
> >>>>>> OCL::logging::LoggingEvent, ReadPolicy = RTT::NonBlockingPolicy,
> >>>>>> WritePolicy = RTT::NonBlockingPolicy]: Assertion `false && "Race
> >>>>>> detected in Push()"' failed. 

> >>>>>>
> >>>>>> Backtrace attached (from gnulinux).
> >>>>>> S
> >>>>>
> >>>>> 2.x has a Multi-writer/Single reader lock-free buffer implementation.
> >>>>> It can handle any number of threads, but can store no more than 65535
> >>>>> elements.
> >>>>>
> >>>>> You could backport it to 1.x, but the front() function has been
> >>>>> removed from the 2.x API. Since its single reader, you could emulate
> >>>>> it though or leave it empty (the RTT itself does not use Buffers, so
> >>>>> only user code would be harmed).
> >>>>
> >>>> Understood, but for the moment, is it theoretically possible/likely
> >>>> that excessive threads could cause the above?
> >>>>
> >>>> I doubt we'd exceed 64k elements, but we do send several thousand
> >>>> logging messages per second in some cases. This might be an option if
> >>>> this turns out to be a problem. S
> >>>
> >>> In that case, you might try to use the BufferLocked implementation,
> >>> because thousands of messages incur a large copy-overhead in the
> >>> lock-free implementation. We solved this performance bug on the 2.x
> >>> line, but for 1.x, large buffer sizes are not recommended with the
> >>> lock-free versions.
> >>
> >> We almost have a reproducible test case for this. Still using LockFree,
> >> pushing about 3000 logging events per second from 12 test components,
> >> mix of RT and non-RT, all Activity's, with a single 200-element buffer.
> >> Either get the above or
> >>
> >>
> >> stressLogging:
> >> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/rtt/
> >> Me moryPool.hpp:389: bool RTT::FixedSizeMemoryPool<T>::deallocate(T*)
> >> [with T = OCL::logging::LoggingEvent]: Assertion `false &&
> >> "Deallocating more elements than allocated !"' failed. 

> >>
> >> Will get you test case as soon as is good. We increased
> >> ORO_OS_CONC_ACCESS from 8 to 64, but no apparent effect.
> >
> > You mean: ORONUM_OS_MAX_THREADS = 64 ?
> >
> >> How do you substitute use of BufferLocked in v1? It is not at all
> >> obvious from any of the doc's. S
> >
> > It's quite simple:
> >
> > port = new BufferLocked<OCL::logging::LoggingEvent>(200); // size=200
> >
> > Can be done before or after the port is connected to other ports.
>
> That doesn't work. RTT::BufferLocked is a buffer, not a port (like
> RTT::WriteBufferPort).

Did you try it ? I'm using this operator in BufferPortBase<T>:

        /**
         * Provide a new implementation for the connection of this port.
         * If this port is not connected, a new connection is created.
         */
        BufferPortBase<T>& operator=(BufferInterface<T>* impl);

Oh... it's your compiler on os-x ? I saw that the unit test that tests this
function has a #ifndef OROPKG_OS_MACOSX around it (generictask_test_3.cpp).

In that case you can fall back to :

port.createConnection( new BufferLocked<OCL::logging::LoggingEvent>(200)
).connect();

before the connection is created.

>
> I may have found an issue though. Would calling port.ready() on a port
> (type = WriteBufferPort) shared by multiple threads cause an issue?
> Certainly adding this line causes a crash. I will get a branch up on
> gitorious with my stress test shortly, so you can test it out. This is
> reproducible on gnulinux. S

Oh rats ! The ready() function evaluates the datasource of the connection,
which is a BufferDataSource, which will read the front() of the buffer for local
buffers, or test the network in case of corba transports.Now front() in lock-
free buffers is a complex algorithm and I suspect it was not covered by the
unit testing (compared to push/pop).

This evaluation was only done for validating the corba side of things. You
could implement 'bool evaluate() {return true;} in BufferDataSource to see if
it fixes things. I would do so anyway as it cuts a complex/unnecessary path
away.

Peter

RTT v1 buffer race condition

On Oct 13, 2010, at 08:40 , Peter Soetens wrote:

> On Wednesday 13 October 2010 12:36:47 Stephen Roderick wrote:
>> On Oct 13, 2010, at 03:02 , Peter Soetens wrote:
>>> On Tuesday 12 October 2010 22:27:32 Stephen Roderick wrote:
>>>> On Oct 12, 2010, at 09:47 , Peter Soetens wrote:
>>>>> On Tuesday 12 October 2010 13:01:07 Stephen Roderick wrote:
>>>>>> On Oct 12, 2010, at 03:28 , Peter Soetens wrote:
>>>>>>> On Tuesday 12 October 2010 02:48:35 S Roderick wrote:
>>>>>>>> We've recently been exciting this assert(), amongst others. Could
>>>>>>>> this occur due to more than ORO_OS_MAX_THREADS components trying to
>>>>>>>> access the same buffer? We're still trying to make this
>>>>>>>> reproducible ...
>>>>>>>>
>>>>>>>>

>>>>>>>> deployer-corba-gnulinux:
>>>>>>>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/
>>>>>>>> rt t/ Bu fferLockFree.hpp:146: bool RTT::BufferLockFree<T,
>>>>>>>> ReadPolicy, WritePolicy>::Push(typename
>>>>>>>> RTT::WriteInterface<T>::param_t) [with T =
>>>>>>>> OCL::logging::LoggingEvent, ReadPolicy = RTT::NonBlockingPolicy,
>>>>>>>> WritePolicy = RTT::NonBlockingPolicy]: Assertion `false && "Race
>>>>>>>> detected in Push()"' failed. 

>>>>>>>>
>>>>>>>> Backtrace attached (from gnulinux).
>>>>>>>> S
>>>>>>>
>>>>>>> 2.x has a Multi-writer/Single reader lock-free buffer implementation.
>>>>>>> It can handle any number of threads, but can store no more than 65535
>>>>>>> elements.
>>>>>>>
>>>>>>> You could backport it to 1.x, but the front() function has been
>>>>>>> removed from the 2.x API. Since its single reader, you could emulate
>>>>>>> it though or leave it empty (the RTT itself does not use Buffers, so
>>>>>>> only user code would be harmed).
>>>>>>
>>>>>> Understood, but for the moment, is it theoretically possible/likely
>>>>>> that excessive threads could cause the above?
>>>>>>
>>>>>> I doubt we'd exceed 64k elements, but we do send several thousand
>>>>>> logging messages per second in some cases. This might be an option if
>>>>>> this turns out to be a problem. S
>>>>>
>>>>> In that case, you might try to use the BufferLocked implementation,
>>>>> because thousands of messages incur a large copy-overhead in the
>>>>> lock-free implementation. We solved this performance bug on the 2.x
>>>>> line, but for 1.x, large buffer sizes are not recommended with the
>>>>> lock-free versions.
>>>>
>>>> We almost have a reproducible test case for this. Still using LockFree,
>>>> pushing about 3000 logging events per second from 12 test components,
>>>> mix of RT and non-RT, all Activity's, with a single 200-element buffer.
>>>> Either get the above or
>>>>
>>>>
>>>> stressLogging:
>>>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/rtt/
>>>> Me moryPool.hpp:389: bool RTT::FixedSizeMemoryPool<T>::deallocate(T*)
>>>> [with T = OCL::logging::LoggingEvent]: Assertion `false &&
>>>> "Deallocating more elements than allocated !"' failed. 

>>>>
>>>> Will get you test case as soon as is good. We increased
>>>> ORO_OS_CONC_ACCESS from 8 to 64, but no apparent effect.
>>>
>>> You mean: ORONUM_OS_MAX_THREADS = 64 ?
>>>
>>>> How do you substitute use of BufferLocked in v1? It is not at all
>>>> obvious from any of the doc's. S
>>>
>>> It's quite simple:
>>>
>>> port = new BufferLocked<OCL::logging::LoggingEvent>(200); // size=200
>>>
>>> Can be done before or after the port is connected to other ports.
>>
>> That doesn't work. RTT::BufferLocked is a buffer, not a port (like
>> RTT::WriteBufferPort).
>
> Did you try it ? I'm using this operator in BufferPortBase<T>:
>
>
>        /**
>         * Provide a new implementation for the connection of this port.
>         * If this port is not connected, a new connection is created.
>         */
>        BufferPortBase<T>& operator=(BufferInterface<T>* impl);
> 

>
> Oh... it's your compiler on os-x ? I saw that the unit test that tests this
> function has a #ifndef OROPKG_OS_MACOSX around it (generictask_test_3.cpp).
>
> In that case you can fall back to :
>
> port.createConnection( new BufferLocked<OCL::logging::LoggingEvent>(200)
> ).connect();
>
> before the connection is created.
>
>>
>> I may have found an issue though. Would calling port.ready() on a port
>> (type = WriteBufferPort) shared by multiple threads cause an issue?
>> Certainly adding this line causes a crash. I will get a branch up on
>> gitorious with my stress test shortly, so you can test it out. This is
>> reproducible on gnulinux. S
>
> Oh rats ! The ready() function evaluates the datasource of the connection,
> which is a BufferDataSource, which will read the front() of the buffer for local
> buffers, or test the network in case of corba transports.Now front() in lock-
> free buffers is a complex algorithm and I suspect it was not covered by the
> unit testing (compared to push/pop).
>
> This evaluation was only done for validating the corba side of things. You
> could implement 'bool evaluate() {return true;} in BufferDataSource to see if
> it fixes things. I would do so anyway as it cuts a complex/unnecessary path
> away.
>
> Peter

If you can get my v1-next-logging branch to run (the RTT_COMPONENT_PATH problem I posted earlier about), you can see for yourself. This is gnulinux BTW - TLSF not supported on Mac in multi-thread environment (one of my next jobs to fix).

cd /path/to/build/ocl/logging/tests
./stressBuffers -n 0
// Ctrl-C to quit, usually it asserts up within 3-20 seconds

Comment out line 90 in stressBuffers.cpp, build and run. Runs nicely ...

Will try above fix today. I need this fixed, causing issues for upcoming demo ... :-(

If get chance, will try BufferLocked also. We aren't anywhere near CPU bound, but after locking inside AtomicQueue (and having my head explode), I understand the overhead issue ...
S

RTT v1 buffer race condition

On Oct 13, 2010, at 08:49 , Stephen Roderick wrote:

> On Oct 13, 2010, at 08:40 , Peter Soetens wrote:
>
>> On Wednesday 13 October 2010 12:36:47 Stephen Roderick wrote:
>>> On Oct 13, 2010, at 03:02 , Peter Soetens wrote:
>>>> On Tuesday 12 October 2010 22:27:32 Stephen Roderick wrote:
>>>>> On Oct 12, 2010, at 09:47 , Peter Soetens wrote:
>>>>>> On Tuesday 12 October 2010 13:01:07 Stephen Roderick wrote:
>>>>>>> On Oct 12, 2010, at 03:28 , Peter Soetens wrote:
>>>>>>>> On Tuesday 12 October 2010 02:48:35 S Roderick wrote:
>>>>>>>>> We've recently been exciting this assert(), amongst others. Could
>>>>>>>>> this occur due to more than ORO_OS_MAX_THREADS components trying to
>>>>>>>>> access the same buffer? We're still trying to make this
>>>>>>>>> reproducible ...
>>>>>>>>>
>>>>>>>>>

>>>>>>>>> deployer-corba-gnulinux:
>>>>>>>>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/
>>>>>>>>> rt t/ Bu fferLockFree.hpp:146: bool RTT::BufferLockFree<T,
>>>>>>>>> ReadPolicy, WritePolicy>::Push(typename
>>>>>>>>> RTT::WriteInterface<T>::param_t) [with T =
>>>>>>>>> OCL::logging::LoggingEvent, ReadPolicy = RTT::NonBlockingPolicy,
>>>>>>>>> WritePolicy = RTT::NonBlockingPolicy]: Assertion `false && "Race
>>>>>>>>> detected in Push()"' failed. 

>>>>>>>>>
>>>>>>>>> Backtrace attached (from gnulinux).
>>>>>>>>> S
>>>>>>>>
>>>>>>>> 2.x has a Multi-writer/Single reader lock-free buffer implementation.
>>>>>>>> It can handle any number of threads, but can store no more than 65535
>>>>>>>> elements.
>>>>>>>>
>>>>>>>> You could backport it to 1.x, but the front() function has been
>>>>>>>> removed from the 2.x API. Since its single reader, you could emulate
>>>>>>>> it though or leave it empty (the RTT itself does not use Buffers, so
>>>>>>>> only user code would be harmed).
>>>>>>>
>>>>>>> Understood, but for the moment, is it theoretically possible/likely
>>>>>>> that excessive threads could cause the above?
>>>>>>>
>>>>>>> I doubt we'd exceed 64k elements, but we do send several thousand
>>>>>>> logging messages per second in some cases. This might be an option if
>>>>>>> this turns out to be a problem. S
>>>>>>
>>>>>> In that case, you might try to use the BufferLocked implementation,
>>>>>> because thousands of messages incur a large copy-overhead in the
>>>>>> lock-free implementation. We solved this performance bug on the 2.x
>>>>>> line, but for 1.x, large buffer sizes are not recommended with the
>>>>>> lock-free versions.
>>>>>
>>>>> We almost have a reproducible test case for this. Still using LockFree,
>>>>> pushing about 3000 logging events per second from 12 test components,
>>>>> mix of RT and non-RT, all Activity's, with a single 200-element buffer.
>>>>> Either get the above or
>>>>>
>>>>>
>>>>> stressLogging:
>>>>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/rtt/
>>>>> Me moryPool.hpp:389: bool RTT::FixedSizeMemoryPool<T>::deallocate(T*)
>>>>> [with T = OCL::logging::LoggingEvent]: Assertion `false &&
>>>>> "Deallocating more elements than allocated !"' failed. 

>>>>>
>>>>> Will get you test case as soon as is good. We increased
>>>>> ORO_OS_CONC_ACCESS from 8 to 64, but no apparent effect.
>>>>
>>>> You mean: ORONUM_OS_MAX_THREADS = 64 ?
>>>>
>>>>> How do you substitute use of BufferLocked in v1? It is not at all
>>>>> obvious from any of the doc's. S
>>>>
>>>> It's quite simple:
>>>>
>>>> port = new BufferLocked<OCL::logging::LoggingEvent>(200); // size=200
>>>>
>>>> Can be done before or after the port is connected to other ports.
>>>
>>> That doesn't work. RTT::BufferLocked is a buffer, not a port (like
>>> RTT::WriteBufferPort).
>>
>> Did you try it ? I'm using this operator in BufferPortBase<T>:
>>
>>
>>       /**
>>        * Provide a new implementation for the connection of this port.
>>        * If this port is not connected, a new connection is created.
>>        */
>>       BufferPortBase<T>& operator=(BufferInterface<T>* impl);
>> 

>>
>> Oh... it's your compiler on os-x ? I saw that the unit test that tests this
>> function has a #ifndef OROPKG_OS_MACOSX around it (generictask_test_3.cpp).
>>
>> In that case you can fall back to :
>>
>> port.createConnection( new BufferLocked<OCL::logging::LoggingEvent>(200)
>> ).connect();
>>
>> before the connection is created.
>>
>>>
>>> I may have found an issue though. Would calling port.ready() on a port
>>> (type = WriteBufferPort) shared by multiple threads cause an issue?
>>> Certainly adding this line causes a crash. I will get a branch up on
>>> gitorious with my stress test shortly, so you can test it out. This is
>>> reproducible on gnulinux. S
>>
>> Oh rats ! The ready() function evaluates the datasource of the connection,
>> which is a BufferDataSource, which will read the front() of the buffer for local
>> buffers, or test the network in case of corba transports.Now front() in lock-
>> free buffers is a complex algorithm and I suspect it was not covered by the
>> unit testing (compared to push/pop).
>>
>> This evaluation was only done for validating the corba side of things. You
>> could implement 'bool evaluate() {return true;} in BufferDataSource to see if
>> it fixes things. I would do so anyway as it cuts a complex/unnecessary path
>> away.
>>
>> Peter
>
> If you can get my v1-next-logging branch to run (the RTT_COMPONENT_PATH problem I posted earlier about), you can see for yourself. This is gnulinux BTW - TLSF not supported on Mac in multi-thread environment (one of my next jobs to fix).
>
> cd /path/to/build/ocl/logging/tests
> ./stressBuffers -n 0
> // Ctrl-C to quit, usually it asserts up within 3-20 seconds
>
> Comment out line 90 in stressBuffers.cpp, build and run. Runs nicely ...
>
> Will try above fix today. I need this fixed, causing issues for upcoming demo ... :-(

That appears to fix it. I'll run many more tests today, but the evaluate() appears to be the issue. I can also modify the logging implementation to avoid the ready() call anyway.

So will this issue potentially affect any user that makes heavy multi-thread use of buffers, or this is a special case?

What affect will the "empty" evaluate have on CORBA connections? I have only one or two cases of buffers over COBRA IIRC, and they aren't in this immediate project so shouldn't be a big deal.

Thanks for the quick help!
S

RTT v1 buffer race condition

On Wednesday 13 October 2010 15:32:13 S Roderick wrote:
> On Oct 13, 2010, at 08:49 , Stephen Roderick wrote:
> > On Oct 13, 2010, at 08:40 , Peter Soetens wrote:
> >> On Wednesday 13 October 2010 12:36:47 Stephen Roderick wrote:
> >>> On Oct 13, 2010, at 03:02 , Peter Soetens wrote:
> >>>> On Tuesday 12 October 2010 22:27:32 Stephen Roderick wrote:
> >>>>> On Oct 12, 2010, at 09:47 , Peter Soetens wrote:
> >>>>>> On Tuesday 12 October 2010 13:01:07 Stephen Roderick wrote:
> >>>>>>> On Oct 12, 2010, at 03:28 , Peter Soetens wrote:
> >>>>>>>> On Tuesday 12 October 2010 02:48:35 S Roderick wrote:
> >>>>>>>>> We've recently been exciting this assert(), amongst others. Could
> >>>>>>>>> this occur due to more than ORO_OS_MAX_THREADS components trying
> >>>>>>>>> to access the same buffer? We're still trying to make this
> >>>>>>>>> reproducible ...
> >>>>>>>>>
> >>>>>>>>>

> >>>>>>>>> deployer-corba-gnulinux:
> >>>>>>>>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/inclu
> >>>>>>>>> de/ rt t/ Bu fferLockFree.hpp:146: bool RTT::BufferLockFree<T,
> >>>>>>>>> ReadPolicy, WritePolicy>::Push(typename
> >>>>>>>>> RTT::WriteInterface<T>::param_t) [with T =
> >>>>>>>>> OCL::logging::LoggingEvent, ReadPolicy = RTT::NonBlockingPolicy,
> >>>>>>>>> WritePolicy = RTT::NonBlockingPolicy]: Assertion `false && "Race
> >>>>>>>>> detected in Push()"' failed. 

> >>>>>>>>>
> >>>>>>>>> Backtrace attached (from gnulinux).
> >>>>>>>>> S
> >>>>>>>>
> >>>>>>>> 2.x has a Multi-writer/Single reader lock-free buffer
> >>>>>>>> implementation. It can handle any number of threads, but can
> >>>>>>>> store no more than 65535 elements.
> >>>>>>>>
> >>>>>>>> You could backport it to 1.x, but the front() function has been
> >>>>>>>> removed from the 2.x API. Since its single reader, you could
> >>>>>>>> emulate it though or leave it empty (the RTT itself does not use
> >>>>>>>> Buffers, so only user code would be harmed).
> >>>>>>>
> >>>>>>> Understood, but for the moment, is it theoretically possible/likely
> >>>>>>> that excessive threads could cause the above?
> >>>>>>>
> >>>>>>> I doubt we'd exceed 64k elements, but we do send several thousand
> >>>>>>> logging messages per second in some cases. This might be an option
> >>>>>>> if this turns out to be a problem. S
> >>>>>>
> >>>>>> In that case, you might try to use the BufferLocked implementation,
> >>>>>> because thousands of messages incur a large copy-overhead in the
> >>>>>> lock-free implementation. We solved this performance bug on the 2.x
> >>>>>> line, but for 1.x, large buffer sizes are not recommended with the
> >>>>>> lock-free versions.
> >>>>>
> >>>>> We almost have a reproducible test case for this. Still using
> >>>>> LockFree, pushing about 3000 logging events per second from 12 test
> >>>>> components, mix of RT and non-RT, all Activity's, with a single
> >>>>> 200-element buffer. Either get the above or
> >>>>>
> >>>>>
> >>>>> stressLogging:
> >>>>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/r
> >>>>> tt/ Me moryPool.hpp:389: bool
> >>>>> RTT::FixedSizeMemoryPool<T>::deallocate(T*) [with T =
> >>>>> OCL::logging::LoggingEvent]: Assertion `false &&
> >>>>> "Deallocating more elements than allocated !"' failed. 

> >>>>>
> >>>>> Will get you test case as soon as is good. We increased
> >>>>> ORO_OS_CONC_ACCESS from 8 to 64, but no apparent effect.
> >>>>
> >>>> You mean: ORONUM_OS_MAX_THREADS = 64 ?
> >>>>
> >>>>> How do you substitute use of BufferLocked in v1? It is not at all
> >>>>> obvious from any of the doc's. S
> >>>>
> >>>> It's quite simple:
> >>>>
> >>>> port = new BufferLocked<OCL::logging::LoggingEvent>(200); // size=200
> >>>>
> >>>> Can be done before or after the port is connected to other ports.
> >>>
> >>> That doesn't work. RTT::BufferLocked is a buffer, not a port (like
> >>> RTT::WriteBufferPort).
> >>
> >> Did you try it ? I'm using this operator in BufferPortBase<T>:
> >>
> >>
> >> 
> >>       /**
> >>       
> >>        * Provide a new implementation for the connection of this port.
> >>        * If this port is not connected, a new connection is created.
> >>        */
> >>       
> >>       BufferPortBase<T>& operator=(BufferInterface<T>* impl);
> >> 
> >> 

> >>
> >> Oh... it's your compiler on os-x ? I saw that the unit test that tests
> >> this function has a #ifndef OROPKG_OS_MACOSX around it
> >> (generictask_test_3.cpp).
> >>
> >> In that case you can fall back to :
> >>
> >> port.createConnection( new BufferLocked<OCL::logging::LoggingEvent>(200)
> >> ).connect();
> >>
> >> before the connection is created.
> >>
> >>> I may have found an issue though. Would calling port.ready() on a port
> >>> (type = WriteBufferPort) shared by multiple threads cause an issue?
> >>> Certainly adding this line causes a crash. I will get a branch up on
> >>> gitorious with my stress test shortly, so you can test it out. This is
> >>> reproducible on gnulinux. S
> >>
> >> Oh rats ! The ready() function evaluates the datasource of the
> >> connection, which is a BufferDataSource, which will read the front() of
> >> the buffer for local buffers, or test the network in case of corba
> >> transports.Now front() in lock- free buffers is a complex algorithm and
> >> I suspect it was not covered by the unit testing (compared to
> >> push/pop).
> >>
> >> This evaluation was only done for validating the corba side of things.
> >> You could implement 'bool evaluate() {return true;} in
> >> BufferDataSource to see if it fixes things. I would do so anyway as it
> >> cuts a complex/unnecessary path away.
> >>
> >> Peter
> >
> > If you can get my v1-next-logging branch to run (the RTT_COMPONENT_PATH
> > problem I posted earlier about), you can see for yourself. This is
> > gnulinux BTW - TLSF not supported on Mac in multi-thread environment
> > (one of my next jobs to fix).
> >
> > cd /path/to/build/ocl/logging/tests
> > ./stressBuffers -n 0
> > // Ctrl-C to quit, usually it asserts up within 3-20 seconds
> >
> > Comment out line 90 in stressBuffers.cpp, build and run. Runs nicely ...
> >
> > Will try above fix today. I need this fixed, causing issues for upcoming
> > demo ... :-(
>
> That appears to fix it. I'll run many more tests today, but the evaluate()
> appears to be the issue. I can also modify the logging implementation to
> avoid the ready() call anyway.
>
> So will this issue potentially affect any user that makes heavy
> multi-thread use of buffers, or this is a special case?

Since push/pop are heavily unit-tested, I believe it is a bug/race in front().

>
> What affect will the "empty" evaluate have on CORBA connections? I have
> only one or two cases of buffers over COBRA IIRC, and they aren't in this
> immediate project so shouldn't be a big deal.

The point is that for corba connections, the evaluate-within-ready will do the
right thing, because it will be done on another object. BufferDataSource is
only for local buffers. So it was *really* a useless call to front().

Peter

RTT v1 buffer race condition

On Oct 13, 2010, at 11:21 , Peter Soetens wrote:

> On Wednesday 13 October 2010 15:32:13 S Roderick wrote:
>> On Oct 13, 2010, at 08:49 , Stephen Roderick wrote:
>>> On Oct 13, 2010, at 08:40 , Peter Soetens wrote:
>>>> On Wednesday 13 October 2010 12:36:47 Stephen Roderick wrote:
>>>>> On Oct 13, 2010, at 03:02 , Peter Soetens wrote:
>>>>>> On Tuesday 12 October 2010 22:27:32 Stephen Roderick wrote:
>>>>>>> On Oct 12, 2010, at 09:47 , Peter Soetens wrote:
>>>>>>>> On Tuesday 12 October 2010 13:01:07 Stephen Roderick wrote:
>>>>>>>>> On Oct 12, 2010, at 03:28 , Peter Soetens wrote:
>>>>>>>>>> On Tuesday 12 October 2010 02:48:35 S Roderick wrote:
>>>>>>>>>>> We've recently been exciting this assert(), amongst others. Could
>>>>>>>>>>> this occur due to more than ORO_OS_MAX_THREADS components trying
>>>>>>>>>>> to access the same buffer? We're still trying to make this
>>>>>>>>>>> reproducible ...
>>>>>>>>>>>
>>>>>>>>>>>

>>>>>>>>>>> deployer-corba-gnulinux:
>>>>>>>>>>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/inclu
>>>>>>>>>>> de/ rt t/ Bu fferLockFree.hpp:146: bool RTT::BufferLockFree<T,
>>>>>>>>>>> ReadPolicy, WritePolicy>::Push(typename
>>>>>>>>>>> RTT::WriteInterface<T>::param_t) [with T =
>>>>>>>>>>> OCL::logging::LoggingEvent, ReadPolicy = RTT::NonBlockingPolicy,
>>>>>>>>>>> WritePolicy = RTT::NonBlockingPolicy]: Assertion `false && "Race
>>>>>>>>>>> detected in Push()"' failed. 

>>>>>>>>>>>
>>>>>>>>>>> Backtrace attached (from gnulinux).
>>>>>>>>>>> S
>>>>>>>>>>
>>>>>>>>>> 2.x has a Multi-writer/Single reader lock-free buffer
>>>>>>>>>> implementation. It can handle any number of threads, but can
>>>>>>>>>> store no more than 65535 elements.
>>>>>>>>>>
>>>>>>>>>> You could backport it to 1.x, but the front() function has been
>>>>>>>>>> removed from the 2.x API. Since its single reader, you could
>>>>>>>>>> emulate it though or leave it empty (the RTT itself does not use
>>>>>>>>>> Buffers, so only user code would be harmed).
>>>>>>>>>
>>>>>>>>> Understood, but for the moment, is it theoretically possible/likely
>>>>>>>>> that excessive threads could cause the above?
>>>>>>>>>
>>>>>>>>> I doubt we'd exceed 64k elements, but we do send several thousand
>>>>>>>>> logging messages per second in some cases. This might be an option
>>>>>>>>> if this turns out to be a problem. S
>>>>>>>>
>>>>>>>> In that case, you might try to use the BufferLocked implementation,
>>>>>>>> because thousands of messages incur a large copy-overhead in the
>>>>>>>> lock-free implementation. We solved this performance bug on the 2.x
>>>>>>>> line, but for 1.x, large buffer sizes are not recommended with the
>>>>>>>> lock-free versions.
>>>>>>>
>>>>>>> We almost have a reproducible test case for this. Still using
>>>>>>> LockFree, pushing about 3000 logging events per second from 12 test
>>>>>>> components, mix of RT and non-RT, all Activity's, with a single
>>>>>>> 200-element buffer. Either get the above or
>>>>>>>
>>>>>>>
>>>>>>> stressLogging:
>>>>>>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/r
>>>>>>> tt/ Me moryPool.hpp:389: bool
>>>>>>> RTT::FixedSizeMemoryPool<T>::deallocate(T*) [with T =
>>>>>>> OCL::logging::LoggingEvent]: Assertion `false &&
>>>>>>> "Deallocating more elements than allocated !"' failed. 

>>>>>>>
>>>>>>> Will get you test case as soon as is good. We increased
>>>>>>> ORO_OS_CONC_ACCESS from 8 to 64, but no apparent effect.
>>>>>>
>>>>>> You mean: ORONUM_OS_MAX_THREADS = 64 ?
>>>>>>
>>>>>>> How do you substitute use of BufferLocked in v1? It is not at all
>>>>>>> obvious from any of the doc's. S
>>>>>>
>>>>>> It's quite simple:
>>>>>>
>>>>>> port = new BufferLocked<OCL::logging::LoggingEvent>(200); // size=200
>>>>>>
>>>>>> Can be done before or after the port is connected to other ports.
>>>>>
>>>>> That doesn't work. RTT::BufferLocked is a buffer, not a port (like
>>>>> RTT::WriteBufferPort).
>>>>
>>>> Did you try it ? I'm using this operator in BufferPortBase<T>:
>>>>
>>>>
>>>> 
>>>>      /**
>>>> 
>>>>       * Provide a new implementation for the connection of this port.
>>>>       * If this port is not connected, a new connection is created.
>>>>       */
>>>> 
>>>>      BufferPortBase<T>& operator=(BufferInterface<T>* impl);
>>>> 
>>>> 

>>>>
>>>> Oh... it's your compiler on os-x ? I saw that the unit test that tests
>>>> this function has a #ifndef OROPKG_OS_MACOSX around it
>>>> (generictask_test_3.cpp).
>>>>
>>>> In that case you can fall back to :
>>>>
>>>> port.createConnection( new BufferLocked<OCL::logging::LoggingEvent>(200)
>>>> ).connect();
>>>>
>>>> before the connection is created.
>>>>
>>>>> I may have found an issue though. Would calling port.ready() on a port
>>>>> (type = WriteBufferPort) shared by multiple threads cause an issue?
>>>>> Certainly adding this line causes a crash. I will get a branch up on
>>>>> gitorious with my stress test shortly, so you can test it out. This is
>>>>> reproducible on gnulinux. S
>>>>
>>>> Oh rats ! The ready() function evaluates the datasource of the
>>>> connection, which is a BufferDataSource, which will read the front() of
>>>> the buffer for local buffers, or test the network in case of corba
>>>> transports.Now front() in lock- free buffers is a complex algorithm and
>>>> I suspect it was not covered by the unit testing (compared to
>>>> push/pop).
>>>>
>>>> This evaluation was only done for validating the corba side of things.
>>>> You could implement 'bool evaluate() {return true;} in
>>>> BufferDataSource to see if it fixes things. I would do so anyway as it
>>>> cuts a complex/unnecessary path away.
>>>>
>>>> Peter
>>>
>>> If you can get my v1-next-logging branch to run (the RTT_COMPONENT_PATH
>>> problem I posted earlier about), you can see for yourself. This is
>>> gnulinux BTW - TLSF not supported on Mac in multi-thread environment
>>> (one of my next jobs to fix).
>>>
>>> cd /path/to/build/ocl/logging/tests
>>> ./stressBuffers -n 0
>>> // Ctrl-C to quit, usually it asserts up within 3-20 seconds
>>>
>>> Comment out line 90 in stressBuffers.cpp, build and run. Runs nicely ...
>>>
>>> Will try above fix today. I need this fixed, causing issues for upcoming
>>> demo ... :-(
>>
>> That appears to fix it. I'll run many more tests today, but the evaluate()
>> appears to be the issue. I can also modify the logging implementation to
>> avoid the ready() call anyway.
>>
>> So will this issue potentially affect any user that makes heavy
>> multi-thread use of buffers, or this is a special case?
>
> Since push/pop are heavily unit-tested, I believe it is a bug/race in front().
>
>>
>> What affect will the "empty" evaluate have on CORBA connections? I have
>> only one or two cases of buffers over COBRA IIRC, and they aren't in this
>> immediate project so shouldn't be a big deal.
>
> The point is that for corba connections, the evaluate-within-ready will do the
> right thing, because it will be done on another object. BufferDataSource is
> only for local buffers. So it was *really* a useless call to front().

Patch attached. Do you want a bug report filed?
S

RTT v1 buffer race condition

On Thu, Oct 14, 2010 at 2:21 PM, Stephen Roderick <kiwi [dot] net [..] ...> wrote:
> On Oct 13, 2010, at 11:21 , Peter Soetens wrote:
>
>> On Wednesday 13 October 2010 15:32:13 S Roderick wrote:
>>> On Oct 13, 2010, at 08:49 , Stephen Roderick wrote:
>>>> On Oct 13, 2010, at 08:40 , Peter Soetens wrote:
>>>>> On Wednesday 13 October 2010 12:36:47 Stephen Roderick wrote:
>>>>>> On Oct 13, 2010, at 03:02 , Peter Soetens wrote:
>>>>>>> On Tuesday 12 October 2010 22:27:32 Stephen Roderick wrote:
>>>>>>>> On Oct 12, 2010, at 09:47 , Peter Soetens wrote:
>>>>>>>>> On Tuesday 12 October 2010 13:01:07 Stephen Roderick wrote:
>>>>>>>>>> On Oct 12, 2010, at 03:28 , Peter Soetens wrote:
>>>>>>>>>>> On Tuesday 12 October 2010 02:48:35 S Roderick wrote:
>>>>>>>>>>>> We've recently been exciting this assert(), amongst others. Could
>>>>>>>>>>>> this occur due to more than ORO_OS_MAX_THREADS components trying
>>>>>>>>>>>> to access the same buffer? We're still trying to make this
>>>>>>>>>>>> reproducible ...
>>>>>>>>>>>>
>>>>>>>>>>>>

>>>>>>>>>>>> deployer-corba-gnulinux:
>>>>>>>>>>>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/inclu
>>>>>>>>>>>> de/ rt t/ Bu fferLockFree.hpp:146: bool RTT::BufferLockFree<T,
>>>>>>>>>>>> ReadPolicy, WritePolicy>::Push(typename
>>>>>>>>>>>> RTT::WriteInterface<T>::param_t) [with T =
>>>>>>>>>>>> OCL::logging::LoggingEvent, ReadPolicy = RTT::NonBlockingPolicy,
>>>>>>>>>>>> WritePolicy = RTT::NonBlockingPolicy]: Assertion `false && "Race
>>>>>>>>>>>> detected in Push()"' failed. 

>>>>>>>>>>>>
>>>>>>>>>>>> Backtrace attached (from gnulinux).
>>>>>>>>>>>> S
>>>>>>>>>>>
>>>>>>>>>>> 2.x has a Multi-writer/Single reader lock-free buffer
>>>>>>>>>>> implementation. It can handle any number of threads, but can
>>>>>>>>>>> store no more than 65535 elements.
>>>>>>>>>>>
>>>>>>>>>>> You could backport it to 1.x, but the front() function has been
>>>>>>>>>>> removed from the 2.x API. Since its single reader, you could
>>>>>>>>>>> emulate it though or leave it empty (the RTT itself does not use
>>>>>>>>>>> Buffers, so only user code would be harmed).
>>>>>>>>>>
>>>>>>>>>> Understood, but for the moment, is it theoretically possible/likely
>>>>>>>>>> that excessive threads could cause the above?
>>>>>>>>>>
>>>>>>>>>> I doubt we'd exceed 64k elements, but we do send several thousand
>>>>>>>>>> logging messages per second in some cases. This might be an option
>>>>>>>>>> if this turns out to be a problem. S
>>>>>>>>>
>>>>>>>>> In that case, you might try to use  the BufferLocked implementation,
>>>>>>>>> because thousands of messages incur a large copy-overhead in the
>>>>>>>>> lock-free implementation. We solved this performance bug on the 2.x
>>>>>>>>> line, but for 1.x, large buffer sizes are not recommended with the
>>>>>>>>> lock-free versions.
>>>>>>>>
>>>>>>>> We almost have a reproducible test case for this. Still using
>>>>>>>> LockFree, pushing about 3000 logging events per second from 12 test
>>>>>>>> components, mix of RT and non-RT, all Activity's, with a single
>>>>>>>> 200-element buffer. Either get the above or
>>>>>>>>
>>>>>>>>
>>>>>>>> stressLogging:
>>>>>>>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/r
>>>>>>>> tt/ Me moryPool.hpp:389: bool
>>>>>>>> RTT::FixedSizeMemoryPool<T>::deallocate(T*) [with T =
>>>>>>>> OCL::logging::LoggingEvent]: Assertion `false &&
>>>>>>>> "Deallocating more elements than allocated !"' failed. 

>>>>>>>>
>>>>>>>> Will get you test case as soon as is good. We increased
>>>>>>>> ORO_OS_CONC_ACCESS from 8 to 64, but no apparent effect.
>>>>>>>
>>>>>>> You mean: ORONUM_OS_MAX_THREADS = 64 ?
>>>>>>>
>>>>>>>> How do you substitute use of BufferLocked in v1? It is not at all
>>>>>>>> obvious from any of the doc's. S
>>>>>>>
>>>>>>> It's quite simple:
>>>>>>>
>>>>>>> port = new BufferLocked<OCL::logging::LoggingEvent>(200); // size=200
>>>>>>>
>>>>>>> Can be done before or after the port is connected to other ports.
>>>>>>
>>>>>> That doesn't work. RTT::BufferLocked is a buffer, not a port (like
>>>>>> RTT::WriteBufferPort).
>>>>>
>>>>> Did you try it ? I'm using this operator in BufferPortBase<T>:
>>>>>
>>>>>
>>>>>
>>>>>      /**
>>>>>
>>>>>       * Provide a new implementation for the connection of this port.
>>>>>       * If this port is not connected, a new connection is created.
>>>>>       */
>>>>>
>>>>>      BufferPortBase<T>& operator=(BufferInterface<T>* impl);
>>>>>
>>>>> 

>>>>>
>>>>> Oh... it's your compiler on os-x ? I saw that the unit test that tests
>>>>> this function has a #ifndef OROPKG_OS_MACOSX around it
>>>>> (generictask_test_3.cpp).
>>>>>
>>>>> In that case you can fall back to :
>>>>>
>>>>> port.createConnection( new BufferLocked<OCL::logging::LoggingEvent>(200)
>>>>> ).connect();
>>>>>
>>>>> before the connection is created.
>>>>>
>>>>>> I may have found an issue though. Would calling port.ready()  on a port
>>>>>> (type = WriteBufferPort) shared by multiple threads cause an issue?
>>>>>> Certainly adding this line causes a crash. I will get a branch up on
>>>>>> gitorious with my stress test shortly, so you can test it out. This is
>>>>>> reproducible on gnulinux. S
>>>>>
>>>>> Oh rats ! The ready() function evaluates the datasource of the
>>>>> connection, which is a BufferDataSource, which will read the front() of
>>>>> the buffer for local buffers, or test the network in case of corba
>>>>> transports.Now front() in lock- free buffers is a complex algorithm and
>>>>> I suspect it was not covered by the unit testing (compared to
>>>>> push/pop).
>>>>>
>>>>> This evaluation was only done for validating the corba side of things.
>>>>> You could  implement  'bool evaluate() {return true;} in
>>>>> BufferDataSource to see if it fixes things. I would do so anyway as it
>>>>> cuts a complex/unnecessary path away.
>>>>>
>>>>> Peter
>>>>
>>>> If you can get my v1-next-logging branch to run (the RTT_COMPONENT_PATH
>>>> problem I posted earlier about), you can see for yourself. This is
>>>> gnulinux BTW - TLSF not supported on Mac in multi-thread environment
>>>> (one of my next jobs to fix).
>>>>
>>>> cd /path/to/build/ocl/logging/tests
>>>> ./stressBuffers -n 0
>>>> // Ctrl-C to quit, usually it asserts up within 3-20 seconds
>>>>
>>>> Comment out line 90 in stressBuffers.cpp, build and run. Runs nicely ...
>>>>
>>>> Will try above fix today. I need this fixed, causing issues for upcoming
>>>> demo ... :-(
>>>
>>> That appears to fix it. I'll run many more tests today, but the evaluate()
>>> appears to be the issue. I can also modify the logging implementation to
>>> avoid the ready() call anyway.
>>>
>>> So will this issue potentially affect any user that makes heavy
>>> multi-thread use of buffers, or this is a special case?
>>
>> Since push/pop are heavily unit-tested, I believe it is a bug/race in front().
>>
>>>
>>> What affect will the "empty" evaluate have on CORBA connections? I have
>>> only one or two cases of buffers over COBRA IIRC, and they aren't in this
>>> immediate project so shouldn't be a big deal.
>>
>> The point is that for corba connections, the evaluate-within-ready will do the
>> right thing, because it will be done on another object. BufferDataSource is
>> only for local buffers. So it was *really* a useless call to front().
>
> Patch attached. Do you want a bug report filed?

Nope. I had a similar patch on my tree too. It will propagate to the
stable branches.

Peter
--
Orocos-Dev mailing list
Orocos-Dev [..] ...
http://lists.mech.kuleuven.be/mailman/listinfo/orocos-dev

RTT v1 buffer race condition

On Oct 13, 2010, at 03:02 , Peter Soetens wrote:

> On Tuesday 12 October 2010 22:27:32 Stephen Roderick wrote:
>> On Oct 12, 2010, at 09:47 , Peter Soetens wrote:
>>> On Tuesday 12 October 2010 13:01:07 Stephen Roderick wrote:
>>>> On Oct 12, 2010, at 03:28 , Peter Soetens wrote:
>>>>> On Tuesday 12 October 2010 02:48:35 S Roderick wrote:
>>>>>> We've recently been exciting this assert(), amongst others. Could this
>>>>>> occur due to more than ORO_OS_MAX_THREADS components trying to access
>>>>>> the same buffer? We're still trying to make this reproducible ...
>>>>>>
>>>>>>

>>>>>> deployer-corba-gnulinux:
>>>>>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/rt
>>>>>> t/ Bu fferLockFree.hpp:146: bool RTT::BufferLockFree<T, ReadPolicy,
>>>>>> WritePolicy>::Push(typename RTT::WriteInterface<T>::param_t) [with T
>>>>>> = OCL::logging::LoggingEvent, ReadPolicy = RTT::NonBlockingPolicy,
>>>>>> WritePolicy = RTT::NonBlockingPolicy]: Assertion `false && "Race
>>>>>> detected in Push()"' failed. 

>>>>>>
>>>>>> Backtrace attached (from gnulinux).
>>>>>> S
>>>>>
>>>>> 2.x has a Multi-writer/Single reader lock-free buffer implementation.
>>>>> It can handle any number of threads, but can store no more than 65535
>>>>> elements.
>>>>>
>>>>> You could backport it to 1.x, but the front() function has been removed
>>>>> from the 2.x API. Since its single reader, you could emulate it though
>>>>> or leave it empty (the RTT itself does not use Buffers, so only user
>>>>> code would be harmed).
>>>>
>>>> Understood, but for the moment, is it theoretically possible/likely that
>>>> excessive threads could cause the above?
>>>>
>>>> I doubt we'd exceed 64k elements, but we do send several thousand
>>>> logging messages per second in some cases. This might be an option if
>>>> this turns out to be a problem. S
>>>
>>> In that case, you might try to use the BufferLocked implementation,
>>> because thousands of messages incur a large copy-overhead in the
>>> lock-free implementation. We solved this performance bug on the 2.x
>>> line, but for 1.x, large buffer sizes are not recommended with the
>>> lock-free versions.
>>
>> We almost have a reproducible test case for this. Still using LockFree,
>> pushing about 3000 logging events per second from 12 test components, mix
>> of RT and non-RT, all Activity's, with a single 200-element buffer.
>> Either get the above or
>>
>>
>> stressLogging:
>> /home/sroderick/nrl/robotics/build/orocos-rtt/../../install/include/rtt/Me
>> moryPool.hpp:389: bool RTT::FixedSizeMemoryPool<T>::deallocate(T*) [with T
>> = OCL::logging::LoggingEvent]: Assertion `false && "Deallocating more
>> elements than allocated !"' failed. 

>>
>> Will get you test case as soon as is good. We increased ORO_OS_CONC_ACCESS
>> from 8 to 64, but no apparent effect.
>
> You mean: ORONUM_OS_MAX_THREADS = 64 ?

Which is defined by OS_MAX_CONC_ACCESS in src/CMakeLists.txt, yes.

>> How do you substitute use of BufferLocked in v1? It is not at all obvious
>> from any of the doc's. S
>
> It's quite simple:
>
> port = new BufferLocked<OCL::logging::LoggingEvent>(200); // size=200
>
> Can be done before or after the port is connected to other ports.

Thx, will see if that prevents crashes.
S