[Bug 724] New: Timer thread locking up on quit

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=724

Summary: Timer thread locking up on quit
Product: RTT
Version: rtt-trunk
Platform: i386 Compatible
OS/Version: GNU/Linux
Status: NEW
Severity: normal
Priority: P3
Component: Real-Time Toolkit (RTT)
AssignedTo: orocos-dev [..] ...
ReportedBy: kiwi [dot] net [..] ...
CC: orocos-dev [..] ...
Estimated Hours: 0.0

I am not sure this is a bug - it may be due to our code.

Running on a VIA C7 with Ubuntu Hardy + PREEMPT_RT. This only happens when we
are talking to hardware, hence I think it might be caused by us. When using
simulated hardware on the same platform, it quits cleanly. Having said that, no
component complains of not being able to quit prior to this so ...?


(gdb) bt
#0 0xb7f0c410 in __kernel_vsyscall ()
#1 0xb72ec589 in __lll_lock_wait () from /lib/tls/i686/cmov/libpthread.so.0
#2 0xb72e7bb4 in _L_lock_236 () from /lib/tls/i686/cmov/libpthread.so.0
#3 0xb72e760b in pthread_mutex_lock () from /lib/tls/i686/cmov/libpthread.so.0
#4 0xb7baa41e in rtos_mutex_rec_lock (m=0x80cd40c) at
/g/o/rtt/src/os/gnulinux/fosi.h:233
#5 0xb7bad765 in RTT::OS::MutexRecursive::lock (this=0x80cd408) at
/g/o/rtt/src/os/Mutex.hpp:237
#6 0xb7ab9f08 in MutexLock (this=0xbfbd99c0, mutex=@0x80cd408) at
/g/o/rtt/src/os/MutexLock.hpp:63
#7 0xb7ba9da7 in RTT::TimerThread::removeActivity (this=0x80cd378,
t=0x80cd338) at /g/o/rtt/src/TimerThread.cpp:114
#8 0xb7ad3e0f in RTT::PeriodicActivity::stop (this=0x80cd338) at
/g/o/rtt/src/PeriodicActivity.cpp:133
#9 0xb7aa117a in RTT::ExecutionEngine::stop (this=0x80bb040) at
/g/o/rtt/src/ExecutionEngine.cpp:194
#10 0xb7ba66ed in RTT::TaskCore::stop (this=0x80c0bb0) at
/g/o/rtt/src/TaskCore.cpp:173
#11 0xb6f942e4 in OCL::DeploymentComponent::stopComponents (this=0xbfbd9e00) at
/g/o/ocl/deployment/DeploymentComponent.cpp:1117
#12 0xb6f9ac66 in OCL::DeploymentComponent::kickOutAll (this=0xbfbd9e00) at
/g/o/ocl/deployment/DeploymentComponent.cpp:415
#13 0xb6fa2ceb in ~DeploymentComponent (this=0xbfbd9e00) at
/g/o/ocl/deployment/DeploymentComponent.cpp:271
#14 0xb7eda232 in ~CorbaDeploymentComponent (this=0xbfbd9e00) at
/g/o/ocl/deployment/CorbaDeploymentComponent.cpp:96
#15 0x08059aa6 in ORO_main_impl (argc=7, argv=0xbfbda194) at
/g/o/ocl/bin/deployer-corba.cpp:105
#16 0x08059dda in main (argc=7, argv=0xbfbda194) at
/g/o/ocl/bin/deployer-corba.cpp:44
(gdb)

[Bug 724] Timer thread locking up on quit

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=724

S Roderick <kiwi [dot] net [..] ...> changed:

What |Removed |Added
----------------------------------------------------------------------------
Resolution| |INVALID
Status|NEW |RESOLVED

--- Comment #2 from S Roderick <kiwi [dot] net [..] ...> 2009-11-04 18:32:38 ---
See comment. Not an Orocos problem

[Bug 724] Timer thread locking up on quit

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=724

Peter Soetens <peter [..] ...> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |peter [..] ...

--- Comment #1 from Peter Soetens <peter [..] ...> 2009-11-04 13:14:05 ---
(In reply to comment #0)
> I am not sure this is a bug - it may be due to our code.
>
> Running on a VIA C7 with Ubuntu Hardy + PREEMPT_RT. This only happens when we
> are talking to hardware, hence I think it might be caused by us. When using
> simulated hardware on the same platform, it quits cleanly. Having said that, no
> component complains of not being able to quit prior to this so ...?

I'd need the backtrace of the other threads too. Note: TimerThread has nothing
to do with Timer/TimerComponent. When we stop() a PeriodicActivity, we wait
until step() returns. If step() is stuck, we block on this mutex forever. This
is similar to the stop() case we discussed earlier where a timeout is desired
to not block the caller too (ie two threads blocked instead of one).

Peter

[Bug 724] Timer thread locking up on quit

On Nov 4, 2009, at 07:14 AM, Peter Soetens wrote:

> https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=724
>
>
> Peter Soetens <peter [..] ...> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> CC| |
> peter [..] ...
>
>
>
>
> --- Comment #1 from Peter Soetens <peter [..] ...>
> 2009-11-04 13:14:05 ---
> (In reply to comment #0)
>> I am not sure this is a bug - it may be due to our code.
>>
>> Running on a VIA C7 with Ubuntu Hardy + PREEMPT_RT. This only
>> happens when we
>> are talking to hardware, hence I think it might be caused by us.
>> When using
>> simulated hardware on the same platform, it quits cleanly. Having
>> said that, no
>> component complains of not being able to quit prior to this so ...?
>
> I'd need the backtrace of the other threads too. Note: TimerThread
> has nothing
> to do with Timer/TimerComponent. When we stop() a PeriodicActivity,
> we wait
> until step() returns. If step() is stuck, we block on this mutex
> forever. This
> is similar to the stop() case we discussed earlier where a timeout
> is desired
> to not block the caller too (ie two threads blocked instead of one).
>
> Peter

Like I said ... might be our code!

Some vendor code has a blocking read in it, which we call from a
method that is called from a state machine. In certain dormant states
of the vehicle, the block occurs and if you quit during this time, you
get the nice lockup. Not much Orocos could do about this!

Sorry for the noise :-(
Stephen