Component hangs on shutdown

I have been trying to debug a strange problem I have with an Orocos
component I have been working on. When I exit the task browser and the
component shuts down, the program seems to hang after writing the following
to the log file:

...
11.113 [ Debug ][Logger] Stopping StartStopManager.
11.113 [ Debug ][Logger] Stopping MainThread.
11.113 [ Info ][Logger] Orocos Logging Deactivated.

I opened up startstop.cpp and it looks like its hanging in __os_exit() at,

// Stop TimeService if present.
TimeService::Release();

// Stop Main Thread
OS::MainThread::Release();

#ifdef OS_HAVE_MANUAL_CRT
DO_GLOBAL_DTORS();
#endif

Interestingly, this hanging only happens when my component uses another
library to do TCP/IP communication with a server process running on another
machine. If I run my component with out using the communication library, the
component exits with no problems. Any thoughts on where/how to proceed to
debug this?

Component hangs on shutdown

On Mon, 3 Aug 2009, John Yamokoski wrote:
> I have been trying to debug a strange problem I have with an Orocos
> component I have been working on. When I exit the task browser and the
> component shuts down, the program seems to hang after writing the following
> to the log file:
>
> ...
> 11.113 [ Debug ][Logger] Stopping StartStopManager.
> 11.113 [ Debug ][Logger] Stopping MainThread.
> 11.113 [ Info ][Logger] Orocos Logging Deactivated.
>
> I opened up startstop.cpp and it looks like its hanging in __os_exit() at,
>
> // Stop TimeService if present.
> TimeService::Release();
>
> // Stop Main Thread
> OS::MainThread::Release();
>
> #ifdef OS_HAVE_MANUAL_CRT
> DO_GLOBAL_DTORS();
> #endif
>
> Interestingly, this hanging only happens when my component uses another
> library to do TCP/IP communication with a server process running on another
> machine. If I run my component with out using the communication library, the
> component exits with no problems. Any thoughts on where/how to proceed to
> debug this?

ctrl-c and bt your program in gdb?

HTH,

Klaas

Component hangs on shutdown

On Tue, Aug 4, 2009 at 2:33 AM, Klaas Gadeyne <klaas [dot] gadeyne [..] ...> wrote:

> ctrl-c and bt your program in gdb?
>

Sorry, I know gdb but I don't know the abbreviation bt? If you mean step
through the final moments of my program using gdb, then that was going to be
my next step. I just didn't know if this behavior had been witnessed by
anyone else...

To Roderick,

I assume the external comm library is shutting down completely, but that
might be a bad assumption. I am communicating with a motion capture system
using the manufacturer's SDK. I have poked around in their SDK and
essentially its a client library using some threads and standard linux
socket calls. According to their documentation I am calling all the right
functions to uninitialize and shutdown their library. If one of the
(standard linux, non-realtime) threads started by their library was not
shutdown properly, could that be preventing something like
"OS::MainThread::Release();" from releasing?

Component hangs on shutdown

oh, and Google to the rescue.

'bt' is backtrace in gdb. Got it!

On Tue, Aug 4, 2009 at 8:56 AM, John Yamokoski <yamokosk [..] ...> wrote:

>
>
> On Tue, Aug 4, 2009 at 2:33 AM, Klaas Gadeyne <klaas [dot] gadeyne [..] ...>wrote:
>
>> ctrl-c and bt your program in gdb?
>>
>
> Sorry, I know gdb but I don't know the abbreviation bt? If you mean step
> through the final moments of my program using gdb, then that was going to be
> my next step. I just didn't know if this behavior had been witnessed by
> anyone else...
>
> To Roderick,
>
> I assume the external comm library is shutting down completely, but that
> might be a bad assumption. I am communicating with a motion capture system
> using the manufacturer's SDK. I have poked around in their SDK and
> essentially its a client library using some threads and standard linux
> socket calls. According to their documentation I am calling all the right
> functions to uninitialize and shutdown their library. If one of the
> (standard linux, non-realtime) threads started by their library was not
> shutdown properly, could that be preventing something like
> "OS::MainThread::Release();" from releasing?
>

Component hangs on shutdown

FIXED! Not that this impacts anyone on this list, but in case anyone was
interested...

I ran gdb and paid attention to the threads being created and destroyed. I
noticed that when I was using the manufacturer's SDK, it was creating two
threads (two not being created when I did not use the SDK library).
Fortunately I have the source code for their client library. I took a peak
at their uninitializing code, and sure enough, they were not calling
"pthread_join" or anything similar to shutdown the threads they created!

Why this was not being done, who knows. I made the change and now my Orocos
component does not hang at shutdown. Lesson: never trust anyone's code..
even commercial code!

On Tue, Aug 4, 2009 at 8:57 AM, John Yamokoski <yamokosk [..] ...> wrote:

> oh, and Google to the rescue.
>
> 'bt' is backtrace in gdb. Got it!
>
>
>
> On Tue, Aug 4, 2009 at 8:56 AM, John Yamokoski <yamokosk [..] ...> wrote:
>
>>
>>
>> On Tue, Aug 4, 2009 at 2:33 AM, Klaas Gadeyne <klaas [dot] gadeyne [..] ...>wrote:
>>
>>> ctrl-c and bt your program in gdb?
>>>
>>
>> Sorry, I know gdb but I don't know the abbreviation bt? If you mean step
>> through the final moments of my program using gdb, then that was going to be
>> my next step. I just didn't know if this behavior had been witnessed by
>> anyone else...
>>
>> To Roderick,
>>
>> I assume the external comm library is shutting down completely, but that
>> might be a bad assumption. I am communicating with a motion capture system
>> using the manufacturer's SDK. I have poked around in their SDK and
>> essentially its a client library using some threads and standard linux
>> socket calls. According to their documentation I am calling all the right
>> functions to uninitialize and shutdown their library. If one of the
>> (standard linux, non-realtime) threads started by their library was not
>> shutdown properly, could that be preventing something like
>> "OS::MainThread::Release();" from releasing?
>>
>
>

Component hangs on shutdown

Well, I spoke too soon. Turns out my issue is not fixed. When I run my
Orocos component in gdb, it now shuts down properly without hanging
(regardless of whether my component utilizes the messy motion capture SDK or
not). However, when I run my component outside of gdb it still hangs
whenever I have to access that SDK, otherwise it always shuts down properly.
My work around for now is opening another terminal, and issuing a kill -9
for the defunct MainThread..

Any ideas on why this works properly in gdb but not out in the wild? Would
it have anything to do with gdb,

"[Thread debugging using libthread_db enabled]"

On Tue, Aug 4, 2009 at 10:18 AM, John Yamokoski <yamokosk [..] ...> wrote:

> FIXED! Not that this impacts anyone on this list, but in case anyone was
> interested...
>
> I ran gdb and paid attention to the threads being created and destroyed. I
> noticed that when I was using the manufacturer's SDK, it was creating two
> threads (two not being created when I did not use the SDK library).
> Fortunately I have the source code for their client library. I took a peak
> at their uninitializing code, and sure enough, they were not calling
> "pthread_join" or anything similar to shutdown the threads they created!
>
> Why this was not being done, who knows. I made the change and now my Orocos
> component does not hang at shutdown. Lesson: never trust anyone's code..
> even commercial code!
>
>
>
> On Tue, Aug 4, 2009 at 8:57 AM, John Yamokoski <yamokosk [..] ...> wrote:
>
>> oh, and Google to the rescue.
>>
>> 'bt' is backtrace in gdb. Got it!
>>
>>
>>
>> On Tue, Aug 4, 2009 at 8:56 AM, John Yamokoski <yamokosk [..] ...>wrote:
>>
>>>
>>>
>>> On Tue, Aug 4, 2009 at 2:33 AM, Klaas Gadeyne <klaas [dot] gadeyne [..] ...>wrote:
>>>
>>>> ctrl-c and bt your program in gdb?
>>>>
>>>
>>> Sorry, I know gdb but I don't know the abbreviation bt? If you mean step
>>> through the final moments of my program using gdb, then that was going to be
>>> my next step. I just didn't know if this behavior had been witnessed by
>>> anyone else...
>>>
>>> To Roderick,
>>>
>>> I assume the external comm library is shutting down completely, but that
>>> might be a bad assumption. I am communicating with a motion capture system
>>> using the manufacturer's SDK. I have poked around in their SDK and
>>> essentially its a client library using some threads and standard linux
>>> socket calls. According to their documentation I am calling all the right
>>> functions to uninitialize and shutdown their library. If one of the
>>> (standard linux, non-realtime) threads started by their library was not
>>> shutdown properly, could that be preventing something like
>>> "OS::MainThread::Release();" from releasing?
>>>
>>
>>
>

Component hangs on shutdown

On Tue, 4 Aug 2009, John Yamokoski wrote:
> Well, I spoke too soon. Turns out my issue is not fixed. When I run my
> Orocos component in gdb, it now shuts down properly without hanging
> (regardless of whether my component utilizes the messy motion capture SDK or
> not). However, when I run my component outside of gdb it still hangs
> whenever I have to access that SDK, otherwise it always shuts down properly.
> My work around for now is opening another terminal, and issuing a kill -9
> for the defunct MainThread..
>
> Any ideas on why this works properly in gdb but not out in the wild? Would
> it have anything to do with gdb,

It's probably a race condition in your program which does not occur when you run it in gdb (you might experience a similar thing if you change gcc's optimization level).
Unfortunately, that's very hard to debug :-(

Maybe you can try using gdb's "attach" command though.

HTH,

Klaas

Component hangs on shutdown

On Aug 3, 2009, at 16:40 , John Yamokoski wrote:

> I have been trying to debug a strange problem I have with an Orocos
> component I have been working on. When I exit the task browser and
> the component shuts down, the program seems to hang after writing
> the following to the log file:
>
> ...
> 11.113 [ Debug ][Logger] Stopping StartStopManager.
> 11.113 [ Debug ][Logger] Stopping MainThread.
> 11.113 [ Info ][Logger] Orocos Logging Deactivated.
>
> I opened up startstop.cpp and it looks like its hanging in
> __os_exit() at,
>
> // Stop TimeService if present.
> TimeService::Release();
>
> // Stop Main Thread
> OS::MainThread::Release();
>
> #ifdef OS_HAVE_MANUAL_CRT
> DO_GLOBAL_DTORS();
> #endif
>
> Interestingly, this hanging only happens when my component uses
> another library to do TCP/IP communication with a server process
> running on another machine. If I run my component with out using the
> communication library, the component exits with no problems. Any
> thoughts on where/how to proceed to debug this?

I too have seen similar behaviour. Are you certain that your
communication component(s) are shutting down cleanly? Are these
components running in periodic or non-periodic activities? Are any
state machines involved? Also, what version of RTT and what operating
system are you running on?

Stephen