Computer freezes during Orocos component execution

I am looking for some help getting started debugging my most recent problem
with Orocos and my application. I have written a component that communicates
over the network with a motion capture software program. Sometime after the
component is running the entire computer will freeze necessitating a hard
reboot.

There are a couple of issues here that I need to address. For instance, I
know that making non-realtime system calls (such as communicating over a
network socket) is a no-no in a hard-real time component, but that is
another email. However, is there anyway I can go about debugging this?
Should I setup up some debug terminal on another machine to dump kernel
messages to (similar to debugging Xenomai/RTAI issues)? Can orocos log
messages be dumped to another machine via the reporter component? Or is it
even worth debugging? Should I just start re-designing the component to
remove all non-real time calls, assuming that that must be causing the
computer to lock up?

If it matters, my code is running on a Ubuntu 9.04 installation with Xenomai
2.4.7, kernel 2.6.28.9, and rtt 1.8.4 all installed.

Computer freezes during Orocos component execution

On Tue, 4 Aug 2009, John Yamokoski wrote:

> I am looking for some help getting started debugging my most recent problem
> with Orocos and my application. I have written a component that communicates
> over the network with a motion capture software program. Sometime after the
> component is running the entire computer will freeze necessitating a hard
> reboot.

What exactly is going on in your communication component? How is it
coordinated with your "core" Orocos component?

> There are a couple of issues here that I need to address. For instance, I
> know that making non-realtime system calls (such as communicating over a
> network socket) is a no-no in a hard-real time component, but that is
> another email. However, is there anyway I can go about debugging this?
> Should I setup up some debug terminal on another machine to dump kernel
> messages to (similar to debugging Xenomai/RTAI issues)? Can orocos log
> messages be dumped to another machine via the reporter component? Or is it
> even worth debugging? Should I just start re-designing the component to
> remove all non-real time calls, assuming that that must be causing the
> computer to lock up?

Debugging realtime problems is _tough_, mostly because they are so time
dependent, and the result of "races" which are difficult to reproduce
deterministically...

Anyway, a good practice in software system design in which you have to
interface to "legacy" pieces of software is to foresee an explicit
communication component at each side, which are doing the waiting.

> If it matters, my code is running on a Ubuntu 9.04 installation with Xenomai
> 2.4.7, kernel 2.6.28.9, and rtt 1.8.4 all installed.
>

Herman

Computer freezes during Orocos component execution

We are having similar problems of "freezed computer" and hard reboot
in our components which use the CAN bus and TCP communication.
We are working hard to found the cause, but, without debugging tools,
it is taking a lot of time If we find out which is the cause of the
freezing and the corresponding bad/good practice, I will post it (I
hope it may help).

By the way, if you are using Xenomai you can get some hints about CPU
usage and evil non-realtime system calls doing :

cat /proc/xenomai/stat

Regards

Davide

Computer freezes during Orocos component execution

On Wed, Aug 5, 2009 at 09:44, Davide Faconti<faconti [..] ...> wrote:
> We are having similar problems of "freezed computer" and hard reboot
> in our components which use the CAN bus and TCP communication.
> We are working hard to found the cause, but, without debugging tools,
> it is taking a lot of time If we find out which is the cause of the
> freezing and the corresponding bad/good practice, I will post it (I
> hope it may help).
>

We had similar problems some time ago, but the Xenomai watchdog thread
killed the thread that went into 100% CPU, avoiding lockups. Are you
guys using this watchdog ?

Note that doing non-RT system calls in a real-time thread does not
kill your PC, but only your real-time guarantees. So removing the
non-RT calls will not fix your problem (unless accidentally if it also
removes the bug).

Finally, there were some instability problems with Xenomai recently on
i386. Consult the Xenomai mailinglist, there are *tons* of posts about
this particular bug freezing/hanging computers.

Peter

Computer freezes during Orocos component execution

On Wednesday 05 August 2009 09:44:03 Davide Faconti wrote:
> We are having similar problems of "freezed computer" and hard reboot
> in our components which use the CAN bus and TCP communication.
> We are working hard to found the cause, but, without debugging tools,
> it is taking a lot of time If we find out which is the cause of the
> freezing and the corresponding bad/good practice, I will post it (I
> hope it may help).
Did you try to run your component(s) under non-realtime scheduling ? I found
that it helps knowing if there is a component simply taking 100% CPU
(debugging similar issues with my CAN component)

Moreover, if a 100% CPU is not the cause, you can also look (under Xenomai) at
the CPU consumption of the gatekeeper kernel thread. That's the one which
switches between domains.