Segfault on a working hardware library call in Orocos

Hi all,

I have a dynamic librairy that makes my watchdog working on my linux
PC-board. This librairy is (nearly) perfectly working when I use it in
simple C programs or in the configure/start hooks.

But as soon as I use it in operations (client or owner thread) or in the
updateHook I have a segmentation fault. It may be a problem of my librairy,
but it is strangly cause trooble in Orocos, and strangly change behavior
when Orocos is in "working" state. It may be a thread safety problem, but to
my user view I just have one periodic component working so I don't see
what's the problem.

by chance any of you has already had this kind of behavior changing in
Orocos ? Is it linked to the Execution engine ?

Segfault on a working hardware library call in Orocos

On Sat, Nov 6, 2010 at 1:52 PM, Willy Lambert <lambert [dot] willy [..] ...> wrote:
> Hi all,
>
> I have a dynamic librairy that makes my watchdog working on my linux
> PC-board. This librairy is (nearly) perfectly working when I use it in
> simple C programs or in the configure/start hooks.
>
> But as soon as I use it in operations (client or owner thread) or in the
> updateHook I have a segmentation fault. It may be a problem of my librairy,
> but it is strangly cause trooble in Orocos, and strangly change behavior
> when Orocos is in "working" state. It may be a thread safety problem, but to
> my user view I just have one periodic component working so I don't see
> what's the problem.
>
> by chance any of you has already had this kind of behavior changing in
> Orocos ? Is it linked to the Execution engine ?

Without a backtrace, we don't know. Run your application in gdb and type 'bt'
after the segfault occurs. That should give the clue.

Peter

Segfault on a working hardware library call in Orocos

here is it, but it don't think you'll have enougth information. I also read
bt in other threads there is nothing more explicit :

root@alpha:~# cd /opt/ard
root@alpha:/opt/ard# gdb ./bin/arp-hml.ard
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs />...
Reading symbols from /opt/ard/bin/arp-hml.ard...done.
(gdb) r
Starting program: /opt/ard/bin/arp-hml.ard

[Thread debugging using libthread_db enabled]
Setting OCL factory for real-time logging
[New Thread 0xb66c6b70 (LWP 1281)]
[New Thread 0xb5ec5b70 (LWP 1282)]
[New Thread 0xb511cb70 (LWP 1283)]
[Thread 0xb66c6b70 (LWP 1281) exited]
0.741 [ Warning][DeploymentComponent::configure] plugin
'/opt/orocos2/install/lib/orocos/types/librtt-transport-corba-gnulinux.so'
already loaded. Not reloading it.
0.741 [ Warning][DeploymentComponent::configure] plugin
'/opt/orocos2/install/lib/orocos/types/librtt-typekit-gnulinux.so' already
loaded. Not reloading it.
0.741 [ Warning][DeploymentComponent::configure] plugin
'/opt/orocos2/install/lib/orocos/types/librtt-transport-mqueue-gnulinux.so'
already loaded. Not reloading it.
0.742 [ Warning][DeploymentComponent::configure] plugin
'/opt/orocos2/install/lib/orocos/plugins/librtt-marshalling-gnulinux.so'
already loaded. Not reloading it.
0.742 [ Warning][DeploymentComponent::configure] plugin
'/opt/orocos2/install/lib/orocos/plugins/librtt-scripting-gnulinux.so'
already loaded. Not reloading it.
0.742 [ Warning][DeploymentComponent::configure] Library
/opt/orocos2/install/lib/orocos/liborocos-reporting-gnulinux.so already
loaded... try to RELOAD
0.743 [ Warning][DeploymentComponent::configure] Library
/opt/orocos2/install/lib/orocos/liborocos-logging-gnulinux.so already
loaded... try to RELOAD
0.743 [ Warning][DeploymentComponent::configure] Library
/opt/orocos2/install/lib/orocos/liborocos-ocl-common-gnulinux.so already
loaded... try to RELOAD
0.743 [ Warning][DeploymentComponent::configure] Library
/opt/orocos2/install/lib/orocos/liborocos-timer-gnulinux.so already
loaded... try to RELOAD
HML configured OK
HML started OK
[New Thread 0xb66c6b70 (LWP 1284)]
Switched to : HML

This console reader allows you to browse and manipulate TaskContexts.
You can type in an operation, expression, create or change variables.
(type 'help' for instructions and 'ls' for context info)

TAB completion and HISTORY is available ('bash' like)

HML [R]>
HML [R]>
HML [R]>
HML [R]> [New Thread 0xb479fb70 (LWP 1285)]
[New Thread 0xb3f9eb70 (LWP 1286)]
[New Thread 0xb379db70 (LWP 1287)]
[Thread 0xb3f9eb70 (LWP 1286) exited]
[New Thread 0xb3f9eb70 (LWP 1288)]
[New Thread 0xb2f9cb70 (LWP 1289)]
[Thread 0xb3f9eb70 (LWP 1288) exited]
***SUSI Evaluation Release Only for [Advanced Robotics Design] on [PCM-3362
]***

HML [R]>
HML [R]> PcBoard.oo
PcBoard.ooManageWatchdog PcBoard.ooTriggerWatchdog
HML [R]> PcBoard.oo
PcBoard.ooManageWatchdog PcBoard.ooTriggerWatchdog
HML [R]> PcBoard.ooManageWatchdog
Wrong number of arguments in call of function "this.ooManageWatchdog":
expected 1, received 0.
HML [R]> PcBoard.ooManageWatchdog(true)
[New Thread 0xb3f9eb70 (LWP 1290)]
= true

HML [R]>
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb3f9eb70 (LWP 1290)]
0xb73f6287 in StartWdtSMSC(unsigned char) () from /usr/lib/libSUSI-3.02.so
(gdb)
(gdb)
(gdb)
(gdb) bt
#0 0xb73f6287 in StartWdtSMSC(unsigned char) () from /usr/lib/
libSUSI-3.02.so
#1 0xb73f56fc in WDT_DispatchIOCTL(unsigned long, unsigned long*, unsigned
long, unsigned long*, unsigned long, long*) () from /usr/lib/libSUSI-3.02.so
#2 0xb73e1b22 in DeviceIoControl(void*, unsigned long, void*, unsigned
long, void*, unsigned long, unsigned long*, void*) () from /usr/lib/
libSUSI-3.02.so
#3 0xb73eb472 in ?? () from /usr/lib/libSUSI-3.02.so
#4 0xb73eb0cf in WDT0_DelayProc(void*) () from /usr/lib/libSUSI-3.02.so
#5 0xb73e2030 in ?? () from /usr/lib/libSUSI-3.02.so
#6 0xb7137955 in start_thread () from /lib/i686/cmov/libpthread.so.0
#7 0xb7216e7e in clone () from /lib/i686/cmov/libc.so.6
(gdb)

2010/11/7 Peter Soetens <peter [..] ...>

> On Sat, Nov 6, 2010 at 1:52 PM, Willy Lambert <lambert [dot] willy [..] ...>
> wrote:
> > Hi all,
> >
> > I have a dynamic librairy that makes my watchdog working on my linux
> > PC-board. This librairy is (nearly) perfectly working when I use it in
> > simple C programs or in the configure/start hooks.
> >
> > But as soon as I use it in operations (client or owner thread) or in the
> > updateHook I have a segmentation fault. It may be a problem of my
> librairy,
> > but it is strangly cause trooble in Orocos, and strangly change behavior
> > when Orocos is in "working" state. It may be a thread safety problem, but
> to
> > my user view I just have one periodic component working so I don't see
> > what's the problem.
> >
> > by chance any of you has already had this kind of behavior changing in
> > Orocos ? Is it linked to the Execution engine ?
>
> Without a backtrace, we don't know. Run your application in gdb and type
> 'bt'
> after the segfault occurs. That should give the clue.
>
> Peter
>

Segfault on a working hardware library call in Orocos

On Sunday 07 November 2010 16:34:02 Willy Lambert wrote:
> here is it, but it don't think you'll have enougth information. I also read
> bt in other threads there is nothing more explicit :
>
...
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0xb3f9eb70 (LWP 1290)]
> 0xb73f6287 in StartWdtSMSC(unsigned char) () from /usr/lib/libSUSI-3.02.so
> (gdb)
> (gdb)
> (gdb)
> (gdb) bt
> #0 0xb73f6287 in StartWdtSMSC(unsigned char) () from /usr/lib/
> libSUSI-3.02.so
> #1 0xb73f56fc in WDT_DispatchIOCTL(unsigned long, unsigned long*, unsigned
> long, unsigned long*, unsigned long, long*) () from
> /usr/lib/libSUSI-3.02.so #2 0xb73e1b22 in DeviceIoControl(void*, unsigned
> long, void*, unsigned long, void*, unsigned long, unsigned long*, void*)
> () from /usr/lib/ libSUSI-3.02.so
> #3 0xb73eb472 in ?? () from /usr/lib/libSUSI-3.02.so
> #4 0xb73eb0cf in WDT0_DelayProc(void*) () from /usr/lib/libSUSI-3.02.so
> #5 0xb73e2030 in ?? () from /usr/lib/libSUSI-3.02.so
> #6 0xb7137955 in start_thread () from /lib/i686/cmov/libpthread.so.0
> #7 0xb7216e7e in clone () from /lib/i686/cmov/libc.so.6
> (gdb)

One thing we can immediately conclude here is that it's not an Orocos thread
that segfaults, but a thread created by the libSUSI library. This points all
to that library not being thread safe (somewhat unlikely), *or*, one of your
functions passing corrupted data to that library, which makes the lib crash
when it's thread processes it.

You aren't playing with (const) references to stacked data which might have
been removed from the stack by the time the libSUSI processes it ?

Peter

Segfault on a working hardware library call in Orocos

2010/11/7 Peter Soetens <peter [..] ...>

> On Sunday 07 November 2010 16:34:02 Willy Lambert wrote:
> > here is it, but it don't think you'll have enougth information. I also
> read
> > bt in other threads there is nothing more explicit :
> >
> ...
> > Program received signal SIGSEGV, Segmentation fault.
> > [Switching to Thread 0xb3f9eb70 (LWP 1290)]
> > 0xb73f6287 in StartWdtSMSC(unsigned char) () from /usr/lib/
> libSUSI-3.02.so
> > (gdb)
> > (gdb)
> > (gdb)
> > (gdb) bt
> > #0 0xb73f6287 in StartWdtSMSC(unsigned char) () from /usr/lib/
> > libSUSI-3.02.so
> > #1 0xb73f56fc in WDT_DispatchIOCTL(unsigned long, unsigned long*,
> unsigned
> > long, unsigned long*, unsigned long, long*) () from
> > /usr/lib/libSUSI-3.02.so #2 0xb73e1b22 in DeviceIoControl(void*,
> unsigned
> > long, void*, unsigned long, void*, unsigned long, unsigned long*, void*)
> > () from /usr/lib/ libSUSI-3.02.so
> > #3 0xb73eb472 in ?? () from /usr/lib/libSUSI-3.02.so
> > #4 0xb73eb0cf in WDT0_DelayProc(void*) () from /usr/lib/libSUSI-3.02.so
> > #5 0xb73e2030 in ?? () from /usr/lib/libSUSI-3.02.so
> > #6 0xb7137955 in start_thread () from /lib/i686/cmov/libpthread.so.0
> > #7 0xb7216e7e in clone () from /lib/i686/cmov/libc.so.6
> > (gdb)
>
> One thing we can immediately conclude here is that it's not an Orocos
> thread
> that segfaults, but a thread created by the libSUSI library.

This points all
> to that library not being thread safe (somewhat unlikely),

many chance it is not. But I have only one component that is calling this
Susilib (exept the one susi creates) Can I do something in my code to avoid
this thread safety problem ?

> *or*, one of your
> functions passing corrupted data to that library, which makes the lib crash
> when it's thread processes it.
>

In the backtrace I send I send properties to the function (just updated
during configure). I will try it with hard coded values. It also happens
with a no argument function :'(

>
> You aren't playing with (const) references to stacked data which might have
> been removed from the stack by the time the libSUSI processes it ?
>

maybe :) I have never been confronted to such problems. I would be great if
you could detail this, because I am not sure to find out what I can do to
check this. I joined my component sources in case you need this to explain
something, but as I didn't reduce the problem to a tiny example I am not
expecting you to read it.

I am waiting the Susi lib owner support in parrallel.

>
> Peter
>

Segfault on a working hardware library call in Orocos

On Monday 08 November 2010 00:25:37 Willy Lambert wrote:
> 2010/11/7 Peter Soetens <peter [..] ...>
>
> > On Sunday 07 November 2010 16:34:02 Willy Lambert wrote:
> > > here is it, but it don't think you'll have enougth information. I also
> >
> > read
> >
> > > bt in other threads there is nothing more explicit :
> > ...
> >
> > > Program received signal SIGSEGV, Segmentation fault.
> > > [Switching to Thread 0xb3f9eb70 (LWP 1290)]
> > > 0xb73f6287 in StartWdtSMSC(unsigned char) () from /usr/lib/
> >
> > libSUSI-3.02.so
> >
> > > (gdb)
> > > (gdb)
> > > (gdb)
> > > (gdb) bt
> > > #0 0xb73f6287 in StartWdtSMSC(unsigned char) () from /usr/lib/
> > > libSUSI-3.02.so
> > > #1 0xb73f56fc in WDT_DispatchIOCTL(unsigned long, unsigned long*,
> >
> > unsigned
> >
> > > long, unsigned long*, unsigned long, long*) () from
> > > /usr/lib/libSUSI-3.02.so #2 0xb73e1b22 in DeviceIoControl(void*,
> >
> > unsigned
> >
> > > long, void*, unsigned long, void*, unsigned long, unsigned long*,
> > > void*) () from /usr/lib/ libSUSI-3.02.so
> > > #3 0xb73eb472 in ?? () from /usr/lib/libSUSI-3.02.so
> > > #4 0xb73eb0cf in WDT0_DelayProc(void*) () from
> > > /usr/lib/libSUSI-3.02.so #5 0xb73e2030 in ?? () from
> > > /usr/lib/libSUSI-3.02.so
> > > #6 0xb7137955 in start_thread () from /lib/i686/cmov/libpthread.so.0
> > > #7 0xb7216e7e in clone () from /lib/i686/cmov/libc.so.6
> > > (gdb)
> >
> > One thing we can immediately conclude here is that it's not an Orocos
> > thread
> > that segfaults, but a thread created by the libSUSI library.
>
> This points all
>
> > to that library not being thread safe (somewhat unlikely),
>
> many chance it is not. But I have only one component that is calling this
> Susilib (exept the one susi creates) Can I do something in my code to avoid
> this thread safety problem ?

I don't think so. It's internal to susi (susi created its own thread), so susi
needs to handle it.

>
> > *or*, one of your
> > functions passing corrupted data to that library, which makes the lib
> > crash when it's thread processes it.
>
> In the backtrace I send I send properties to the function (just updated
> during configure). I will try it with hard coded values. It also happens
> with a no argument function :'(
>
> > You aren't playing with (const) references to stacked data which might
> > have been removed from the stack by the time the libSUSI processes it ?
>
> maybe :) I have never been confronted to such problems. I would be great
> if you could detail this, because I am not sure to find out what I can do
> to check this. I joined my component sources in case you need this to
> explain something, but as I didn't reduce the problem to a tiny example I
> am not expecting you to read it.
>

Is it possible that configureHook() was not called ? You should make your
component PreOperational in the constructor such that configure() is forced.
This would avoid calling the lib without a configure.

Peter

Segfault on a working hardware library call in Orocos

2010/11/8 Peter Soetens <peter [..] ...>

> On Monday 08 November 2010 00:25:37 Willy Lambert wrote:
> > 2010/11/7 Peter Soetens <peter [..] ...>
> >
> > > On Sunday 07 November 2010 16:34:02 Willy Lambert wrote:
> > > > here is it, but it don't think you'll have enougth information. I
> also
> > >
> > > read
> > >
> > > > bt in other threads there is nothing more explicit :
> > > ...
> > >
> > > > Program received signal SIGSEGV, Segmentation fault.
> > > > [Switching to Thread 0xb3f9eb70 (LWP 1290)]
> > > > 0xb73f6287 in StartWdtSMSC(unsigned char) () from /usr/lib/
> > >
> > > libSUSI-3.02.so
> > >
> > > > (gdb)
> > > > (gdb)
> > > > (gdb)
> > > > (gdb) bt
> > > > #0 0xb73f6287 in StartWdtSMSC(unsigned char) () from /usr/lib/
> > > > libSUSI-3.02.so
> > > > #1 0xb73f56fc in WDT_DispatchIOCTL(unsigned long, unsigned long*,
> > >
> > > unsigned
> > >
> > > > long, unsigned long*, unsigned long, long*) () from
> > > > /usr/lib/libSUSI-3.02.so #2 0xb73e1b22 in DeviceIoControl(void*,
> > >
> > > unsigned
> > >
> > > > long, void*, unsigned long, void*, unsigned long, unsigned long*,
> > > > void*) () from /usr/lib/ libSUSI-3.02.so
> > > > #3 0xb73eb472 in ?? () from /usr/lib/libSUSI-3.02.so
> > > > #4 0xb73eb0cf in WDT0_DelayProc(void*) () from
> > > > /usr/lib/libSUSI-3.02.so #5 0xb73e2030 in ?? () from
> > > > /usr/lib/libSUSI-3.02.so
> > > > #6 0xb7137955 in start_thread () from /lib/i686/cmov/libpthread.so.0
> > > > #7 0xb7216e7e in clone () from /lib/i686/cmov/libc.so.6
> > > > (gdb)
> > >
> > > One thing we can immediately conclude here is that it's not an Orocos
> > > thread
> > > that segfaults, but a thread created by the libSUSI library.
> >
> > This points all
> >
> > > to that library not being thread safe (somewhat unlikely),
> >
> > many chance it is not. But I have only one component that is calling this
> > Susilib (exept the one susi creates) Can I do something in my code to
> avoid
> > this thread safety problem ?
>
> I don't think so. It's internal to susi (susi created its own thread), so
> susi
> needs to handle it.

arg, bad news. But I'm still worrying about were is the thread safety
problem because it's introduce in by Orocos UpdateHook and Operations. When
I use this WDconfigure function in the configure Hook It works (ie PC board
is rebooted). It's only when switching in Orocos "Operationnal" world that I
have problems. So is there anything happening in Orocos Execution Engine
that could lead to this ?

>
> >
> > > *or*, one of your
> > > functions passing corrupted data to that library, which makes the lib
> > > crash when it's thread processes it.
> >
> > In the backtrace I send I send properties to the function (just updated
> > during configure). I will try it with hard coded values. It also happens
> > with a no argument function :'(
> >
> > > You aren't playing with (const) references to stacked data which might
> > > have been removed from the stack by the time the libSUSI processes it ?
> >
> > maybe :) I have never been confronted to such problems. I would be great
> > if you could detail this, because I am not sure to find out what I can do
> > to check this. I joined my component sources in case you need this to
> > explain something, but as I didn't reduce the problem to a tiny example I
> > am not expecting you to read it.
> >
>
> Is it possible that configureHook() was not called ? You should make your
> component PreOperational in the constructor such that configure() is
> forced.
> This would avoid calling the lib without a configure.
>

I'll try hard-code values and initialization in the constructor.

>
> Peter
>

Segfault on a working hardware library call in Orocos

With initialization of Susi in the constructor it is working since it was
done in the configure hook before.

I am sure that when initialization is done in configure hook it is executed
because I have logs and the component is in running state...

I will go on this solution because I can't live without Susi and I don't
rely on Susi modification possibilities. But I am very interested in
understanding what's going on, but I don't know what to look at.

2010/11/8 Willy Lambert <lambert [dot] willy [..] ...>

>
>
> 2010/11/8 Peter Soetens <peter [..] ...>
>
> On Monday 08 November 2010 00:25:37 Willy Lambert wrote:
>> > 2010/11/7 Peter Soetens <peter [..] ...>
>> >
>> > > On Sunday 07 November 2010 16:34:02 Willy Lambert wrote:
>> > > > here is it, but it don't think you'll have enougth information. I
>> also
>> > >
>> > > read
>> > >
>> > > > bt in other threads there is nothing more explicit :
>> > > ...
>> > >
>> > > > Program received signal SIGSEGV, Segmentation fault.
>> > > > [Switching to Thread 0xb3f9eb70 (LWP 1290)]
>> > > > 0xb73f6287 in StartWdtSMSC(unsigned char) () from /usr/lib/
>> > >
>> > > libSUSI-3.02.so
>> > >
>> > > > (gdb)
>> > > > (gdb)
>> > > > (gdb)
>> > > > (gdb) bt
>> > > > #0 0xb73f6287 in StartWdtSMSC(unsigned char) () from /usr/lib/
>> > > > libSUSI-3.02.so
>> > > > #1 0xb73f56fc in WDT_DispatchIOCTL(unsigned long, unsigned long*,
>> > >
>> > > unsigned
>> > >
>> > > > long, unsigned long*, unsigned long, long*) () from
>> > > > /usr/lib/libSUSI-3.02.so #2 0xb73e1b22 in DeviceIoControl(void*,
>> > >
>> > > unsigned
>> > >
>> > > > long, void*, unsigned long, void*, unsigned long, unsigned long*,
>> > > > void*) () from /usr/lib/ libSUSI-3.02.so
>> > > > #3 0xb73eb472 in ?? () from /usr/lib/libSUSI-3.02.so
>> > > > #4 0xb73eb0cf in WDT0_DelayProc(void*) () from
>> > > > /usr/lib/libSUSI-3.02.so #5 0xb73e2030 in ?? () from
>> > > > /usr/lib/libSUSI-3.02.so
>> > > > #6 0xb7137955 in start_thread () from
>> /lib/i686/cmov/libpthread.so.0
>> > > > #7 0xb7216e7e in clone () from /lib/i686/cmov/libc.so.6
>> > > > (gdb)
>> > >
>> > > One thing we can immediately conclude here is that it's not an Orocos
>> > > thread
>> > > that segfaults, but a thread created by the libSUSI library.
>> >
>> > This points all
>> >
>> > > to that library not being thread safe (somewhat unlikely),
>> >
>> > many chance it is not. But I have only one component that is calling
>> this
>> > Susilib (exept the one susi creates) Can I do something in my code to
>> avoid
>> > this thread safety problem ?
>>
>> I don't think so. It's internal to susi (susi created its own thread), so
>> susi
>> needs to handle it.
>
>
> arg, bad news. But I'm still worrying about were is the thread safety
> problem because it's introduce in by Orocos UpdateHook and Operations. When
> I use this WDconfigure function in the configure Hook It works (ie PC board
> is rebooted). It's only when switching in Orocos "Operationnal" world that I
> have problems. So is there anything happening in Orocos Execution Engine
> that could lead to this ?
>
>
>
>
>>
>> >
>> > > *or*, one of your
>> > > functions passing corrupted data to that library, which makes the lib
>> > > crash when it's thread processes it.
>> >
>> > In the backtrace I send I send properties to the function (just updated
>> > during configure). I will try it with hard coded values. It also happens
>> > with a no argument function :'(
>> >
>> > > You aren't playing with (const) references to stacked data which might
>> > > have been removed from the stack by the time the libSUSI processes it
>> ?
>> >
>> > maybe :) I have never been confronted to such problems. I would be
>> great
>> > if you could detail this, because I am not sure to find out what I can
>> do
>> > to check this. I joined my component sources in case you need this to
>> > explain something, but as I didn't reduce the problem to a tiny example
>> I
>> > am not expecting you to read it.
>> >
>>
>> Is it possible that configureHook() was not called ? You should make your
>> component PreOperational in the constructor such that configure() is
>> forced.
>> This would avoid calling the lib without a configure.
>>
>
> I'll try hard-code values and initialization in the constructor.
>
>
>
>>
>> Peter
>>
>
>