Starting & stopping slaves in master component

All,

Revisiting the master/slave scenario (with FBSched), I've tried to have the master start (and stop) its slaves, but am experiencing segfaults. I've tried overriding TaskContext::start/stop() (start/stop slaves first, then calling super::start/stop()) and using the start/stopHook() methods. All result in a segfaults in SlaveActivity::trigger() at line 182 (return mmaster->trigger()).

What would be the right place / time to start or stop a slave?

Starting & stopping slaves in master component

On Tue, Jun 05, 2012 at 07:07:12AM +0000, g ah wrote:
>
> All,
>
> Revisiting the master/slave scenario (with FBSched), I've tried to
> have the master start (and stop) its slaves, but am experiencing
> segfaults. I've tried overriding TaskContext::start/stop()
> (start/stop slaves first, then calling super::start/stop()) and
> using the start/stopHook() methods. All result in a segfaults in
> SlaveActivity::trigger() at line 182 (return mmaster->trigger()).
>
> What would be the right place / time to start or stop a slave?

I would prefer to do it externally from a (scripted)
Coordination/Configuration component. But either way it shouldn't
crash. If you can provide a self-contained, minimal testcase that
crashes we can try to track it down.

Thanks.
Markus

Starting & stopping slaves in master component

Markus Klotzbuecher wrote:
> On Tue, Jun 05, 2012 at 07:07:12AM +0000, g ah wrote:
>> All,
>>
>> Revisiting the master/slave scenario (with FBSched), I've tried to
>> have the master start (and stop) its slaves, but am experiencing
>> segfaults. I've tried overriding TaskContext::start/stop()
>> (start/stop slaves first, then calling super::start/stop()) and
>> using the start/stopHook() methods. All result in a segfaults in
>> SlaveActivity::trigger() at line 182 (return mmaster->trigger()).
>>
>> What would be the right place / time to start or stop a slave?
>
> I would prefer to do it externally from a (scripted)
> Coordination/Configuration component. But either way it shouldn't
> crash. If you can provide a self-contained, minimal testcase that
> crashes we can try to track it down.

The attachment contains a MWE of the described behaviour. What I'm trying to do is not so different from what W. Lambert does in [1]. I might be doing it wrong though.

I agree with you on the scripted coordinator/configurator, but isn't the FBSched component a (minimal) example of such a component?

thank you,

[1] http://lists.mech.kuleuven.be/pipermail/orocos-users/2012-January/004792...

Starting & stopping slaves in master component

On Tue, Jun 05, 2012 at 09:27:49AM +0000, g ah wrote:
>
> Markus Klotzbuecher wrote:
> > On Tue, Jun 05, 2012 at 07:07:12AM +0000, g ah wrote:
> >> All,
> >>
> >> Revisiting the master/slave scenario (with FBSched), I've tried to
> >> have the master start (and stop) its slaves, but am experiencing
> >> segfaults. I've tried overriding TaskContext::start/stop()
> >> (start/stop slaves first, then calling super::start/stop()) and
> >> using the start/stopHook() methods. All result in a segfaults in
> >> SlaveActivity::trigger() at line 182 (return mmaster->trigger()).
> >>
> >> What would be the right place / time to start or stop a slave?
> >
> > I would prefer to do it externally from a (scripted)
> > Coordination/Configuration component. But either way it shouldn't
> > crash. If you can provide a self-contained, minimal testcase that
> > crashes we can try to track it down.
>
> The attachment contains a MWE of the described behaviour. What I'm
> trying to do is not so different from what W. Lambert does in [1]. I
> might be doing it wrong though.

How do you run it? Running slave_test.ops and starting and stopping fb
does not crash for me.

> I agree with you on the scripted coordinator/configurator, but isn't
> the FBSched component a (minimal) example of such a component?

They are similar but constructed with very different goals.

FBsched offers a low-latency scheduling of components with the goal to
permit to (or better justify overheadwise) the (mis-) use of
components as if they were function blocks.

Coordination supervises and monitors, but (although real-time safe) it
is typically not optimized (performancewise) for such "dumb"
sequencing. Coordination, for instance, could effect a reconfiguration
of the sched_order.

Markus

Starting & stopping slaves in master component

Markus Klotzbuecher wrote:
> On Tue, Jun 05, 2012 at 09:27:49AM +0000, g ah wrote:
>> Markus Klotzbuecher wrote:
>>> On Tue, Jun 05, 2012 at 07:07:12AM +0000, g ah wrote:
>>>> All,
>>>>
>>>> Revisiting the master/slave scenario (with FBSched), I've tried to
>>>> have the master start (and stop) its slaves, but am experiencing
>>>> segfaults. I've tried overriding TaskContext::start/stop()
>>>> (start/stop slaves first, then calling super::start/stop()) and
>>>> using the start/stopHook() methods. All result in a segfaults in
>>>> SlaveActivity::trigger() at line 182 (return mmaster->trigger()).
>>>>
>>>> What would be the right place / time to start or stop a slave?
>>> I would prefer to do it externally from a (scripted)
>>> Coordination/Configuration component. But either way it shouldn't
>>> crash. If you can provide a self-contained, minimal testcase that
>>> crashes we can try to track it down.
>> The attachment contains a MWE of the described behaviour. What I'm
>> trying to do is not so different from what W. Lambert does in [1]. I
>> might be doing it wrong though.
>
> How do you run it? Running slave_test.ops and starting and stopping fb
> does not crash for me.

Just starting the deployer with '-s slave_test.ops', waiting for it to load and finally a 'fb.start() <enter>' results in an immediate segfault here. I've put the console output on pastebin (http://pastebin.com/XJi1vSda).

More info on the system:

Ubuntu Lucid
ROS Electric
with orocos_toolchain and all ros-electric-rtt* packages installed
orocos_toolchain v0.5.1-s1336558277~lucid

exact command line to start the MWE:

roscd fbsched
rosrun ocl deployer-gnulinux -s slave_test.ops

>> I agree with you on the scripted coordinator/configurator, but isn't
>> the FBSched component a (minimal) example of such a component?
>
> They are similar but constructed with very different goals.
>
> FBsched offers a low-latency scheduling of components with the goal to
> permit to (or better justify overheadwise) the (mis-) use of
> components as if they were function blocks.
>
> Coordination supervises and monitors, but (although real-time safe) it
> is typically not optimized (performancewise) for such "dumb"
> sequencing. Coordination, for instance, could effect a reconfiguration
> of the sched_order.

Starting & stopping slaves in master component

On Wed, Jun 06, 2012 at 02:09:19PM +0000, g ah wrote:
>
> Markus Klotzbuecher wrote:
> > On Tue, Jun 05, 2012 at 09:27:49AM +0000, g ah wrote:
> >> Markus Klotzbuecher wrote:
> >>> On Tue, Jun 05, 2012 at 07:07:12AM +0000, g ah wrote:
> >>>> All,
> >>>>
> >>>> Revisiting the master/slave scenario (with FBSched), I've tried to
> >>>> have the master start (and stop) its slaves, but am experiencing
> >>>> segfaults. I've tried overriding TaskContext::start/stop()
> >>>> (start/stop slaves first, then calling super::start/stop()) and
> >>>> using the start/stopHook() methods. All result in a segfaults in
> >>>> SlaveActivity::trigger() at line 182 (return mmaster->trigger()).
> >>>>
> >>>> What would be the right place / time to start or stop a slave?
> >>> I would prefer to do it externally from a (scripted)
> >>> Coordination/Configuration component. But either way it shouldn't
> >>> crash. If you can provide a self-contained, minimal testcase that
> >>> crashes we can try to track it down.
> >> The attachment contains a MWE of the described behaviour. What I'm
> >> trying to do is not so different from what W. Lambert does in [1]. I
> >> might be doing it wrong though.
> >
> > How do you run it? Running slave_test.ops and starting and stopping fb
> > does not crash for me.
>
> Just starting the deployer with '-s slave_test.ops', waiting for it to load and finally a 'fb.start() <enter>' results in an immediate segfault here. I've put the console output on pastebin (http://pastebin.com/XJi1vSda).
>
> More info on the system:
>
> Ubuntu Lucid
> ROS Electric
> with orocos_toolchain and all ros-electric-rtt* packages installed
> orocos_toolchain v0.5.1-s1336558277~lucid
>
> exact command line to start the MWE:
>
> roscd fbsched
> rosrun ocl deployer-gnulinux -s slave_test.ops

This doesn't crash for me.
Is that a 32bit ubuntu?

Markus

Starting & stopping slaves in master component

Markus Klotzbuecher wrote:
> On Wed, Jun 06, 2012 at 02:09:19PM +0000, g ah wrote:
>> Markus Klotzbuecher wrote:
>>> On Tue, Jun 05, 2012 at 09:27:49AM +0000, g ah wrote:
>>>> Markus Klotzbuecher wrote:
>>>>> On Tue, Jun 05, 2012 at 07:07:12AM +0000, g ah wrote:
>>>>>> All,
>>>>>>
>>>>>> Revisiting the master/slave scenario (with FBSched), I've tried to
>>>>>> have the master start (and stop) its slaves, but am experiencing
>>>>>> segfaults. I've tried overriding TaskContext::start/stop()
>>>>>> (start/stop slaves first, then calling super::start/stop()) and
>>>>>> using the start/stopHook() methods. All result in a segfaults in
>>>>>> SlaveActivity::trigger() at line 182 (return mmaster->trigger()).
>>>>>>
>>>>>> What would be the right place / time to start or stop a slave?
>>>>> I would prefer to do it externally from a (scripted)
>>>>> Coordination/Configuration component. But either way it shouldn't
>>>>> crash. If you can provide a self-contained, minimal testcase that
>>>>> crashes we can try to track it down.
>>>> The attachment contains a MWE of the described behaviour. What I'm
>>>> trying to do is not so different from what W. Lambert does in [1]. I
>>>> might be doing it wrong though.
>>> How do you run it? Running slave_test.ops and starting and stopping fb
>>> does not crash for me.
>> Just starting the deployer with '-s slave_test.ops', waiting for it to load and finally a 'fb.start() <enter>' results in an immediate segfault here. I've put the console output on pastebin (http://pastebin.com/XJi1vSda).
>>
>> More info on the system:
>>
>> Ubuntu Lucid
>> ROS Electric
>> with orocos_toolchain and all ros-electric-rtt* packages installed
>> orocos_toolchain v0.5.1-s1336558277~lucid
>>
>> exact command line to start the MWE:
>>
>> roscd fbsched
>> rosrun ocl deployer-gnulinux -s slave_test.ops
>
> This doesn't crash for me.
> Is that a 32bit ubuntu?

Yes.

I first thought my compiled-from-source orocos_toolchain was to blame, so I switched to the ROS provided one, but as you can see, that didn't change anything.

Starting & stopping slaves in master component

g ah wrote:
> Markus Klotzbuecher wrote:
>> On Wed, Jun 06, 2012 at 02:09:19PM +0000, g ah wrote:
>>> Markus Klotzbuecher wrote:
>>>> On Tue, Jun 05, 2012 at 09:27:49AM +0000, g ah wrote:
>>>>> Markus Klotzbuecher wrote:
>>>>>> On Tue, Jun 05, 2012 at 07:07:12AM +0000, g ah wrote:
>>>>>>> All,
>>>>>>>
>>>>>>> Revisiting the master/slave scenario (with FBSched), I've tried to
>>>>>>> have the master start (and stop) its slaves, but am experiencing
>>>>>>> segfaults. I've tried overriding TaskContext::start/stop()
>>>>>>> (start/stop slaves first, then calling super::start/stop()) and
>>>>>>> using the start/stopHook() methods. All result in a segfaults in
>>>>>>> SlaveActivity::trigger() at line 182 (return mmaster->trigger()).
>>>>>>>
>>>>>>> What would be the right place / time to start or stop a slave?
>>>>>> I would prefer to do it externally from a (scripted)
>>>>>> Coordination/Configuration component. But either way it shouldn't
>>>>>> crash. If you can provide a self-contained, minimal testcase that
>>>>>> crashes we can try to track it down.
>>>>> The attachment contains a MWE of the described behaviour. What I'm
>>>>> trying to do is not so different from what W. Lambert does in [1]. I
>>>>> might be doing it wrong though.
>>>> How do you run it? Running slave_test.ops and starting and stopping fb
>>>> does not crash for me.
>>> Just starting the deployer with '-s slave_test.ops', waiting for it to load and finally a 'fb.start() <enter>' results in an immediate segfault here. I've put the console output on pastebin (http://pastebin.com/XJi1vSda).
>>>
>>> More info on the system:
>>>
>>> Ubuntu Lucid
>>> ROS Electric
>>> with orocos_toolchain and all ros-electric-rtt* packages installed
>>> orocos_toolchain v0.5.1-s1336558277~lucid
>>>
>>> exact command line to start the MWE:
>>>
>>> roscd fbsched
>>> rosrun ocl deployer-gnulinux -s slave_test.ops
>> This doesn't crash for me.
>> Is that a 32bit ubuntu?
>
> Yes.
>
> I first thought my compiled-from-source orocos_toolchain was to blame, so I switched to the ROS provided one, but as you can see, that didn't change anything.

It also sometimes crashes on a <ctrl+d> from the deployer, without starting anything.

Starting & stopping slaves in master component

On Wed, Jun 06, 2012 at 02:32:06PM +0000, g ah wrote:
>
> g ah wrote:
> > Markus Klotzbuecher wrote:
> >> On Wed, Jun 06, 2012 at 02:09:19PM +0000, g ah wrote:
> >>> Markus Klotzbuecher wrote:
> >>>> On Tue, Jun 05, 2012 at 09:27:49AM +0000, g ah wrote:
> >>>>> Markus Klotzbuecher wrote:
> >>>>>> On Tue, Jun 05, 2012 at 07:07:12AM +0000, g ah wrote:
> >>>>>>> All,
> >>>>>>>
> >>>>>>> Revisiting the master/slave scenario (with FBSched), I've tried to
> >>>>>>> have the master start (and stop) its slaves, but am experiencing
> >>>>>>> segfaults. I've tried overriding TaskContext::start/stop()
> >>>>>>> (start/stop slaves first, then calling super::start/stop()) and
> >>>>>>> using the start/stopHook() methods. All result in a segfaults in
> >>>>>>> SlaveActivity::trigger() at line 182 (return mmaster->trigger()).
> >>>>>>>
> >>>>>>> What would be the right place / time to start or stop a slave?
> >>>>>> I would prefer to do it externally from a (scripted)
> >>>>>> Coordination/Configuration component. But either way it shouldn't
> >>>>>> crash. If you can provide a self-contained, minimal testcase that
> >>>>>> crashes we can try to track it down.
> >>>>> The attachment contains a MWE of the described behaviour. What I'm
> >>>>> trying to do is not so different from what W. Lambert does in [1]. I
> >>>>> might be doing it wrong though.
> >>>> How do you run it? Running slave_test.ops and starting and stopping fb
> >>>> does not crash for me.
> >>> Just starting the deployer with '-s slave_test.ops', waiting for it to load and finally a 'fb.start() <enter>' results in an immediate segfault here. I've put the console output on pastebin (http://pastebin.com/XJi1vSda).
> >>>
> >>> More info on the system:
> >>>
> >>> Ubuntu Lucid
> >>> ROS Electric
> >>> with orocos_toolchain and all ros-electric-rtt* packages installed
> >>> orocos_toolchain v0.5.1-s1336558277~lucid
> >>>
> >>> exact command line to start the MWE:
> >>>
> >>> roscd fbsched
> >>> rosrun ocl deployer-gnulinux -s slave_test.ops
> >> This doesn't crash for me.
> >> Is that a 32bit ubuntu?
> >
> > Yes.
> >
> > I first thought my compiled-from-source orocos_toolchain was to blame, so I switched to the ROS provided one, but as you can see, that didn't change anything.
>
> It also sometimes crashes on a <ctrl+d> from the deployer, without starting anything.

Hmm, are you sure everything is compiled correctly? Could there by old
components or typekits from the previous install lying around
somewhere?

Markus

Starting & stopping slaves in master component

Markus Klotzbuecher wrote:
> On Wed, Jun 06, 2012 at 02:32:06PM +0000, g ah wrote:
>> g ah wrote:
>>> Markus Klotzbuecher wrote:
>>>> On Wed, Jun 06, 2012 at 02:09:19PM +0000, g ah wrote:
>>>>> Markus Klotzbuecher wrote:
>>>>>> On Tue, Jun 05, 2012 at 09:27:49AM +0000, g ah wrote:
>>>>>>> Markus Klotzbuecher wrote:
>>>>>>>> On Tue, Jun 05, 2012 at 07:07:12AM +0000, g ah wrote:
>>>>>>>>> All,
>>>>>>>>>
>>>>>>>>> Revisiting the master/slave scenario (with FBSched), I've tried to
>>>>>>>>> have the master start (and stop) its slaves, but am experiencing
>>>>>>>>> segfaults. I've tried overriding TaskContext::start/stop()
>>>>>>>>> (start/stop slaves first, then calling super::start/stop()) and
>>>>>>>>> using the start/stopHook() methods. All result in a segfaults in
>>>>>>>>> SlaveActivity::trigger() at line 182 (return mmaster->trigger()).
>>>>>>>>>
>>>>>>>>> What would be the right place / time to start or stop a slave?
>>>>>>>> I would prefer to do it externally from a (scripted)
>>>>>>>> Coordination/Configuration component. But either way it shouldn't
>>>>>>>> crash. If you can provide a self-contained, minimal testcase that
>>>>>>>> crashes we can try to track it down.
>>>>>>> The attachment contains a MWE of the described behaviour. What I'm
>>>>>>> trying to do is not so different from what W. Lambert does in [1]. I
>>>>>>> might be doing it wrong though.
>>>>>> How do you run it? Running slave_test.ops and starting and stopping fb
>>>>>> does not crash for me.
>>>>> Just starting the deployer with '-s slave_test.ops', waiting for it to load and finally a 'fb.start() <enter>' results in an immediate segfault here. I've put the console output on pastebin (http://pastebin.com/XJi1vSda).
>>>>>
>>>>> More info on the system:
>>>>>
>>>>> Ubuntu Lucid
>>>>> ROS Electric
>>>>> with orocos_toolchain and all ros-electric-rtt* packages installed
>>>>> orocos_toolchain v0.5.1-s1336558277~lucid
>>>>>
>>>>> exact command line to start the MWE:
>>>>>
>>>>> roscd fbsched
>>>>> rosrun ocl deployer-gnulinux -s slave_test.ops
>>>> This doesn't crash for me.
>>>> Is that a 32bit ubuntu?
>>> Yes.
>>>
>>> I first thought my compiled-from-source orocos_toolchain was to blame, so I switched to the ROS provided one, but as you can see, that didn't change anything.
>> It also sometimes crashes on a <ctrl+d> from the deployer, without starting anything.
>
> Hmm, are you sure everything is compiled correctly? Could there by old
> components or typekits from the previous install lying around
> somewhere?

AFAICT everything is at it should be: I've removed the from-source-compiled orocos_toolchain, installed the ROS packages, closed all active terminals, started a new one, checked the environment (all OROCOS related variables point to '/opt/ros/stacks/..'), untarred a fresh fbsched-mwe (one I sent you) and did a 'rosmake' in it. 'ldd' shows:

====

gah@machine:~/ros/stacks/fbsched-mwe$ ldd lib/orocos/gnulinux/libfbsched-mwe-gnulinux.so
linux-gate.so.1 => (0xb7749000)
liborocos-rtt-gnulinux.so.2.5 => /opt/ros/electric/stacks/orocos_toolchain/install/lib/liborocos-rtt-gnulinux.so.2.5 (0xb7543000)
libboost_filesystem.so.1.40.0 => /usr/lib/libboost_filesystem.so.1.40.0 (0xb7511000)
libboost_system.so.1.40.0 => /usr/lib/libboost_system.so.1.40.0 (0xb750c000)
libboost_serialization.so.1.40.0 => /usr/lib/libboost_serialization.so.1.40.0 (0xb74a0000)
libpthread.so.0 => /lib/tls/i686/cmov/libpthread.so.0 (0xb7487000)
librt.so.1 => /lib/tls/i686/cmov/librt.so.1 (0xb747e000)
libdl.so.2 => /lib/tls/i686/cmov/libdl.so.2 (0xb7479000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0xb7384000)
libm.so.6 => /lib/tls/i686/cmov/libm.so.6 (0xb735e000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xb733f000)
libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0xb71e6000)
/lib/ld-linux.so.2 (0xb774a000)

====

Again, see pastebin for log of command line output: http://pastebin.com/vkS2gDmB

I'm not having any other problems with this machine (stability or other), or I'd start to think suspect corruption. GDB also shows the 'mmaster' pointer to be valid, or at least, it is not some weird value. Trying to 'print *mmaster' doesn't work though, as GDB complains about its vtable pointer.

I'll try and see if I can test this on another machine.

PS: just got another crash, but this time in RTT::ExecutionEngine::stopTask, at ExecutionEngine.cpp:401. Again, nothing seems out of the ordinary, but see: http://pastebin.com/ghtC3qdB.

Starting & stopping slaves in master component

g ah wrote:
> Markus Klotzbuecher wrote:
>> On Wed, Jun 06, 2012 at 02:32:06PM +0000, g ah wrote:
>>> g ah wrote:
>>>> Markus Klotzbuecher wrote:
>>>>> On Wed, Jun 06, 2012 at 02:09:19PM +0000, g ah wrote:
>>>>>> More info on the system:
>>>>>>
>>>>>> Ubuntu Lucid
>>>>>> ROS Electric
>>>>>> with orocos_toolchain and all ros-electric-rtt* packages installed
>>>>>> orocos_toolchain v0.5.1-s1336558277~lucid
>>>>>>
>>>>>> exact command line to start the MWE:
>>>>>>
>>>>>> roscd fbsched
>>>>>> rosrun ocl deployer-gnulinux -s slave_test.ops
>>>>> This doesn't crash for me.
[..]
>
> I'm not having any other problems with this machine (stability or other), or I'd start to suspect memory corruption. GDB also shows the 'mmaster' pointer to be valid, or at least, it is not some weird value. Trying to 'print *mmaster' doesn't work though, as GDB complains about its vtable pointer.
>
> I'll try and see if I can test this on another machine.

Fresh install of Oneiric 32bit on another machine (exact same model) with ROS Electric orocos_toolchain (0.5.1-s1336583372~oneiric) gives the same result.

Have you tried to run slave_test.ops multiple times? Cause once in a while it does work for me, the other times it just crashes.

Configured period also seems to influence the chances of it crashing.

Just observed a crash at deployer exit with <ctrl+d>, no usable backtrace and GDB complains about:

=====

Backtrace stopped: Not enough registers or memory available to unwind further

=====

Any ideas?

workaround for crash (was: RE: Starting & stopping slaves in mas

g ah wrote:
> g ah wrote:
>> Markus Klotzbuecher wrote:
>>> On Wed, Jun 06, 2012 at 02:32:06PM +0000, g ah wrote:
>>>> g ah wrote:
>>>>> Markus Klotzbuecher wrote:
>>>>>> On Wed, Jun 06, 2012 at 02:09:19PM +0000, g ah wrote:
>>>>>>> More info on the system:
>>>>>>>
>>>>>>> Ubuntu Lucid
>>>>>>> ROS Electric
>>>>>>> with orocos_toolchain and all ros-electric-rtt* packages installed
>>>>>>> orocos_toolchain v0.5.1-s1336558277~lucid
>>>>>>>
>>>>>>> exact command line to start the MWE:
>>>>>>>
>>>>>>> roscd fbsched
>>>>>>> rosrun ocl deployer-gnulinux -s slave_test.ops
>>>>>> This doesn't crash for me.
> [..]
>>
>> I'm not having any other problems with this machine (stability or other), or I'd start to suspect memory corruption. GDB also shows the 'mmaster' pointer to be valid, or at least, it is not some weird value. Trying to 'print *mmaster' doesn't work though, as GDB complains about its vtable pointer.
>>
>> I'll try and see if I can test this on another machine.
>
> Fresh install of Oneiric 32bit on another machine (exact same model) with ROS Electric orocos_toolchain (0.5.1-s1336583372~oneiric) gives the same result.
>
> Have you tried to run slave_test.ops multiple times? Cause once in a while it does work for me, the other times it just crashes.
>
> Configured period also seems to influence the chances of it crashing.
>
> Just observed a crash at deployer exit with <ctrl+d>, no usable backtrace and GDB complains about:
>
> =====
>
> Backtrace stopped: Not enough registers or memory available to unwind further
>
> =====
>
> Any ideas?

ok, so it seems I've tracked down the cause of my segfaults: the master component needed to be configured with an Activity _before_ the setMasterSlaveActivity statement in the script. Same problem as in [1].

Does the ROS orocos_toolchain not include the mentioned fix? I also built orocos_toolchain from source (beginning of April, commit 81b3c549), but it also crashes.

workaround for crash (was: RE: Starting & stopping slaves in mas

On Sat, Jun 9, 2012 at 10:22 AM, g ah <gaohml [..] ...> wrote:

>
> g ah wrote:
> > g ah wrote:
> >> Markus Klotzbuecher wrote:
> >>> On Wed, Jun 06, 2012 at 02:32:06PM +0000, g ah wrote:
> >>>> g ah wrote:
> >>>>> Markus Klotzbuecher wrote:
> >>>>>> On Wed, Jun 06, 2012 at 02:09:19PM +0000, g ah wrote:
> >>>>>>> More info on the system:
> >>>>>>>
> >>>>>>> Ubuntu Lucid
> >>>>>>> ROS Electric
> >>>>>>> with orocos_toolchain and all ros-electric-rtt* packages
> installed
> >>>>>>> orocos_toolchain v0.5.1-s1336558277~lucid
> >>>>>>>
> >>>>>>> exact command line to start the MWE:
> >>>>>>>
> >>>>>>> roscd fbsched
> >>>>>>> rosrun ocl deployer-gnulinux -s slave_test.ops
> >>>>>> This doesn't crash for me.
> > [..]
> >>
> >> I'm not having any other problems with this machine (stability or
> other), or I'd start to suspect memory corruption. GDB also shows the
> 'mmaster' pointer to be valid, or at least, it is not some weird value.
> Trying to 'print *mmaster' doesn't work though, as GDB complains about its
> vtable pointer.
> >>
> >> I'll try and see if I can test this on another machine.
> >
> > Fresh install of Oneiric 32bit on another machine (exact same model)
> with ROS Electric orocos_toolchain (0.5.1-s1336583372~oneiric) gives the
> same result.
> >
> > Have you tried to run slave_test.ops multiple times? Cause once in a
> while it does work for me, the other times it just crashes.
> >
> > Configured period also seems to influence the chances of it crashing.
> >
> > Just observed a crash at deployer exit with <ctrl+d>, no usable
> backtrace and GDB complains about:
> >
> > =====
> >
> > Backtrace stopped: Not enough registers or memory available to unwind
> further
> >
> > =====
> >
> > Any ideas?
>
> ok, so it seems I've tracked down the cause of my segfaults: the master
> component needed to be configured with an Activity _before_ the
> setMasterSlaveActivity statement in the script. Same problem as in [1].
>
> Does the ROS orocos_toolchain not include the mentioned fix? I also built
> orocos_toolchain from source (beginning of April, commit 81b3c549), but it
> also crashes.
>

The git merge strategy we're using guarantees that any patch on a 2.x
branch is also on a 2.x+1 branch. So maybe it's something else, which
wasn't fixed yet.

The problem you're having is indeed that if you use setMasterSlaveActivity
and then change the activity of the master, the slaves still use the old
master pointer, which has been deleted, with your crashes as a consequence.

A more logical thing to do in each slave would be to keep track of the
master TaskContext* or ExecutionEngine*, which does not change when you
change activities. You'd still need to take care that the slaves components
are unloaded before the master component, or you'd have similar
behavior/crashes.

So it's indeed a genuine bug in this mechanism.

Peter

workaround for crash (was: RE: Starting & stopping slaves in mas

g ah wrote:
[..]
> ok, so it seems I've tracked down the cause of my segfaults: the master component needed to be configured with an Activity _before_ the setMasterSlaveActivity statement in the script. Same problem as in [1].

with [1] being: http://lists.mech.kuleuven.be/pipermail/orocos-users/2011-April/003653.html

sorry about that