Segfault when starting a state machine in [D] state

Submitted by willy on Wed, 2011-01-05 09:16

Orocos-users

Hi all,

I am using Orocos 1.10 and have a segfault when start a state machine in a
[D] state (desactivating).
The state machine is in this state because I desactivated it (and I think I
stop it). I think it is in a [D] state because a command has blocked in the
exit state.

If I start it again Orocos end in segfault in StateMachine.cpp line 126
(function "automatic") because of the "current' variable.
I am sorry not being able to give more details about this, I am at a
customer office with no easy web access (I am *not* requested an emergency
answer).

I think the problem which leads to this is in my app, but I think Orocos
should not end in segfault.

Segfault when starting a state machine in [D] state

Submitted by peter on Tue, 2011-01-18 14:32.

On Wednesday 05 January 2011 10:15:58 Willy Lambert wrote:
> Hi all,
>
> I am using Orocos 1.10 and have a segfault when start a state machine in a
> [D] state (desactivating).
> The state machine is in this state because I desactivated it (and I think I
> stop it). I think it is in a [D] state because a command has blocked in the
> exit state.
>
> If I start it again Orocos end in segfault in StateMachine.cpp line 126
> (function "automatic") because of the "current' variable.
> I am sorry not being able to give more details about this, I am at a
> customer office with no easy web access (I am *not* requested an emergency
> answer).
>
> I think the problem which leads to this is in my app, but I think Orocos
> should not end in segfault.

I could reproduce this as well with an error state. To know what the best fix
would be, I'm asking the users of these scripts what they do when
1. the SM gets in the error state
OR
2. a command of the SM hangs forever.

Both situations boil down to the same (the current line refuses to execute or
complete). Since it's not documented what the behaviour should be, I prefer to
keep existing code working.

Do you:
1. call stop() then deactivate() ?
2. call deactivate() twice ?
3. call stop() and try to start() again ?

Which ones of these three are working now and should keep working ?

What we could do is:
1. allow to deactivate (twice if necessary)
and
2. allow to stop() (go to final state) and then reset() or start() or
deactivate()

Please provide your input.

Peter

Segfault when starting a state machine in [D] state

Submitted by willy on Tue, 2011-01-18 15:24.

2011/1/18 Peter Soetens <peter [..] ...>

> On Wednesday 05 January 2011 10:15:58 Willy Lambert wrote:
> > Hi all,
> >
> > I am using Orocos 1.10 and have a segfault when start a state machine in
> a
> > [D] state (desactivating).
> > The state machine is in this state because I desactivated it (and I think
> I
> > stop it). I think it is in a [D] state because a command has blocked in
> the
> > exit state.
> >
> > If I start it again Orocos end in segfault in StateMachine.cpp line 126
> > (function "automatic") because of the "current' variable.
> > I am sorry not being able to give more details about this, I am at a
> > customer office with no easy web access (I am *not* requested an
> emergency
> > answer).
> >
> > I think the problem which leads to this is in my app, but I think Orocos
> > should not end in segfault.
>
> I could reproduce this as well with an error state.

Did you test this on several versions ? Is it appearing in 1.12 ? 2.x ?

> To know what the best fix
> would be, I'm asking the users of these scripts what they do when
> 1. the SM gets in the error state
> OR
> 2. a command of the SM hangs forever.
>
> Both situations boil down to the same (the current line refuses to execute
> or
> complete). Since it's not documented what the behaviour should be, I prefer
> to
> keep existing code working.
>
> Do you:
> 1. call stop() then deactivate() ?
> 2. call deactivate() twice ?
> 3. call stop() and try to start() again ?
>
> Which ones of these three are working now and should keep working ?

> What we could do is:
> 1. allow to deactivate (twice if necessary)
> and
> 2. allow to stop() (go to final state) and then reset() or start() or
> deactivate()
>

A have a preference for the second one, since it is a normal case to stop,
and then deactivate.
Deactivating twice is a kind of "trics that you have to know, newbie" :p
(newbie is not pejorative)

>
> Please provide your input.
>
> Peter
>

Segfault when starting a state machine in [D] state

Submitted by snrkiwi on Tue, 2011-01-18 14:36.

On Jan 18, 2011, at 09:29 , Peter Soetens wrote:

> On Wednesday 05 January 2011 10:15:58 Willy Lambert wrote:
>> Hi all,
>>
>> I am using Orocos 1.10 and have a segfault when start a state machine in a
>> [D] state (desactivating).
>> The state machine is in this state because I desactivated it (and I think I
>> stop it). I think it is in a [D] state because a command has blocked in the
>> exit state.
>>
>> If I start it again Orocos end in segfault in StateMachine.cpp line 126
>> (function "automatic") because of the "current' variable.
>> I am sorry not being able to give more details about this, I am at a
>> customer office with no easy web access (I am *not* requested an emergency
>> answer).
>>
>> I think the problem which leads to this is in my app, but I think Orocos
>> should not end in segfault.
>
> I could reproduce this as well with an error state. To know what the best fix
> would be, I'm asking the users of these scripts what they do when
> 1. the SM gets in the error state
> OR
> 2. a command of the SM hangs forever.
>
> Both situations boil down to the same (the current line refuses to execute or
> complete). Since it's not documented what the behaviour should be, I prefer to
> keep existing code working.

> Do you:
> 1. call stop() then deactivate() ?
> 2. call deactivate() twice ?
> 3. call stop() and try to start() again ?
>
> Which ones of these three are working now and should keep working ?

This happens so seldom to us, as it is typically a development bug. Our working systems don't even deal with errors in state machines - we engineer our state machines to not error as much as we reasonably can. So overall, we don't care (in v1.10).

> What we could do is:
> 1. allow to deactivate (twice if necessary)
> and
> 2. allow to stop() (go to final state) and then reset() or start() or
> deactivate()

Having said that, whenever we get SM problems it's invariably during deactivation while shutting down. And that is more due to the deployer's inability to shutdown a system cleanly when the user asks to quit.

I presume in 1) above that the second deactivate is a forceful one (ie ignores the error)? In 2), would this stop() also be forceful?
S

Segfault when starting a state machine in [D] state

Submitted by willy on Tue, 2011-01-18 15:32.

2011/1/18 S Roderick <kiwi [dot] net [..] ...>

> On Jan 18, 2011, at 09:29 , Peter Soetens wrote:
>
> > On Wednesday 05 January 2011 10:15:58 Willy Lambert wrote:
> >> Hi all,
> >>
> >> I am using Orocos 1.10 and have a segfault when start a state machine in
> a
> >> [D] state (desactivating).
> >> The state machine is in this state because I desactivated it (and I
> think I
> >> stop it). I think it is in a [D] state because a command has blocked in
> the
> >> exit state.
> >>
> >> If I start it again Orocos end in segfault in StateMachine.cpp line 126
> >> (function "automatic") because of the "current' variable.
> >> I am sorry not being able to give more details about this, I am at a
> >> customer office with no easy web access (I am *not* requested an
> emergency
> >> answer).
> >>
> >> I think the problem which leads to this is in my app, but I think Orocos
> >> should not end in segfault.
> >
> > I could reproduce this as well with an error state. To know what the best
> fix
> > would be, I'm asking the users of these scripts what they do when
> > 1. the SM gets in the error state
> > OR
> > 2. a command of the SM hangs forever.
> >
> > Both situations boil down to the same (the current line refuses to
> execute or
> > complete). Since it's not documented what the behaviour should be, I
> prefer to
> > keep existing code working.
>
> +1
>
> > Do you:
> > 1. call stop() then deactivate() ?
> > 2. call deactivate() twice ?
> > 3. call stop() and try to start() again ?
> >
> > Which ones of these three are working now and should keep working ?
>
> This happens so seldom to us, as it is typically a development bug. Our
> working systems don't even deal with errors in state machines - we engineer
> our state machines to not error as much as we reasonably can. So overall, we
> don't care (in v1.10).
>

In my compagny (BA) we try to do the same as error cases a hard to manage.

What is more questionnable is how you reset a submachine into a main state
machine. Until now I didn't find the correct way to do it since you may have
blocking commands in ExitStates.

>
> > What we could do is:
> > 1. allow to deactivate (twice if necessary)
> > and
> > 2. allow to stop() (go to final state) and then reset() or start() or
> > deactivate()
>
> Having said that, whenever we get SM problems it's invariably during
> deactivation while shutting down. And that is more due to the deployer's
> inability to shutdown a system cleanly when the user asks to quit.
>
> I presume in 1) above that the second deactivate is a forceful one (ie
> ignores the error)? In 2), would this stop() also be forceful?
> S
>
> --
> Orocos-Users mailing list
> Orocos-Users [..] ...
> http://lists.mech.kuleuven.be/mailman/listinfo/orocos-users
>

Segfault when starting a state machine in [D] state

Submitted by Ruben Smits on Tue, 2011-01-18 16:12.

On Tuesday 18 January 2011 16:25:10 Willy Lambert wrote:
> 2011/1/18 S Roderick <kiwi.net >
> On Jan 18, 2011, at 09:29 , Peter Soetens wrote:
> > On Wednesday 05 January 2011 10:15:58 Willy Lambert wrote:
> >> Hi all,
> >>
> >> I am using Orocos 1.10 and have a segfault when start a state machine
> >> in a [D] state (desactivating).
> >> The state machine is in this state because I desactivated it (and I
> >> think I stop it). I think it is in a [D] state because a command has
> >> blocked in the exit state.
> >>
> >> If I start it again Orocos end in segfault in StateMachine.cpp line
> >> 126
> >> (function "automatic") because of the "current' variable.
> >> I am sorry not being able to give more details about this, I am at a
> >> customer office with no easy web access (I am *not* requested an
> >> emergency answer).
> >>
> >> I think the problem which leads to this is in my app, but I think
> >> Orocos
> >> should not end in segfault.
> >
> > I could reproduce this as well with an error state. To know what the
> > best fix would be, I'm asking the users of these scripts what they do
> > when 1. the SM gets in the error state
> > OR
> > 2. a command of the SM hangs forever.
> >
> > Both situations boil down to the same (the current line refuses to
> > execute or complete). Since it's not documented what the behaviour
> > should be, I prefer to keep existing code working.
>
> +1
>
> > Do you:
> > 1. call stop() then deactivate() ?
> > 2. call deactivate() twice ?
> > 3. call stop() and try to start() again ?
> >
> > Which ones of these three are working now and should keep working ?

I usually just quit the app, debug and startup again ;)

> This happens so seldom to us, as it is typically a development bug. Our
> working systems don't even deal with errors in state machines - we
> engineer our state machines to not error as much as we reasonably can. So
> overall, we don't care (in v1.10).
>
> In my compagny (BA) we try to do the same as error cases a hard to manage.

Same here, we try to get our SM error free during development, we consider the
SM in running systems error free. If they do error the least we do is restart
the entire app.

> What is more questionnable is how you reset a submachine into a main state
> machine. Until now I didn't find the correct way to do it since you may
> have blocking commands in ExitStates.
>
> > What we could do is:
> > 1. allow to deactivate (twice if necessary)
> > and
> > 2. allow to stop() (go to final state) and then reset() or start() or
> > deactivate()
>
> Having said that, whenever we get SM problems it's invariably during
> deactivation while shutting down. And that is more due to the deployer's
> inability to shutdown a system cleanly when the user asks to quit.

We see the same behavior, the deployer is usually not able to quit an
application in which a running statemachine is blocked somewhere.

> I presume in 1) above that the second deactivate is a forceful one (ie
> ignores the error)? In 2), would this stop() also be forceful? S
>

Ruben

> --
> Orocos-Users mailing list
> Orocos-Users [..] ...<mailto:Orocos-Users [..] ...en.
> be> http://lists.mech.kuleuven.be/mailman/listinfo/orocos-users

Segfault when starting a state machine in [D] state

Submitted by willy on Tue, 2011-01-18 17:24.

2011/1/18 Ruben Smits <ruben [dot] smits [..] ...>

> On Tuesday 18 January 2011 16:25:10 Willy Lambert wrote:
> > 2011/1/18 S Roderick <kiwi.net > >
> > On Jan 18, 2011, at 09:29 , Peter Soetens wrote:
> > > On Wednesday 05 January 2011 10:15:58 Willy Lambert wrote:
> > >> Hi all,
> > >>
> > >> I am using Orocos 1.10 and have a segfault when start a state machine
> > >> in a [D] state (desactivating).
> > >> The state machine is in this state because I desactivated it (and I
> > >> think I stop it). I think it is in a [D] state because a command has
> > >> blocked in the exit state.
> > >>
> > >> If I start it again Orocos end in segfault in StateMachine.cpp line
> > >> 126
> > >> (function "automatic") because of the "current' variable.
> > >> I am sorry not being able to give more details about this, I am at a
> > >> customer office with no easy web access (I am *not* requested an
> > >> emergency answer).
> > >>
> > >> I think the problem which leads to this is in my app, but I think
> > >> Orocos
> > >> should not end in segfault.
> > >
> > > I could reproduce this as well with an error state. To know what the
> > > best fix would be, I'm asking the users of these scripts what they do
> > > when 1. the SM gets in the error state
> > > OR
> > > 2. a command of the SM hangs forever.
> > >
> > > Both situations boil down to the same (the current line refuses to
> > > execute or complete). Since it's not documented what the behaviour
> > > should be, I prefer to keep existing code working.
> >
> > +1
> >
> > > Do you:
> > > 1. call stop() then deactivate() ?
> > > 2. call deactivate() twice ?
> > > 3. call stop() and try to start() again ?
> > >
> > > Which ones of these three are working now and should keep working ?
>
> I usually just quit the app, debug and startup again ;)
>
> > This happens so seldom to us, as it is typically a development bug. Our
> > working systems don't even deal with errors in state machines - we
> > engineer our state machines to not error as much as we reasonably can. So
> > overall, we don't care (in v1.10).
> >
> > In my compagny (BA) we try to do the same as error cases a hard to
> manage.
>
> Same here, we try to get our SM error free during development, we consider
> the
> SM in running systems error free. If they do error the least we do is
> restart
> the entire app.
>
> > What is more questionnable is how you reset a submachine into a main
> state
> > machine. Until now I didn't find the correct way to do it since you may
> > have blocking commands in ExitStates.
> >
> > > What we could do is:
> > > 1. allow to deactivate (twice if necessary)
> > > and
> > > 2. allow to stop() (go to final state) and then reset() or start() or
> > > deactivate()
> >
> > Having said that, whenever we get SM problems it's invariably during
> > deactivation while shutting down. And that is more due to the deployer's
> > inability to shutdown a system cleanly when the user asks to quit.
>
> We see the same behavior, the deployer is usually not able to quit an
> application in which a running statemachine is blocked somewhere.
>

And this can be very annoying when shutdown has to be clean. In our app we
often receive "external" reset and bad app shutdown forced us to develop
some ugly tricks.

>
> > I presume in 1) above that the second deactivate is a forceful one (ie
> > ignores the error)? In 2), would this stop() also be forceful? S
> >
>
> Ruben
>
> > --
> > Orocos-Users mailing list
> > Orocos-Users [..] ...<mailto:
> Orocos-Users [..] ...en.
> > be> http://lists.mech.kuleuven.be/mailman/listinfo/orocos-users
> --
> Orocos-Users mailing list
> Orocos-Users [..] ...
> http://lists.mech.kuleuven.be/mailman/listinfo/orocos-users
>

Segfault when starting a state machine in [D] state

Submitted by peter on Tue, 2011-01-18 15:04.

On Tuesday 18 January 2011 15:35:05 S Roderick wrote:
> On Jan 18, 2011, at 09:29 , Peter Soetens wrote:
> > On Wednesday 05 January 2011 10:15:58 Willy Lambert wrote:
> >> Hi all,
> >>
> >> I am using Orocos 1.10 and have a segfault when start a state machine in
> >> a [D] state (desactivating).
> >> The state machine is in this state because I desactivated it (and I
> >> think I stop it). I think it is in a [D] state because a command has
> >> blocked in the exit state.
> >>
> >> If I start it again Orocos end in segfault in StateMachine.cpp line 126
> >> (function "automatic") because of the "current' variable.
> >> I am sorry not being able to give more details about this, I am at a
> >> customer office with no easy web access (I am *not* requested an
> >> emergency answer).
> >>
> >> I think the problem which leads to this is in my app, but I think Orocos
> >> should not end in segfault.
> >
> > I could reproduce this as well with an error state. To know what the best
> > fix would be, I'm asking the users of these scripts what they do when 1.
> > the SM gets in the error state
> > OR
> > 2. a command of the SM hangs forever.
> >
> > Both situations boil down to the same (the current line refuses to
> > execute or complete). Since it's not documented what the behaviour
> > should be, I prefer to keep existing code working.
>
> +1
>
> > Do you:
> > 1. call stop() then deactivate() ?
> > 2. call deactivate() twice ?
> > 3. call stop() and try to start() again ?
> >
> > Which ones of these three are working now and should keep working ?
>
> This happens so seldom to us, as it is typically a development bug. Our
> working systems don't even deal with errors in state machines - we
> engineer our state machines to not error as much as we reasonably can. So
> overall, we don't care (in v1.10).
>
> > What we could do is:
> > 1. allow to deactivate (twice if necessary)
> > and
> > 2. allow to stop() (go to final state) and then reset() or start() or
> > deactivate()
>
> Having said that, whenever we get SM problems it's invariably during
> deactivation while shutting down. And that is more due to the deployer's
> inability to shutdown a system cleanly when the user asks to quit.
>
> I presume in 1) above that the second deactivate is a forceful one (ie
> ignores the error)?

Yes.

> In 2), would this stop() also be forceful?

Yes. It would skip the current program body it's executing and try to call the
exit of current and entry of final state. If one of these fail as well, we're
back to square #1 and you could try to stop() again or to deactivate().

Peter

Segfault when starting a state machine in [D] state

Submitted by peter on Wed, 2011-01-05 11:40.

I agree. Thanks for the description of the fault. I think it's quite easy for
us to replicate it in a unit test and then see where it went wrong.

I won't be coding anything this week, so I'll take a look at it next week.

Peter