Default exception handling in RTT 2.0

Submitted by Sylvain Joyeux on Tue, 2010-04-20 13:20

RTT-dev

As far as I saw on the RTT2 ExecutionEngine code, exception handling is
as follows (Peter: tell me if I'm wrong):

* an uncaught exception in updateHook() transitions to RUNTIME_ERROR
* an uncaught exception in errorHook() transitions to FATAL_ERROR

This assumes that errorHook() is able to handle unspecified errors. This
seems to broad for my POV. How we design components is that the
transition to fatal() is basically a stop() + some sort of tentative
cleanup. runtime_error is used as a runtime state categorization (i.e. a
way to "regroup" internal states), but its interpretation as an actual
error is situation dependent.

For instance, our motor controllers go into runtime_error when the
motors can't be driven (because of hardware protection mechanisms for
instance), but the electronics still *reads* the encoder + motor data.
I.e. they are read-only when in runtime error and can be used if only
reading is needed. fatal_error would be entered if we are not able to
talk to the electronics anymore.

In a way, runtime_error is not very useful in this case ...

To get back to the point: I think that runtime_error should be used when
the component is still able to provide a limited functionality,
fatalError being used when the component does not provide any
functionality anymore. Thus, the default exception handling of
updateHook() should IMO transition to FATAL_ERROR: I don't see how a
component can *know* what it is doing when an uncaught exception has
been raised by updateHook().

Thoughts ?

Default exception handling in RTT 2.0

Submitted by sspr on Tue, 2010-04-20 15:32.

Hi Sylvain,

On Tue, Apr 20, 2010 at 3:18 PM, Sylvain Joyeux <sylvain [dot] joyeux [..] ...> wrote:
> As far as I saw on the RTT2 ExecutionEngine code, exception handling is
> as follows (Peter: tell me if I'm wrong):
>
> * an uncaught exception in updateHook() transitions to RUNTIME_ERROR
> * an uncaught exception in errorHook() transitions to FATAL_ERROR

Correct. Fatal error (should) leads to stopHook() + cleanupHook() and
then wait for component cleanup/removal.

>
> This assumes that errorHook() is able to handle unspecified errors. This
> seems to broad for my POV. How we design components is that the
> transition to fatal() is basically a stop() + some sort of tentative
> cleanup. runtime_error is used as a runtime state categorization (i.e. a
> way to "regroup" internal states), but its interpretation as an actual
> error is situation dependent.

It is. errorHook() is reserved for an RTT/Component specific error
state, not for application error states. So you would need to write
your application specific error states in updateHook().

>
> For instance, our motor controllers go into runtime_error when the
> motors can't be driven (because of hardware protection mechanisms for
> instance), but the electronics still *reads* the encoder + motor data.
> I.e. they are read-only when in runtime error and can be used if only
> reading is needed. fatal_error would be entered if we are not able to
> talk to the electronics anymore.

These are all application states and should not be implemented in the
component lifecycle states.

>
> In a way, runtime_error is not very useful in this case ...
>
> To get back to the point: I think that runtime_error should be used when
> the component is still able to provide a limited functionality,
> fatalError being used when the component does not provide any
> functionality anymore. Thus, the default exception handling of
> updateHook() should IMO transition to FATAL_ERROR: I don't see how a
> component can *know* what it is doing when an uncaught exception has
> been raised by updateHook().
>
> Thoughts ?

I think you identified a painpoint when trying to apply the component
error states to application error states. It will never work. The idea
for run time error is that any code in updateHook might throw, even if
the user is unaware of this (during development for example). In
critical components, you can put safe state code in errorHook(), for
example, writing data to ports (which will never throw). If you did a
bad job there, you go to fatal error. So all cases are covered. We
don't want to go to fatal error immediately because this is an
unrecoverable state, meaning, RTT judged that it can no longer execute
that *instance* of a component.

All the other stuff goes into updateHook and you need to define your
own application states in there, using own operations or attributes or
so.

Makes sense ?

Peter

Default exception handling in RTT 2.0

Submitted by Sylvain Joyeux on Tue, 2010-04-20 16:28.

Peter Soetens wrote:
> Hi Sylvain,
>
> On Tue, Apr 20, 2010 at 3:18 PM, Sylvain Joyeux <sylvain [dot] joyeux [..] ...> wrote:
>
>> As far as I saw on the RTT2 ExecutionEngine code, exception handling is
>> as follows (Peter: tell me if I'm wrong):
>>
>> * an uncaught exception in updateHook() transitions to RUNTIME_ERROR
>> * an uncaught exception in errorHook() transitions to FATAL_ERROR
>>
>
> Correct. Fatal error (should) leads to stopHook() + cleanupHook() and
> then wait for component cleanup/removal.
>
>
>> This assumes that errorHook() is able to handle unspecified errors. This
>> seems to broad for my POV. How we design components is that the
>> transition to fatal() is basically a stop() + some sort of tentative
>> cleanup. runtime_error is used as a runtime state categorization (i.e. a
>> way to "regroup" internal states), but its interpretation as an actual
>> error is situation dependent.
>>
>
> It is. errorHook() is reserved for an RTT/Component specific error
> state, not for application error states. So you would need to write
> your application specific error states in updateHook().
>
This completely negates the usefulness of the taskcontext state machine.
>> For instance, our motor controllers go into runtime_error when the
>> motors can't be driven (because of hardware protection mechanisms for
>> instance), but the electronics still *reads* the encoder + motor data.
>> I.e. they are read-only when in runtime error and can be used if only
>> reading is needed. fatal_error would be entered if we are not able to
>> talk to the electronics anymore.
>>
>
> These are all application states and should not be implemented in the
> component lifecycle states.
>
I don't follow you in this split between application and lifecycle
states. The component goes into various states because of the application.
>> In a way, runtime_error is not very useful in this case ...
>>
>> To get back to the point: I think that runtime_error should be used when
>> the component is still able to provide a limited functionality,
>> fatalError being used when the component does not provide any
>> functionality anymore. Thus, the default exception handling of
>> updateHook() should IMO transition to FATAL_ERROR: I don't see how a
>> component can *know* what it is doing when an uncaught exception has
>> been raised by updateHook().
>>
>> Thoughts ?
>>
>
> I think you identified a painpoint when trying to apply the component
> error states to application error states. It will never work. The idea
> for run time error is that any code in updateHook might throw, even if
> the user is unaware of this (during development for example). In
> critical components, you can put safe state code in errorHook(), for
> example, writing data to ports (which will never throw). If you did a
>
> bad job there, you go to fatal error. So all cases are covered. We
> don't want to go to fatal error immediately because this is an
> unrecoverable state, meaning, RTT judged that it can no longer execute
> that *instance* of a component.
>
> All the other stuff goes into updateHook and you need to define your
> own application states in there, using own operations or attributes or
> so.
>
> Makes sense ?
>
I don't think it does ...

First of all, most components will have nothing in errorHook(). Thus,
you will have a still-running component that has failed in a way that
was not predicted by the designer.

From a more conceptual point of view, I don't think that a component
should be allowed to run even if it had an unexpected exception. An
exception means "the internal state of the component is unspecified as
of now". The only thing that could make sense is to try to "emergency
stop" it, which -- I though -- is what fatal error is there for.

Second, the way you define fatal error states makes no sense to me. The
whole point of having a component model a-la RTT is that it should be
able to go back to a defined state (in my POV, through the fatalError()
cycle).

I.e. a "completely unrecoverable" error should only be a diagnostics
estimation, for instance triggered because a component that went to an
unspecified fatalError() (fatal-error-that-we-don't-know) refused to
reconfigure and/or restart), thus showing that the component does not
know how to recover.

Default exception handling in RTT 2.0

Submitted by peter on Wed, 2010-04-21 10:04.

On Tuesday 20 April 2010 18:23:56 Sylvain Joyeux wrote:
> Peter Soetens wrote:
> > Hi Sylvain,
> >
> > On Tue, Apr 20, 2010 at 3:18 PM, Sylvain Joyeux <sylvain [dot] joyeux [..] ...>
wrote:
> >> As far as I saw on the RTT2 ExecutionEngine code, exception handling is
> >> as follows (Peter: tell me if I'm wrong):
> >>
> >> * an uncaught exception in updateHook() transitions to RUNTIME_ERROR
> >> * an uncaught exception in errorHook() transitions to FATAL_ERROR
> >
> > Correct. Fatal error (should) leads to stopHook() + cleanupHook() and
> > then wait for component cleanup/removal.
> >
> >> This assumes that errorHook() is able to handle unspecified errors. This
> >> seems to broad for my POV. How we design components is that the
> >> transition to fatal() is basically a stop() + some sort of tentative
> >> cleanup. runtime_error is used as a runtime state categorization (i.e. a
> >> way to "regroup" internal states), but its interpretation as an actual
> >> error is situation dependent.
> >
> > It is. errorHook() is reserved for an RTT/Component specific error
> > state, not for application error states. So you would need to write
> > your application specific error states in updateHook().
>
> This completely negates the usefulness of the taskcontext state machine.

... for application specific state machines. From a component life cycle view,
these states are still necessary. I think we did it the wrong way in 1.x,
coupling component lifecycle states with application states. They may overlap,
and we can't/won't prevent that, but they don't have to overlap.

>
> >> For instance, our motor controllers go into runtime_error when the
> >> motors can't be driven (because of hardware protection mechanisms for
> >> instance), but the electronics still *reads* the encoder + motor data.
> >> I.e. they are read-only when in runtime error and can be used if only
> >> reading is needed. fatal_error would be entered if we are not able to
> >> talk to the electronics anymore.
> >
> > These are all application states and should not be implemented in the
> > component lifecycle states.
>
> I don't follow you in this split between application and lifecycle
> states. The component goes into various states because of the application.

Not from the viewpoint of the RTT or deployer. For example, configureHook is to
check if input ports are connected or if required services are available. This
is independent of a component requiring additional configuration of parameters.

>
> >> In a way, runtime_error is not very useful in this case ...
> >>
> >> To get back to the point: I think that runtime_error should be used when
> >> the component is still able to provide a limited functionality,
> >> fatalError being used when the component does not provide any
> >> functionality anymore. Thus, the default exception handling of
> >> updateHook() should IMO transition to FATAL_ERROR: I don't see how a
> >> component can *know* what it is doing when an uncaught exception has
> >> been raised by updateHook().
> >>
> >> Thoughts ?
> >
> > I think you identified a painpoint when trying to apply the component
> > error states to application error states. It will never work. The idea
> > for run time error is that any code in updateHook might throw, even if
> > the user is unaware of this (during development for example). In
> > critical components, you can put safe state code in errorHook(), for
> > example, writing data to ports (which will never throw). If you did a
> >
> > bad job there, you go to fatal error. So all cases are covered. We
> > don't want to go to fatal error immediately because this is an
> > unrecoverable state, meaning, RTT judged that it can no longer execute
> > that *instance* of a component.
> >
> > All the other stuff goes into updateHook and you need to define your
> > own application states in there, using own operations or attributes or
> > so.
> >
> > Makes sense ?
>
> I don't think it does ...
>
> First of all, most components will have nothing in errorHook(). Thus,
> you will have a still-running component that has failed in a way that
> was not predicted by the designer.

We could install a default action in errorHook() ourselves.

>
> From a more conceptual point of view, I don't think that a component
> should be allowed to run even if it had an unexpected exception. An
> exception means "the internal state of the component is unspecified as
> of now". The only thing that could make sense is to try to "emergency
> stop" it, which -- I though -- is what fatal error is there for.

Agreed, but C++ exceptions do not cause an unspecified state. They unwind the
stack and cleanup resources by calling destructors. It's not the same like a
segfault. On the other hand, if your updateHook() has the scenario port1.write
(exception) port2.write, only the first write will succeed, leading to a non
consistent output. So yes, a leaked exception is maybe more 'grave' than it is
considered now.

Realize that the scripts can also leak exceptions, because they call user
functions too. We also need to define a state when this happens, I don't want
to go into 'unrecoverable error' when this happens. For a script, this would
just cause the 'E'rror status of that script, while other scripts/updateHook
keep running. That's also why runtime error only relates to an exception in
updateHook(), while in runtime error, the scripts etc keep on executing
(unless they reach the Error status too).

>
> Second, the way you define fatal error states makes no sense to me. The
> whole point of having a component model a-la RTT is that it should be
> able to go back to a defined state (in my POV, through the fatalError()
> cycle).

>
> I.e. a "completely unrecoverable" error should only be a diagnostics
> estimation, for instance triggered because a component that went to an
> unspecified fatalError() (fatal-error-that-we-don't-know) refused to
> reconfigure and/or restart), thus showing that the component does not
> know how to recover.

There is another reason: what if the RTT figures out that it can no longer
execute the component ? Fatal by definition means unrecoverable, so let's keep
it with these semantics. There is no way to recover from the fatal error state
in the current implementation. So fatal means: unload/kill me please.

What you describe sounds to me like a run-time error (recoverable), you can
still recover from it. Maybe we should change it then to these semantics:

Fatal error: can be entered from any state, triggered by in RTT code or error
recovery code. Causes stopHook()->cleanupHook() in transition (if necessary).
Only step left is delete component.

Runtime error: triggered by exceptions in updateHook() or by user in
updateHook()/script.

Looking at it, I'm not really chaning my position here...

Peter

Default exception handling in RTT 2.0

Submitted by Sylvain Joyeux on Wed, 2010-04-21 10:40.

Peter Soetens wrote:
>>>> As far as I saw on the RTT2 ExecutionEngine code, exception handling is
>>>> as follows (Peter: tell me if I'm wrong):
>>>>
>>>> * an uncaught exception in updateHook() transitions to RUNTIME_ERROR
>>>> * an uncaught exception in errorHook() transitions to FATAL_ERROR
>>>>
>>> Correct. Fatal error (should) leads to stopHook() + cleanupHook() and
>>> then wait for component cleanup/removal.
>>>
>>>
>>>> This assumes that errorHook() is able to handle unspecified errors. This
>>>> seems to broad for my POV. How we design components is that the
>>>> transition to fatal() is basically a stop() + some sort of tentative
>>>> cleanup. runtime_error is used as a runtime state categorization (i.e. a
>>>> way to "regroup" internal states), but its interpretation as an actual
>>>> error is situation dependent.
>>>>
>>> It is. errorHook() is reserved for an RTT/Component specific error
>>> state, not for application error states. So you would need to write
>>> your application specific error states in updateHook().
>>>
>> This completely negates the usefulness of the taskcontext state machine.
>>
>
> ... for application specific state machines. From a component life cycle view,
> these states are still necessary. I think we did it the wrong way in 1.x,
> coupling component lifecycle states with application states. They may overlap,
> and we can't/won't prevent that, but they don't have to overlap.
>
OK, then define a default application state machine (and see the number
of state explode). Not having a default application state machine
defined negates completely the use of

>
>>>> For instance, our motor controllers go into runtime_error when the
>>>> motors can't be driven (because of hardware protection mechanisms for
>>>> instance), but the electronics still *reads* the encoder + motor data.
>>>> I.e. they are read-only when in runtime error and can be used if only
>>>> reading is needed. fatal_error would be entered if we are not able to
>>>> talk to the electronics anymore.
>>>>
>>> These are all application states and should not be implemented in the
>>> component lifecycle states.
>>>
>> I don't follow you in this split between application and lifecycle
>> states. The component goes into various states because of the application.
>>
>
> Not from the viewpoint of the RTT or deployer. For example, configureHook is to
> check if input ports are connected or if required services are available. This
> is independent of a component requiring additional configuration of parameters.
>
There is definitely a mismatch between our uses of states. We use
configureHook() to verify that the component can run. This means:
checking if devices are there, if properties are set to sane values and
so on. In effect, for device drivers, configureHook() is the place
where the device gets accessed and configured.

>>>> In a way, runtime_error is not very useful in this case ...
>>>>
>>>> To get back to the point: I think that runtime_error should be used when
>>>> the component is still able to provide a limited functionality,
>>>> fatalError being used when the component does not provide any
>>>> functionality anymore. Thus, the default exception handling of
>>>> updateHook() should IMO transition to FATAL_ERROR: I don't see how a
>>>> component can *know* what it is doing when an uncaught exception has
>>>> been raised by updateHook().
>>>>
>>>> Thoughts ?
>>>>
>>> I think you identified a painpoint when trying to apply the component
>>> error states to application error states. It will never work. The idea
>>> for run time error is that any code in updateHook might throw, even if
>>> the user is unaware of this (during development for example). In
>>> critical components, you can put safe state code in errorHook(), for
>>> example, writing data to ports (which will never throw). If you did a
>>>
>>> bad job there, you go to fatal error. So all cases are covered. We
>>> don't want to go to fatal error immediately because this is an
>>> unrecoverable state, meaning, RTT judged that it can no longer execute
>>> that *instance* of a component.
>>>
>>> All the other stuff goes into updateHook and you need to define your
>>> own application states in there, using own operations or attributes or
>>> so.
>>>
>>> Makes sense ?
>>>
>> I don't think it does ...
>>
>> First of all, most components will have nothing in errorHook(). Thus,
>> you will have a still-running component that has failed in a way that
>> was not predicted by the designer.
>>
>
> We could install a default action in errorHook() ourselves.
>
What could you meanginfully do in errorHook() that is completely generic
and will handle the underlying problem (updateHook() threw an exception,
the component is not functional anymore).
>
>> From a more conceptual point of view, I don't think that a component
>> should be allowed to run even if it had an unexpected exception. An
>> exception means "the internal state of the component is unspecified as
>> of now". The only thing that could make sense is to try to "emergency
>> stop" it, which -- I though -- is what fatal error is there for.
>>
>
> Agreed, but C++ exceptions do not cause an unspecified state. They unwind the
> stack and cleanup resources by calling destructors. It's not the same like a
> segfault. On the other hand, if your updateHook() has the scenario port1.write
> (exception) port2.write, only the first write will succeed, leading to a non
> consistent output. So yes, a leaked exception is maybe more 'grave' than it is
> considered now.
>
I don't agree there. An *uncaught* exception means that the component
designer was not *expecting* this particular error. If the code is
well-written (which is very unlikely), ressources will be freed and so
on, but from a global logic point of view, the application will *not*
know where it was (hey, otherwise it would have caught this exception).

> Realize that the scripts can also leak exceptions, because they call user
> functions too. We also need to define a state when this happens, I don't want
> to go into 'unrecoverable error' when this happens. For a script, this would
> just cause the 'E'rror status of that script, while other scripts/updateHook
> keep running. That's also why runtime error only relates to an exception in
> updateHook(), while in runtime error, the scripts etc keep on executing
> (unless they reach the Error status too).
>
>
>> Second, the way you define fatal error states makes no sense to me. The
>> whole point of having a component model a-la RTT is that it should be
>> able to go back to a defined state (in my POV, through the fatalError()
>> cycle).
>>
>
>
>> I.e. a "completely unrecoverable" error should only be a diagnostics
>> estimation, for instance triggered because a component that went to an
>> unspecified fatalError() (fatal-error-that-we-don't-know) refused to
>> reconfigure and/or restart), thus showing that the component does not
>> know how to recover.
>>
>
> There is another reason: what if the RTT figures out that it can no longer
> execute the component ? Fatal by definition means unrecoverable, so let's keep
> it with these semantics. There is no way to recover from the fatal error state
> in the current implementation. So fatal means: unload/kill me please.
>
> What you describe sounds to me like a run-time error (recoverable), you can
> still recover from it.
Again, we have a different understanding of the state machine. Yes, it
can recover from it, but that will require a stop()/configure()/start()
cycle. How the state machine is interpreted by our supervision is:

configure: the component verifies that everything it needs to be
functional is there. This means: checking property values, port
connections, accessing external processes/hardware when applicable. The
goal of that step is to make start() as simple as possible, and have it
most likely return true (i.e. have the longest and most likely to fail
steps that can be done in advance in configure()).
start: start the component functionality. I.e. turn on data
acquisition for a driver (for instance).
runtime_error: the component still provides a somewhat limited
functionality. The actual semantic of this is very application
dependent. In practice, we use orogen to specialize it into sub-states.
stop: the component stops functioning, either because it reached its
stated goal (case for a planner), or because it has been requested
fatal: the component cannot provide its stated functionality anymore,
and therefore stopped. It should try to clean up as much as possible so
that a configure()/start() cycle has a change to recover from the
problem. In the same way than for runtime error, orogen specializes it
into substates.

This state machine allowed us to keep the updateHook() simple (since it
does not have to deal with initialization/recovery/...), and has most of
the information needed for supervision.
> Maybe we should change it then to these semantics:
>
> Fatal error: can be entered from any state, triggered by in RTT code or error
> recovery code. Causes stopHook()->cleanupHook() in transition (if necessary).
> Only step left is delete component.
>
> Runtime error: triggered by exceptions in updateHook() or by user in updateHook()/script.
>
There is a funny thing: on the one hand you say "raised exceptions
should leave the application in a well-defined state" and "if an
exception is raised in errorHook()" we can't recover ever, we actually
need to destroy everything". This seems contradictory to me.

As to the interpretation of "fatal": it depends on the point of view.
From the point of view of the supervision, the "fatal" I described
above *is* fatal as the component does not provide the service it should
provide, and that happened because of something non-nominal.
> Looking at it, I'm not really chaning my position here...
>
Me neither ...

Default exception handling in RTT 2.0

Submitted by peter on Wed, 2010-04-21 12:12.

On Wednesday 21 April 2010 12:37:09 Sylvain Joyeux wrote:
> Peter Soetens wrote:
> >>>> As far as I saw on the RTT2 ExecutionEngine code, exception handling
> >>>> is as follows (Peter: tell me if I'm wrong):
> >>>>
> >>>> * an uncaught exception in updateHook() transitions to RUNTIME_ERROR
> >>>> * an uncaught exception in errorHook() transitions to FATAL_ERROR
> >>>
> >>> Correct. Fatal error (should) leads to stopHook() + cleanupHook() and
> >>> then wait for component cleanup/removal.
> >>>
> >>>> This assumes that errorHook() is able to handle unspecified errors.
> >>>> This seems to broad for my POV. How we design components is that the
> >>>> transition to fatal() is basically a stop() + some sort of tentative
> >>>> cleanup. runtime_error is used as a runtime state categorization (i.e.
> >>>> a way to "regroup" internal states), but its interpretation as an
> >>>> actual error is situation dependent.
> >>>
> >>> It is. errorHook() is reserved for an RTT/Component specific error
> >>> state, not for application error states. So you would need to write
> >>> your application specific error states in updateHook().
> >>
> >> This completely negates the usefulness of the taskcontext state machine.
> >
> > ... for application specific state machines. From a component life cycle
> > view, these states are still necessary. I think we did it the wrong way
> > in 1.x, coupling component lifecycle states with application states. They
> > may overlap, and we can't/won't prevent that, but they don't have to
> > overlap.
>
> OK, then define a default application state machine (and see the number
> of state explode). Not having a default application state machine
> defined negates completely the use of

You have to see this in the light of the state machines in the scripting.
These are by definition application specific state machines. So this made us
realize that there is a difference between the lifecycle of a component (hooks)
and of an application (states in scripts).

>
> >>>> For instance, our motor controllers go into runtime_error when the
> >>>> motors can't be driven (because of hardware protection mechanisms for
> >>>> instance), but the electronics still *reads* the encoder + motor data.
> >>>> I.e. they are read-only when in runtime error and can be used if only
> >>>> reading is needed. fatal_error would be entered if we are not able to
> >>>> talk to the electronics anymore.
> >>>
> >>> These are all application states and should not be implemented in the
> >>> component lifecycle states.
> >>
> >> I don't follow you in this split between application and lifecycle
> >> states. The component goes into various states because of the
> >> application.
> >
> > Not from the viewpoint of the RTT or deployer. For example, configureHook
> > is to check if input ports are connected or if required services are
> > available. This is independent of a component requiring additional
> > configuration of parameters.
>
> There is definitely a mismatch between our uses of states. We use
> configureHook() to verify that the component can run. This means:
> checking if devices are there, if properties are set to sane values and
> so on. In effect, for device drivers, configureHook() is the place
> where the device gets accessed and configured.

I agree here. RTT 2.x actually supports this better, since you can also send
'commands' (in 2.x: send method calls) to a component before it is started.
This means that if configuration does some blocking/asynchronous work or
depends on a script to complete, this can all be done before start().

>
> >>>> In a way, runtime_error is not very useful in this case ...
> >>>>
> >>>> To get back to the point: I think that runtime_error should be used
> >>>> when the component is still able to provide a limited functionality,
> >>>> fatalError being used when the component does not provide any
> >>>> functionality anymore. Thus, the default exception handling of
> >>>> updateHook() should IMO transition to FATAL_ERROR: I don't see how a
> >>>> component can *know* what it is doing when an uncaught exception has
> >>>> been raised by updateHook().
> >>>>
> >>>> Thoughts ?
> >>>
> >>> I think you identified a painpoint when trying to apply the component
> >>> error states to application error states. It will never work. The idea
> >>> for run time error is that any code in updateHook might throw, even if
> >>> the user is unaware of this (during development for example). In
> >>> critical components, you can put safe state code in errorHook(), for
> >>> example, writing data to ports (which will never throw). If you did a
> >>>
> >>> bad job there, you go to fatal error. So all cases are covered. We
> >>> don't want to go to fatal error immediately because this is an
> >>> unrecoverable state, meaning, RTT judged that it can no longer execute
> >>> that *instance* of a component.
> >>>
> >>> All the other stuff goes into updateHook and you need to define your
> >>> own application states in there, using own operations or attributes or
> >>> so.
> >>>
> >>> Makes sense ?
> >>
> >> I don't think it does ...
> >>
> >> First of all, most components will have nothing in errorHook(). Thus,
> >> you will have a still-running component that has failed in a way that
> >> was not predicted by the designer.
> >
> > We could install a default action in errorHook() ourselves.
>
> What could you meanginfully do in errorHook() that is completely generic
> and will handle the underlying problem (updateHook() threw an exception,
> the component is not functional anymore).

Yeah, I wasn't making sense here.

>
> >> From a more conceptual point of view, I don't think that a component
> >> should be allowed to run even if it had an unexpected exception. An
> >> exception means "the internal state of the component is unspecified as
> >> of now". The only thing that could make sense is to try to "emergency
> >> stop" it, which -- I though -- is what fatal error is there for.
> >
> > Agreed, but C++ exceptions do not cause an unspecified state. They unwind
> > the stack and cleanup resources by calling destructors. It's not the same
> > like a segfault. On the other hand, if your updateHook() has the scenario
> > port1.write (exception) port2.write, only the first write will succeed,
> > leading to a non consistent output. So yes, a leaked exception is maybe
> > more 'grave' than it is considered now.
>
> I don't agree there. An *uncaught* exception means that the component
> designer was not *expecting* this particular error. If the code is
> well-written (which is very unlikely), ressources will be freed and so
> on, but from a global logic point of view, the application will *not*
> know where it was (hey, otherwise it would have caught this exception).

So actually we agree, in the end, the application does not know. So this is a
problematic state it ends up in.

>
> > Realize that the scripts can also leak exceptions, because they call user
> > functions too. We also need to define a state when this happens, I don't
> > want to go into 'unrecoverable error' when this happens. For a script,
> > this would just cause the 'E'rror status of that script, while other
> > scripts/updateHook keep running. That's also why runtime error only
> > relates to an exception in updateHook(), while in runtime error, the
> > scripts etc keep on executing (unless they reach the Error status too).
> >
> >> Second, the way you define fatal error states makes no sense to me. The
> >> whole point of having a component model a-la RTT is that it should be
> >> able to go back to a defined state (in my POV, through the fatalError()
> >> cycle).
> >>
> >>
> >>
> >> I.e. a "completely unrecoverable" error should only be a diagnostics
> >> estimation, for instance triggered because a component that went to an
> >> unspecified fatalError() (fatal-error-that-we-don't-know) refused to
> >> reconfigure and/or restart), thus showing that the component does not
> >> know how to recover.
> >
> > There is another reason: what if the RTT figures out that it can no
> > longer execute the component ? Fatal by definition means unrecoverable,
> > so let's keep it with these semantics. There is no way to recover from
> > the fatal error state in the current implementation. So fatal means:
> > unload/kill me please.
> >
> > What you describe sounds to me like a run-time error (recoverable), you
> > can still recover from it.
>
> Again, we have a different understanding of the state machine. Yes, it
> can recover from it, but that will require a stop()/configure()/start()
> cycle. How the state machine is interpreted by our supervision is:
>
> configure: the component verifies that everything it needs to be
> functional is there. This means: checking property values, port
> connections, accessing external processes/hardware when applicable. The
> goal of that step is to make start() as simple as possible, and have it
> most likely return true (i.e. have the longest and most likely to fail
> steps that can be done in advance in configure()).

OK.

> start: start the component functionality. I.e. turn on data
> acquisition for a driver (for instance).

OK.

> runtime_error: the component still provides a somewhat limited
> functionality. The actual semantic of this is very application
> dependent. In practice, we use orogen to specialize it into sub-states.

Please define which triggers cause a transition to this state and from this
state away + which hooks are called.

> stop: the component stops functioning, either because it reached its
> stated goal (case for a planner), or because it has been requested

OK.

> fatal: the component cannot provide its stated functionality anymore,
> and therefore stopped.

so stopHook() is called automatically ? As above, define the triggers +
possible next states ?

> It should try to clean up as much as possible so
> that a configure()/start() cycle has a change to recover from the
> problem. In the same way than for runtime error, orogen specializes it
> into substates.
>
> This state machine allowed us to keep the updateHook() simple (since it
> does not have to deal with initialization/recovery/...), and has most of
> the information needed for supervision.
>
> > Maybe we should change it then to these semantics:
> >
> > Fatal error: can be entered from any state, triggered by in RTT code or
> > error recovery code. Causes stopHook()->cleanupHook() in transition (if
> > necessary). Only step left is delete component.
> >
> > Runtime error: triggered by exceptions in updateHook() or by user in
> > updateHook()/script.
>
> There is a funny thing: on the one hand you say "raised exceptions
> should leave the application in a well-defined state" and "if an
> exception is raised in errorHook()" we can't recover ever, we actually
> need to destroy everything". This seems contradictory to me.

But makes sense to me: if your error recovery throws, it went really bad, it
means your last resort to pull things right did not succeed. There *is* no way
out, this *is* fatal, literally as in 'terminal'. No transition succeeds.

>
> As to the interpretation of "fatal": it depends on the point of view.
> From the point of view of the supervision, the "fatal" I described
> above *is* fatal as the component does not provide the service it should
> provide, and that happened because of something non-nominal.

There must be a posibility to resolve these constraints we both have:

1. Define a transition/state when an exception is thrown in updateHook(). The
last thing we want to do is call it again, the component is possibly in a
'messy' state and it may throw 'ad infinitum'.

2. Define a transition/state when error recovery from point 1 failed as well.

3. Define a transition/state when the RTT can no longer execute a component.
This might be the same as #2.

>From the RTT point of view, these are the things I *need* to define, without
even caring for application-level supervision. Supervision is a fundamental
part of every application (ie handle faults the component can not solve by
itself) so I am not against in adding support for that in the TaskContext, on
the other hand, you/me are biased and I wonder if it's not better to stick to
the minimal.

A possible clean solution I see here is to define a supervision interface that
defines these extra states that your supervision software requires. So your
component inherits TaskContext + SuperviseInterface. where the latter sets up
a 'supervise' provided interface with the methods/states you require.

The supervisor component/user can than query each component if it has this
interface and proceed from there if it has.

This 'extendability' is actually one of the major issues I wanted to solve in
2.x. The component itself has only a minimal life cycle interface and the rest
is set into 'plugins'/'interfaces'.

Peter

Default exception handling in RTT 2.0

Submitted by Sylvain Joyeux on Wed, 2010-04-21 12:40.

Peter Soetens wrote:
> On Wednesday 21 April 2010 12:37:09 Sylvain Joyeux wrote:
>
>> Peter Soetens wrote:
>>
>>>>>> As far as I saw on the RTT2 ExecutionEngine code, exception handling
>>>>>> is as follows (Peter: tell me if I'm wrong):
>>>>>>
>>>>>> * an uncaught exception in updateHook() transitions to RUNTIME_ERROR
>>>>>> * an uncaught exception in errorHook() transitions to FATAL_ERROR
>>>>>>
>>>>> Correct. Fatal error (should) leads to stopHook() + cleanupHook() and
>>>>> then wait for component cleanup/removal.
>>>>>
>>>>>
>>>>>> This assumes that errorHook() is able to handle unspecified errors.
>>>>>> This seems to broad for my POV. How we design components is that the
>>>>>> transition to fatal() is basically a stop() + some sort of tentative
>>>>>> cleanup. runtime_error is used as a runtime state categorization (i.e.
>>>>>> a way to "regroup" internal states), but its interpretation as an
>>>>>> actual error is situation dependent.
>>>>>>
>>>>> It is. errorHook() is reserved for an RTT/Component specific error
>>>>> state, not for application error states. So you would need to write
>>>>> your application specific error states in updateHook().
>>>>>
>>>> This completely negates the usefulness of the taskcontext state machine.
>>>>
>>> ... for application specific state machines. From a component life cycle
>>> view, these states are still necessary. I think we did it the wrong way
>>> in 1.x, coupling component lifecycle states with application states. They
>>> may overlap, and we can't/won't prevent that, but they don't have to
>>> overlap.
>>>
>> OK, then define a default application state machine (and see the number
>> of state explode). Not having a default application state machine
>> defined negates completely the use of
>>
>
> You have to see this in the light of the state machines in the scripting.
> These are by definition application specific state machines. So this made us
> realize that there is a difference between the lifecycle of a component (hooks)
> and of an application (states in scripts).
>
>
>>>>>> For instance, our motor controllers go into runtime_error when the
>>>>>> motors can't be driven (because of hardware protection mechanisms for
>>>>>> instance), but the electronics still *reads* the encoder + motor data.
>>>>>> I.e. they are read-only when in runtime error and can be used if only
>>>>>> reading is needed. fatal_error would be entered if we are not able to
>>>>>> talk to the electronics anymore.
>>>>>>
>>>>> These are all application states and should not be implemented in the
>>>>> component lifecycle states.
>>>>>
>>>> I don't follow you in this split between application and lifecycle
>>>> states. The component goes into various states because of the
>>>> application.
>>>>
>>> Not from the viewpoint of the RTT or deployer. For example, configureHook
>>> is to check if input ports are connected or if required services are
>>> available. This is independent of a component requiring additional
>>> configuration of parameters.
>>>
>> There is definitely a mismatch between our uses of states. We use
>> configureHook() to verify that the component can run. This means:
>> checking if devices are there, if properties are set to sane values and
>> so on. In effect, for device drivers, configureHook() is the place
>> where the device gets accessed and configured.
>>
>
> I agree here. RTT 2.x actually supports this better, since you can also send
> 'commands' (in 2.x: send method calls) to a component before it is started.
> This means that if configuration does some blocking/asynchronous work or
> depends on a script to complete, this can all be done before start().
>
>
>>>>>> In a way, runtime_error is not very useful in this case ...
>>>>>>
>>>>>> To get back to the point: I think that runtime_error should be used
>>>>>> when the component is still able to provide a limited functionality,
>>>>>> fatalError being used when the component does not provide any
>>>>>> functionality anymore. Thus, the default exception handling of
>>>>>> updateHook() should IMO transition to FATAL_ERROR: I don't see how a
>>>>>> component can *know* what it is doing when an uncaught exception has
>>>>>> been raised by updateHook().
>>>>>>
>>>>>> Thoughts ?
>>>>>>
>>>>> I think you identified a painpoint when trying to apply the component
>>>>> error states to application error states. It will never work. The idea
>>>>> for run time error is that any code in updateHook might throw, even if
>>>>> the user is unaware of this (during development for example). In
>>>>> critical components, you can put safe state code in errorHook(), for
>>>>> example, writing data to ports (which will never throw). If you did a
>>>>>
>>>>> bad job there, you go to fatal error. So all cases are covered. We
>>>>> don't want to go to fatal error immediately because this is an
>>>>> unrecoverable state, meaning, RTT judged that it can no longer execute
>>>>> that *instance* of a component.
>>>>>
>>>>> All the other stuff goes into updateHook and you need to define your
>>>>> own application states in there, using own operations or attributes or
>>>>> so.
>>>>>
>>>>> Makes sense ?
>>>>>
>>>> I don't think it does ...
>>>>
>>>> First of all, most components will have nothing in errorHook(). Thus,
>>>> you will have a still-running component that has failed in a way that
>>>> was not predicted by the designer.
>>>>
>>> We could install a default action in errorHook() ourselves.
>>>
>> What could you meanginfully do in errorHook() that is completely generic
>> and will handle the underlying problem (updateHook() threw an exception,
>> the component is not functional anymore).
>>
>
> Yeah, I wasn't making sense here.
>

>>>> From a more conceptual point of view, I don't think that a component
>>>> should be allowed to run even if it had an unexpected exception. An
>>>> exception means "the internal state of the component is unspecified as
>>>> of now". The only thing that could make sense is to try to "emergency
>>>> stop" it, which -- I though -- is what fatal error is there for.
>>>>
>>> Agreed, but C++ exceptions do not cause an unspecified state. They unwind
>>> the stack and cleanup resources by calling destructors. It's not the same
>>> like a segfault. On the other hand, if your updateHook() has the scenario
>>> port1.write (exception) port2.write, only the first write will succeed,
>>> leading to a non consistent output. So yes, a leaked exception is maybe
>>> more 'grave' than it is considered now.
>>>
>> I don't agree there. An *uncaught* exception means that the component
>> designer was not *expecting* this particular error. If the code is
>> well-written (which is very unlikely), ressources will be freed and so
>> on, but from a global logic point of view, the application will *not*
>> know where it was (hey, otherwise it would have caught this exception).
>>
>
> So actually we agree, in the end, the application does not know. So this is a
> problematic state it ends up in.
>
Yes, so it makes no sense to remain in a running state (which
RUNTIME_ERROR is).

>>> Realize that the scripts can also leak exceptions, because they call user
>>> functions too. We also need to define a state when this happens, I don't
>>> want to go into 'unrecoverable error' when this happens. For a script,
>>> this would just cause the 'E'rror status of that script, while other
>>> scripts/updateHook keep running. That's also why runtime error only
>>> relates to an exception in updateHook(), while in runtime error, the
>>> scripts etc keep on executing (unless they reach the Error status too).
>>>
>>>
>>>> Second, the way you define fatal error states makes no sense to me. The
>>>> whole point of having a component model a-la RTT is that it should be
>>>> able to go back to a defined state (in my POV, through the fatalError()
>>>> cycle).
>>>>
>>>>
>>>>
>>>> I.e. a "completely unrecoverable" error should only be a diagnostics
>>>> estimation, for instance triggered because a component that went to an
>>>> unspecified fatalError() (fatal-error-that-we-don't-know) refused to
>>>> reconfigure and/or restart), thus showing that the component does not
>>>> know how to recover.
>>>>
>>> There is another reason: what if the RTT figures out that it can no
>>> longer execute the component ? Fatal by definition means unrecoverable,
>>> so let's keep it with these semantics. There is no way to recover from
>>> the fatal error state in the current implementation. So fatal means:
>>> unload/kill me please.
>>>
>>> What you describe sounds to me like a run-time error (recoverable), you
>>> can still recover from it.
>>>
>> Again, we have a different understanding of the state machine. Yes, it
>> can recover from it, but that will require a stop()/configure()/start()
>> cycle. How the state machine is interpreted by our supervision is:
>>
>> configure: the component verifies that everything it needs to be
>> functional is there. This means: checking property values, port
>> connections, accessing external processes/hardware when applicable. The
>> goal of that step is to make start() as simple as possible, and have it
>> most likely return true (i.e. have the longest and most likely to fail
>> steps that can be done in advance in configure()).
>>
>
> OK.
>
>
>> start: start the component functionality. I.e. turn on data
>> acquisition for a driver (for instance).
>>
>
> OK.
>
>
>> runtime_error: the component still provides a somewhat limited
>> functionality. The actual semantic of this is very application
>> dependent. In practice, we use orogen to specialize it into sub-states.
>>
>
> Please define which triggers cause a transition to this state and from this
> state away + which hooks are called.
>
Here's the thing: I'm not using scripts, and I completely do not intend
to use them. I do see their possible usefulness, it is just that I did
not (yet) encounter a situation where they were needed.

So: triggers
* any application-defined situation which means that the component
provides a limited functionality.
* hooks: errorHook(), in the same situations than updateHook() (i.e.
activity triggers)
* getting out of there: component specific and component-decided
>> stop: the component stops functioning, either because it reached its
>> stated goal (case for a planner), or because it has been requested
>>
>
> OK.
>
>
>> fatal: the component cannot provide its stated functionality anymore,
>> and therefore stopped.
>>
>
> so stopHook() is called automatically ? As above, define the triggers +
> possible next states ?
>
Triggers: internal component diagnostics which detected a situation
representing a loss of functionality.
Possible next states: STOPPED or PRE_OPERATIONAL (depending on whether
the component needs a configure step).
>> It should try to clean up as much as possible so
>> that a configure()/start() cycle has a change to recover from the
>> problem. In the same way than for runtime error, orogen specializes it
>> into substates.
>>
>> This state machine allowed us to keep the updateHook() simple (since it
>> does not have to deal with initialization/recovery/...), and has most of
>> the information needed for supervision.
>>
>>
>>> Maybe we should change it then to these semantics:
>>>
>>> Fatal error: can be entered from any state, triggered by in RTT code or
>>> error recovery code. Causes stopHook()->cleanupHook() in transition (if
>>> necessary). Only step left is delete component.
>>>
>>> Runtime error: triggered by exceptions in updateHook() or by user in
>>> updateHook()/script.
>>>
>> There is a funny thing: on the one hand you say "raised exceptions
>> should leave the application in a well-defined state" and "if an
>> exception is raised in errorHook()" we can't recover ever, we actually
>> need to destroy everything". This seems contradictory to me.
>>
>
> But makes sense to me: if your error recovery throws, it went really bad, it
> means your last resort to pull things right did not succeed. There *is* no way
> out, this *is* fatal, literally as in 'terminal'. No transition succeeds.
>

Here is my proposal:
* RUNTIME_ERROR remains an application state. The component announces
that it has limited functionality due to something non-nominal happening.
* unexpected exceptions in running states (RUNNING and RUNTIME_ERROR)
transition to fatal. This calls a fatalHook() which -- by default --
calls stopHook() and cleanupHook(). The component can also transition to
fatal to announce that something non-nominal happened that makes the
component's service not available.
* if fatalHook() and/or stopHook() raise, then we go into the
"unrecoverable fault" (we can't even go into FATAL ...)
>> As to the interpretation of "fatal": it depends on the point of view.
>> From the point of view of the supervision, the "fatal" I described
>> above *is* fatal as the component does not provide the service it should
>> provide, and that happened because of something non-nominal.
>>
>
> There must be a posibility to resolve these constraints we both have:
>
> 1. Define a transition/state when an exception is thrown in updateHook(). The
> last thing we want to do is call it again, the component is possibly in a
> 'messy' state and it may throw 'ad infinitum'.
>
> 2. Define a transition/state when error recovery from point 1 failed as well.
>
> 3. Define a transition/state when the RTT can no longer execute a component.
> This might be the same as #2.
>
> From the RTT point of view, these are the things I *need* to define, without
> even caring for application-level supervision. Supervision is a fundamental
> part of every application (ie handle faults the component can not solve by
> itself) so I am not against in adding support for that in the TaskContext, on
> the other hand, you/me are biased and I wonder if it's not better to stick to
> the minimal.
>
Yes, but in my opinion a basic application state machine *is* part of
the minimal.
> A possible clean solution I see here is to define a supervision interface that
> defines these extra states that your supervision software requires. So your
> component inherits TaskContext + SuperviseInterface. where the latter sets up
> a 'supervise' provided interface with the methods/states you require.
>
> The supervisor component/user can than query each component if it has this
> interface and proceed from there if it has.
>
> This 'extendability' is actually one of the major issues I wanted to solve in
> 2.x. The component itself has only a minimal life cycle interface and the rest
> is set into 'plugins'/'interfaces'.
>
While I see why you want that (you are the RTT-as-a-universal-framework
guy), I do see a lot of practical issues. The biggest issue being that
you will start to completely fragment what components can run on what
tools and make the whole "RTT ecosystem" (for lack of a better name) a
huge mess.

We're having that discussion *because* I want to avoid this. I could
live on with Roby and oroGen: they already provide all the tools I need
to "work around" the state machine you define to get what I want. We're
having that discussion because I think it would be a very bad idea.

So, yes, being able to extend is important. Now, I feel that the RTT
*must* provide a basic standard, supervise-able, interface to ALL RTT
components. And -- more importantly -- should make the component
developer aware that this interface is important.

Default Component states (Was Default exception handling in RTT

Submitted by peter on Mon, 2010-05-03 10:32.

To all lurking on this thread, could we have a 'voting' about this ?

In case of doubt, I follow the user's opinion, but since we only have Sylvain
and me arguing, I wonder how much user there is going on here actually :-)

Summary:

* The RTT 1.x component states are mixing component lifecycle and application
states. For example, configureHook() sets up method calls using 'getPeer()' or
checks if input (read) ports are connected. It could *also* configure a device
or so, but with limited flexibility, since the thread of the component was not
running yet. So some configuration could be necessary in updateHook(), for
example, if you were talking to a device bus. On the other hand, the component
has some clear 'application' error states, like RunTimeError, which a
component will only enter if user code instructs it to do so.

I wanted to change this in 2.0 to a reduced life cycle, where the only states
that a component has are independent of application states. For example,
RunTimeError would mean that an exception was leaked in updateHook(). If
errorHook() leaked an exception, the component would enter the FatalError
state, which is unrecoverable (hence 'fatal'). My main motivation for this was
to define what happens when user code throws exceptions. I wanted to have a
similar scheme as was happening with program scripts: it's not because one
program is in error, that the whole component should stop. One change
contributing to this filosophy is that the thread of a component now always
runs.

I should have known by looking at historical evidence, but with this change, I
stepped on Sylvain's turf.

He proposes (correct me if I'm wrong) to keep close to the current application
states, at least the RunTimeError state for user errors and use FatalError
(=stop+cleanup) if user code throws (in any place). If transition to fatal
fails, an unrecoverable-worse-than-fatal state is entered. His reasoning is
that supervision needs info of application health for every component, and
that this belongs in the interface every component.

Pleaes read full details below or in the thread.

I don't think the water is that deep between us, most developers use
configureHook/errorHook already for application states so I see the point, and
no one is complainging... We should mainly focus on user's ease of
programming/minimal coding effort, but what do the others think ?

Peter

On Wednesday 21 April 2010 14:31:17 Sylvain Joyeux wrote:
> Peter Soetens wrote:
> > On Wednesday 21 April 2010 12:37:09 Sylvain Joyeux wrote:
> >> Peter Soetens wrote:
> >>>>>> As far as I saw on the RTT2 ExecutionEngine code, exception handling
> >>>>>> is as follows (Peter: tell me if I'm wrong):
> >>>>>>
> >>>>>> * an uncaught exception in updateHook() transitions to
> >>>>>> RUNTIME_ERROR * an uncaught exception in errorHook() transitions to
> >>>>>> FATAL_ERROR
> >>>>>
> >>>>> Correct. Fatal error (should) leads to stopHook() + cleanupHook() and
> >>>>> then wait for component cleanup/removal.
> >>>>>
> >>>>>> This assumes that errorHook() is able to handle unspecified errors.
> >>>>>> This seems to broad for my POV. How we design components is that the
> >>>>>> transition to fatal() is basically a stop() + some sort of tentative
> >>>>>> cleanup. runtime_error is used as a runtime state categorization
> >>>>>> (i.e. a way to "regroup" internal states), but its interpretation as
> >>>>>> an actual error is situation dependent.
> >>>>>
> >>>>> It is. errorHook() is reserved for an RTT/Component specific error
> >>>>> state, not for application error states. So you would need to write
> >>>>> your application specific error states in updateHook().
> >>>>
> >>>> This completely negates the usefulness of the taskcontext state
> >>>> machine.
> >>>
> >>> ... for application specific state machines. From a component life
> >>> cycle view, these states are still necessary. I think we did it the
> >>> wrong way in 1.x, coupling component lifecycle states with application
> >>> states. They may overlap, and we can't/won't prevent that, but they
> >>> don't have to overlap.
> >>
> >> OK, then define a default application state machine (and see the number
> >> of state explode). Not having a default application state machine
> >> defined negates completely the use of
> >
> > You have to see this in the light of the state machines in the scripting.
> > These are by definition application specific state machines. So this made
> > us realize that there is a difference between the lifecycle of a
> > component (hooks) and of an application (states in scripts).
> >
> >>>>>> For instance, our motor controllers go into runtime_error when the
> >>>>>> motors can't be driven (because of hardware protection mechanisms
> >>>>>> for instance), but the electronics still *reads* the encoder + motor
> >>>>>> data. I.e. they are read-only when in runtime error and can be used
> >>>>>> if only reading is needed. fatal_error would be entered if we are
> >>>>>> not able to talk to the electronics anymore.
> >>>>>
> >>>>> These are all application states and should not be implemented in the
> >>>>> component lifecycle states.
> >>>>
> >>>> I don't follow you in this split between application and lifecycle
> >>>> states. The component goes into various states because of the
> >>>> application.
> >>>
> >>> Not from the viewpoint of the RTT or deployer. For example,
> >>> configureHook is to check if input ports are connected or if required
> >>> services are available. This is independent of a component requiring
> >>> additional configuration of parameters.
> >>
> >> There is definitely a mismatch between our uses of states. We use
> >> configureHook() to verify that the component can run. This means:
> >> checking if devices are there, if properties are set to sane values and
> >> so on. In effect, for device drivers, configureHook() is the place
> >> where the device gets accessed and configured.
> >
> > I agree here. RTT 2.x actually supports this better, since you can also
> > send 'commands' (in 2.x: send method calls) to a component before it is
> > started. This means that if configuration does some blocking/asynchronous
> > work or depends on a script to complete, this can all be done before
> > start().
> >
> >>>>>> In a way, runtime_error is not very useful in this case ...
> >>>>>>
> >>>>>> To get back to the point: I think that runtime_error should be used
> >>>>>> when the component is still able to provide a limited functionality,
> >>>>>> fatalError being used when the component does not provide any
> >>>>>> functionality anymore. Thus, the default exception handling of
> >>>>>> updateHook() should IMO transition to FATAL_ERROR: I don't see how a
> >>>>>> component can *know* what it is doing when an uncaught exception has
> >>>>>> been raised by updateHook().
> >>>>>>
> >>>>>> Thoughts ?
> >>>>>
> >>>>> I think you identified a painpoint when trying to apply the component
> >>>>> error states to application error states. It will never work. The
> >>>>> idea for run time error is that any code in updateHook might throw,
> >>>>> even if the user is unaware of this (during development for example).
> >>>>> In critical components, you can put safe state code in errorHook(),
> >>>>> for example, writing data to ports (which will never throw). If you
> >>>>> did a
> >>>>>
> >>>>> bad job there, you go to fatal error. So all cases are covered. We
> >>>>> don't want to go to fatal error immediately because this is an
> >>>>> unrecoverable state, meaning, RTT judged that it can no longer
> >>>>> execute that *instance* of a component.
> >>>>>
> >>>>> All the other stuff goes into updateHook and you need to define your
> >>>>> own application states in there, using own operations or attributes
> >>>>> or so.
> >>>>>
> >>>>> Makes sense ?
> >>>>
> >>>> I don't think it does ...
> >>>>
> >>>> First of all, most components will have nothing in errorHook(). Thus,
> >>>> you will have a still-running component that has failed in a way that
> >>>> was not predicted by the designer.
> >>>
> >>> We could install a default action in errorHook() ourselves.
> >>
> >> What could you meanginfully do in errorHook() that is completely generic
> >> and will handle the underlying problem (updateHook() threw an exception,
> >> the component is not functional anymore).
> >
> > Yeah, I wasn't making sense here.
> >
> >>>> From a more conceptual point of view, I don't think that a component
> >>>> should be allowed to run even if it had an unexpected exception. An
> >>>> exception means "the internal state of the component is unspecified as
> >>>> of now". The only thing that could make sense is to try to "emergency
> >>>> stop" it, which -- I though -- is what fatal error is there for.
> >>>
> >>> Agreed, but C++ exceptions do not cause an unspecified state. They
> >>> unwind the stack and cleanup resources by calling destructors. It's not
> >>> the same like a segfault. On the other hand, if your updateHook() has
> >>> the scenario port1.write (exception) port2.write, only the first write
> >>> will succeed, leading to a non consistent output. So yes, a leaked
> >>> exception is maybe more 'grave' than it is considered now.
> >>
> >> I don't agree there. An *uncaught* exception means that the component
> >> designer was not *expecting* this particular error. If the code is
> >> well-written (which is very unlikely), ressources will be freed and so
> >> on, but from a global logic point of view, the application will *not*
> >> know where it was (hey, otherwise it would have caught this exception).
> >
> > So actually we agree, in the end, the application does not know. So this
> > is a problematic state it ends up in.
>
> Yes, so it makes no sense to remain in a running state (which
> RUNTIME_ERROR is).
>
> >>> Realize that the scripts can also leak exceptions, because they call
> >>> user functions too. We also need to define a state when this happens, I
> >>> don't want to go into 'unrecoverable error' when this happens. For a
> >>> script, this would just cause the 'E'rror status of that script, while
> >>> other scripts/updateHook keep running. That's also why runtime error
> >>> only relates to an exception in updateHook(), while in runtime error,
> >>> the scripts etc keep on executing (unless they reach the Error status
> >>> too).
> >>>
> >>>> Second, the way you define fatal error states makes no sense to me.
> >>>> The whole point of having a component model a-la RTT is that it should
> >>>> be able to go back to a defined state (in my POV, through the
> >>>> fatalError() cycle).
> >>>>
> >>>>
> >>>>
> >>>> I.e. a "completely unrecoverable" error should only be a diagnostics
> >>>> estimation, for instance triggered because a component that went to an
> >>>> unspecified fatalError() (fatal-error-that-we-don't-know) refused to
> >>>> reconfigure and/or restart), thus showing that the component does not
> >>>> know how to recover.
> >>>
> >>> There is another reason: what if the RTT figures out that it can no
> >>> longer execute the component ? Fatal by definition means unrecoverable,
> >>> so let's keep it with these semantics. There is no way to recover from
> >>> the fatal error state in the current implementation. So fatal means:
> >>> unload/kill me please.
> >>>
> >>> What you describe sounds to me like a run-time error (recoverable), you
> >>> can still recover from it.
> >>
> >> Again, we have a different understanding of the state machine. Yes, it
> >> can recover from it, but that will require a stop()/configure()/start()
> >> cycle. How the state machine is interpreted by our supervision is:
> >>
> >> configure: the component verifies that everything it needs to be
> >> functional is there. This means: checking property values, port
> >> connections, accessing external processes/hardware when applicable. The
> >> goal of that step is to make start() as simple as possible, and have it
> >> most likely return true (i.e. have the longest and most likely to fail
> >> steps that can be done in advance in configure()).
> >
> > OK.
> >
> >> start: start the component functionality. I.e. turn on data
> >> acquisition for a driver (for instance).
> >
> > OK.
> >
> >> runtime_error: the component still provides a somewhat limited
> >> functionality. The actual semantic of this is very application
> >> dependent. In practice, we use orogen to specialize it into sub-states.
> >
> > Please define which triggers cause a transition to this state and from
> > this state away + which hooks are called.
>
> Here's the thing: I'm not using scripts, and I completely do not intend
> to use them. I do see their possible usefulness, it is just that I did
> not (yet) encounter a situation where they were needed.
>
> So: triggers
> * any application-defined situation which means that the component
> provides a limited functionality.
> * hooks: errorHook(), in the same situations than updateHook() (i.e.
> activity triggers)
> * getting out of there: component specific and component-decided
>
> >> stop: the component stops functioning, either because it reached its
> >> stated goal (case for a planner), or because it has been requested
> >
> > OK.
> >
> >> fatal: the component cannot provide its stated functionality anymore,
> >> and therefore stopped.
> >
> > so stopHook() is called automatically ? As above, define the triggers +
> > possible next states ?
>
> Triggers: internal component diagnostics which detected a situation
> representing a loss of functionality.
> Possible next states: STOPPED or PRE_OPERATIONAL (depending on whether
> the component needs a configure step).
>
> >> It should try to clean up as much as possible so
> >> that a configure()/start() cycle has a change to recover from the
> >> problem. In the same way than for runtime error, orogen specializes it
> >> into substates.
> >>
> >> This state machine allowed us to keep the updateHook() simple (since it
> >> does not have to deal with initialization/recovery/...), and has most of
> >> the information needed for supervision.
> >>
> >>> Maybe we should change it then to these semantics:
> >>>
> >>> Fatal error: can be entered from any state, triggered by in RTT code or
> >>> error recovery code. Causes stopHook()->cleanupHook() in transition (if
> >>> necessary). Only step left is delete component.
> >>>
> >>> Runtime error: triggered by exceptions in updateHook() or by user in
> >>> updateHook()/script.
> >>
> >> There is a funny thing: on the one hand you say "raised exceptions
> >> should leave the application in a well-defined state" and "if an
> >> exception is raised in errorHook()" we can't recover ever, we actually
> >> need to destroy everything". This seems contradictory to me.
> >
> > But makes sense to me: if your error recovery throws, it went really bad,
> > it means your last resort to pull things right did not succeed. There
> > *is* no way out, this *is* fatal, literally as in 'terminal'. No
> > transition succeeds.
>
> Here is my proposal:
> * RUNTIME_ERROR remains an application state. The component announces
> that it has limited functionality due to something non-nominal happening.
> * unexpected exceptions in running states (RUNNING and RUNTIME_ERROR)
> transition to fatal. This calls a fatalHook() which -- by default --
> calls stopHook() and cleanupHook(). The component can also transition to
> fatal to announce that something non-nominal happened that makes the
> component's service not available.
> * if fatalHook() and/or stopHook() raise, then we go into the
> "unrecoverable fault" (we can't even go into FATAL ...)
>
> >> As to the interpretation of "fatal": it depends on the point of view.
> >> From the point of view of the supervision, the "fatal" I described
> >> above *is* fatal as the component does not provide the service it should
> >> provide, and that happened because of something non-nominal.
> >
> > There must be a posibility to resolve these constraints we both have:
> >
> > 1. Define a transition/state when an exception is thrown in updateHook().
> > The last thing we want to do is call it again, the component is possibly
> > in a 'messy' state and it may throw 'ad infinitum'.
> >
> > 2. Define a transition/state when error recovery from point 1 failed as
> > well.
> >
> > 3. Define a transition/state when the RTT can no longer execute a
> > component. This might be the same as #2.
> >
> > From the RTT point of view, these are the things I *need* to define,
> > without even caring for application-level supervision. Supervision is a
> > fundamental part of every application (ie handle faults the component can
> > not solve by itself) so I am not against in adding support for that in
> > the TaskContext, on the other hand, you/me are biased and I wonder if
> > it's not better to stick to the minimal.
>
> Yes, but in my opinion a basic application state machine *is* part of
> the minimal.
>
> > A possible clean solution I see here is to define a supervision interface
> > that defines these extra states that your supervision software requires.
> > So your component inherits TaskContext + SuperviseInterface. where the
> > latter sets up a 'supervise' provided interface with the methods/states
> > you require.
> >
> > The supervisor component/user can than query each component if it has
> > this interface and proceed from there if it has.
> >
> > This 'extendability' is actually one of the major issues I wanted to
> > solve in 2.x. The component itself has only a minimal life cycle
> > interface and the rest is set into 'plugins'/'interfaces'.
>
> While I see why you want that (you are the RTT-as-a-universal-framework
> guy), I do see a lot of practical issues. The biggest issue being that
> you will start to completely fragment what components can run on what
> tools and make the whole "RTT ecosystem" (for lack of a better name) a
> huge mess.
>
> We're having that discussion *because* I want to avoid this. I could
> live on with Roby and oroGen: they already provide all the tools I need
> to "work around" the state machine you define to get what I want. We're
> having that discussion because I think it would be a very bad idea.
>
> So, yes, being able to extend is important. Now, I feel that the RTT
> *must* provide a basic standard, supervise-able, interface to ALL RTT
> components. And -- more importantly -- should make the component
> developer aware that this interface is important.
>

Default Component states (Was Default exception handling in RTT

Submitted by Sylvain Joyeux on Tue, 2010-05-04 10:24.

I've created a little table with both Peter and my proposal (attached to
this mail). It represents the possible transitions, the hooks that get
called and when each transition/hook gets called.

For the record, this idea of "separating the application state machine
from the component lifetime state machine" still does not make any sense
to me. Both state machines are obviously tied together (a motor
controller cannot be in CONTROLLING_MOTORS state while the component is
in FatalError).

The main differences between the two are:
- I add a UnrecoverableError state which is Peter's FatalError state.
FatalError, in my case, is recoverable, it only requires a restart of
the component.
- unhandled exceptions always end up terminating the component's
execution. I have no clue how a component could continue running (i.e.
providing functionality) after it raised an unexpected/unhandled
exception. The only sane thing to do *by default* is to try and stop
everything you can, and see if you can restart properly.

I also added, in my proposal, what to do with unhandled exceptions in
configureHook() and startHook(). I'm really less sure about those. It
just happens that, in Roby [supervision layer], I tried different things
and terminating tasks when the "start" command fails ended up being the
safest.

Sylvain

Default Component states (Was Default exception handling in RTT

Submitted by markus.klotzbuecher on Wed, 2010-05-05 20:24.

Hi Sylvain,

On Tue, May 04, 2010 at 12:18:49PM +0200, Sylvain Joyeux wrote:

> For the record, this idea of "separating the application state machine
> from the component lifetime state machine" still does not make any sense
> to me. Both state machines are obviously tied together (a motor
> controller cannot be in CONTROLLING_MOTORS state while the component is
> in FatalError).

Yes they are tied together! The desired relationship is that the
application FSM is a hierarchical substate of the "Running" state.

> The main differences between the two are:
> - I add a UnrecoverableError state which is Peter's FatalError state.
> FatalError, in my case, is recoverable, it only requires a restart of
> the component.
> - unhandled exceptions always end up terminating the component's
> execution. I have no clue how a component could continue running (i.e.
> providing functionality) after it raised an unexpected/unhandled

It can't continue running.

> exception. The only sane thing to do *by default* is to try and stop
> everything you can, and see if you can restart properly.

And that's still optimistic for a C++ application. An uncaught
exception is after all a programming error. Who knows which pointers
are dangling or which memory has leaked. If there were saftey issues
you would have to restart all components running the the process in a
new one.

Anyway, I do very much like the 2.0 life-cycle FSM. I doubt adding
more application specific states e.g. for degraded functionality will
really work. For instance if a piece of HW can work with two different
states of "degradedness" you are already in trouble. However if a user
can define these behaviours in a clean and transparent
(=introspectable) fashion in a sub-FSM of the Running state there is
no such problem.

Markus

Default Component states (Was Default exception handling in RTT

Submitted by Sylvain Joyeux on Thu, 2010-05-06 08:12.

Markus Klotzbuecher wrote:
> Hi Sylvain,
>
> On Tue, May 04, 2010 at 12:18:49PM +0200, Sylvain Joyeux wrote:
>
>
>> For the record, this idea of "separating the application state machine
>> from the component lifetime state machine" still does not make any sense
>> to me. Both state machines are obviously tied together (a motor
>> controller cannot be in CONTROLLING_MOTORS state while the component is
>> in FatalError).
>>
>
> Yes they are tied together! The desired relationship is that the
> application FSM is a hierarchical substate of the "Running" state.
>
>
>> The main differences between the two are:
>> - I add a UnrecoverableError state which is Peter's FatalError state.
>> FatalError, in my case, is recoverable, it only requires a restart of
>> the component.
>> - unhandled exceptions always end up terminating the component's
>> execution. I have no clue how a component could continue running (i.e.
>> providing functionality) after it raised an unexpected/unhandled
>>
>
> It can't continue running.
>
>
>> exception. The only sane thing to do *by default* is to try and stop
>> everything you can, and see if you can restart properly.
>>
>
> And that's still optimistic for a C++ application. An uncaught
> exception is after all a programming error. Who knows which pointers
> are dangling or which memory has leaked. If there were saftey issues
> you would have to restart all components running the the process in a
> new one.
>
Two cases in my opinion: the cleanup worked, and you are fine. It did
not work because of dangling pointers, and the process will crash. In
which case you will know for sure that you have to restart it. In
practice, the first case is the most common case that I observed. But YMMV.

In any case, the "Recovering" state being able to be recovered from does
not mean that you *have* to recover. You can have your policy (restart
the process), I can have mine (restart the component).
> Anyway, I do very much like the 2.0 life-cycle FSM.
It seems that you don't since you want uncaught exceptions to stop the
component's execution.
> I doubt adding
> more application specific states e.g. for degraded functionality will
> really work. For instance if a piece of HW can work with two different
> states of "degradedness" you are already in trouble. However if a user
> can define these behaviours in a clean and transparent
> (=introspectable) fashion in a sub-FSM of the Running state there is
> no such problem.
>
True. What I am saying is that by having a "Nominal" (Running) and
"Degraded" (RuntimeError) main states, you offer a basic understanding
of the component without the overload of having yet another
introspection mechanism.

It does not mean that you cannot have an introspection mechanism that
allows you to define substates of those. I'm just advocating that we
should have a basic state machine that covers ... the basics.

Default Component states (Was Default exception handling in RTT

Submitted by markus.klotzbuecher on Thu, 2010-05-06 09:28.

On Thu, May 06, 2010 at 10:09:20AM +0200, Sylvain Joyeux wrote:
> Markus Klotzbuecher wrote:
>
> Hi Sylvain,
>
> On Tue, May 04, 2010 at 12:18:49PM +0200, Sylvain Joyeux wrote:
>
>
>
> For the record, this idea of "separating the application state machine
> from the component lifetime state machine" still does not make any sense
> to me. Both state machines are obviously tied together (a motor
> controller cannot be in CONTROLLING_MOTORS state while the component is
> in FatalError).
>
>
> Yes they are tied together! The desired relationship is that the
> application FSM is a hierarchical substate of the "Running" state.
>
>
>
> The main differences between the two are:
> - I add a UnrecoverableError state which is Peter's FatalError state.
> FatalError, in my case, is recoverable, it only requires a restart of
> the component.
> - unhandled exceptions always end up terminating the component's
> execution. I have no clue how a component could continue running (i.e.
> providing functionality) after it raised an unexpected/unhandled
>
>
> It can't continue running.
>
>
>
> exception. The only sane thing to do *by default* is to try and stop
> everything you can, and see if you can restart properly.
>
>
> And that's still optimistic for a C++ application. An uncaught
> exception is after all a programming error. Who knows which pointers
> are dangling or which memory has leaked. If there were saftey issues
> you would have to restart all components running the the process in a
> new one.
>
>
> Two cases in my opinion: the cleanup worked, and you are fine. It did not work

Well your not really fine IMO. Something unanticipated went wrong and
might have corrupted memory of other components. Possibly this
approach will work most of time and therefore be acceptable for
certain applications.

> because of dangling pointers, and the process will crash. In which case you
> will know for sure that you have to restart it. In practice, the first case is
> the most common case that I observed. But YMMV.
>
> In any case, the "Recovering" state being able to be recovered from does not
> mean that you *have* to recover. You can have your policy (restart the
> process), I can have mine (restart the component).
>
> Anyway, I do very much like the 2.0 life-cycle FSM.
>
> It seems that you don't since you want uncaught exceptions to stop the
> component's execution.

Because compared to the set of all possible C++ errors in a component
the "uncaught exception" seems a rather obvious and easy to avoid
problem. Adding a recover mechanisms for such a special case will
encourage writing buggy components which can recover (which they can't
really, s.o.) instead of doing things right in the first place.

> I doubt adding
> more application specific states e.g. for degraded functionality will
> really work. For instance if a piece of HW can work with two different
> states of "degradedness" you are already in trouble. However if a user
> can define these behaviours in a clean and transparent
> (=introspectable) fashion in a sub-FSM of the Running state there is
> no such problem.
>
>
> True. What I am saying is that by having a "Nominal" (Running) and "Degraded"
> (RuntimeError) main states, you offer a basic understanding of the component
> without the overload of having yet another introspection mechanism.
>
> It does not mean that you cannot have an introspection mechanism that allows
> you to define substates of those. I'm just advocating that we should have a
> basic state machine that covers ... the basics.

I understand you are advocating a more practical approach. I hope that
my FSM implementation will be able to provide an alternative. The
reason I would prefer not to add such application specific states is
that it will break the compositionality chain: The application FSM
model of the highest level of abstraction is contained in the
"Running" state of the lower level life-cycle FSM. This way a FSM
compiler needs to know nothing about the life-cycle FSM, while
otherwise it would have to have some knowledge about the life-cycle
FSM in order to generate code for that.

Markus

Default Component states (Was Default exception handling in RTT

Submitted by Sylvain Joyeux on Thu, 2010-05-06 12:36.

Markus Klotzbuecher wrote:
>> And that's still optimistic for a C++ application. An uncaught
>> exception is after all a programming error. Who knows which pointers
>> are dangling or which memory has leaked. If there were saftey issues
>> you would have to restart all components running the the process in a
>> new one.
>>
>>
>> Two cases in my opinion: the cleanup worked, and you are fine. It did not work
>>
>
> Well your not really fine IMO. Something unanticipated went wrong and
> might have corrupted memory of other components. Possibly this
> approach will work most of time and therefore be acceptable for
> certain applications.
>

>
>> because of dangling pointers, and the process will crash. In which case you
>> will know for sure that you have to restart it. In practice, the first case is
>> the most common case that I observed. But YMMV.
>>
>> In any case, the "Recovering" state being able to be recovered from does not
>> mean that you *have* to recover. You can have your policy (restart the
>> process), I can have mine (restart the component).
>>
>> Anyway, I do very much like the 2.0 life-cycle FSM.
>>
>> It seems that you don't since you want uncaught exceptions to stop the
>> component's execution.
>>
>
> Because compared to the set of all possible C++ errors in a component
> the "uncaught exception" seems a rather obvious and easy to avoid
> problem. Adding a recover mechanisms for such a special case will
> encourage writing buggy components which can recover (which they can't
> really, s.o.) instead of doing things right in the first place.
>
First of all, uncaught exceptions *should not* lead to an undefined
state of the software itself (i.e. there should be no dangling pointers
and corrupted memory). What they *do* announce is that the global
component behaviour is not guaranteed anymore. Hence reacting by
stopping the component.

If you believe that having exceptions leads to dangling pointers, you
should not use exceptions *at all*. Because you would end up with
dangling pointers *even though you are catching the exception*.

I actually do let my hooks throw exceptions. They are not "unexpected",
they are "uncaught". And I do that because then our components go to a
state that is relevant for supervision ("Recovering").
I'd rather have that than requiring all the component writers to have
try { } catch(...) { } blocks in all hooks.

I will again point out what I said in my previous mail: the fact that
Recovering can be recovered does *not* mean that it should be. We can
both have our desired behaviour.
>> I doubt adding
>> more application specific states e.g. for degraded functionality will
>> really work. For instance if a piece of HW can work with two different
>> states of "degradedness" you are already in trouble. However if a user
>> can define these behaviours in a clean and transparent
>> (=introspectable) fashion in a sub-FSM of the Running state there is
>> no such problem.
>>
>>
>> True. What I am saying is that by having a "Nominal" (Running) and "Degraded"
>> (RuntimeError) main states, you offer a basic understanding of the component
>> without the overload of having yet another introspection mechanism.
>>
>> It does not mean that you cannot have an introspection mechanism that allows
>> you to define substates of those. I'm just advocating that we should have a
>> basic state machine that covers ... the basics.
>>
>
> I understand you are advocating a more practical approach. I hope that
> my FSM implementation will be able to provide an alternative.
I think you don't get my point there. Even if I end up being conquered
at first sight by your FSM compiler, I am not planning to convert all
the modules we have to it (it would not be practical). There is
therefore a hole in Peter's proposition that the Recovering state would
fill.

Moreover, I do not want that a RTT 2.0 component *needs* to use your FSM
implementation to be supervisable. Because, let's face it, the C++
interface it is already too complex for some people, so if they have to
learn I-don-t-know-how-many tools to write a good orocos component,
they're going to go to some other framework.

> The
> reason I would prefer not to add such application specific states is
> that it will break the compositionality chain: The application FSM
> model of the highest level of abstraction is contained in the
> "Running" state of the lower level life-cycle FSM. This way a FSM
> compiler needs to know nothing about the life-cycle FSM, while
> otherwise it would have to have some knowledge about the life-cycle
> FSM in order to generate code for that.
>
Then you are blocking other people's solutions for the sake of your own.

In theory, the FSMs are composable. The problem is that your compiler
can't deal with an underlying FSM that he did not generate. Tell me: how
do you handle the configure, start and stop transitions (which are both
application and RTT transitions) if it is the case ?

For the record, I don't see Recovering as application specific. I could
live with Degraded not being there, but I would feel that it is sad indeed.

Default Component states (Was Default exception handling in RTT

Submitted by markus.klotzbuecher on Thu, 2010-05-06 19:56.

On Thu, May 06, 2010 at 02:35:34PM +0200, Sylvain Joyeux wrote:
> Markus Klotzbuecher wrote:
> >> And that's still optimistic for a C++ application. An uncaught
> >> exception is after all a programming error. Who knows which pointers
> >> are dangling or which memory has leaked. If there were saftey issues
> >> you would have to restart all components running the the process in a
> >> new one.
> >>
> >>
> >> Two cases in my opinion: the cleanup worked, and you are fine. It did not work
> >>
> >
> > Well your not really fine IMO. Something unanticipated went wrong and
> > might have corrupted memory of other components. Possibly this
> > approach will work most of time and therefore be acceptable for
> > certain applications.
> >
>
> >
> >> because of dangling pointers, and the process will crash. In which case you
> >> will know for sure that you have to restart it. In practice, the first case is
> >> the most common case that I observed. But YMMV.
> >>
> >> In any case, the "Recovering" state being able to be recovered from does not
> >> mean that you *have* to recover. You can have your policy (restart the
> >> process), I can have mine (restart the component).
> >>
> >> Anyway, I do very much like the 2.0 life-cycle FSM.
> >>
> >> It seems that you don't since you want uncaught exceptions to stop the
> >> component's execution.
> >>
> >
> > Because compared to the set of all possible C++ errors in a component
> > the "uncaught exception" seems a rather obvious and easy to avoid
> > problem. Adding a recover mechanisms for such a special case will
> > encourage writing buggy components which can recover (which they can't
> > really, s.o.) instead of doing things right in the first place.
> >
> First of all, uncaught exceptions *should not* lead to an undefined
> state of the software itself (i.e. there should be no dangling pointers
> and corrupted memory). What they *do* announce is that the global
> component behaviour is not guaranteed anymore. Hence reacting by
> stopping the component.

Unfortunately "should not" does not imply "will not".

> If you believe that having exceptions leads to dangling pointers, you
> should not use exceptions *at all*. Because you would end up with
> dangling pointers *even though you are catching the exception*.

Yes, I personally do believe that exceptions should be better avoided.

> I actually do let my hooks throw exceptions. They are not "unexpected",
> they are "uncaught". And I do that because then our components go to a
> state that is relevant for supervision ("Recovering").

Interesting idea, you let exceptions slip through in order to
transition the FSM. But how can you know this happens only for your
intended-and-uncaught exceptions and not for others you forgot about?

> I'd rather have that than requiring all the component writers to have
> try { } catch(...) { } blocks in all hooks.

I agree with you that this is kind of ugly, but isn't that the price
you pay for using exceptions? Secondly can't this be trivially
generated?

> I will again point out what I said in my previous mail: the fact that
> Recovering can be recovered does *not* mean that it should be. We can
> both have our desired behaviour.

That's an often heard argument for adding a feature which nobody else
needs.

> >> I doubt adding
> >> more application specific states e.g. for degraded functionality will
> >> really work. For instance if a piece of HW can work with two different
> >> states of "degradedness" you are already in trouble. However if a user
> >> can define these behaviours in a clean and transparent
> >> (=introspectable) fashion in a sub-FSM of the Running state there is
> >> no such problem.
> >>
> >>
> >> True. What I am saying is that by having a "Nominal" (Running) and "Degraded"
> >> (RuntimeError) main states, you offer a basic understanding of the component
> >> without the overload of having yet another introspection mechanism.
> >>
> >> It does not mean that you cannot have an introspection mechanism that allows
> >> you to define substates of those. I'm just advocating that we should have a
> >> basic state machine that covers ... the basics.
> >>
> >
> > I understand you are advocating a more practical approach. I hope that
> > my FSM implementation will be able to provide an alternative.
> I think you don't get my point there. Even if I end up being conquered
> at first sight by your FSM compiler, I am not planning to convert all
> the modules we have to it (it would not be practical). There is
> therefore a hole in Peter's proposition that the Recovering state would
> fill.

For the record: my FSM will function as a plugin. The FSM compiler was
just an example to illustrate that these are really two different
levels of abstraction.

> Moreover, I do not want that a RTT 2.0 component *needs* to use your FSM
> implementation to be supervisable. Because, let's face it, the C++
> interface it is already too complex for some people, so if they have to
> learn I-don-t-know-how-many tools to write a good orocos component,
> they're going to go to some other framework.

Nobody will have to use my solution!

> > The
> > reason I would prefer not to add such application specific states is
> > that it will break the compositionality chain: The application FSM
> > model of the highest level of abstraction is contained in the
> > "Running" state of the lower level life-cycle FSM. This way a FSM
> > compiler needs to know nothing about the life-cycle FSM, while
> > otherwise it would have to have some knowledge about the life-cycle
> > FSM in order to generate code for that.
> >
> Then you are blocking other people's solutions for the sake of your own.

No need to get upset. I'm not blocking anything. I just think that
adding states to this FSM needs to considered very carefully. Once a
state is there and people use it to build their components there is no
way back.

> In theory, the FSMs are composable. The problem is that your compiler
> can't deal with an underlying FSM that he did not generate. Tell me: how
> do you handle the configure, start and stop transitions (which are both
> application and RTT transitions) if it is the case ?

I have no compiler (s.o). It's a plugin which glues together both FSM.

> For the record, I don't see Recovering as application specific. I could
> live with Degraded not being there, but I would feel that it is sad
> indeed.

I agree that Recovering is not application specific, I'm just somewhat
in doubt it can really work.

Markus

Default Component states (Was Default exception handling in RTT

Submitted by Sylvain Joyeux on Fri, 2010-05-07 09:08.

Markus Klotzbuecher wrote:
>> First of all, uncaught exceptions *should not* lead to an undefined
>> state of the software itself (i.e. there should be no dangling pointers
>> and corrupted memory). What they *do* announce is that the global
>> component behaviour is not guaranteed anymore. Hence reacting by
>> stopping the component.
>>
>
> Unfortunately "should not" does not imply "will not".
>
>
>> If you believe that having exceptions leads to dangling pointers, you
>> should not use exceptions *at all*. Because you would end up with
>> dangling pointers *even though you are catching the exception*.
>>
>
> Yes, I personally do believe that exceptions should be better avoided.
>

>> I actually do let my hooks throw exceptions. They are not "unexpected",
>> they are "uncaught". And I do that because then our components go to a
>> state that is relevant for supervision ("Recovering").
>>
>
> Interesting idea, you let exceptions slip through in order to
> transition the FSM. But how can you know this happens only for your
> intended-and-uncaught exceptions and not for others you forgot about?
>
I don't. Since the component, in any case, goes into FatalError it does
not matter: the component stops running.

The feeling that I have is that you don't like exceptions, and you don't
want exceptions to be anywhere near the components you are using. I'm a
bit left to wonder how much your position on default exception handling
is about "exceptions suck" instead of being about "what does the
exception *if used right* mean, and how can I represent that in my
component FSM". Which is what this discussion should be about.

In other words, I think that the current RTT 2.0 state machine is
imparing "exception use", and I have the feeling that your position on
it comes from your dislike of exceptions. Here's the message: I do like
exceptions, and I do think they are useful.

>> I'd rather have that than requiring all the component writers to have
>> try { } catch(...) { } blocks in all hooks.
>>
>
> I agree with you that this is kind of ugly, but isn't that the price
> you pay for using exceptions? Secondly can't this be trivially
> generated?
>
Why, in god's name, should there be such a price for using exceptions ?
Exceptions mostly reduce the amount of code needed

Moreover, we're not talking about code generation, but about plain C++
components. Finally, I think that it is a useful feature for anybody
using exceptions, so I don't see the point of limiting it.

>> I will again point out what I said in my previous mail: the fact that
>> Recovering can be recovered does *not* mean that it should be. We can
>> both have our desired behaviour.
>>
>
> That's an often heard argument for adding a feature which nobody else
> needs.
>
Well, I'm admittedly leading the effort, but there is right now ~10
people using it at DFKI.

>>> The
>>> reason I would prefer not to add such application specific states is
>>> that it will break the compositionality chain: The application FSM
>>> model of the highest level of abstraction is contained in the
>>> "Running" state of the lower level life-cycle FSM. This way a FSM
>>> compiler needs to know nothing about the life-cycle FSM, while
>>> otherwise it would have to have some knowledge about the life-cycle
>>> FSM in order to generate code for that.
>>>
>>>
>> Then you are blocking other people's solutions for the sake of your own.
>>
>
> No need to get upset. I'm not blocking anything. I just think that
> adding states to this FSM needs to considered very carefully. Once a
> state is there and people use it to build their components there is no
> way back.
>
Sorry if I sounded upset, I did not mean that.
>> In theory, the FSMs are composable. The problem is that your compiler
>> can't deal with an underlying FSM that he did not generate. Tell me: how
>> do you handle the configure, start and stop transitions (which are both
>> application and RTT transitions) if it is the case ?
>>
>
> I have no compiler (s.o). It's a plugin which glues together both FSM.
>
Could you elaborate on the fact that your plugin already has to handle
the start/stop/configure transitions that are both RTT and application
transitions ?
>> For the record, I don't see Recovering as application specific. I could
>> live with Degraded not being there, but I would feel that it is sad
>> indeed.
>>
>
> I agree that Recovering is not application specific, I'm just somewhat
> in doubt it can really work.
>
Well. It *does* work ! We are using it successfully already.

Default Component states (Was Default exception handling in RTT

Submitted by markus.klotzbuecher on Fri, 2010-05-07 20:44.

On Fri, May 07, 2010 at 11:03:58AM +0200, Sylvain Joyeux wrote:
> Markus Klotzbuecher wrote:
> >> First of all, uncaught exceptions *should not* lead to an undefined
> >> state of the software itself (i.e. there should be no dangling pointers
> >> and corrupted memory). What they *do* announce is that the global
> >> component behaviour is not guaranteed anymore. Hence reacting by
> >> stopping the component.
> >>
> >
> > Unfortunately "should not" does not imply "will not".
> >
> >
> >> If you believe that having exceptions leads to dangling pointers, you
> >> should not use exceptions *at all*. Because you would end up with
> >> dangling pointers *even though you are catching the exception*.
> >>
> >
> > Yes, I personally do believe that exceptions should be better avoided.
> >
>
> >> I actually do let my hooks throw exceptions. They are not "unexpected",
> >> they are "uncaught". And I do that because then our components go to a
> >> state that is relevant for supervision ("Recovering").
> >>
> >
> > Interesting idea, you let exceptions slip through in order to
> > transition the FSM. But how can you know this happens only for your
> > intended-and-uncaught exceptions and not for others you forgot about?
> >
> I don't. Since the component, in any case, goes into FatalError it does
> not matter: the component stops running.
>
> The feeling that I have is that you don't like exceptions, and you don't
> want exceptions to be anywhere near the components you are using. I'm a
> bit left to wonder how much your position on default exception handling
> is about "exceptions suck" instead of being about "what does the
> exception *if used right* mean, and how can I represent that in my
> component FSM". Which is what this discussion should be about.

I don't dislike exceptions in general. I only find it a questionable
practice to use them as a sort of event service between user code and
core RTT. You are essentially using exceptions to alter regular
component execution flow.

Secondly really writing exception-safe code without leaking resources
can be hard, wouldn't it be better to put that effort into writing
robust code in the first place? I would be worried it will encourage
users to write poor components, because hey, no problem - they can
recover!

> In other words, I think that the current RTT 2.0 state machine is
> imparing "exception use", and I have the feeling that your position on
> it comes from your dislike of exceptions. Here's the message: I do like
> exceptions, and I do think they are useful.

I can live with that :-)

> >> I'd rather have that than requiring all the component writers to have
> >> try { } catch(...) { } blocks in all hooks.
> >>
> >
> > I agree with you that this is kind of ugly, but isn't that the price
> > you pay for using exceptions? Secondly can't this be trivially
> > generated?
> >
> Why, in god's name, should there be such a price for using exceptions ?
> Exceptions mostly reduce the amount of code needed
" .. to handle errors without missing some (which is a very common
problem with return values)."

Well of course you reduce the amount of code if you omit the proper
handling and let core code catch exceptions at point where it is
definitely too late to do anything useful.

> Moreover, we're not talking about code generation, but about plain C++
> components. Finally, I think that it is a useful feature for anybody
> using exceptions, so I don't see the point of limiting it.
>
> >> I will again point out what I said in my previous mail: the fact that
> >> Recovering can be recovered does *not* mean that it should be. We can
> >> both have our desired behaviour.
> >>
> >
> > That's an often heard argument for adding a feature which nobody else
> > needs.
> >
> Well, I'm admittedly leading the effort, but there is right now ~10
> people using it at DFKI.
>
> >>> The
> >>> reason I would prefer not to add such application specific states is
> >>> that it will break the compositionality chain: The application FSM
> >>> model of the highest level of abstraction is contained in the
> >>> "Running" state of the lower level life-cycle FSM. This way a FSM
> >>> compiler needs to know nothing about the life-cycle FSM, while
> >>> otherwise it would have to have some knowledge about the life-cycle
> >>> FSM in order to generate code for that.
> >>>
> >>>
> >> Then you are blocking other people's solutions for the sake of your own.
> >>
> >
> > No need to get upset. I'm not blocking anything. I just think that
> > adding states to this FSM needs to considered very carefully. Once a
> > state is there and people use it to build their components there is no
> > way back.
> >
> Sorry if I sounded upset, I did not mean that.
> >> In theory, the FSMs are composable. The problem is that your compiler
> >> can't deal with an underlying FSM that he did not generate. Tell me: how
> >> do you handle the configure, start and stop transitions (which are both
> >> application and RTT transitions) if it is the case ?
> >>
> >
> > I have no compiler (s.o). It's a plugin which glues together both FSM.
> >
> Could you elaborate on the fact that your plugin already has to handle
> the start/stop/configure transitions that are both RTT and application
> transitions ?

In the first place there is no relationship at all between the two
FSM. As Peter describes later they can could run completely
independendly from each other. As this coupling is probably too lose
for most cases a user can for example use start- and stopHooks to send
respective start and stop events to the FSM in order to cause the
required transitions.

> >> For the record, I don't see Recovering as application specific. I could
> >> live with Degraded not being there, but I would feel that it is sad
> >> indeed.
> >>
> >
> > I agree that Recovering is not application specific, I'm just somewhat
> > in doubt it can really work.
> >
> Well. It *does* work ! We are using it successfully already.

Well running code is an argument I must recognize. So let the
community decide!

Markus

Default Component states (Was Default exception handling in RTT

Submitted by Sylvain Joyeux on Fri, 2010-05-07 09:08.

> Why, in god's name, should there be such a price for using exceptions ?
> Exceptions mostly reduce the amount of code needed
>
Finishing my own sentence:

needed to handle errors without missing some (which is a very common
problem with return values).

Default Component states (Was Default exception handling in RTT

Submitted by peter on Thu, 2010-05-06 20:28.

On Thu, May 6, 2010 at 21:54, Markus Klotzbuecher <
markus [dot] klotzbuecher [..] ...> wrote:

> On Thu, May 06, 2010 at 02:35:34PM +0200, Sylvain Joyeux wrote:
>
> > If you believe that having exceptions leads to dangling pointers, you
> > should not use exceptions *at all*. Because you would end up with
> > dangling pointers *even though you are catching the exception*.
>
> Yes, I personally do believe that exceptions should be better avoided.
>

We (RTT nor CORBA) can't avoid exceptions in this case:

Component A created
Component B created
Component B looks up A and prepares to call method M on A.
Component A destroyed
Component B calls A::M in its updateHook()

RTT Methods and CORBA method calls have no other choice than to throw.
There's no other way to notify the user. Actually there is, but that is an
opt-in mechanism, where you can be notified of appearing/disappearing
'services' and act to it in time, but even then you can have races. In/Out
ports will never throw.

This is the reason why I initially thought of going to RunTimeError when an
exception is thrown in updateHook(). The solution proposed on this list is
to use try {} catch blocks in updateHook() instead; and anything else is a
(recoverable) error. I think you guys are right, calling methods is weaker
than sending data through locally attached ports, the try/catch block makes
this explict: "it may fail right here, in this call"...

OK for me.

Peter

Default Component states (Was Default exception handling in RTT

Submitted by bruyninc on Fri, 2010-05-07 05:44.

On Thu, 6 May 2010, Peter Soetens wrote:

> On Thu, May 6, 2010 at 21:54, Markus Klotzbuecher <markus [dot] klotzbuecher [..] ...markus [dot] klotzbuecher [..] ...>> wrote:
> On Thu, May 06, 2010 at 02:35:34PM +0200, Sylvain Joyeux wrote:
>
>> If you believe that having exceptions leads to dangling pointers, you
>> should not use exceptions *at all*. Because you would end up with
>> dangling pointers *even though you are catching the exception*.
>
> Yes, I personally do believe that exceptions should be better avoided.
>
> We (RTT nor CORBA) can't avoid exceptions in this case:
>
> Component A created
> Component B created
> Component B looks up A and prepares to call method M on A.
> Component A destroyed
> Component B calls A::M in its updateHook()
>
> RTT Methods and CORBA method calls have no other choice than to throw. There's no other way to notify the user. Actually there is, but that is an opt-in mechanism, where you can be notified of appearing/disappearing 'services' and act to it in time, but even then you can have races. In/Out ports will never throw.
>
> This is the reason why I initially thought of going to RunTimeError when an exception is thrown in updateHook(). The solution proposed on this list is to use try {} catch blocks in updateHook() instead; and anything else is a (recoverable) error. I think you guys are right, calling methods is weaker than sending data through locally attached ports, the try/catch block makes this explict: "it may fail right here, in this call"...
>
> OK for me.

I really believe more and more that components should not _call_ each
other, but only use asynchronous messages! Using method calls is something
for _classes_ , not for components, sigh... There _is_ a difference ("price
to pay") when going from object-based systems to component-based systems...

Allowing components to be created/destroyed at will indeed leads to
inconsistencies in your system. On the other hand, the system can guarantee
that messages can always be sent/read by not destroying the communication
_components_ before all of the communicating components are cleanly
destroyed.

A tool chain should be able to create the glue code to turn method calls
into messages if functionality is distributed over components, or, to turn
messages into method calls if functionality is deployed in the same thread/address
space.

Herman

Default Component states (Was Default exception handling in RTT

Submitted by peter on Fri, 2010-05-07 09:36.

On Friday 07 May 2010 07:41:14 Herman Bruyninckx wrote:
> On Thu, 6 May 2010, Peter Soetens wrote:
> > On Thu, May 6, 2010 at 21:54, Markus Klotzbuecher
> > <markus [dot] klotzbuecher [..] ...markus [dot] klotzbuecher [..] ...
> >euven.be>> wrote:
> >
> > On Thu, May 06, 2010 at 02:35:34PM +0200, Sylvain Joyeux wrote:
> >> If you believe that having exceptions leads to dangling pointers, you
> >> should not use exceptions *at all*. Because you would end up with
> >> dangling pointers *even though you are catching the exception*.
> >
> > Yes, I personally do believe that exceptions should be better avoided.
> >
> > We (RTT nor CORBA) can't avoid exceptions in this case:
> >
> > Component A created
> > Component B created
> > Component B looks up A and prepares to call method M on A.
> > Component A destroyed
> > Component B calls A::M in its updateHook()
> >
> > RTT Methods and CORBA method calls have no other choice than to throw.
> > There's no other way to notify the user. Actually there is, but that is
> > an opt-in mechanism, where you can be notified of appearing/disappearing
> > 'services' and act to it in time, but even then you can have races.
> > In/Out ports will never throw.
> >
> > This is the reason why I initially thought of going to RunTimeError when
> > an exception is thrown in updateHook(). The solution proposed on this
> > list is to use try {} catch blocks in updateHook() instead; and anything
> > else is a (recoverable) error. I think you guys are right, calling
> > methods is weaker than sending data through locally attached ports, the
> > try/catch block makes this explict: "it may fail right here, in this
> > call"...
> >
> > OK for me.
>
> I really believe more and more that components should not _call_ each
> other, but only use asynchronous messages! Using method calls is something
> for _classes_ , not for components, sigh... There _is_ a difference ("price
> to pay") when going from object-based systems to component-based systems...

You are right, but I don't agree with you. First of all, inter-process 'calls'
are de facto implemented with asynchronous messages, there's no other way to
do it. Second, we do need the query/reply pattern in inter-component
communication, for example for all coordination tasks, and a method call is
just a user friendly / well known interface to that pattern.

The disadvantage of encapsulating the query in a method (ie the abstraction)
is that there's no way to see if the query itself succeeded or not, since
you're only waiting for the reply (ie the method returns). The solution to
this *in C++* is to throw an exception. If we were using only *C*, we couldn't
even use the method abstraction and always had to write explicitly the
request/reply pair, or, as CORBA did it, provide an extra function argument to
the method that indicates success or failure of the request/reply pattern
itself.

>
> Allowing components to be created/destroyed at will indeed leads to
> inconsistencies in your system. On the other hand, the system can guarantee
> that messages can always be sent/read by not destroying the communication
> _components_ before all of the communicating components are cleanly
> destroyed.

We don't 'allow' it ! It just *happens* in *distributed* computing ! It's such
a devastating omnipresent guarantee that sooner or later your 'peer' will go
away, that it's a shame that the CORBA standard (nor the Orocos project :( )
did not start from this a priori knowledge.

That is why I'm talking all the time about 'services': they come and they go,
depending on factors we *can't control*:
* a camera being plugged in or out
* the reception of new knowledge or algorithms
* a new decision maker that wants to control task execution
etc...

That's why OSGi is such a big deal and inspiration for robotics: it started
from the assumption that you can't rely on the fact that by the time you *use*
the service (ie after you already discovered it), it might be gone already...

AFAIKT, the ROS middleware is on the right track here too: if your standard
methodology of running applications is killing nodes at any time and starting
them back at any time, the middleware must have a way to handle this situation
and reconnect automatically. ROS middleware started from the assumption that
everything is distributed, I can tell by looking at it.

Apologies for the shouting! and *bold* statements, these are fundamental
insights I only 'recently' acknowledged and this is really the big thing I did
wrong during my PhD, and I'm trying to recover the wreckage in 2.0.

>
> A tool chain should be able to create the glue code to turn method calls
> into messages if functionality is distributed over components, or, to turn
> messages into method calls if functionality is deployed in the same
> thread/address space.

Completely agree. This happens already in the RTT. It happens by design since
we don't have toolchain yet. Because of the 'by design' choice, our code paths
are longer than if it would be generated, because our code must be ready for
both cases. CORBA/TAO suffers from this same problem btw.

Peter

Default Component states (Was Default exception handling in RTT

Submitted by bruyninc on Fri, 2010-05-07 11:00.

On Fri, 7 May 2010, Peter Soetens wrote:

> On Friday 07 May 2010 07:41:14 Herman Bruyninckx wrote:
>> On Thu, 6 May 2010, Peter Soetens wrote:
>>> On Thu, May 6, 2010 at 21:54, Markus Klotzbuecher
>>> <markus [dot] klotzbuecher [..] ...markus [dot] klotzbuecher [..] ...
>>> euven.be>> wrote:
>>>
>>> On Thu, May 06, 2010 at 02:35:34PM +0200, Sylvain Joyeux wrote:
>>>> If you believe that having exceptions leads to dangling pointers, you
>>>> should not use exceptions *at all*. Because you would end up with
>>>> dangling pointers *even though you are catching the exception*.
>>>
>>> Yes, I personally do believe that exceptions should be better avoided.
>>>
>>> We (RTT nor CORBA) can't avoid exceptions in this case:
>>>
>>> Component A created
>>> Component B created
>>> Component B looks up A and prepares to call method M on A.
>>> Component A destroyed
>>> Component B calls A::M in its updateHook()
>>>
>>> RTT Methods and CORBA method calls have no other choice than to throw.
>>> There's no other way to notify the user. Actually there is, but that is
>>> an opt-in mechanism, where you can be notified of appearing/disappearing
>>> 'services' and act to it in time, but even then you can have races.
>>> In/Out ports will never throw.
>>>
>>> This is the reason why I initially thought of going to RunTimeError when
>>> an exception is thrown in updateHook(). The solution proposed on this
>>> list is to use try {} catch blocks in updateHook() instead; and anything
>>> else is a (recoverable) error. I think you guys are right, calling
>>> methods is weaker than sending data through locally attached ports, the
>>> try/catch block makes this explict: "it may fail right here, in this
>>> call"...
>>>
>>> OK for me.
>>
>> I really believe more and more that components should not _call_ each
>> other, but only use asynchronous messages! Using method calls is something
>> for _classes_ , not for components, sigh... There _is_ a difference ("price
>> to pay") when going from object-based systems to component-based systems...
>
> You are right, but I don't agree with you. First of all, inter-process 'calls'
> are de facto implemented with asynchronous messages, there's no other way to
> do it.

So, the "error" situation you created some paragraphs above will not show
up, will it?

> Second, we do need the query/reply pattern in inter-component
> communication, for example for all coordination tasks, and a method call is
> just a user friendly / well known interface to that pattern.

Yes, but (i) it is the _wrong_ one in a (real!) component-based system,
(ii) the system can (should) guarantee that the coordinator lives longer
than all its coordinated components, and (iii) _components_ do not have to
communicate with query/reply, but they should _only_ communicate with the
communication infrastructure components, and with their Coordinator...

In other words: a _real_ component-based infrastructure can do _without_
exceptions and their difficulties. Please, _don't_ try to bring "method
call" abstractions into the inter-component interactions!

> The disadvantage of encapsulating the query in a method (ie the abstraction)
> is that there's no way to see if the query itself succeeded or not, since
> you're only waiting for the reply (ie the method returns). The solution to
> this *in C++* is to throw an exception. If we were using only *C*, we couldn't
> even use the method abstraction and always had to write explicitly the
> request/reply pair, or, as CORBA did it, provide an extra function argument to
> the method that indicates success or failure of the request/reply pattern
> itself.
>
>>
>> Allowing components to be created/destroyed at will indeed leads to
>> inconsistencies in your system. On the other hand, the system can guarantee
>> that messages can always be sent/read by not destroying the communication
>> _components_ before all of the communicating components are cleanly
>> destroyed.
>
> We don't 'allow' it ! It just *happens* in *distributed* computing ! It's such
> a devastating omnipresent guarantee that sooner or later your 'peer' will go
> away, that it's a shame that the CORBA standard (nor the Orocos project :( )
> did not start from this a priori knowledge.

CORBA and RTT did not do it right, that's true. So, let's do it better: let
the communication components (best hidden inside the infrastructure and not
explicitly visible to the application components!) and the Coordinator live
longer than any inter-component interaction. Problem solved!

> That is why I'm talking all the time about 'services': they come and they go,
> depending on factors we *can't control*:
> * a camera being plugged in or out
> * the reception of new knowledge or algorithms
> * a new decision maker that wants to control task execution
> etc...

What you _can_ control is the scope over which these (inevitable, indeed)
events have influence on the overall system! And I claim that we _can_
control that scope, more in particular, we can make sure that no
inter-component interaction is within that scope.

I think I see a recurring pattern in the threads that are currently active:
- some people (Markus and myself) look at any application as having at
least two levels of components: those that execute the application's
functionality (to be programmed by the application programmer), and those
that the infrastruture provides as "run time" (invisible to the application
programmer)
- the other people look at any application as just a set of components.

The second approach is easier, maybe more efficient, but less reusable and
robust. I prefer the first approach, because that's where Orocos will have
the largest added value, for a long time to come. (The 'easiness' has to be
provided by toolchain support...)

> That's why OSGi is such a big deal and inspiration for robotics: it started
> from the assumption that you can't rely on the fact that by the time you *use*
> the service (ie after you already discovered it), it might be gone already...

I fully support this! What I am advocating is to make RTT into an
infrastruture that is more _robust_ against these inevitable service
problems.

> AFAIKT, the ROS middleware is on the right track here too: if your standard
> methodology of running applications is killing nodes at any time and starting
> them back at any time, the middleware must have a way to handle this situation
> and reconnect automatically. ROS middleware started from the assumption that
> everything is distributed, I can tell by looking at it.

Yes, indeed. But their "killing" is too drastic: no way to save the "state"
of the application if it can be saved and if it is worth saving...

> Apologies for the shouting! and *bold* statements, these are fundamental
> insights I only 'recently' acknowledged and this is really the big thing I did
> wrong during my PhD, and I'm trying to recover the wreckage in 2.0.

Good! But I am already thinking one PhD ahead! :-) Markus's...

>> A tool chain should be able to create the glue code to turn method calls
>> into messages if functionality is distributed over components, or, to turn
>> messages into method calls if functionality is deployed in the same
>> thread/address space.
>
> Completely agree. This happens already in the RTT. It happens by design since
> we don't have toolchain yet. Because of the 'by design' choice, our code paths
> are longer than if it would be generated, because our code must be ready for
> both cases. CORBA/TAO suffers from this same problem btw.

Ack. Just one thing to add: my suggestions are to include (a very small
amount of) _architectural patterns_ into the toolchain, and not just (glue) code
generation.

Herman

Default Component states (Was Default exception handling in RTT

Submitted by Sylvain Joyeux on Fri, 2010-05-07 12:04.

Herman Bruyninckx wrote:
>> That is why I'm talking all the time about 'services': they come and they go,
>> depending on factors we *can't control*:
>> * a camera being plugged in or out
>> * the reception of new knowledge or algorithms
>> * a new decision maker that wants to control task execution
>> etc...
>>
>
> What you _can_ control is the scope over which these (inevitable, indeed)
> events have influence on the overall system! And I claim that we _can_
> control that scope, more in particular, we can make sure that no
> inter-component interaction is within that scope.
>
> I think I see a recurring pattern in the threads that are currently active:
> - some people (Markus and myself) look at any application as having at
> least two levels of components: those that execute the application's
> functionality (to be programmed by the application programmer), and those
> that the infrastruture provides as "run time" (invisible to the application
> programmer)
> - the other people look at any application as just a set of components.
>
> The second approach is easier, maybe more efficient, but less reusable and
> robust. I prefer the first approach, because that's where Orocos will have
> the largest added value, for a long time to come. (The 'easiness' has to be
> provided by toolchain support...)
>
+1 on the first approach.

Default Component states (Was Default exception handling in RTT

Submitted by snrkiwi on Fri, 2010-05-07 13:16.

In all the hussle and bussle that this thread has generated, have we come to a final conclusion on the original problem of component states? Can we move on yet ... ?

Stephen

Default Component states (Was Default exception handling in RTT

Submitted by peter on Fri, 2010-05-07 21:20.

On Friday 07 May 2010 14:05:54 S Roderick wrote:
> In all the hussle and bussle that this thread has generated, have we come
> to a final conclusion on the original problem of component states? Can we
> move on yet ... ?

Enough bike-shedding !

I made a third drawing taking into account Stephen's remarks about the naming.

http://picasaweb.google.be/lh/photo/5-dVzTBInTEB0R6AQ912yQ?feat=directlink

I think the fact that the state machine goes to the Exception state will cause
enough discomfort to users to avoid the situation, so I don't think that
recover() is encouraging bad practice.

Also, I'm proposing to make fatal() and exception() protected members of
TaskContext. I see no reason why a peer would have access to these.

Sylvain ?

Peter

Default Component states (Was Default exception handling in RTT

Submitted by snrkiwi on Fri, 2010-05-07 21:28.

On May 7, 2010, at 17:18 , Peter Soetens wrote:

> On Friday 07 May 2010 14:05:54 S Roderick wrote:
>> In all the hussle and bussle that this thread has generated, have we come
>> to a final conclusion on the original problem of component states? Can we
>> move on yet ... ?
>
> Enough bike-shedding !
>
> I made a third drawing taking into account Stephen's remarks about the naming.
>
> http://picasaweb.google.be/lh/photo/5-dVzTBInTEB0R6AQ912yQ?feat=directlink
>
> I think the fact that the state machine goes to the Exception state will cause
> enough discomfort to users to avoid the situation, so I don't think that
> recover() is encouraging bad practice.
>
> Also, I'm proposing to make fatal() and exception() protected members of
> TaskContext. I see no reason why a peer would have access to these.
>
> Sylvain ?
>
> Peter

Good enough for me.
S

Default Component states (Was Default exception handling in RTT

Submitted by Sylvain Joyeux on Fri, 2010-05-14 07:52.

Stephen Roderick wrote:
> On May 7, 2010, at 17:18 , Peter Soetens wrote:
>
>
>> On Friday 07 May 2010 14:05:54 S Roderick wrote:
>>
>>> In all the hussle and bussle that this thread has generated, have we come
>>> to a final conclusion on the original problem of component states? Can we
>>> move on yet ... ?
>>>
>> Enough bike-shedding !
>>
>> I made a third drawing taking into account Stephen's remarks about the naming.
>>
>> http://picasaweb.google.be/lh/photo/5-dVzTBInTEB0R6AQ912yQ?feat=directlink
>>
>> I think the fact that the state machine goes to the Exception state will cause
>> enough discomfort to users to avoid the situation, so I don't think that
>> recover() is encouraging bad practice.
>>
>> Also, I'm proposing to make fatal() and exception() protected members of
>> TaskContext. I see no reason why a peer would have access to these.
>>
>> Sylvain ?
>>
>> Peter
>>
>
> Good enough for me.
>
Good for me too

Thanks for wrapping that up Peter ;)

Sylvain

Default Component states (Was Default exception handling in RTT

Submitted by bruyninc on Tue, 2010-05-04 10:32.

On Tue, 4 May 2010, Sylvain Joyeux wrote:

> I've created a little table with both Peter and my proposal (attached to
> this mail). It represents the possible transitions, the hooks that get
> called and when each transition/hook gets called.
>
> For the record, this idea of "separating the application state machine
> from the component lifetime state machine" still does not make any sense
> to me. Both state machines are obviously tied together (a motor
> controller cannot be in CONTROLLING_MOTORS state while the component is
> in FatalError).

Your logical error is the following: the "tying" between both state
machines is _uni-directional_, from the "container" FSM to the
"application" FSM, and never in the opposite direction! _Hence_,
each "application" FSM should have a state/transition that semantically
represents the unavailability of one or more of the "container" resources
(probably without having to know which one...?)

I repeat what I have said many times before already: it is semantic suicide
to give these names to states! In other words, the name of a state should
_not_ reflect what your "application"/"container" _can/cannot_ do in that
state, but what your "application"/"container" state _is doing_ in that
state.

The "semantic suicide" comes from the fact that the suggested names make
_compositionality_ extremely difficult, and at least semantically
ambiguous: as Sylvain mentions, some states are not that fatal _if_ the
system gets some more functionalities later on.

> - unhandled exceptions always end up terminating the component's
> execution. I have no clue how a component could continue running (i.e.
> providing functionality) after it raised an unexpected/unhandled
> exception. The only sane thing to do *by default* is to try and stop
> everything you can, and see if you can restart properly.
>
> I also added, in my proposal, what to do with unhandled exceptions in
> configureHook() and startHook(). I'm really less sure about those. It
> just happens that, in Roby [supervision layer], I tried different things
> and terminating tasks when the "start" command fails ended up being the
> safest.
>
> Sylvain

Herman

Default Component states (Was Default exception handling in RTT

Submitted by Sylvain Joyeux on Tue, 2010-05-04 16:20.

Herman Bruyninckx wrote:
> On Tue, 4 May 2010, Sylvain Joyeux wrote:
>
>
>> I've created a little table with both Peter and my proposal (attached to
>> this mail). It represents the possible transitions, the hooks that get
>> called and when each transition/hook gets called.
>>
>> For the record, this idea of "separating the application state machine
>> from the component lifetime state machine" still does not make any sense
>> to me. Both state machines are obviously tied together (a motor
>> controller cannot be in CONTROLLING_MOTORS state while the component is
>> in FatalError).
>>
>
> Your logical error is the following: the "tying" between both state
> machines is _uni-directional_, from the "container" FSM to the
> "application" FSM, and never in the opposite direction! _Hence_,
> each "application" FSM should have a state/transition that semantically
> represents the unavailability of one or more of the "container" resources
> (probably without having to know which one...?)
>
I only partially agree. The states defined in the container FSM are
superstates (i.e. aggregates) of the ones in the application FSM. That
is a two-way link to me: if the application decides to transition, it
has an impact on the component FSM. If the component FSM transitions, it
must have an impact on the application FSM (as the application FSM's
state is not a substate of the component's).
>> The main differences between the two are:
>> - I add a UnrecoverableError state which is Peter's FatalError state.
>> FatalError, in my case, is recoverable, it only requires a restart of
>> the component.
>>
>
> I repeat what I have said many times before already: it is semantic suicide
> to give these names to states! In other words, the name of a state should
> _not_ reflect what your "application"/"container" _can/cannot_ do in that
> state, but what your "application"/"container" state _is doing_ in that
> state.
>
> The "semantic suicide" comes from the fact that the suggested names make
> _compositionality_ extremely difficult, and at least semantically
> ambiguous: as Sylvain mentions, some states are not that fatal _if_ the
> system gets some more functionalities later on.
>
I don't understand what you mean. Please give a concrete example (for
instance: how would you name the "fatal error" state I propose, i.e.
"the application/component stopped functioning because of a non-nominal
situation".

Default Component states (Was Default exception handling in RTT

Submitted by bruyninc on Tue, 2010-05-04 16:24.

On Tue, 4 May 2010, Sylvain Joyeux wrote:

> Herman Bruyninckx wrote:
>> On Tue, 4 May 2010, Sylvain Joyeux wrote:
>>
>>
>>> I've created a little table with both Peter and my proposal (attached to
>>> this mail). It represents the possible transitions, the hooks that get
>>> called and when each transition/hook gets called.
>>>
>>> For the record, this idea of "separating the application state machine
>>> from the component lifetime state machine" still does not make any sense
>>> to me. Both state machines are obviously tied together (a motor
>>> controller cannot be in CONTROLLING_MOTORS state while the component is
>>> in FatalError).
>>>
>>
>> Your logical error is the following: the "tying" between both state
>> machines is _uni-directional_, from the "container" FSM to the
>> "application" FSM, and never in the opposite direction! _Hence_,
>> each "application" FSM should have a state/transition that semantically
>> represents the unavailability of one or more of the "container" resources
>> (probably without having to know which one...?)
>>
> I only partially agree. The states defined in the container FSM are
> superstates (i.e. aggregates) of the ones in the application FSM.

I don't think so... That would be a too tight coupling!

> That
> is a two-way link to me: if the application decides to transition, it
> has an impact on the component FSM. If the component FSM transitions, it
> must have an impact on the application FSM (as the application FSM's
> state is not a substate of the component's).
>>> The main differences between the two are:
>>> - I add a UnrecoverableError state which is Peter's FatalError state.
>>> FatalError, in my case, is recoverable, it only requires a restart of
>>> the component.
>>
>> I repeat what I have said many times before already: it is semantic suicide
>> to give these names to states! In other words, the name of a state should
>> _not_ reflect what your "application"/"container" _can/cannot_ do in that
>> state, but what your "application"/"container" state _is doing_ in that
>> state.
>>
>> The "semantic suicide" comes from the fact that the suggested names make
>> _compositionality_ extremely difficult, and at least semantically
>> ambiguous: as Sylvain mentions, some states are not that fatal _if_ the
>> system gets some more functionalities later on.
>>
> I don't understand what you mean. Please give a concrete example (for
> instance: how would you name the "fatal error" state I propose, i.e.
> "the application/component stopped functioning because of a non-nominal
> situation".

The concrete name will come form the concrete non-nominality of one of the
constraints/tasks/goals/... that your application component is (was...)
working on. "Fatal" has _no_ compositional semantic meaning, so please,
don't use it.

Herman

Default Component states (Was Default exception handling in RTT

Submitted by Klaas Gadeyne on Mon, 2010-05-03 12:04.

On Mon, May 3, 2010 at 12:31 PM, Peter Soetens <peter [..] ...> wrote:
> To all lurking on this thread, could we have a 'voting' about this ?
>
> In case of doubt, I follow the user's opinion, but since we only have Sylvain
> and me arguing, I wonder how much user there is going on here actually :-)

I've probably given up since I noticed the thread had more than 100
SLOT (T for text) and not a single FSM diagram ;-)

> Summary:
>
> * The RTT 1.x component states are mixing component lifecycle and application
> states. For example, configureHook() sets up method calls using 'getPeer()' or
> checks if input (read) ports are connected. It could *also* configure a device
> or so, but with limited flexibility, since the thread of the component was not
> running yet. So some configuration could be necessary in updateHook(), for
> example, if you were talking to a device bus. On the other hand, the component
> has some clear 'application' error states, like RunTimeError, which a
> component will only enter if user code instructs it to do so.
>
> I wanted to change this in 2.0 to a reduced life cycle, where the only states
> that a component has are independent of application states. For example,
> RunTimeError would mean that an exception was leaked in updateHook(). If
> errorHook() leaked an exception, the component would enter the FatalError
> state, which is unrecoverable (hence 'fatal'). My main motivation for this was
> to define what happens when user code throws exceptions. I wanted to have a
> similar scheme as was happening with program scripts: it's not because one
> program is in error, that the whole component should stop. One change
> contributing to this filosophy is that the thread of a component now always
> runs.
>
> I should have known by looking at historical evidence, but with this change, I
> stepped on Sylvain's turf.
>
> He proposes (correct me if I'm wrong) to keep close to the current application
> states, at least the RunTimeError state for user errors and use FatalError
> (=stop+cleanup) if user code throws (in any place). If transition to fatal
> fails, an unrecoverable-worse-than-fatal state is entered. His reasoning is
> that supervision needs info of application health for every component, and
> that this belongs in the interface every component.
>
> Pleaes read full details below or in the thread.

No, thx :-)

> I don't think the water is that deep between us, most developers use
> configureHook/errorHook already for application states so I see the point, and
> no one is complainging... We should mainly focus on user's ease of
> programming/minimal coding effort, but what do the others think ?

Regarding the mix of application/framework states: we recently had an
issue with an application which needs to be examined in detail (I
expect Steven to post on this during the course of this week), but
_seems_ at first sight (I know this is dangerous, I haven't even
looked at the code :-) to be caused by the following.

For those in need of a diagram: See
<http://www.orocos.org/stable/documentation/rtt/v1.10.x/doc-xml/orocos-components-manual.html#id3110846>
You can create a component and instruct it to start in preoperational.
However, it looked like you can call the destructor on that same
component if it's in stopped state, and it won't "force" it back to
preop first (hence executing cleanupHook). Ofcourse, you can force
this transition in your destructor yourself, but I wonder if this was
intentionally "left blank" (usecase?).

Best regards,

Klaas [..] ...

Default Component states (Was Default exception handling in RTT

Submitted by Sylvain Joyeux on Mon, 2010-05-03 12:36.

Klaas Gadeyne wrote:
> On Mon, May 3, 2010 at 12:31 PM, Peter Soetens <peter [..] ...> wrote:
>
>> To all lurking on this thread, could we have a 'voting' about this ?
>>
>> In case of doubt, I follow the user's opinion, but since we only have Sylvain
>> and me arguing, I wonder how much user there is going on here actually :-)
>>
>
> I've probably given up since I noticed the thread had more than 100
> SLOT (T for text) and not a single FSM diagram ;-)
>
>
>> Summary:
>>
>> * The RTT 1.x component states are mixing component lifecycle and application
>> states. For example, configureHook() sets up method calls using 'getPeer()' or
>> checks if input (read) ports are connected. It could *also* configure a device
>> or so, but with limited flexibility, since the thread of the component was not
>> running yet. So some configuration could be necessary in updateHook(), for
>> example, if you were talking to a device bus. On the other hand, the component
>> has some clear 'application' error states, like RunTimeError, which a
>> component will only enter if user code instructs it to do so.
>>
>> I wanted to change this in 2.0 to a reduced life cycle, where the only states
>> that a component has are independent of application states. For example,
>> RunTimeError would mean that an exception was leaked in updateHook(). If
>> errorHook() leaked an exception, the component would enter the FatalError
>> state, which is unrecoverable (hence 'fatal'). My main motivation for this was
>> to define what happens when user code throws exceptions. I wanted to have a
>> similar scheme as was happening with program scripts: it's not because one
>> program is in error, that the whole component should stop. One change
>> contributing to this filosophy is that the thread of a component now always
>> runs.
>>
>> I should have known by looking at historical evidence, but with this change, I
>> stepped on Sylvain's turf.
>>
>> He proposes (correct me if I'm wrong) to keep close to the current application
>> states, at least the RunTimeError state for user errors and use FatalError
>> (=stop+cleanup) if user code throws (in any place). If transition to fatal
>> fails, an unrecoverable-worse-than-fatal state is entered. His reasoning is
>> that supervision needs info of application health for every component, and
>> that this belongs in the interface every component.
>>
>> Pleaes read full details below or in the thread.
>>
>
> No, thx :-)
>
>
>> I don't think the water is that deep between us, most developers use
>> configureHook/errorHook already for application states so I see the point, and
>> no one is complainging... We should mainly focus on user's ease of
>> programming/minimal coding effort, but what do the others think ?
>>
>
> Regarding the mix of application/framework states: we recently had an
> issue with an application which needs to be examined in detail (I
> expect Steven to post on this during the course of this week), but
> _seems_ at first sight (I know this is dangerous, I haven't even
> looked at the code :-) to be caused by the following.
>
> For those in need of a diagram: See
> <http://www.orocos.org/stable/documentation/rtt/v1.10.x/doc-xml/orocos-components-manual.html#id3110846>
> You can create a component and instruct it to start in preoperational.
> However, it looked like you can call the destructor on that same
> component if it's in stopped state, and it won't "force" it back to
> preop first (hence executing cleanupHook). Ofcourse, you can force
> this transition in your destructor yourself, but I wonder if this was
> intentionally "left blank" (usecase?).
>
You cannot make sure that the hooks are called from the destructors, as
the destructors will *not* call overloaded virtual methods.

I.e. you have to make sure that whatever application development tool
you use (the deployer, orogen) stop and cleanup the tasks before the
application is shut down.

Default Component states (Was Default exception handling in RTT

Submitted by Klaas Gadeyne on Mon, 2010-05-03 12:56.

On Mon, May 3, 2010 at 2:31 PM, Sylvain Joyeux <sylvain [dot] joyeux [..] ...> wrote:
[...]
>> Regarding the mix of application/framework states: we recently had an
>> issue with an application which needs to be examined in detail (I
>> expect Steven to post on this during the course of this week), but
>> _seems_ at first sight (I know this is dangerous, I haven't even
>> looked at the code :-) to be caused by the following.
>>
>> For those in need of a diagram: See
>>
>> <http://www.orocos.org/stable/documentation/rtt/v1.10.x/doc-xml/orocos-components-manual.html#id3110846>
>> You can create a component and instruct it to start in preoperational.
>> However, it looked like you can call the destructor on that same
>> component if it's in stopped state, and it won't "force" it back to
>> preop first (hence executing cleanupHook). Ofcourse, you can force
>> this transition in your destructor yourself, but I wonder if this was
>> intentionally "left blank" (usecase?).
>>
>
> You cannot make sure that the hooks are called from the destructors, as the
> destructors will *not* call overloaded virtual methods.

As a modeling advocate I tend to forget the gory c++ details from time
to time ;-)
Now, playing the devils advocate [*], wouldn't that nifty c++
"feature" be an argument in favour of _separating_ application and
framework FSMs?

Klaas

[*] At this point in time, I still have a hard time judging wether to
be in favour of your propositions of Peters...

> I.e. you have to make sure that whatever application development tool you
> use (the deployer, orogen) stop and cleanup the tasks before the application
> is shut down.

AFAIR the issue only occurred using cdeployer indeed, but I'll leave
the last word about that for Steven.

Klaas

Default Component states (Was Default exception handling in RTT

Submitted by snrkiwi on Mon, 2010-05-03 11:40.

On May 3, 2010, at 06:31 , Peter Soetens wrote:

> To all lurking on this thread, could we have a 'voting' about this ?

1) Could someone draw up a state chart of what you guys propose? The long discussion below is hell of a fragmented ...

> In case of doubt, I follow the user's opinion, but since we only have Sylvain
> and me arguing, I wonder how much user there is going on here actually :-)
>
> Summary:
>
> * The RTT 1.x component states are mixing component lifecycle and application
> states. For example, configureHook() sets up method calls using 'getPeer()' or
> checks if input (read) ports are connected. It could *also* configure a device
> or so, but with limited flexibility, since the thread of the component was not
> running yet. So some configuration could be necessary in updateHook(), for
> example, if you were talking to a device bus. On the other hand, the component
> has some clear 'application' error states, like RunTimeError, which a
> component will only enter if user code instructs it to do so.
>
> I wanted to change this in 2.0 to a reduced life cycle, where the only states
> that a component has are independent of application states. For example,
> RunTimeError would mean that an exception was leaked in updateHook(). If
> errorHook() leaked an exception, the component would enter the FatalError
> state, which is unrecoverable (hence 'fatal'). My main motivation for this was
> to define what happens when user code throws exceptions. I wanted to have a
> similar scheme as was happening with program scripts: it's not because one
> program is in error, that the whole component should stop. One change
> contributing to this filosophy is that the thread of a component now always
> runs.

In my 2.5 years using Orocos, I've never used either an errorHook() nor the RunTimeError-related stuff. I can see a couple of cases where I should have ...

> I should have known by looking at historical evidence, but with this change, I
> stepped on Sylvain's turf.
>
> He proposes (correct me if I'm wrong) to keep close to the current application
> states, at least the RunTimeError state for user errors and use FatalError
> (=stop+cleanup) if user code throws (in any place). If transition to fatal
> fails, an unrecoverable-worse-than-fatal state is entered. His reasoning is
> that supervision needs info of application health for every component, and
> that this belongs in the interface every component.

It _looks_ like one or both of you are proposing

exception/error in updateHook() = transition to RunTimeError state = call errorHook()
exception/error in errorHook() = transition to FatalError = call stopHook() then cleanupHook().

And RunTimeError is a running state, so scripts and state machines are still running (if they aren't the cause of the throw), right? Does errorHook() get called as updateHook() would have (ie periodically if in a PeriodicActivity)?

How do you get out of errorHook()?

IMHO the state after fatalError() should be STOPPED, not PRE_OPERATIONAL. The component is done.

What happens if you exception/error in stopHook() or cleanupHook() that was called as part of a FatalError? Personally, I would think to catch it in FatalError and don't continue doing anything. I don't see the point of yet another worse-than-fatal-error state.

Sylvain, we do use scripts some, particularly where the corresponding C++ code is too long, but they do have their issues that limit their usefulness.

We do some potentially blocking driver setup in configure() also.

Peter, what is the difference with v2 w.r.t. the thread being active during configureHook(), or being able to send commands prior to start()?

I don't like the idea of a SupervisionInterface. Yet another thing to learn, and I see little benefit.
S

>
> Pleaes read full details below or in the thread.
>
> I don't think the water is that deep between us, most developers use
> configureHook/errorHook already for application states so I see the point, and
> no one is complainging... We should mainly focus on user's ease of
> programming/minimal coding effort, but what do the others think ?
>
> Peter
>
> On Wednesday 21 April 2010 14:31:17 Sylvain Joyeux wrote:
>> Peter Soetens wrote:
>>> On Wednesday 21 April 2010 12:37:09 Sylvain Joyeux wrote:
>>>> Peter Soetens wrote:
>>>>>>>> As far as I saw on the RTT2 ExecutionEngine code, exception handling
>>>>>>>> is as follows (Peter: tell me if I'm wrong):
>>>>>>>>
>>>>>>>> * an uncaught exception in updateHook() transitions to
>>>>>>>> RUNTIME_ERROR * an uncaught exception in errorHook() transitions to
>>>>>>>> FATAL_ERROR
>>>>>>>
>>>>>>> Correct. Fatal error (should) leads to stopHook() + cleanupHook() and
>>>>>>> then wait for component cleanup/removal.
>>>>>>>
>>>>>>>> This assumes that errorHook() is able to handle unspecified errors.
>>>>>>>> This seems to broad for my POV. How we design components is that the
>>>>>>>> transition to fatal() is basically a stop() + some sort of tentative
>>>>>>>> cleanup. runtime_error is used as a runtime state categorization
>>>>>>>> (i.e. a way to "regroup" internal states), but its interpretation as
>>>>>>>> an actual error is situation dependent.
>>>>>>>
>>>>>>> It is. errorHook() is reserved for an RTT/Component specific error
>>>>>>> state, not for application error states. So you would need to write
>>>>>>> your application specific error states in updateHook().
>>>>>>
>>>>>> This completely negates the usefulness of the taskcontext state
>>>>>> machine.
>>>>>
>>>>> ... for application specific state machines. From a component life
>>>>> cycle view, these states are still necessary. I think we did it the
>>>>> wrong way in 1.x, coupling component lifecycle states with application
>>>>> states. They may overlap, and we can't/won't prevent that, but they
>>>>> don't have to overlap.
>>>>
>>>> OK, then define a default application state machine (and see the number
>>>> of state explode). Not having a default application state machine
>>>> defined negates completely the use of
>>>
>>> You have to see this in the light of the state machines in the scripting.
>>> These are by definition application specific state machines. So this made
>>> us realize that there is a difference between the lifecycle of a
>>> component (hooks) and of an application (states in scripts).
>>>
>>>>>>>> For instance, our motor controllers go into runtime_error when the
>>>>>>>> motors can't be driven (because of hardware protection mechanisms
>>>>>>>> for instance), but the electronics still *reads* the encoder + motor
>>>>>>>> data. I.e. they are read-only when in runtime error and can be used
>>>>>>>> if only reading is needed. fatal_error would be entered if we are
>>>>>>>> not able to talk to the electronics anymore.
>>>>>>>
>>>>>>> These are all application states and should not be implemented in the
>>>>>>> component lifecycle states.
>>>>>>
>>>>>> I don't follow you in this split between application and lifecycle
>>>>>> states. The component goes into various states because of the
>>>>>> application.
>>>>>
>>>>> Not from the viewpoint of the RTT or deployer. For example,
>>>>> configureHook is to check if input ports are connected or if required
>>>>> services are available. This is independent of a component requiring
>>>>> additional configuration of parameters.
>>>>
>>>> There is definitely a mismatch between our uses of states. We use
>>>> configureHook() to verify that the component can run. This means:
>>>> checking if devices are there, if properties are set to sane values and
>>>> so on. In effect, for device drivers, configureHook() is the place
>>>> where the device gets accessed and configured.
>>>
>>> I agree here. RTT 2.x actually supports this better, since you can also
>>> send 'commands' (in 2.x: send method calls) to a component before it is
>>> started. This means that if configuration does some blocking/asynchronous
>>> work or depends on a script to complete, this can all be done before
>>> start().
>>>
>>>>>>>> In a way, runtime_error is not very useful in this case ...
>>>>>>>>
>>>>>>>> To get back to the point: I think that runtime_error should be used
>>>>>>>> when the component is still able to provide a limited functionality,
>>>>>>>> fatalError being used when the component does not provide any
>>>>>>>> functionality anymore. Thus, the default exception handling of
>>>>>>>> updateHook() should IMO transition to FATAL_ERROR: I don't see how a
>>>>>>>> component can *know* what it is doing when an uncaught exception has
>>>>>>>> been raised by updateHook().
>>>>>>>>
>>>>>>>> Thoughts ?
>>>>>>>
>>>>>>> I think you identified a painpoint when trying to apply the component
>>>>>>> error states to application error states. It will never work. The
>>>>>>> idea for run time error is that any code in updateHook might throw,
>>>>>>> even if the user is unaware of this (during development for example).
>>>>>>> In critical components, you can put safe state code in errorHook(),
>>>>>>> for example, writing data to ports (which will never throw). If you
>>>>>>> did a
>>>>>>>
>>>>>>> bad job there, you go to fatal error. So all cases are covered. We
>>>>>>> don't want to go to fatal error immediately because this is an
>>>>>>> unrecoverable state, meaning, RTT judged that it can no longer
>>>>>>> execute that *instance* of a component.
>>>>>>>
>>>>>>> All the other stuff goes into updateHook and you need to define your
>>>>>>> own application states in there, using own operations or attributes
>>>>>>> or so.
>>>>>>>
>>>>>>> Makes sense ?
>>>>>>
>>>>>> I don't think it does ...
>>>>>>
>>>>>> First of all, most components will have nothing in errorHook(). Thus,
>>>>>> you will have a still-running component that has failed in a way that
>>>>>> was not predicted by the designer.
>>>>>
>>>>> We could install a default action in errorHook() ourselves.
>>>>
>>>> What could you meanginfully do in errorHook() that is completely generic
>>>> and will handle the underlying problem (updateHook() threw an exception,
>>>> the component is not functional anymore).
>>>
>>> Yeah, I wasn't making sense here.
>>>
>>>>>> From a more conceptual point of view, I don't think that a component
>>>>>> should be allowed to run even if it had an unexpected exception. An
>>>>>> exception means "the internal state of the component is unspecified as
>>>>>> of now". The only thing that could make sense is to try to "emergency
>>>>>> stop" it, which -- I though -- is what fatal error is there for.
>>>>>
>>>>> Agreed, but C++ exceptions do not cause an unspecified state. They
>>>>> unwind the stack and cleanup resources by calling destructors. It's not
>>>>> the same like a segfault. On the other hand, if your updateHook() has
>>>>> the scenario port1.write (exception) port2.write, only the first write
>>>>> will succeed, leading to a non consistent output. So yes, a leaked
>>>>> exception is maybe more 'grave' than it is considered now.
>>>>
>>>> I don't agree there. An *uncaught* exception means that the component
>>>> designer was not *expecting* this particular error. If the code is
>>>> well-written (which is very unlikely), ressources will be freed and so
>>>> on, but from a global logic point of view, the application will *not*
>>>> know where it was (hey, otherwise it would have caught this exception).
>>>
>>> So actually we agree, in the end, the application does not know. So this
>>> is a problematic state it ends up in.
>>
>> Yes, so it makes no sense to remain in a running state (which
>> RUNTIME_ERROR is).
>>
>>>>> Realize that the scripts can also leak exceptions, because they call
>>>>> user functions too. We also need to define a state when this happens, I
>>>>> don't want to go into 'unrecoverable error' when this happens. For a
>>>>> script, this would just cause the 'E'rror status of that script, while
>>>>> other scripts/updateHook keep running. That's also why runtime error
>>>>> only relates to an exception in updateHook(), while in runtime error,
>>>>> the scripts etc keep on executing (unless they reach the Error status
>>>>> too).
>>>>>
>>>>>> Second, the way you define fatal error states makes no sense to me.
>>>>>> The whole point of having a component model a-la RTT is that it should
>>>>>> be able to go back to a defined state (in my POV, through the
>>>>>> fatalError() cycle).
>>>>>>
>>>>>>
>>>>>>
>>>>>> I.e. a "completely unrecoverable" error should only be a diagnostics
>>>>>> estimation, for instance triggered because a component that went to an
>>>>>> unspecified fatalError() (fatal-error-that-we-don't-know) refused to
>>>>>> reconfigure and/or restart), thus showing that the component does not
>>>>>> know how to recover.
>>>>>
>>>>> There is another reason: what if the RTT figures out that it can no
>>>>> longer execute the component ? Fatal by definition means unrecoverable,
>>>>> so let's keep it with these semantics. There is no way to recover from
>>>>> the fatal error state in the current implementation. So fatal means:
>>>>> unload/kill me please.
>>>>>
>>>>> What you describe sounds to me like a run-time error (recoverable), you
>>>>> can still recover from it.
>>>>
>>>> Again, we have a different understanding of the state machine. Yes, it
>>>> can recover from it, but that will require a stop()/configure()/start()
>>>> cycle. How the state machine is interpreted by our supervision is:
>>>>
>>>> configure: the component verifies that everything it needs to be
>>>> functional is there. This means: checking property values, port
>>>> connections, accessing external processes/hardware when applicable. The
>>>> goal of that step is to make start() as simple as possible, and have it
>>>> most likely return true (i.e. have the longest and most likely to fail
>>>> steps that can be done in advance in configure()).
>>>
>>> OK.
>>>
>>>> start: start the component functionality. I.e. turn on data
>>>> acquisition for a driver (for instance).
>>>
>>> OK.
>>>
>>>> runtime_error: the component still provides a somewhat limited
>>>> functionality. The actual semantic of this is very application
>>>> dependent. In practice, we use orogen to specialize it into sub-states.
>>>
>>> Please define which triggers cause a transition to this state and from
>>> this state away + which hooks are called.
>>
>> Here's the thing: I'm not using scripts, and I completely do not intend
>> to use them. I do see their possible usefulness, it is just that I did
>> not (yet) encounter a situation where they were needed.
>>
>> So: triggers
>> * any application-defined situation which means that the component
>> provides a limited functionality.
>> * hooks: errorHook(), in the same situations than updateHook() (i.e.
>> activity triggers)
>> * getting out of there: component specific and component-decided
>>
>>>> stop: the component stops functioning, either because it reached its
>>>> stated goal (case for a planner), or because it has been requested
>>>
>>> OK.
>>>
>>>> fatal: the component cannot provide its stated functionality anymore,
>>>> and therefore stopped.
>>>
>>> so stopHook() is called automatically ? As above, define the triggers +
>>> possible next states ?
>>
>> Triggers: internal component diagnostics which detected a situation
>> representing a loss of functionality.
>> Possible next states: STOPPED or PRE_OPERATIONAL (depending on whether
>> the component needs a configure step).
>>
>>>> It should try to clean up as much as possible so
>>>> that a configure()/start() cycle has a change to recover from the
>>>> problem. In the same way than for runtime error, orogen specializes it
>>>> into substates.
>>>>
>>>> This state machine allowed us to keep the updateHook() simple (since it
>>>> does not have to deal with initialization/recovery/...), and has most of
>>>> the information needed for supervision.
>>>>
>>>>> Maybe we should change it then to these semantics:
>>>>>
>>>>> Fatal error: can be entered from any state, triggered by in RTT code or
>>>>> error recovery code. Causes stopHook()->cleanupHook() in transition (if
>>>>> necessary). Only step left is delete component.
>>>>>
>>>>> Runtime error: triggered by exceptions in updateHook() or by user in
>>>>> updateHook()/script.
>>>>
>>>> There is a funny thing: on the one hand you say "raised exceptions
>>>> should leave the application in a well-defined state" and "if an
>>>> exception is raised in errorHook()" we can't recover ever, we actually
>>>> need to destroy everything". This seems contradictory to me.
>>>
>>> But makes sense to me: if your error recovery throws, it went really bad,
>>> it means your last resort to pull things right did not succeed. There
>>> *is* no way out, this *is* fatal, literally as in 'terminal'. No
>>> transition succeeds.
>>
>> Here is my proposal:
>> * RUNTIME_ERROR remains an application state. The component announces
>> that it has limited functionality due to something non-nominal happening.
>> * unexpected exceptions in running states (RUNNING and RUNTIME_ERROR)
>> transition to fatal. This calls a fatalHook() which -- by default --
>> calls stopHook() and cleanupHook(). The component can also transition to
>> fatal to announce that something non-nominal happened that makes the
>> component's service not available.
>> * if fatalHook() and/or stopHook() raise, then we go into the
>> "unrecoverable fault" (we can't even go into FATAL ...)
>>
>>>> As to the interpretation of "fatal": it depends on the point of view.
>>>> From the point of view of the supervision, the "fatal" I described
>>>> above *is* fatal as the component does not provide the service it should
>>>> provide, and that happened because of something non-nominal.
>>>
>>> There must be a posibility to resolve these constraints we both have:
>>>
>>> 1. Define a transition/state when an exception is thrown in updateHook().
>>> The last thing we want to do is call it again, the component is possibly
>>> in a 'messy' state and it may throw 'ad infinitum'.
>>>
>>> 2. Define a transition/state when error recovery from point 1 failed as
>>> well.
>>>
>>> 3. Define a transition/state when the RTT can no longer execute a
>>> component. This might be the same as #2.
>>>
>>> From the RTT point of view, these are the things I *need* to define,
>>> without even caring for application-level supervision. Supervision is a
>>> fundamental part of every application (ie handle faults the component can
>>> not solve by itself) so I am not against in adding support for that in
>>> the TaskContext, on the other hand, you/me are biased and I wonder if
>>> it's not better to stick to the minimal.
>>
>> Yes, but in my opinion a basic application state machine *is* part of
>> the minimal.
>>
>>> A possible clean solution I see here is to define a supervision interface
>>> that defines these extra states that your supervision software requires.
>>> So your component inherits TaskContext + SuperviseInterface. where the
>>> latter sets up a 'supervise' provided interface with the methods/states
>>> you require.
>>>
>>> The supervisor component/user can than query each component if it has
>>> this interface and proceed from there if it has.
>>>
>>> This 'extendability' is actually one of the major issues I wanted to
>>> solve in 2.x. The component itself has only a minimal life cycle
>>> interface and the rest is set into 'plugins'/'interfaces'.
>>
>> While I see why you want that (you are the RTT-as-a-universal-framework
>> guy), I do see a lot of practical issues. The biggest issue being that
>> you will start to completely fragment what components can run on what
>> tools and make the whole "RTT ecosystem" (for lack of a better name) a
>> huge mess.
>>
>> We're having that discussion *because* I want to avoid this. I could
>> live on with Roby and oroGen: they already provide all the tools I need
>> to "work around" the state machine you define to get what I want. We're
>> having that discussion because I think it would be a very bad idea.
>>
>> So, yes, being able to extend is important. Now, I feel that the RTT
>> *must* provide a basic standard, supervise-able, interface to ALL RTT
>> components. And -- more importantly -- should make the component
>> developer aware that this interface is important.
>>

Default Component states (Was Default exception handling in RTT

Submitted by peter on Mon, 2010-05-03 14:44.

On Monday 03 May 2010 13:35:25 S Roderick wrote:
> On May 3, 2010, at 06:31 , Peter Soetens wrote:
> > To all lurking on this thread, could we have a 'voting' about this ?
>
> 1) Could someone draw up a state chart of what you guys propose? The long
> discussion below is hell of a fragmented ...

I created an Image here :
http://picasaweb.google.be/lh/photo/zQEXp8DeQiCLwMit_7TtnQ?feat=directlink

>
> It _looks_ like one or both of you are proposing
>
> exception/error in updateHook() = transition to RunTimeError state = call
> errorHook() exception/error in errorHook() = transition to FatalError =
> call stopHook() then cleanupHook().

Yes.

>
> And RunTimeError is a running state, so scripts and state machines are
> still running (if they aren't the cause of the throw), right? Does
> errorHook() get called as updateHook() would have (ie periodically if in a
> PeriodicActivity)?

Yes.

>
> How do you get out of errorHook()?

Calling 'recovered()'.

>
> IMHO the state after fatalError() should be STOPPED, not PRE_OPERATIONAL.
> The component is done.

The idea is here that after a fatal error, re-configuration is necessary.

>
> What happens if you exception/error in stopHook() or cleanupHook() that was
> called as part of a FatalError? Personally, I would think to catch it in
> FatalError and don't continue doing anything. I don't see the point of yet
> another worse-than-fatal-error state.

The difference we have is that I consider fatal as really fatal, ie no recovery
possible, Sylvain allows to 'reset' the component (unless you're in the 'extra
fatal' state).

>
> Sylvain, we do use scripts some, particularly where the corresponding C++
> code is too long, but they do have their issues that limit their
> usefulness.
>
> We do some potentially blocking driver setup in configure() also.
>
> Peter, what is the difference with v2 w.r.t. the thread being active during
> configureHook(), or being able to send commands prior to start()?

In 2.x, the thread is *always* running, even in PRE_OPERTIONAL. Replacing a
thread is possible in everything < running without any management code (ie
just call comp->setActivity( new Activity() ) in such a state). Every *Hook
function can be executed by the component's thread (ie the ExecutionEngine) or
by the caller, depending on a setting *within* the component.

>
> I don't like the idea of a SupervisionInterface. Yet another thing to
> learn, and I see little benefit. S

Ok.

Peter

Default Component states (Was Default exception handling in RTT

Submitted by snrkiwi on Tue, 2010-05-04 11:56.

On May 3, 2010, at 10:41 , Peter Soetens wrote:

> On Monday 03 May 2010 13:35:25 S Roderick wrote:
>> On May 3, 2010, at 06:31 , Peter Soetens wrote:
>>> To all lurking on this thread, could we have a 'voting' about this ?
>>
>> 1) Could someone draw up a state chart of what you guys propose? The long
>> discussion below is hell of a fragmented ...
>
> I created an Image here :
> http://picasaweb.google.be/lh/photo/zQEXp8DeQiCLwMit_7TtnQ?feat=directlink
>
>
>>
>> It _looks_ like one or both of you are proposing
>>
>> exception/error in updateHook() = transition to RunTimeError state = call
>> errorHook() exception/error in errorHook() = transition to FatalError =
>> call stopHook() then cleanupHook().
>
> Yes.
>
>>
>> And RunTimeError is a running state, so scripts and state machines are
>> still running (if they aren't the cause of the throw), right? Does
>> errorHook() get called as updateHook() would have (ie periodically if in a
>> PeriodicActivity)?
>
> Yes.
>
>>
>> How do you get out of errorHook()?
>
> Calling 'recovered()'.
>
>>
>> IMHO the state after fatalError() should be STOPPED, not PRE_OPERATIONAL.
>> The component is done.
>
> The idea is here that after a fatal error, re-configuration is necessary.
>
>>
>> What happens if you exception/error in stopHook() or cleanupHook() that was
>> called as part of a FatalError? Personally, I would think to catch it in
>> FatalError and don't continue doing anything. I don't see the point of yet
>> another worse-than-fatal-error state.
>
> The difference we have is that I consider fatal as really fatal, ie no recovery
> possible, Sylvain allows to 'reset' the component (unless you're in the 'extra
> fatal' state).

Thanks for drawing the diagrams up. My 2c worth ...

- Peter's version is far more understandable than Sylvain's
- The diagrams make it very clear the mix of lifecycle and application states. We personally encode our application states in FSMs explicitly.
- I see no reason for the extra "Really Fatal Error" state (and I agree in large part with Herman's comments regarding naming)
- A fatal error is a fatal error. No way out. End of story. Done. Finished.
- if we can recover from RunTimeError then we should be able to programmatically enter it (Sylvain's error() )
- it looks like adding Sylvain's resetError()/resetHook() would not be a huge change in the API. That would get him the ability to recover from FatalError.

I would advocate one of two approaches
a) Peter's diagram minus RunTimeError. An exception during Running goes to FatalError.
b) Sylvain's diagram without RunTimeError and without ExtraFatalError.

For a), after an exception you've no idea of the state of things, how on earth do you think you might recover. Just call it a day. RunTimeError is an application state, all the others are lifecycle states.

For b), RunTimeError seriously strikes me as application specific. Use an FSM that has error() and recovered() events instead. Again, I see no need for ExtraFatalError. And I still argue that exceptions pretty much anywhere mean you can't really recover, as with a).

YMMV
S

Default Component states (Was Default exception handling in RTT

Submitted by Sylvain Joyeux on Wed, 2010-05-05 08:44.

Stephen Roderick wrote:
> I would advocate one of two approaches
> a) Peter's diagram minus RunTimeError. An exception during Running goes to FatalError.
> b) Sylvain's diagram without RunTimeError and without ExtraFatalError.
>
> For a), after an exception you've no idea of the state of things, how on earth do you think you might recover. Just call it a day. RunTimeError is an application state, all the others are lifecycle states.
>
> For b), RunTimeError seriously strikes me as application specific. Use an FSM that has error() and recovered() events instead. Again, I see no need for ExtraFatalError. And I still argue that exceptions pretty much anywhere mean you can't really recover, as with a).
>
I could live with (b) -- i.e. with the removal of RuntimeError -- but
*with* a recoverable FatalError (i.e. might be called Failed since Fatal
seems controversial)

Default Component states (Was Default exception handling in RTT

Submitted by snrkiwi on Wed, 2010-05-05 13:12.

[combining several conclusionary email ...]

On May 5, 2010, at 04:41 , Sylvain Joyeux wrote:

> Stephen Roderick wrote:
>> I would advocate one of two approaches
>> a) Peter's diagram minus RunTimeError. An exception during Running goes to FatalError. b) Sylvain's diagram without RunTimeError and without ExtraFatalError.
>>
>> For a), after an exception you've no idea of the state of things, how on earth do you think you might recover. Just call it a day. RunTimeError is an application state, all the others are lifecycle states.
>>
>> For b), RunTimeError seriously strikes me as application specific. Use an FSM that has error() and recovered() events instead. Again, I see no need for ExtraFatalError. And I still argue that exceptions pretty much anywhere mean you can't really recover, as with a).
>>
> I could live with (b) -- i.e. with the removal of RuntimeError -- but *with* a recoverable FatalError (i.e. might be called Failed since Fatal seems controversial)

On May 5, 2010, at 05:17 , Herman Bruyninckx wrote:

> On Wed, 5 May 2010, Sylvain Joyeux wrote:
>
>> Herman Bruyninckx wrote:
>

>> The rationale is that, by doing that, you actually simplify the component
>> implementation greatly, leaving the complexity of handling problems to a
>> layer that is designed for that (the supervision layer).
>
> You are, conceptually, just moving the problem to another component. And
> sometimes that's the right thing to do, sometimes it is not. This trade-off
> depends completely on the application, but does not change the discussion
> we are having here: what should be part of the _default_ state machine in
> RTT...? And the only point I make in this discussion is to advocate _not_
> to use non-reusable/non-composable names such as "Fatal", "Unrecoverable",
> etc. More constructively, I would suggest to name such a state
> "Recoverable" or something (I don't like that name too much, frankly...),
> indicating that (i) it is halted because it encountered something that it
> could not deal with, _and_ (ii) it is still working as a piece of software
> and hence can communicate with others (its "supervisor" for example) to
> help discover the cause of the problem, _and_ (iii) it has still the
> capability to transition to one of its more "useful" states.
>
> In my suggestion, the "and"s are logical "and"s, so all three conditions
> have to be fulfilled before the component gets in that state. This, of
> course, leaves room for other states in which only one or two of these
> three conditions are fulfilled. For example, "Debuggable" would be (i) and
> (ii). Probably (ii) is not an extra 'state' since without (ii) the
> component cannot communicate with others anymore, de facte being useless to
> the system...

The above looks a lot like Peter's diagram's RuntimeError, utilising Sylvain's error() call. That maps i) to RunTimeError, and iii) to FatalError (with a rename here).

Proposal

1) use Sylvain's proposed state chart, with the following modifications
- remove RunTimeError state
- remote ExtraFatalErrorState
- add a programmatic method to get from any hook() to FatalError state [eg error() ]
** it's not clear on the diagram, but I believe Sylvain or Peter said that an exception in any hook() transitions to FatalError **

2) rename FatalError according to some of Herman's comments. Taking i), ii) and iii) above, the name might be HaltedButStillWorkingAndTransitionable ... :-) But seriously, InError, ...?

Anyone else have any input ... Peter ... ?
Stephen

Default Component states (Was Default exception handling in RTT

Submitted by gbiggs on Wed, 2010-05-05 14:40.

On 05/05/10 21:03, Stephen Roderick wrote:
> 2) rename FatalError according to some of Herman's comments. Taking i), ii) and iii) above, the name might be HaltedButStillWorkingAndTransitionable ... :-) But seriously, InError, ...?

"Recovering"?

It says what it's doing (or, at least, what it's attempting to do).

Geoff

Default Component states (Was Default exception handling in RTT

Submitted by bruyninc on Wed, 2010-05-05 15:28.

On Wed, 5 May 2010, Geoffrey Biggs wrote:

> On 05/05/10 21:03, Stephen Roderick wrote:
>> 2) rename FatalError according to some of Herman's comments. Taking i), ii) and iii) above, the name might be HaltedButStillWorkingAndTransitionable ... :-) But seriously, InError, ...?
>
> "Recovering"?
>
> It says what it's doing (or, at least, what it's attempting to do).
>
This is the best trade-off between brevity and semantic clarity that I have
read until now. Thanks!
Good to have native English speakers on the mailinglist!!!! :-)

Herman

Default Component states (Was Default exception handling in RTT

Submitted by Sylvain Joyeux on Wed, 2010-05-05 15:36.

Herman Bruyninckx wrote:
> On Wed, 5 May 2010, Geoffrey Biggs wrote:
>
>
>> On 05/05/10 21:03, Stephen Roderick wrote:
>>
>>> 2) rename FatalError according to some of Herman's comments. Taking i), ii) and iii) above, the name might be HaltedButStillWorkingAndTransitionable ... :-) But seriously, InError, ...?
>>>
>> "Recovering"?
>>
>> It says what it's doing (or, at least, what it's attempting to do).
>>
>>
> This is the best trade-off between brevity and semantic clarity that I have
> read until now. Thanks!
> Good to have native English speakers on the mailinglist!!!! :-)
>
I'm sorry to not share that feeling.

"Recovering" sounds to me that the component is actively doing something
to recover, and will *by itself* come back to running. Not really what
(I think) the state is supposed to convey.

Default Component states (Was Default exception handling in RTT

Submitted by peter on Thu, 2010-05-06 13:24.

On Wednesday 05 May 2010 17:33:18 Sylvain Joyeux wrote:
> Herman Bruyninckx wrote:
> > On Wed, 5 May 2010, Geoffrey Biggs wrote:
> >> On 05/05/10 21:03, Stephen Roderick wrote:
> >>> 2) rename FatalError according to some of Herman's comments. Taking i),
> >>> ii) and iii) above, the name might be
> >>> HaltedButStillWorkingAndTransitionable ... :-) But seriously, InError,
> >>> ...?
> >>
> >> "Recovering"?
> >>
> >> It says what it's doing (or, at least, what it's attempting to do).
> >
> > This is the best trade-off between brevity and semantic clarity that I
> > have read until now. Thanks!
> > Good to have native English speakers on the mailinglist!!!! :-)
>
> I'm sorry to not share that feeling.
>
> "Recovering" sounds to me that the component is actively doing something
> to recover, and will *by itself* come back to running. Not really what
> (I think) the state is supposed to convey.

I follow Sylvain here (watch me switch camps :-) ). Just Error does not mean
it's unrecoverable... When thinking of names, I try to imagine a discussion
between two engineers. If one says: "Hey, this component is 'Recovering'", and
the other says: "Try a recover() !" that sounds to weird to me. I had
imagined: "Hey, this component is in 'Error'", and the other: "Try a
recover()!"

Then there's the point of Markus that if you add application states to the
life cycle state machine, any added application state machine needs to know if
the lifecycle is in run-time error or just running. I don't think this is true
in all/most cases. I see the state machines as being additive: The life cycle
state machine might be in recoverable Error, while an application state
machine (running in a plugin/script) is still running in the 'Happy' state.
That's the component writer's choice, so there's nothing I can do about that
anyway, I won't try to control that.

In my opinion, RunTimeError relates to the fact that updateHook() can't do its
nominal work *and* that this is communicated to everyone interested
(otherwise, updateHook() could silently fail).. A supervisor will need
detailed knowledge/inspection of this component to know why exactly it failed.
That's certainly application/component specific. However, it does not mean that
another application specific state machine isn't/can't doing its work anymore,
they run in parallel in the EE anyway. To make this very clear: even in the
'PreOperational' state, a state machine can be running in the EE (doing all
the stuff to make it 'Ready'). So it's no different for the RunTimeError state.

Even with respect to fatal errors/leaked exceptions, this can be true. The EE
has no way of knowing which state machines to stop or keep running. OR these
machines will be stopped by user from fatalHook() OR they will eventually
throw themselves OR be restarted. All implementation specific.

I fear the state transitions that Sylvain showed in the ODS document are not
very clear. I also wonder what useful code you could put into emergencyHook(),
it already notes that user's recovery code failed severely... how could it do
better ? Also, it does not show how to get out of FatalError and RunTimeError
(recover() ?)

I'll try to resume what I think is (acceptable) on the table right now. Don't
shoot me for naming issues :-)

0. PreOperational/Stopped/Running stays as is in 1.x

1. Any leaked exception from any Hook() function is seen as a
language/programmatic error and leads to the 'Error' (or 'Exception') state.
This state is still recoverable by supervision, other 'stuff' in the EE just
keeps on running, unless told to do otherwise by user code.

Entry: C++ exception
Entry Hook: exceptionHook() : calls stopHook()/cleanupHook()
Exit: recover() leads to 'PreOperational'
Exit Hook: none. Use configure() from PreOperational to proceed.

2. If any 'Error' transition throws (ie to/from 'Error'), we go to 'Fatal' (or
call it 'Emergency') and just stop everything, also the EE.

Entry: C++ exception in exceptionHook() or during recover()
Entry Hook: none
Exit: impossible (user/supervision calls component's destructor).

3. The 'RunTimeError' (or 'Degraded') state is an application specific state
that announces potential application specific problems to supervision. Maybe it
requires further introspection, maybe not and it is just restarted. This state
is only entered/left by user code or supervision calls.

Entry: error()
Entry Hook: none - errorHook() is called instead of updateHook()
Exit: recover()
Exit Hook: none - updateHook() is called instead of errorHook().
Exit: stop()
Exit Hook: stopHook() - just like leaving 'Running'.

I see point 3. as the only 'optional' states, the others are mandatory. I
would add point 3 nevertheless, because it's very easy to extend states in the
scripting languages, it's not so easy in C++, so I would provide a 'basic'
state, which RunTimeError is.

Getting there ?

Peter

Default Component states (Was Default exception handling in RTT

Submitted by snrkiwi on Thu, 2010-05-06 13:36.

On May 6, 2010, at 09:21 , Peter Soetens wrote:

> On Wednesday 05 May 2010 17:33:18 Sylvain Joyeux wrote:
>> Herman Bruyninckx wrote:
>>> On Wed, 5 May 2010, Geoffrey Biggs wrote:
>>>> On 05/05/10 21:03, Stephen Roderick wrote:
>>>>> 2) rename FatalError according to some of Herman's comments. Taking i),
>>>>> ii) and iii) above, the name might be
>>>>> HaltedButStillWorkingAndTransitionable ... :-) But seriously, InError,
>>>>> ...?
>>>>
>>>> "Recovering"?
>>>>
>>>> It says what it's doing (or, at least, what it's attempting to do).
>>>
>>> This is the best trade-off between brevity and semantic clarity that I
>>> have read until now. Thanks!
>>> Good to have native English speakers on the mailinglist!!!! :-)
>>
>> I'm sorry to not share that feeling.
>>
>> "Recovering" sounds to me that the component is actively doing something
>> to recover, and will *by itself* come back to running. Not really what
>> (I think) the state is supposed to convey.
>
> I follow Sylvain here (watch me switch camps :-) ). Just Error does not mean
> it's unrecoverable... When thinking of names, I try to imagine a discussion
> between two engineers. If one says: "Hey, this component is 'Recovering'", and
> the other says: "Try a recover() !" that sounds to weird to me. I had
> imagined: "Hey, this component is in 'Error'", and the other: "Try a
> recover()!"
>
> Then there's the point of Markus that if you add application states to the
> life cycle state machine, any added application state machine needs to know if
> the lifecycle is in run-time error or just running. I don't think this is true
> in all/most cases. I see the state machines as being additive: The life cycle
> state machine might be in recoverable Error, while an application state
> machine (running in a plugin/script) is still running in the 'Happy' state.
> That's the component writer's choice, so there's nothing I can do about that
> anyway, I won't try to control that.
>
> In my opinion, RunTimeError relates to the fact that updateHook() can't do its
> nominal work *and* that this is communicated to everyone interested
> (otherwise, updateHook() could silently fail).. A supervisor will need
> detailed knowledge/inspection of this component to know why exactly it failed.
> That's certainly application/component specific. However, it does not mean that
> another application specific state machine isn't/can't doing its work anymore,
> they run in parallel in the EE anyway. To make this very clear: even in the
> 'PreOperational' state, a state machine can be running in the EE (doing all
> the stuff to make it 'Ready'). So it's no different for the RunTimeError state.
>
> Even with respect to fatal errors/leaked exceptions, this can be true. The EE
> has no way of knowing which state machines to stop or keep running. OR these
> machines will be stopped by user from fatalHook() OR they will eventually
> throw themselves OR be restarted. All implementation specific.
>
> I fear the state transitions that Sylvain showed in the ODS document are not
> very clear. I also wonder what useful code you could put into emergencyHook(),
> it already notes that user's recovery code failed severely... how could it do
> better ? Also, it does not show how to get out of FatalError and RunTimeError
> (recover() ?)
>
> I'll try to resume what I think is (acceptable) on the table right now. Don't
> shoot me for naming issues :-)
>
> 0. PreOperational/Stopped/Running stays as is in 1.x
>
> 1. Any leaked exception from any Hook() function is seen as a
> language/programmatic error and leads to the 'Error' (or 'Exception') state.
> This state is still recoverable by supervision, other 'stuff' in the EE just
> keeps on running, unless told to do otherwise by user code.
>
> Entry: C++ exception
> Entry Hook: exceptionHook() : calls stopHook()/cleanupHook()
> Exit: recover() leads to 'PreOperational'
> Exit Hook: none. Use configure() from PreOperational to proceed.
>
> 2. If any 'Error' transition throws (ie to/from 'Error'), we go to 'Fatal' (or
> call it 'Emergency') and just stop everything, also the EE.
>
> Entry: C++ exception in exceptionHook() or during recover()
> Entry Hook: none
> Exit: impossible (user/supervision calls component's destructor).
>
> 3. The 'RunTimeError' (or 'Degraded') state is an application specific state
> that announces potential application specific problems to supervision. Maybe it
> requires further introspection, maybe not and it is just restarted. This state
> is only entered/left by user code or supervision calls.
>
> Entry: error()
> Entry Hook: none - errorHook() is called instead of updateHook()
> Exit: recover()
> Exit Hook: none - updateHook() is called instead of errorHook().
> Exit: stop()
> Exit Hook: stopHook() - just like leaving 'Running'.
>
> I see point 3. as the only 'optional' states, the others are mandatory. I
> would add point 3 nevertheless, because it's very easy to extend states in the
> scripting languages, it's not so easy in C++, so I would provide a 'basic'
> state, which RunTimeError is.
>
> Getting there ?

Hate to say it mate, but the above just muddied the waters for me. Can you make (yet another) diagram for this? These ones really helped IMHO

> I created an Image here :
> http://picasaweb.google.be/lh/photo/zQEXp8DeQiCLwMit_7TtnQ?feat=directlink

Stephen

Default Component states (Was Default exception handling in RTT

Submitted by peter on Thu, 2010-05-06 14:32.

On Thursday 06 May 2010 15:34:45 S Roderick wrote:
> >
> > Getting there ?
>
> Hate to say it mate, but the above just muddied the waters for me. Can you
> make (yet another) diagram for this? These ones really helped IMHO
>

http://picasaweb.google.be/lh/photo/KwEzui4b4Mg0BKgtcESVag?feat=directlink

Peter

Default Component states (Was Default exception handling in RTT

Submitted by snrkiwi on Thu, 2010-05-06 14:52.

On May 6, 2010, at 10:30 , Peter Soetens wrote:

> On Thursday 06 May 2010 15:34:45 S Roderick wrote:
>>>
>>> Getting there ?
>>
>> Hate to say it mate, but the above just muddied the waters for me. Can you
>> make (yet another) diagram for this? These ones really helped IMHO
>>
>
> http://picasaweb.google.be/lh/photo/KwEzui4b4Mg0BKgtcESVag?feat=directlink
>
> Peter

Many thanks, much easier.

QUESTIONS
Does an exception from Stopped cause exceptionHook() to call stopHook() on the way? Similarly for an exception from PreOperational causing calls to stopHook() and cleanupHook(). In both cases this is redundant, which may be a reasonable compromise, but we need to ensure this is well documented otherwise badly written stopHook/cleanupHook may assume they are only called after start/configure has occurred, respectively.

COMMENTS
I see assymmetry in that recover() can get you out of RunTimeError and Error, but error() can only get you to RunTimeError(). Also, you can programmatically get to FatalError by fatal(). I would argue for something to programmatically get you to Error state.

To beat Herman to the punch, you have an error() event causing transition to a RunTimeError state, but there is also an Error state. Also, errorHook() is called by RunTimeError, while exceptionHook() is called by Error. Bad Naming Choices. I have no real problem with the structure, but this aspect is confusing. Really confusing for newbies ...

SUGGESTIONS
RunTimeError -> Warning? Degraded?
Then use warningHook() instead of errorHook()
Change error() to warning()

Make error() transition to Error state (or whatever we decide to call it), and then that uses errorHook() instead of exceptionHook().

I'm just getting at consistency here. The overall structure seems like a reasonable compromise of what everyone desires.
Stephen

Default Component states (Was Default exception handling in RTT

Submitted by peter on Thu, 2010-05-06 15:08.

On Thursday 06 May 2010 16:48:01 Stephen Roderick wrote:
> On May 6, 2010, at 10:30 , Peter Soetens wrote:
> > On Thursday 06 May 2010 15:34:45 S Roderick wrote:
> >>> Getting there ?
> >>
> >> Hate to say it mate, but the above just muddied the waters for me. Can
> >> you make (yet another) diagram for this? These ones really helped IMHO
> >
> > http://picasaweb.google.be/lh/photo/KwEzui4b4Mg0BKgtcESVag?feat=directlin
> >k
> >
> > Peter
>
> Many thanks, much easier.
>
> QUESTIONS
> Does an exception from Stopped cause exceptionHook() to call stopHook() on
> the way? Similarly for an exception from PreOperational causing calls to
> stopHook() and cleanupHook(). In both cases this is redundant, which may
> be a reasonable compromise, but we need to ensure this is well documented
> otherwise badly written stopHook/cleanupHook may assume they are only
> called after start/configure has occurred, respectively.

The calls are only made if they would have been made in 'normal' transitions
to PreOperational. So exceptionHook's default would look like:

// when exceptionHook is entered, the state is set to the last state reached.
void TaskContext::exceptionHook() {
  if (mTaskState >= Running)
     this->stopHook();
  if (mTaskState >=Stopped)
     this->cleanupHook();
}
// after exceptionHook, mTaskState is changed to Error

>
> COMMENTS
> I see assymmetry in that recover() can get you out of RunTimeError and
> Error, but error() can only get you to RunTimeError(). Also, you can
> programmatically get to FatalError by fatal(). I would argue for something
> to programmatically get you to Error state.

  throw int();

? :-)

The reason I didn't put it there was because I didn't know yet how to call
it... use:
* error() -> RunTimeError ;
* exception() -> ExceptionError ;
* fatal() -> FatalError
?

>
> To beat Herman to the punch, you have an error() event causing transition
> to a RunTimeError state, but there is also an Error state. Also,
> errorHook() is called by RunTimeError, while exceptionHook() is called by
> Error. Bad Naming Choices. I have no real problem with the structure, but
> this aspect is confusing. Really confusing for newbies ...

Yeah, tell me about it, I'm scratching my head all the time !

>
> SUGGESTIONS
> RunTimeError -> Warning? Degraded?
> Then use warningHook() instead of errorHook()
> Change error() to warning()
>
> Make error() transition to Error state (or whatever we decide to call it),
> and then that uses errorHook() instead of exceptionHook().

So that's the other direction than I had in mind above. I'm still keeping 1.x
hook naming in mind, in order to make transition a little smooth... Remember
that 1.x had a warning() method, which we removed now.

>
> I'm just getting at consistency here. The overall structure seems like a
> reasonable compromise of what everyone desires. Stephen
>

Me too.

Peter.

Default Component states (Was Default exception handling in RTT

Submitted by bruyninc on Wed, 2010-05-05 16:28.

On Wed, 5 May 2010, Sylvain Joyeux wrote:

> Herman Bruyninckx wrote:
>> On Wed, 5 May 2010, Geoffrey Biggs wrote:
>>
>>> On 05/05/10 21:03, Stephen Roderick wrote:
>>>
>>>> 2) rename FatalError according to some of Herman's comments. Taking i), ii) and iii) above, the name might be HaltedButStillWorkingAndTransitionable ... :-) But seriously, InError, ...?
>>>>
>>> "Recovering"?
>>>
>>> It says what it's doing (or, at least, what it's attempting to do).
>>>
>>>
>> This is the best trade-off between brevity and semantic clarity that I have
>> read until now. Thanks!
>> Good to have native English speakers on the mailinglist!!!! :-)
>>
> I'm sorry to not share that feeling.
>
> "Recovering" sounds to me that the component is actively doing something
> to recover, and will *by itself* come back to running. Not really what
> (I think) the state is supposed to convey.

I agree with this. But the name is the best _trade-off_ thus far :-)

Herman

Default Component states (Was Default exception handling in RTT

Submitted by snrkiwi on Wed, 2010-05-05 15:44.

On May 5, 2010, at 11:33 , Sylvain Joyeux wrote:

> Herman Bruyninckx wrote:
>> On Wed, 5 May 2010, Geoffrey Biggs wrote:
>>
>>
>>> On 05/05/10 21:03, Stephen Roderick wrote:
>>>
>>>> 2) rename FatalError according to some of Herman's comments. Taking i), ii) and iii) above, the name might be HaltedButStillWorkingAndTransitionable ... :-) But seriously, InError, ...?
>>>>
>>> "Recovering"?
>>>
>>> It says what it's doing (or, at least, what it's attempting to do).
>>>
>>>
>> This is the best trade-off between brevity and semantic clarity that I have
>> read until now. Thanks!
>> Good to have native English speakers on the mailinglist!!!! :-)
>>
> I'm sorry to not share that feeling.
>
> "Recovering" sounds to me that the component is actively doing something
> to recover, and will *by itself* come back to running. Not really what
> (I think) the state is supposed to convey.

For the most part, +1 ... its the right kind of word, but this state is more an "ability to recover", or is "in an error". But I'd compromise on the above too ...
Stephen

Default Component states (Was Default exception handling in RTT

Submitted by Sylvain Joyeux on Tue, 2010-05-04 16:28.

Stephen Roderick wrote:
> Thanks for drawing the diagrams up. My 2c worth ...
>
> - Peter's version is far more understandable than Sylvain's
>
> - The diagrams make it very clear the mix of lifecycle and application states. We personally encode our application states in FSMs explicitly.
> - I see no reason for the extra "Really Fatal Error" state (and I agree in large part with Herman's comments regarding naming)
> - A fatal error is a fatal error. No way out. End of story. Done. Finished.
>
Fine, call it 'Failure' or simply 'Error' (by opposition to "RuntimeError")
> - if we can recover from RunTimeError then we should be able to programmatically enter it (Sylvain's error() )
> - it looks like adding Sylvain's resetError()/resetHook() would not be a huge change in the API. That would get him the ability to recover from FatalError.
>
> I would advocate one of two approaches
> a) Peter's diagram minus RunTimeError. An exception during Running goes to FatalError.
> b) Sylvain's diagram without RunTimeError and without ExtraFatalError.
>
> For a), after an exception you've no idea of the state of things, how on earth do you think you might recover. Just call it a day. RunTimeError is an application state, all the others are lifecycle states.
>
> For b), RunTimeError seriously strikes me as application specific. Use an FSM that has error() and recovered() events instead. Again, I see no need for ExtraFatalError. And I still argue that exceptions pretty much anywhere mean you can't really recover, as with a).
>
If they are living their own life, applications don't need to export any
states, as they are the only ones interested about them. The need I
personally see from RuntimeError is to provide a default "degraded"
state for applications that don't want to use events and FSM (none of
ours do), and a default mean of supervision. What I believe is that, if
no default runtime error is provided, then nobody will ever export
"degraded" states by default.

Moreover, I believe that in most cases, a cleanup procedure can actually
recover from an ill-known state. In my POV (and in our components),
cleanup() does a *lot* of cleanup (not far away from delete/new. I.e.,
it *is* a perfectly possible sequence to try and recover from unhandled
exceptions. It might fail (i.e. throw), but in most cases it will work.

Default Component states (Was Default exception handling in RTT

Submitted by snrkiwi on Tue, 2010-05-04 17:40.

On May 4, 2010, at 12:25 , Sylvain Joyeux wrote:

> Stephen Roderick wrote:
>> Thanks for drawing the diagrams up. My 2c worth ...
>>
>> - Peter's version is far more understandable than Sylvain's
>> - The diagrams make it very clear the mix of lifecycle and application states. We personally encode our application states in FSMs explicitly. - I see no reason for the extra "Really Fatal Error" state (and I agree in large part with Herman's comments regarding naming)
>> - A fatal error is a fatal error. No way out. End of story. Done. Finished.
>>
> Fine, call it 'Failure' or simply 'Error' (by opposition to "RuntimeError")

Works for me. Semantically, fatal == terminal == end.

>> - if we can recover from RunTimeError then we should be able to programmatically enter it (Sylvain's error() )
>> - it looks like adding Sylvain's resetError()/resetHook() would not be a huge change in the API. That would get him the ability to recover from FatalError.
>>
>> I would advocate one of two approaches
>> a) Peter's diagram minus RunTimeError. An exception during Running goes to FatalError. b) Sylvain's diagram without RunTimeError and without ExtraFatalError.
>>
>> For a), after an exception you've no idea of the state of things, how on earth do you think you might recover. Just call it a day. RunTimeError is an application state, all the others are lifecycle states.
>>
>> For b), RunTimeError seriously strikes me as application specific. Use an FSM that has error() and recovered() events instead. Again, I see no need for ExtraFatalError. And I still argue that exceptions pretty much anywhere mean you can't really recover, as with a).
>>
> If they are living their own life, applications don't need to export any states, as they are the only ones interested about them. The need I personally see from RuntimeError is to provide a default "degraded" state for applications that don't want to use events and FSM (none of ours do), and a default mean of supervision. What I believe is that, if no default runtime error is provided, then nobody will ever export "degraded" states by default.

What do you mean precisely by "export any states"? Is that annoucing state changes to a supervisor?

It really does sound like you want RTT to provide a defaul try/catch block around all user hook code, instead of making the user do it themselves. Is that not true?

Is there a compromise here where you get what you want, but the rest of us don't have to live with additional complexity in a component's states. Alternatively, can you get a compromise where you don't have to be forced to use FSM's to do what you want?

What would it take in RTT v2 to allow the lifecycle/component-states to be extendable? Is it possible to support both Peter's model and Sylvain's model?

> Moreover, I believe that in most cases, a cleanup procedure can actually recover from an ill-known state. In my POV (and in our components), cleanup() does a *lot* of cleanup (not far away from delete/new. I.e., it *is* a perfectly possible sequence to try and recover from unhandled exceptions. It might fail (i.e. throw), but in most cases it will work.

Fair enough.
Stephen

Default Component states (Was Default exception handling in RTT

Submitted by Sylvain Joyeux on Wed, 2010-05-05 08:52.

Stephen Roderick wrote:
> On May 4, 2010, at 12:25 , Sylvain Joyeux wrote:
>
>
>> Stephen Roderick wrote:
>>
>>> Thanks for drawing the diagrams up. My 2c worth ...
>>>
>>> - Peter's version is far more understandable than Sylvain's
>>> - The diagrams make it very clear the mix of lifecycle and application states. We personally encode our application states in FSMs explicitly. - I see no reason for the extra "Really Fatal Error" state (and I agree in large part with Herman's comments regarding naming)
>>> - A fatal error is a fatal error. No way out. End of story. Done. Finished.
>>>
>>>
>> Fine, call it 'Failure' or simply 'Error' (by opposition to "RuntimeError")
>>
>
> Works for me. Semantically, fatal == terminal == end.
>
For me to. End of the component's execution. I.e. fatal to its
functionality. Does not mean that you can't restart it.
>>> - if we can recover from RunTimeError then we should be able to programmatically enter it (Sylvain's error() )
>>> - it looks like adding Sylvain's resetError()/resetHook() would not be a huge change in the API. That would get him the ability to recover from FatalError.
>>>
>>> I would advocate one of two approaches
>>> a) Peter's diagram minus RunTimeError. An exception during Running goes to FatalError. b) Sylvain's diagram without RunTimeError and without ExtraFatalError.
>>>
>>> For a), after an exception you've no idea of the state of things, how on earth do you think you might recover. Just call it a day. RunTimeError is an application state, all the others are lifecycle states.
>>>
>>> For b), RunTimeError seriously strikes me as application specific. Use an FSM that has error() and recovered() events instead. Again, I see no need for ExtraFatalError. And I still argue that exceptions pretty much anywhere mean you can't really recover, as with a).
>>>
>>>
>> If they are living their own life, applications don't need to export any states, as they are the only ones interested about them. The need I personally see from RuntimeError is to provide a default "degraded" state for applications that don't want to use events and FSM (none of ours do), and a default mean of supervision. What I believe is that, if no default runtime error is provided, then nobody will ever export "degraded" states by default.
>>
>
> What do you mean precisely by "export any states"? Is that annoucing state changes to a supervisor?
>
Yes.
> It really does sound like you want RTT to provide a defaul try/catch block around all user hook code, instead of making the user do it themselves. Is that not true?
>
Yes, it is true. If there is none, then a component's hook leaking an
exception will lead to having the whole process crashing. Including the
tasks that are deployed in the same process, and that have nothing to do
with it.
> Is there a compromise here where you get what you want, but the rest of us don't have to live with additional complexity in a component's states. Alternatively, can you get a compromise where you don't have to be forced to use FSM's to do what you want?
>
Well. My original proposal (the one I am using right now on "my" branch
of RTT 1.x) is the original 1.x state machine. The modifications have
been brought by Peter.
> What would it take in RTT v2 to allow the lifecycle/component-states to be extendable? Is it possible to support both Peter's model and Sylvain's model?
>
I don't think that it is desirable to add yet another extension layer
just to have a default state machine that allows to represent/supervise
simple components. I do believe that for more complex cases it would be
nice to have one (and we should probably discuss such an
service/extension during the workshop), but I want my users at DFKI to
use a default, already-there, simple API to represent the state of their
components.
>> Moreover, I believe that in most cases, a cleanup procedure can actually recover from an ill-known state. In my POV (and in our components), cleanup() does a *lot* of cleanup (not far away from delete/new. I.e., it *is* a perfectly possible sequence to try and recover from unhandled exceptions. It might fail (i.e. throw), but in most cases it will work.
>>
>
> Fair enough.
> Stephen
>

Default Component states (Was Default exception handling in RTT

Submitted by bruyninc on Wed, 2010-05-05 05:48.

On Tue, 4 May 2010, Stephen Roderick wrote:

> On May 4, 2010, at 12:25 , Sylvain Joyeux wrote:
>
>> Stephen Roderick wrote:
>>> Thanks for drawing the diagrams up. My 2c worth ...
>>>
>>> - Peter's version is far more understandable than Sylvain's
>>> - The diagrams make it very clear the mix of lifecycle and application states. We personally encode our application states in FSMs explicitly. - I see no reason for the extra "Really Fatal Error" state (and I agree in large part with Herman's comments regarding naming)
>>> - A fatal error is a fatal error. No way out. End of story. Done. Finished.
>>>
>> Fine, call it 'Failure' or simply 'Error' (by opposition to "RuntimeError")
>
> Works for me. Semantically, fatal == terminal == end.

Doesn't work for me! _If_ your component is still so much active that it
can perform transitions to different states, it can still do other things
too! Most often, that means that its _useful_ activity is _temporarily_
hindered by the non-availability of some externally controlled resource
(communication, motors, whatever), and that this component could/should be
actively waiting for the temporary problem to be solved. Hence, it should
get a name that reflects this situation. If something is really fatal for a
component, the component will not know about that, because it will have
lost all its "consciousness" :-)

Herman

>>> - if we can recover from RunTimeError then we should be able to programmatically enter it (Sylvain's error() )
>>> - it looks like adding Sylvain's resetError()/resetHook() would not be a huge change in the API. That would get him the ability to recover from FatalError.
>>>
>>> I would advocate one of two approaches
>>> a) Peter's diagram minus RunTimeError. An exception during Running goes to FatalError. b) Sylvain's diagram without RunTimeError and without ExtraFatalError.
>>>
>>> For a), after an exception you've no idea of the state of things, how on earth do you think you might recover. Just call it a day. RunTimeError is an application state, all the others are lifecycle states.
>>>
>>> For b), RunTimeError seriously strikes me as application specific. Use an FSM that has error() and recovered() events instead. Again, I see no need for ExtraFatalError. And I still argue that exceptions pretty much anywhere mean you can't really recover, as with a).
>>>
>> If they are living their own life, applications don't need to export any states, as they are the only ones interested about them. The need I personally see from RuntimeError is to provide a default "degraded" state for applications that don't want to use events and FSM (none of ours do), and a default mean of supervision. What I believe is that, if no default runtime error is provided, then nobody will ever export "degraded" states by default.
>
> What do you mean precisely by "export any states"? Is that annoucing state changes to a supervisor?
>
> It really does sound like you want RTT to provide a defaul try/catch block around all user hook code, instead of making the user do it themselves. Is that not true?
>
> Is there a compromise here where you get what you want, but the rest of us don't have to live with additional complexity in a component's states. Alternatively, can you get a compromise where you don't have to be forced to use FSM's to do what you want?
>
> What would it take in RTT v2 to allow the lifecycle/component-states to be extendable? Is it possible to support both Peter's model and Sylvain's model?
>
>> Moreover, I believe that in most cases, a cleanup procedure can actually recover from an ill-known state. In my POV (and in our components), cleanup() does a *lot* of cleanup (not far away from delete/new. I.e., it *is* a perfectly possible sequence to try and recover from unhandled exceptions. It might fail (i.e. throw), but in most cases it will work.
>
> Fair enough.
> Stephen
>
>