What are your error events?

Dear List,

One important step for writing reusable basic _and_ coordination
components is a unified, robotic specific set of errors. This is
because:

- it allows reacting to groups or classes of events: "if any sensor
reports an hardware error, disable it". Less code.

- As the same error (semantically) will be named identically for
different components, generic error handling behaviours can evolve:
dealing with broken hardware, switching of controllers, etc.

- Last but not least it eases understanding, if the same "thing" is
not named differently each time.

I'd like to hear your opinion on this:

- Which errors do you commonly use?

- Which classes of errors do you commonly use?

- How should such errors be organized? For instance Unix errors are a
flat list (confer errno(3)), which is probably not a good
choice. Alternatives could be a tree like hierarchical structure or
even a "multi-view" scheme in which errors can be grouped according
to different properties (severity, location, etc).

A quick grep through OCL shows mainly Out-of-Range kind of errors
generated by robotic driver components and a "maximumLoadEvent" by
hardware/wrench/WrenchSensor.

What else have you got?

Best regards
Markus

What are your error events?

On Fri, Sep 11, 2009 at 11:01 AM, Markus Klotzbuecher
<markus [dot] klotzbuecher [..] ...> wrote:
[...]
> One important step for writing reusable basic _and_ coordination
> components is a unified, robotic specific set of errors. This is
> because:
>
>  - it allows reacting to groups or classes of events: "if any sensor
>   reports an hardware error, disable it". Less code.
>
>  - As the same error (semantically) will be named identically for
>   different components, generic error handling behaviours can evolve:
>   dealing with broken hardware, switching of controllers, etc.
>
>  - Last but not least it eases understanding, if the same "thing" is
>   not named differently each time.
>
> I'd like to hear your opinion on this:
>
>  - Which errors do you commonly use?
>
>  - Which classes of errors do you commonly use?
>
>  - How should such errors be organized? For instance Unix errors are a
>   flat list (confer errno(3)), which is probably not a good
>   choice. Alternatives could be a tree like hierarchical structure or
>   even a "multi-view" scheme in which errors can be grouped according
>   to different properties (severity, location, etc).

Others on this thread already noticed that you can/could apply the
term 'error' both to events and states.
- A sensor component can raise a 'component-error' event, e.g. when
its hardware sensor becomes defunctional
- From a system level, you can have certain 'error states' (e.g.
emergency stop), and arrive in those states after one or more
'component-error' events.

Where I see a clear benefit in trying to come to a unified set of
error events (e.g. both a laser scanner component and an encoder
components can fire a "hardware defunctional" event), I wonder wether
it is useful to define "generic error handling for robotics" as you
describe it above. I'm rather thinking along the same lines as
Stephen here and put this system-level behaviour in a system-level and
_application specific_ state machine. In one application, a
defunctional sensor will lead to an emergency stop, in another, you
might just replace it by a different sensor...

This all reminds me of Peter's post about "library vs. middleware" discussion.

But I might be wrong...

k

What are your error events?

On Thu, Sep 17, 2009 at 04:45:14PM +0200, Klaas Gadeyne wrote:
> On Fri, Sep 11, 2009 at 11:01 AM, Markus Klotzbuecher
> <markus [dot] klotzbuecher [..] ...> wrote:
> [...]
> > One important step for writing reusable basic _and_ coordination
> > components is a unified, robotic specific set of errors. This is
> > because:
> >
> >  - it allows reacting to groups or classes of events: "if any sensor
> >   reports an hardware error, disable it". Less code.
> >
> >  - As the same error (semantically) will be named identically for
> >   different components, generic error handling behaviours can evolve:
> >   dealing with broken hardware, switching of controllers, etc.
> >
> >  - Last but not least it eases understanding, if the same "thing" is
> >   not named differently each time.
> >
> > I'd like to hear your opinion on this:
> >
> >  - Which errors do you commonly use?
> >
> >  - Which classes of errors do you commonly use?
> >
> >  - How should such errors be organized? For instance Unix errors are a
> >   flat list (confer errno(3)), which is probably not a good
> >   choice. Alternatives could be a tree like hierarchical structure or
> >   even a "multi-view" scheme in which errors can be grouped according
> >   to different properties (severity, location, etc).
>
> Others on this thread already noticed that you can/could apply the
> term 'error' both to events and states.

Yes, but you implicitely assume the hidden word "occurrence" written
after the first, and "handling" after the second.

> - A sensor component can raise a 'component-error' event, e.g. when
> its hardware sensor becomes defunctional
> - From a system level, you can have certain 'error states' (e.g.
> emergency stop), and arrive in those states after one or more
> 'component-error' events.
>
> Where I see a clear benefit in trying to come to a unified set of
> error events (e.g. both a laser scanner component and an encoder
> components can fire a "hardware defunctional" event), I wonder wether
> it is useful to define "generic error handling for robotics" as you
> describe it above. I'm rather thinking along the same lines as
> Stephen here and put this system-level behaviour in a system-level and
> _application specific_ state machine. In one application, a
> defunctional sensor will lead to an emergency stop, in another, you
> might just replace it by a different sensor...

Yes, I agree with you that these are system level concerns and
(almost) impossibly can be dealt with in a generic way. But if *all*
sensors emit the same "hardware defunctional" event maybe we can have
a generic "sensor replacement coordination" mechanism as a first step?

> This all reminds me of Peter's post about "library vs. middleware" discussion.
>
> But I might be wrong...

It is not about creating a new layer, it's about streamlining the
existing one...

Markus

What are your error events?

First of all, in my case errors are represented in orocos as states, not as
events.

Secondly, as you suggest, errors should be categorized. I will be self-
advertising and point my thesis for an example of how to represent this ;-)

> - Which errors do you commonly use?
> - Which classes of errors do you commonly use?
- hardware failure
- bad input: one of the inputs is out of the accepted range
- internal error: an error occured internally to the module (for instance: a
control algorithm produces out of range output)

> - How should such errors be organized? For instance Unix errors are a
> flat list (confer errno(3)), which is probably not a good
> choice. Alternatives could be a tree like hierarchical structure or
> even a "multi-view" scheme in which errors can be grouped according
> to different properties (severity, location, etc).
Well. First of all, the severity is actually a system-wide property and should
not be represented at the module specification level. What the module
specification should take care of is categorization. For instance, one could
have out_of_range_command as a suberror of bad_input and so on.

Sylvain

What are your error events?

On Fri, Sep 11, 2009 at 02:56:01PM +0200, Sylvain Joyeux wrote:
> First of all, in my case errors are represented in orocos as states, not as
> events.

You mean states as in state machine states or how are they encoded?

> Secondly, as you suggest, errors should be categorized. I will be self-
> advertising and point my thesis for an example of how to represent this ;-)

Ok, will take a look... :-)

> > - Which errors do you commonly use?
> > - Which classes of errors do you commonly use?
> - hardware failure
> - bad input: one of the inputs is out of the accepted range
> - internal error: an error occured internally to the module (for instance: a
> control algorithm produces out of range output)

Ok.

> > - How should such errors be organized? For instance Unix errors are a
> > flat list (confer errno(3)), which is probably not a good
> > choice. Alternatives could be a tree like hierarchical structure or
> > even a "multi-view" scheme in which errors can be grouped according
> > to different properties (severity, location, etc).
> Well. First of all, the severity is actually a system-wide property and should

I agree...

> not be represented at the module specification level. What the module
> specification should take care of is categorization. For instance, one could
> have out_of_range_command as a suberror of bad_input and so on.

But do you really think this categorization can take place at module
level, isn't it a system level issue itself?

Markus

What are your error events?

On Sep 11, 2009, at 08:56 , Sylvain Joyeux wrote:

> First of all, in my case errors are represented in orocos as states,
> not as
> events.

That applies to most of our errors too, though we obviously use
"failure/error" events to generate the state transitions.

> Secondly, as you suggest, errors should be categorized. I will be
> self-
> advertising and point my thesis for an example of how to represent
> this ;-)
>
>> - Which errors do you commonly use?
>> - Which classes of errors do you commonly use?
> - hardware failure
> - bad input: one of the inputs is out of the accepted range
> - internal error: an error occured internally to the module (for
> instance: a
> control algorithm produces out of range output)

Mostly limit related ones (position out of range, excessive velocity,
input out of range), hardware failure (these are almost always state
machine modelled), communication failures (eg socket problems, again,
modelled in state machines).
>
>> - How should such errors be organized? For instance Unix errors are a
>> flat list (confer errno(3)), which is probably not a good
>> choice. Alternatives could be a tree like hierarchical structure or
>> even a "multi-view" scheme in which errors can be grouped according
>> to different properties (severity, location, etc).
> Well. First of all, the severity is actually a system-wide property
> and should
> not be represented at the module specification level. What the module
> specification should take care of is categorization. For instance,
> one could
> have out_of_range_command as a suberror of bad_input and so on.

Agreed, severity (ie consequence) is different from cause.

Where are you going with all this? What is it for? For the majority of
cases, state machines do what we want here.

Stephen

What are your error events?

On Fri, Sep 11, 2009 at 03:37:18PM +0200, S Roderick wrote:
> On Sep 11, 2009, at 08:56 , Sylvain Joyeux wrote:
>
> > First of all, in my case errors are represented in orocos as states,
> > not as
> > events.
>
> That applies to most of our errors too, though we obviously use
> "failure/error" events to generate the state transitions.

Yes, I'm interested in these :-)

> > Secondly, as you suggest, errors should be categorized. I will be
> > self-
> > advertising and point my thesis for an example of how to represent
> > this ;-)
> >
> >> - Which errors do you commonly use?
> >> - Which classes of errors do you commonly use?
> > - hardware failure
> > - bad input: one of the inputs is out of the accepted range
> > - internal error: an error occured internally to the module (for
> > instance: a
> > control algorithm produces out of range output)
>
> Mostly limit related ones (position out of range, excessive velocity,
> input out of range), hardware failure (these are almost always state
> machine modelled), communication failures (eg socket problems, again,
> modelled in state machines).

Ok.

> >> - How should such errors be organized? For instance Unix errors are a
> >> flat list (confer errno(3)), which is probably not a good
> >> choice. Alternatives could be a tree like hierarchical structure or
> >> even a "multi-view" scheme in which errors can be grouped according
> >> to different properties (severity, location, etc).
> > Well. First of all, the severity is actually a system-wide property
> > and should
> > not be represented at the module specification level. What the module
> > specification should take care of is categorization. For instance,
> > one could
> > have out_of_range_command as a suberror of bad_input and so on.
>
> Agreed, severity (ie consequence) is different from cause.
>
> Where are you going with all this? What is it for? For the majority of
> cases, state machines do what we want here.

State machines allow you to nicely specify _what_ to do when events
occur, but the events themselves are currently defined only at a
syntactical level and are not reused across different components. I
think it would be desireable to define a set of error
events/conditions at a semantic level which are robotic specific,
categorized in some way and thus meaningfull across component
boundaries.

Markus

What are your error events?

> Where are you going with all this? What is it for? For the majority of
> cases, state machines do what we want here.
I think that he goes towards an interface allowing to specify an interface
between what the module can diagnose about himself, which could be used by
external supervision tools, which is *good*.

Sylvain

We consider faults/errors as

We consider faults/errors as if events and states that are detected and coped with inside the highest priority component/controller called Safe-Guarded.

A. Fault sources:
1. Faults of the manipulator:
- Mechanical structure
- Power supply
- Motors and power amplifiers
- Computing system
- Interface cards
- Interconnecting wiring
2. Faults of the environment:
- Intrusion into the working space
- Emergency-stop
3. Faults of the operating process:
- Obstacle collision
- Endstop/joint collision
- Self-collision
- Actuator saturation
- Servo overload
- Overweight load

B. How to check:
We plan to use the combined conditions of:
- sensors
- tracking errors
- model checking
- watchdog
- input values (measured signals)
- output values (control signals)
to detect kind of errors.

Regards,
Phong