Cannot call stop() from within the updateHook

Submitted by Sylvain Joyeux on Wed, 2010-12-22 12:32

RTT-dev

It used to work in RTT 1.x but fails in 2.x

I looked around and saw two things. Peter, can you confirm/infirm that ?

* breakLoop() returns false by default on non-periodic activities. In
practice, it means that stop() will return false "every once in a
while" in data-triggered and/or self-triggered tasks. The condition
for this to happen is that updateHook is called at the same time
stop() is called. Which can happen for tasks that are triggered by
their input ports.

* Thread::stop() synchronizes on the end of loop(). Which will fail
miserably if stop() is called within the same thread than loop().

Based on these findings (which might be wrong ...), I thought about two
possible fixes:

* have TaskCore::breakUpdateHook() return true by default. In the
general case, updateHook() *should* should eventually return. The
case where updateHook() is an infinite loop should be reduced to a
minimum *and* in these cases people should accept the additional
burden of making sure that breakLoop plays nice with their tasks.

* have TaskCore::stop check if it is being called from within its own
thread, and act accordingly. One could "just" send the stop()
operation to the task, but that would be quite a bit of burden
(create a thread each time a task wants to stop itself), and have the
additional problem of having updateHook() called after stop() has
been called -- since the processing of stop() and the next trigger
race with each other.

Thoughts ?

Cannot call stop() from within the updateHook

Submitted by peter on Wed, 2010-12-22 12:56.

On Wednesday 22 December 2010 13:29:28 Sylvain Joyeux wrote:
> It used to work in RTT 1.x but fails in 2.x
>
> I looked around and saw two things. Peter, can you confirm/infirm that ?
>
> * breakLoop() returns false by default on non-periodic activities. In
> practice, it means that stop() will return false "every once in a
> while" in data-triggered and/or self-triggered tasks. The condition
> for this to happen is that updateHook is called at the same time
> stop() is called. Which can happen for tasks that are triggered by
> their input ports.

That seems a correct analysis. When you stop the activity of a component,
you're on your own.

>
> * Thread::stop() synchronizes on the end of loop(). Which will fail
> miserably if stop() is called within the same thread than loop().

It will fail and print a warning that it didn't succeed. We could use a
recursive mutex in stop() that allows a double lock...

>
> Based on these findings (which might be wrong ...), I thought about two
> possible fixes:
>
> * have TaskCore::breakUpdateHook() return true by default. In the
> general case, updateHook() *should* should eventually return. The
> case where updateHook() is an infinite loop should be reduced to a
> minimum *and* in these cases people should accept the additional
> burden of making sure that breakLoop plays nice with their tasks.

Hmm...maybe. On the other hand, returning false informs the users that he
needs to implement/do something for his case.

>
> * have TaskCore::stop check if it is being called from within its own
> thread, and act accordingly. One could "just" send the stop()
> operation to the task, but that would be quite a bit of burden
> (create a thread each time a task wants to stop itself), and have the
> additional problem of having updateHook() called after stop() has
> been called -- since the processing of stop() and the next trigger
> race with each other.

Seems too complex.

>
> Thoughts ?

We didn't really intend users to call Thread::stop() at all because activities
keep on running in the component's lifecycle. From that point of view, ony
breakUpdateHook() matters, because it allows a transition from 'Running' to
'Stopped' in the TaskContext's state.

In case this only relates to a file-descriptor activity, why don't you keep it
always running and only start monitoring fd's when it is instructed to do so
(by a start()) ?

Peter

Cannot call stop() from within the updateHook

Submitted by Sylvain Joyeux on Wed, 2010-12-22 14:56.

On 12/22/2010 01:50 PM, Peter Soetens wrote:
> On Wednesday 22 December 2010 13:29:28 Sylvain Joyeux wrote:
>> It used to work in RTT 1.x but fails in 2.x
>>
>> I looked around and saw two things. Peter, can you confirm/infirm that ?
>>
>> * breakLoop() returns false by default on non-periodic activities. In
>> practice, it means that stop() will return false "every once in a
>> while" in data-triggered and/or self-triggered tasks. The condition
>> for this to happen is that updateHook is called at the same time
>> stop() is called. Which can happen for tasks that are triggered by
>> their input ports.
>
> That seems a correct analysis. When you stop the activity of a component,
> you're on your own.

I don't touch the activity and/or the thread explicitely. I just have
updateHook() call stop() on a task context that has an activity with
period() == 0. And it fails.

I do get the warning about breakLoop returning false, which I think
means that Thread::stop() is called behind the scenes.

Moreover, even if breakLoop returned true, I *guess* that we would have
a deadlock (but I can't stress the "I guess" enough)

> Hmm...maybe. On the other hand, returning false informs the users that he
> needs to implement/do something for his case.
My point is that the general case *is* that updateHook() returns and
that therefore breakUpdateHook() should return true by default.

Blocking updateHook() should be the exception. If breakUpdateHook()
returns false by default, we actually need *all* tasks for which
updateHook() is meant to return eventually (a.k.a. most tasks) to
override that default. Or they will behave strangely as soon as they are
deployed on a non-periodic activity. Makes no sense.

>> Thoughts ?
>
> We didn't really intend users to call Thread::stop() at all because activities
> keep on running in the component's lifecycle. From that point of view, ony
> breakUpdateHook() matters, because it allows a transition from 'Running' to
> 'Stopped' in the TaskContext's state.
Thread::stop() *is* called behind the scenes (i.e. not explicitely by
me). I'll try to have a look as to where ...

> In case this only relates to a file-descriptor activity, why don't you keep it
> always running and only start monitoring fd's when it is instructed to do so
> (by a start()) ?
This is absolutely not related to FDActivity

Cannot call stop() from within the updateHook

Submitted by snrkiwi on Wed, 2010-12-22 19:32.

On 23/12/2010, at 03:53 , Sylvain Joyeux wrote:

> On 12/22/2010 01:50 PM, Peter Soetens wrote:
>> On Wednesday 22 December 2010 13:29:28 Sylvain Joyeux wrote:
>>> It used to work in RTT 1.x but fails in 2.x
>>>
>>> I looked around and saw two things. Peter, can you confirm/infirm that ?
>>>
>>> * breakLoop() returns false by default on non-periodic activities. In
>>> practice, it means that stop() will return false "every once in a
>>> while" in data-triggered and/or self-triggered tasks. The condition
>>> for this to happen is that updateHook is called at the same time
>>> stop() is called. Which can happen for tasks that are triggered by
>>> their input ports.
>>
>> That seems a correct analysis. When you stop the activity of a component,
>> you're on your own.
>
> I don't touch the activity and/or the thread explicitely. I just have
> updateHook() call stop() on a task context that has an activity with
> period() == 0. And it fails.
>
> I do get the warning about breakLoop returning false, which I think
> means that Thread::stop() is called behind the scenes.
>
> Moreover, even if breakLoop returned true, I *guess* that we would have
> a deadlock (but I can't stress the "I guess" enough)
>
>> Hmm...maybe. On the other hand, returning false informs the users that he
>> needs to implement/do something for his case.
> My point is that the general case *is* that updateHook() returns and
> that therefore breakUpdateHook() should return true by default.
>
> Blocking updateHook() should be the exception. If breakUpdateHook()
> returns false by default, we actually need *all* tasks for which
> updateHook() is meant to return eventually (a.k.a. most tasks) to
> override that default. Or they will behave strangely as soon as they are
> deployed on a non-periodic activity. Makes no sense.

For my customers, non-periodic tasks are _all_ blocking. It's the entire reason that we use non-periodic - because the underlying implementation is blocking (eg serial port). IMHO Sylvain's "general case" above is backwards, because it applies to _only_ NP tasks.

We just set a quit flag (or similar) within breakUpdateHook(), which updateHook() monitors. I don't think that the proposed change will affect anything we do.

I do like that the default is to provide a warning to the user that they probably have to implement something else.
S

Cannot call stop() from within the updateHook

Submitted by Sylvain Joyeux on Thu, 2010-12-23 11:12.

On 12/22/2010 08:27 PM, S Roderick wrote:
>> Blocking updateHook() should be the exception. If breakUpdateHook()
>> returns false by default, we actually need *all* tasks for which
>> updateHook() is meant to return eventually (a.k.a. most tasks) to
>> override that default. Or they will behave strangely as soon as they are
>> deployed on a non-periodic activity. Makes no sense.
>
> For my customers, non-periodic tasks are _all_ blocking. It's the entire reason that we use non-periodic - because the underlying implementation is blocking (eg serial port). IMHO Sylvain's "general case" above is backwards, because it applies to _only_ NP tasks.
I don't understand the "general case is backwards because it applies
only on NP tasks" part of that statement

Periodic tasks must have an updateHook() that returns. I.e. for them,
breakUpdateHook() really should return true.

For the others, let's see:
* if you *do* have an updateHook() that is *really* blocking --
because read() blocks on the serial port on the OS you are running.
What good is to have breakUpdateHook() ? You're blocked, you're
blocked. If you can set a flag to make updateHook() quit, it means
that updateHook() actually runs (i.e. you chose to make updateHook()
completely blocking) and you can make updateHook() return regularly.
If you use select() to listen on IOs, use FileDescriptorActivity
which will do the interfacing with RTT for you.
* the breakUpdateHook() returning false breaks data-driven tasks.
Which I believe should be a very common use-case.

In other words, two situation for blocking updateHook()
* either the OS makes you have updateHook() block, in which case you
can't break for it *at all* and you're in a pretty bad situation
* making updateHook() block is a design decision, and IMO a not very
good one as it makes your task completely tied to have a nonperiodic
activity

> We just set a quit flag (or similar) within breakUpdateHook(), which updateHook() monitors. I don't think that the proposed change will affect anything we do.
I believe that the proposed change will not affect anybody that is
actually having updateHook() block, as they *have* to implement
breakUpdateHook().

I saw on Thread::stop() that it monitors if the loop quits within one
second. I guess we could do the same for updateHook() and
breakUpdateHook() and warn if an updateHook() does not quit even though
breakUpdateHook() returned true.

> I do like that the default is to provide a warning to the user that they probably have to implement something else.
First of all, the warning does *not* tell the user that he should have
breakUpdateHook() return true. It talks about Activity::breakLoop(),
which users should not have to know anything about.

Moreover, in "XXX-triggered" tasks, i.e. in tasks for which updateHook()
does return, it is a spurious error. If you call stop() in between two
triggers, nothing will appear. If you call stop() during a trigger, you
get it. Not nice.

Cannot call stop() from within the updateHook

Submitted by peter on Wed, 2010-12-22 15:44.

On Wednesday 22 December 2010 15:53:14 Sylvain Joyeux wrote:
> On 12/22/2010 01:50 PM, Peter Soetens wrote:
> > On Wednesday 22 December 2010 13:29:28 Sylvain Joyeux wrote:
> >> It used to work in RTT 1.x but fails in 2.x
> >>
> >> I looked around and saw two things. Peter, can you confirm/infirm that ?
> >>
> >> * breakLoop() returns false by default on non-periodic activities. In
> >>
> >> practice, it means that stop() will return false "every once in a
> >> while" in data-triggered and/or self-triggered tasks. The condition
> >> for this to happen is that updateHook is called at the same time
> >> stop() is called. Which can happen for tasks that are triggered by
> >> their input ports.
> >
> > That seems a correct analysis. When you stop the activity of a component,
> > you're on your own.
>
> I don't touch the activity and/or the thread explicitely. I just have
> updateHook() call stop() on a task context that has an activity with
> period() == 0. And it fails.

That's another story. I didn't understand the original post like that.

The offending code is this part in the EE:

    bool ExecutionEngine::stopTask(TaskCore* task) {
        // stop and start where former will call breakLoop() in case of non-
periodic.                                                                                                                                                        
        // this is a forced synchronization point, since stop() will only 
return when                                                                                                                                                        
        // step() returned.                                                                                                                                                                                                                  
        if ( getActivity() && this->getActivity()->stop() ) {
            this->getActivity()->start();
            return true;
        }
        return false;
    }

The stopt/start cycle is done as a synchronisation point and prevents that
updateHook() is called after stopHook() (due to scheduling and a race
condition). We *need* something like this, so I would follow your suggestion
to return true by default in breakUpdateHook(), such that this stop() succeeds
with non-periodic activities.

This also influences custom written activities, so we better document this in
the stop() function of TaskCore.

Does this stop/start behavior break FDActivity ?

Peter

Cannot call stop() from within the updateHook

Submitted by Sylvain Joyeux on Thu, 2010-12-23 11:12.

On 12/22/2010 04:41 PM, Peter Soetens wrote:
> On Wednesday 22 December 2010 15:53:14 Sylvain Joyeux wrote:
>> On 12/22/2010 01:50 PM, Peter Soetens wrote:
>>> On Wednesday 22 December 2010 13:29:28 Sylvain Joyeux wrote:
>>>> It used to work in RTT 1.x but fails in 2.x
>>>>
>>>> I looked around and saw two things. Peter, can you confirm/infirm that ?
>>>>
>>>> * breakLoop() returns false by default on non-periodic activities. In
>>>>
>>>> practice, it means that stop() will return false "every once in a
>>>> while" in data-triggered and/or self-triggered tasks. The condition
>>>> for this to happen is that updateHook is called at the same time
>>>> stop() is called. Which can happen for tasks that are triggered by
>>>> their input ports.
>>>
>>> That seems a correct analysis. When you stop the activity of a component,
>>> you're on your own.
>>
>> I don't touch the activity and/or the thread explicitely. I just have
>> updateHook() call stop() on a task context that has an activity with
>> period() == 0. And it fails.
>
> That's another story. I didn't understand the original post like that.
>
> The offending code is this part in the EE:
>

>      bool ExecutionEngine::stopTask(TaskCore* task) {
>          // stop and start where former will call breakLoop() in case of non-
> periodic.
>          // this is a forced synchronization point, since stop() will only
> return when
>          // step() returned.
>          if ( getActivity()&&  this->getActivity()->stop() ) {
>              this->getActivity()->start();
>              return true;
>          }
>          return false;
>      }
>

>
> The stopt/start cycle is done as a synchronisation point and prevents that
> updateHook() is called after stopHook() (due to scheduling and a race
> condition). We *need* something like this, so I would follow your suggestion
> to return true by default in breakUpdateHook(), such that this stop() succeeds
> with non-periodic activities.
>
> This also influences custom written activities, so we better document this in
> the stop() function of TaskCore.
>
> Does this stop/start behavior break FDActivity ?
No. It just looked weird when I debugged the problems I had with
FDActivity. FDActivity is designed so that you can start/stop it at
will, so that's no problem there.

Cannot call stop() from within the updateHook

Submitted by peter on Mon, 2011-01-31 11:20.

FYI, fixed this on master branch.

Peter

On Thursday 23 December 2010 12:10:59 Sylvain Joyeux wrote:
> On 12/22/2010 04:41 PM, Peter Soetens wrote:
> > On Wednesday 22 December 2010 15:53:14 Sylvain Joyeux wrote:
> >> On 12/22/2010 01:50 PM, Peter Soetens wrote:
> >>> On Wednesday 22 December 2010 13:29:28 Sylvain Joyeux wrote:
> >>>> It used to work in RTT 1.x but fails in 2.x
> >>>>
> >>>> I looked around and saw two things. Peter, can you confirm/infirm that
> >>>> ?
> >>>>
> >>>> * breakLoop() returns false by default on non-periodic activities. In
> >>>>
> >>>> practice, it means that stop() will return false "every once in
> >>>> a while" in data-triggered and/or self-triggered tasks. The
> >>>> condition for this to happen is that updateHook is called at the
> >>>> same time stop() is called. Which can happen for tasks that are
> >>>> triggered by their input ports.
> >>>
> >>> That seems a correct analysis. When you stop the activity of a
> >>> component, you're on your own.
> >>
> >> I don't touch the activity and/or the thread explicitely. I just have
> >> updateHook() call stop() on a task context that has an activity with
> >> period() == 0. And it fails.
> >
> > That's another story. I didn't understand the original post like that.
> >
> > The offending code is this part in the EE:
> >

> > 
> >      bool ExecutionEngine::stopTask(TaskCore* task) {
> >      
> >          // stop and start where former will call breakLoop() in case of
> >          non-
> > 
> > periodic.
> > 
> >          // this is a forced synchronization point, since stop() will
> >          only
> > 
> > return when
> > 
> >          // step() returned.
> >          if ( getActivity()&&  this->getActivity()->stop() ) {
> >          
> >              this->getActivity()->start();
> >              return true;
> >          
> >          }
> >          return false;
> >      
> >      }
> > 
> >

> >
> > The stopt/start cycle is done as a synchronisation point and prevents
> > that updateHook() is called after stopHook() (due to scheduling and a
> > race condition). We *need* something like this, so I would follow your
> > suggestion to return true by default in breakUpdateHook(), such that
> > this stop() succeeds with non-periodic activities.
> >
> > This also influences custom written activities, so we better document
> > this in the stop() function of TaskCore.
> >
> > Does this stop/start behavior break FDActivity ?
>
> No. It just looked weird when I debugged the problems I had with
> FDActivity. FDActivity is designed so that you can start/stop it at
> will, so that's no problem there.