Synchronization issue in stopHook

I have a hard time trying to implement a managed stop of my orocos task.

To explain the issue, let me first try give you an overview of what I
am trying to do.

I have a non periodic task that processes commands when trigger() is
called on it.
There are a regular C++ method in this task to submit commands. I want
to let the user be blocked when they call command() until it has been
processes by the updateHook.

I am using orocos semaphores. In command() there is a wait() and in
the updateHook there is a signal().

It works like a charm except when I stop the task. Of course, in the
stopHook there is also a signal() to unlock pending call. But at that
point if stop() is not completed, when a command() arrives, it stays
there in the wait() dead locked.

The solution in this case would be not to go in the command wait() if
the stop is started.

Solution:

But there is a race condition because if I read the code corectly, the
active state in orocos is changed only at the end for the stop
sequence, after the stopHook is called.

I believe this is wrong and it should be set before the user code is
called or it's impossible for the user to synchronize the stop
sequence of his task.

does the set at the end is there on purpose, could it be moved at the start ?

Bruno.

Synchronization issue in stopHook

> I have a hard time trying to implement a managed stop of my orocos task.
>
> To explain the issue, let me first try give you an overview of what I
> am trying to do.
>
> I have a non periodic task that processes commands when trigger() is
> called on it.
> There are a regular C++ method in this task to submit commands. I want
> to let the user be blocked when they call command() until it has been
> processes by the updateHook.
>
> I am using orocos semaphores. In command() there is a wait() and in
> the updateHook there is a signal().
>
> It works like a charm except when I stop the task. Of course, in the
> stopHook there is also a signal() to unlock pending call. But at that
> point if stop() is not completed, when a command() arrives, it stays
> there in the wait() dead locked.

Aren't the Orocos semaphores counting the signal() calls ? Which OS are
you using ?

>
> The solution in this case would be not to go in the command wait() if
> the stop is started.
>
> Solution:
>
> But there is a race condition because if I read the code corectly, the
> active state in orocos is changed only at the end for the stop
> sequence, after the stopHook is called.
>
> I believe this is wrong and it should be set before the user code is
> called or it's impossible for the user to synchronize the stop
> sequence of his task.

If I understand correctly, you would still have the same race you're
having now. If you have if (!stopped) { sem.wait(); }, the stopped flag
can be set to true just before calling wait... But then again, this
wouldn't be an issue with counting semaphores...

>
> does the set at the end is there on purpose, could it be moved at the
> start ?

Since a stop() should never fail (unless not active), changing the state
early on has only effect on code which would read the state in your
stopHook() (and any concurrent code at that time). So changing it is an
option, but I wonder if it fixes your problem. I believe the unit test
checks for this.

Just to understand better: do you want the caller to block until the
completion condition becomes true, until the command function has executed
or until update hook has executed once after the command ?

Peter

>
> Bruno.
> --
> Orocos-Dev mailing list
> Orocos-Dev [..] ...
> http://lists.mech.kuleuven.be/mailman/listinfo/orocos-dev
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
>

Synchronization issue in stopHook

Le lundi 22 décembre 2008 à 09:52 +0100, Peter Soetens a écrit :
> > I have a hard time trying to implement a managed stop of my orocos task.
> >
> > To explain the issue, let me first try give you an overview of what I
> > am trying to do.
> >
> > I have a non periodic task that processes commands when trigger() is
> > called on it.
> > There are a regular C++ method in this task to submit commands. I want
> > to let the user be blocked when they call command() until it has been
> > processes by the updateHook.
> >
> > I am using orocos semaphores. In command() there is a wait() and in
> > the updateHook there is a signal().
> >
> > It works like a charm except when I stop the task. Of course, in the
> > stopHook there is also a signal() to unlock pending call. But at that
> > point if stop() is not completed, when a command() arrives, it stays
> > there in the wait() dead locked.
>
> Aren't the Orocos semaphores counting the signal() calls ? Which OS are
> you using ?

GNU/Linux. Yes the semaphores works fine.

> >
> > The solution in this case would be not to go in the command wait() if
> > the stop is started.
> >
> > Solution:
> >
> > But there is a race condition because if I read the code corectly, the
> > active state in orocos is changed only at the end for the stop
> > sequence, after the stopHook is called.
> >
> > I believe this is wrong and it should be set before the user code is
> > called or it's impossible for the user to synchronize the stop
> > sequence of his task.
>
> If I understand correctly, you would still have the same race you're
> having now. If you have if (!stopped) { sem.wait(); }, the stopped flag
> can be set to true just before calling wait... But then again, this
> wouldn't be an issue with counting semaphores...

Yes, it can happens and it already works fine in this case, the stop
will signal() and the wait() will be unlocked.

> >
> > does the set at the end is there on purpose, could it be moved at the
> > start ?
>
> Since a stop() should never fail (unless not active), changing the state
> early on has only effect on code which would read the state in your
> stopHook() (and any concurrent code at that time). So changing it is an
> option, but I wonder if it fixes your problem. I believe the unit test
> checks for this.

Well, sorry, my explanations has not been good enough to let you
understand the issue let's go pseudo code (python style):

updateHook:
mutex lock
process command
signal

stopHook:
mutex lock
process command
signal
(orocos task stopped)

command: (from other thread)
return if not active
mutex lock
add command
trigger
if active
wait

The issue is that the stopHook as been completed but not yet marked as
such so I still can get in the wait and nobody is there anymore to
signal me.

If the task was marked stopped before calling the stopHook I could still
'add command' and wait if a stop is on its way but will wait only in
case the stopHook has or will been processed which is fine.

> Just to understand better: do you want the caller to block until the
> completion condition becomes true, until the command function has executed
> or until update hook has executed once after the command ?
>

I expect the command() call to return true when the command as been
processed (by the updateHook or the stopHook) or to return false if the
command was not processed because the task is stopped. I don't use the
orocos command interface so don't have a completion condition.

Bruno.

Synchronization issue in stopHook

On Monday 22 December 2008 13:57:50 Bruno Coudoin wrote:
>
> Well, sorry, my explanations has not been good enough to let you
> understand the issue let's go pseudo code (python style):
>
> updateHook:
> mutex lock
> process command
> signal
>
> stopHook:
> mutex lock
> process command
> signal
> (orocos task stopped)
>
> command: (from other thread)
> return if not active
> mutex lock
> add command
> trigger
> if active
> wait
>
> The issue is that the stopHook as been completed but not yet marked as
> such so I still can get in the wait and nobody is there anymore to
> signal me.

In an ideal world, Orocos commands would have supported such blocking
operations... but it isn't so you started plumbing it yourself :-)

The easiest fix for you is to set your own 'state' flag yourself in stopHook.

Setting the TaskContext state to stopped early solves this as well, but we'd
need to do the same change for cleanup() (when going from stopped to
unconfigured).

So the new semantics would then be:
* When going to a 'higher state' (indicated by a -Hook function that returns
a bool): the state is changed after the -Hook function returns
* When going to a 'lower state' (indicated by a -Hook function that returns
void): the state is changed before the -Hook function is called

I'd still have to see how the unit tests respond to this though...

>
> I expect the command() call to return true when the command as been
> processed (by the updateHook or the stopHook) or to return false if the
> command was not processed because the task is stopped. I don't use the
> orocos command interface so don't have a completion condition.

The mechanism you're setting up is quite familiar to what commands (and the
CommandProcessor) do. The reason they don't work that way is that we would
have needed at least a semaphore for each command allocated. This seemed
somewhat resource heavy so it never came to an implementation.
A similar concept has been proposed for the Boost C++ library under the
name 'futures'. You send a thread out to do something and then wait and
collect the result.

Peter

Trigger return value

Le lundi 22 décembre 2008 à 16:55 +0100, Peter Soetens a écrit :
> On Monday 22 December 2008 13:57:50 Bruno Coudoin wrote:
> >
> > command: (from other thread)
> > return if not active
> > mutex lock
> > add command
> > trigger
> > if active
> > wait
> >

This is somewhat less important but I don't like the trigger() api. If
you look at my pseudo code, I would have prefered it to be:

if trigger
wait

The problem is that trigger() can return false in 2 very different
cases:
- the task is stopped and won't process your command
- the task as already been triggered but did not started it yet

Won't it be more logical to have triger() return true in the later case.
Does the caller really care if the trig was already on its way.

Trigger return value

> Le lundi 22 décembre 2008 à 16:55 +0100, Peter Soetens a écrit :
>> On Monday 22 December 2008 13:57:50 Bruno Coudoin wrote:
>> >
>> > command: (from other thread)
>> > return if not active
>> > mutex lock
>> > add command
>> > trigger
>> > if active
>> > wait
>> >
>
> This is somewhat less important but I don't like the trigger() api. If
> you look at my pseudo code, I would have prefered it to be:
>
> if trigger
> wait
>
> The problem is that trigger() can return false in 2 very different
> cases:
> - the task is stopped and won't process your command
> - the task as already been triggered but did not started it yet
>
> Won't it be more logical to have triger() return true in the later case.
> Does the caller really care if the trig was already on its way.

You're right. As earlier discussed on this list, trigger() shall guarantee
that the task is executed at least once after it returns, and shall return
true in that case. It shall only return false if the task won't run
(because it isn't started for example).

Is this in NonPeriodicActivity ?

Peter

>
>
> --
> Orocos-Dev mailing list
> Orocos-Dev [..] ...
> http://lists.mech.kuleuven.be/mailman/listinfo/orocos-dev
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
>

Trigger return value

Le mardi 23 décembre 2008 à 13:40 +0100, Peter Soetens a écrit :
> > Le lundi 22 décembre 2008 à 16:55 +0100, Peter Soetens a écrit :
> >> On Monday 22 December 2008 13:57:50 Bruno Coudoin wrote:
> >> >
> >> > command: (from other thread)
> >> > return if not active
> >> > mutex lock
> >> > add command
> >> > trigger
> >> > if active
> >> > wait
> >> >
> >
> > This is somewhat less important but I don't like the trigger() api. If
> > you look at my pseudo code, I would have prefered it to be:
> >
> > if trigger
> > wait
> >
> > The problem is that trigger() can return false in 2 very different
> > cases:
> > - the task is stopped and won't process your command
> > - the task as already been triggered but did not started it yet
> >
> > Won't it be more logical to have triger() return true in the later case.
> > Does the caller really care if the trig was already on its way.
>
> You're right. As earlier discussed on this list, trigger() shall guarantee
> that the task is executed at least once after it returns, and shall return
> true in that case. It shall only return false if the task won't run
> (because it isn't started for example).
>
> Is this in NonPeriodicActivity ?

Sorry, I missed this talk on the list. Yes actually it's a
NonPeriodicActivity. Then you have decided to change it. Do you already
know in which release it will end up ?

Bruno.

Synchronization issue in stopHook

Le lundi 22 décembre 2008 à 16:55 +0100, Peter Soetens a écrit :
> On Monday 22 December 2008 13:57:50 Bruno Coudoin wrote:
> In an ideal world, Orocos commands would have supported such blocking
> operations... but it isn't so you started plumbing it yourself :-)

Sorry for being so counter nature to orocos ;) I probably could have
used commands as well.

> The easiest fix for you is to set your own 'state' flag yourself in stopHook.

Sure, this is not cute but that will work.

> Setting the TaskContext state to stopped early solves this as well, but we'd
> need to do the same change for cleanup() (when going from stopped to
> unconfigured).

Yes.

> So the new semantics would then be:
> * When going to a 'higher state' (indicated by a -Hook function that returns
> a bool): the state is changed after the -Hook function returns
> * When going to a 'lower state' (indicated by a -Hook function that returns
> void): the state is changed before the -Hook function is called

Yes, this is also what sounds logical to me.

> I'd still have to see how the unit tests respond to this though...

It should be better than before but sure it needs testing.

> >
> > I expect the command() call to return true when the command as been
> > processed (by the updateHook or the stopHook) or to return false if the
> > command was not processed because the task is stopped. I don't use the
> > orocos command interface so don't have a completion condition.
>
> The mechanism you're setting up is quite familiar to what commands (and the
> CommandProcessor) do. The reason they don't work that way is that we would
> have needed at least a semaphore for each command allocated. This seemed
> somewhat resource heavy so it never came to an implementation.

Yes, I create a semaphore per command. I need to know on a per command
basis that mine as been processed and an updateHook can process several
command at once.

Bruno.