[Bug 766] New: Segfault on quit when stopping running function in program processor

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=766

Summary: Segfault on quit when stopping running function in
program processor
Product: RTT
Version: rtt-trunk
Platform: Intel 64bit
OS/Version: Mac OS X
Status: NEW
Severity: critical
Priority: P3
Component: Scripting
AssignedTo: orocos-dev [..] ...
ReportedBy: kiwi [dot] net [..] ...
CC: orocos-dev [..] ...
Estimated Hours: 0.0

Looks a lot like one we fixed a while back ... we start our system with
deployer-corba, immediately quit, and voila! Segfault! :-(

Mac OS X 10.6.3, RTT v1

backtrace attached
Stephen

[Bug 766] Segfault on quit when stopping running function in pro

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=766

Peter Soetens <peter [..] ...> changed:

What |Removed |Added
----------------------------------------------------------------------------
Resolution| |FIXED
Status|ASSIGNED |RESOLVED

--- Comment #9 from Peter Soetens <peter [..] ...> 2010-10-12 15:33:20 ---
Bug was fixed in 1.12.0

[Bug 766] Segfault on quit when stopping running function in pro

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=766

--- Comment #8 from Peter Soetens <peter [..] ...> 2010-06-17 16:39:17 ---
Created an attachment (id=605)
--> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=605)
Attempts to fix this bug

If the PP::step() function did not run yet, the function is left in the f_queue
and needs to be removed from there.

[Bug 766] Segfault on quit when stopping running function in pro

On Jun 17, 2010, at 10:39 , Peter Soetens wrote:

> https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=766
>
>
>
>
>
> --- Comment #8 from Peter Soetens <peter [..] ...> 2010-06-17 16:39:17 ---
> Created an attachment (id=605)
> --> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=605)
> Attempts to fix this bug
>
> If the PP::step() function did not run yet, the function is left in the f_queue
> and needs to be removed from there.

No apparent change in behaviour. :-(

Backtrace same as before, and I'm still getting

5.742 [ Warning][~ExecutionEngine] Stopping Function running in ProgramProcessor !
5.742 [ Warning][~ExecutionEngine] Stopping Function running in ProgramProcessor !

I'm certain I rebuilt both RTT and our app.
Stephen

[Bug 766] Segfault on quit when stopping running function in pro

On Friday 18 June 2010 13:48:42 S Roderick wrote:
> On Jun 17, 2010, at 10:39 , Peter Soetens wrote:
> > https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=766
> >
> >
> >
> >
> >
> > --- Comment #8 from Peter Soetens <peter [..] ...> 2010-06-17
> > 16:39:17 --- Created an attachment (id=605)
> > --> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=605)
> > Attempts to fix this bug
> >
> > If the PP::step() function did not run yet, the function is left in the
> > f_queue and needs to be removed from there.
>
> No apparent change in behaviour. :-(
>
> Backtrace same as before, and I'm still getting
>
> 5.742 [ Warning][~ExecutionEngine] Stopping Function running in
> ProgramProcessor ! 5.742 [ Warning][~ExecutionEngine] Stopping Function
> running in ProgramProcessor !
>
> I'm certain I rebuilt both RTT and our app.

It's a .cpp only change, so only rebuilding RTT was necessary. Since your
report indicates that the bug appears when removing a function from the
f_queue, this looked like the correct fix. Even more, this fix must be in place
to solve that corner case anyway. I wonder what could be missing now. Does it
segfault on the same place ?

The strange part is that the case I fixed is when runFunction() was called, but
step() was not yet executed. This can only happen in very slow periodic
activities, since a non periodic one is triggered immediately. Which kind of
activity is used in your case ? How can it that step() did not remove the
function pointer from the f_queue ?

I'll have to come up with a test case to reproduce your case.

Peter

[Bug 766] Segfault on quit when stopping running function in pro

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=766

--- Comment #7 from Peter Soetens <peter [..] ...> 2010-06-17 16:30:36 ---
(In reply to comment #6)
> (In reply to comment #5)
> > Created an attachment (id=600)
--> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=600)
> > Valgrind log from Ubuntu Lynx
> >
> > Pretty sparse log ...
>
> I want to fix this asap. Everything looks right at first sight. removeFunction
> is called prior to deletion of the function, so it shouldn't be anymore in the
> PP's function list. We don't check the ret value of removeFunction in
> ~CommandExecFunction(), we could log an error if it fails, like in:
> &#10;&gt; CommandExecFunction::~CommandExecFunction() {&#10;&gt; if ( _foo-&gt;isRunning() ) {&#10;&gt; log(Warning) &lt;&lt; &quot;Stopping Function running in ProgramProcessor !&quot; &lt;&lt; endlog();&#10;&gt; }&#10;&gt; if ( _foo-&gt;getProgramProcessor() != 0 ) // ie if _foo-&gt;isLoaded().&#10;&gt; if ( _proc-&gt;removeFunction( _foo.get() ) == false ) {&#10;&gt; log(Error) &lt;&lt; &quot;Failed to remove running function from ProgramProcessor !&quot;&#10;&gt; &lt;&lt; endlog();&#10;&gt; }&#10;&gt; }&#10;&gt;
>
> And see what it reveals. I'll need to setup a testcase to fix it...

The problem is that PP::removeFunction does not remove functions if they are
still in the PP's queue (because step() did not yet execute!). removeFunction
must also remove functions from its f_queue...

Peter

[Bug 766] Segfault on quit when stopping running function in pro

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=766

Peter Soetens <peter [..] ...> changed:

What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |1.10.4
Status|NEW |ASSIGNED

--- Comment #6 from Peter Soetens <peter [..] ...> 2010-06-17 16:24:03 ---
(In reply to comment #5)
> Created an attachment (id=600)
--> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=600)
> Valgrind log from Ubuntu Lynx
>
> Pretty sparse log ...

I want to fix this asap. Everything looks right at first sight. removeFunction
is called prior to deletion of the function, so it shouldn't be anymore in the
PP's function list. We don't check the ret value of removeFunction in
~CommandExecFunction(), we could log an error if it fails, like in:
&#10;CommandExecFunction::~CommandExecFunction() {&#10;    if ( _foo-&gt;isRunning() ) {&#10;        log(Warning) &lt;&lt; &quot;Stopping Function running in ProgramProcessor !&quot; &lt;&lt;&#10;endlog();&#10;    }&#10;    if ( _foo-&gt;getProgramProcessor() != 0 ) // ie if _foo-&gt;isLoaded().&#10;        if ( _proc-&gt;removeFunction( _foo.get() ) == false ) {&#10;            log(Error) &lt;&lt; &quot;Failed to remove running function from&#10;ProgramProcessor !&quot; &lt;&lt; endlog();&#10;        }&#10;}&#10;

And see what it reveals. I'll need to setup a testcase to fix it...

Peter

[Bug 766] Segfault on quit when stopping running function in pro

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=766

--- Comment #5 from S Roderick <kiwi [dot] net [..] ...> 2010-05-21 15:26:57 ---
Created an attachment (id=600)
--> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=600)
Valgrind log from Ubuntu Lynx

Pretty sparse log ...

[Bug 766] Segfault on quit when stopping running function in pro

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=766

Peter Soetens <peter [..] ...> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |peter [..] ...

--- Comment #4 from Peter Soetens <peter [..] ...> 2010-05-19 10:22:45 ---
(In reply to comment #3)
> Created an attachment (id=598)
--> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=598) [details]
> Brief debugging attempt
>
> Looks like 'foo' was valid, but I would guess that it has been destroyed
> somewhere else already?
>
> This component is running a state machine that uses functions from a program
> script. I have actually emptied the state machine's idle state - it is no
> longer calling a function. It still segfaults though.
>
> Beats me ...

Any chance of reproducing this in a valgrind session ?

It's an ownership problem between the FunctionGraph object and the program
script. The program was probably first deleted, which deleted the function
graph too without removing it from the PP ?

Peter

[Bug 766] Segfault on quit when stopping running function in pro

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=766

--- Comment #3 from S Roderick <kiwi [dot] net [..] ...> 2010-05-18 15:10:33 ---
Created an attachment (id=598)
--> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=598)
Brief debugging attempt

Looks like 'foo' was valid, but I would guess that it has been destroyed
somewhere else already?

This component is running a state machine that uses functions from a program
script. I have actually emptied the state machine's idle state - it is no
longer calling a function. It still segfaults though.

Beats me ...

[Bug 766] Segfault on quit when stopping running function in pro

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=766

--- Comment #2 from S Roderick <kiwi [dot] net [..] ...> 2010-05-18 14:48:06 ---
Created an attachment (id=597)
--> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=597)
Log entries just prior to segfault

[Bug 766] Segfault on quit when stopping running function in pro

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=766

--- Comment #1 from S Roderick <kiwi [dot] net [..] ...> 2010-05-18 14:36:35 ---
Created an attachment (id=596)
--> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=596)
Backtrace