[Bug 721] New: Destroying running function seg-faults ProgramProcessor later

Submitted by snrkiwi on Mon, 2009-11-02 10:28

RTT-dev

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=721

Summary: Destroying running function seg-faults
ProgramProcessor later
Product: RTT
Version: rtt-trunk
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P3
Component: Scripting
AssignedTo: orocos-dev [..] ...
ReportedBy: kiwi [dot] net [..] ...
CC: orocos-dev [..] ...
Estimated Hours: 0.0

We might be doing something bad here ... :-)

We are loading state machines (SM) into a component, and the SMs use the
OCL::Timer component to wait periods of time (several seconds or longer)
between actions. We wrap the call to Timer.wait() inside a program script
function that acommodates for the lost periods consumed making the wait
actually happen (ie if you say "wait(10.0)" and the period of the SM is 0.1,
then you actually wait 10.1 by the time the calling program script resumes).

The problem comes when we stop an SM during a wait, and then unload it. We get

[CRITICAL][ParserScriptingAccess::loadStateMachine] Destroying Function running
in ProgramProcessor !

Everything keeps running until the deployer seg-faults when we try to run a
newly loaded SM. We are forcibly killing the wait timer when the state exits,
but that doesn't seem to help.

While I think we are doing something bad here, I don't think that the program
processor should seg-fault later on. We started out with the wait wrapper as a
C++ function, but it is too much code just to wrap a command call to another
peer (and it wasn't working right). We need the ability to stop a running
script that may be waiting, and then unload and reload a new one.

Demonstrated on Mac OS X SL and Ubuntu Hardy.

Backtrace from SL attached.

[Bug 721] Destroying running function seg-faults ProgramProcesso

Submitted by peter on Thu, 2009-11-05 13:48.

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=721

Peter Soetens <peter [..] ...> changed:

--- Comment #7 from Peter Soetens <peter [..] ...> 2009-11-05 14:47:20 ---
I fixed this bug on git/svn trunks, but could not back-port it to the rtt-1.10
line. If you want this fix, use trunk/rtt (svn) or rtt-1.0-svn-patches on git

[Bug 721] Destroying running function seg-faults ProgramProcesso

Submitted by snrkiwi on Thu, 2009-11-05 13:52.

On Nov 5, 2009, at 08:47 , Peter Soetens wrote:

> https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=721
>
>
> Peter Soetens <peter [..] ...> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> Resolution| |FIXED
> Status|ASSIGNED |RESOLVED
>
>
>
>
> --- Comment #7 from Peter Soetens <peter [..] ...>
> 2009-11-05 14:47:20 ---
> I fixed this bug on git/svn trunks, but could not back-port it to
> the rtt-1.10
> line. If you want this fix, use trunk/rtt (svn) or rtt-1.0-svn-
> patches on git

We have only been using git-trunk v1.x for some time now. How else can
we keep up with all your new goodies otherwise ... !? :-)

Thanks!
S

[Bug 721] Destroying running function seg-faults ProgramProcesso

Submitted by peter on Mon, 2009-11-02 13:32.

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=721

--- Comment #6 from Peter Soetens <peter [..] ...> 2009-11-02 13:57:37 ---
Created an attachment (id=542)
--> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=542)
Drops useless triggering and clear PP from function

The previous patch was wrong. The trigger is useless, since our function is
already cleared. What needs to be done is to remove the ProgramProcessor from
the function, such that the function's owner can see it's no longer loaded.

Replaces previous patch.

Peter

[Bug 721] Destroying running function seg-faults ProgramProcesso

Submitted by peter on Mon, 2009-11-02 11:27.

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=721

--- Comment #5 from Peter Soetens <peter [..] ...> 2009-11-02 12:20:28 ---
Created an attachment (id=541)
--> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=541)
In addition to previous patch, fixes segfault during cleanup

During cleanup, the activity is already removed, so removeFunction should only
trigger if an activity is present.

Patch for RTT 1.10 in addition to previous patch.

Peter

[Bug 721] Destroying running function seg-faults ProgramProcesso

Submitted by snrkiwi on Mon, 2009-11-02 10:28.

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=721

--- Comment #4 from S Roderick <kiwi [dot] net [..] ...> 2009-10-30 19:06:29 ---
Created an attachment (id=539)
--> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=539)
Backtrace of deployer exit crash after patch applied.

Our original segfault is gone, but we now get this invalid memory access on
deployer exit in certain situations.

We start the system, load a state machine, run it and then while a Timer.wait()
command is active we forcibly stop the state machine. If we quit then, we are
fine. If we load a new SM and then quit, we are fine. If we load a new SM, run
it, let it complete and then quit, we are fine. But if we load a new SM, run
it, forcibly stop it and then quit, we get this new fault.

[Bug 721] Destroying running function seg-faults ProgramProcesso

Submitted by peter on Mon, 2009-11-02 10:28.

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=721

--- Comment #3 from Peter Soetens <peter [..] ...> 2009-10-29 21:59:02 ---
Created an attachment (id=535)
--> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=535)
Fixes this bug

It took way longer than I expected to fix this (and my primary hard drive
crashed *again* on me this morning). Since it's a synchronisation problem, I
had to add a mutex during function execution to resolve it. It should now work
in any case with any number of threads.

We may need a more generic way or resolving this case in 2.0.

Peter

[Bug 721] Destroying running function seg-faults ProgramProcesso

Submitted by peter on Mon, 2009-11-02 10:28.

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=721

Peter Soetens <peter [..] ...> changed:

--- Comment #2 from Peter Soetens <peter [..] ...> 2009-10-28 14:37:26 ---
(In reply to comment #0)
> We might be doing something bad here ... :-)
>
> We are loading state machines (SM) into a component, and the SMs use the
> OCL::Timer component to wait periods of time (several seconds or longer) between
> actions. We wrap the call to Timer.wait() inside a program script function that
> acommodates for the lost periods consumed making the wait actually happen (ie if
> you say "wait(10.0)" and the period of the SM is 0.1, then you actually wait
> 10.1 by the time the calling program script resumes).
>
> The problem comes when we stop an SM during a wait, and then unload it. We get
>
> [CRITICAL][ParserScriptingAccess::loadStateMachine] Destroying Function running
> in ProgramProcessor !

Coincidentally we ran into this today in Leuven as well, with a function not
returning timely. The error comes from the destructor of the command sending in
the function, detecting that trouble will happen.

I'll fix this cleanup code. I'm not sure yet if it will be officially part of
1.10 though.

Peter

[Bug 721] Destroying running function seg-faults ProgramProcesso

Submitted by snrkiwi on Mon, 2009-11-02 10:28.

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=721

--- Comment #1 from S Roderick <kiwi [dot] net [..] ...> 2009-10-28 13:25:24 ---
Created an attachment (id=532)
--> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=532)
Backtrace from SL

Ignore the warning on source file being newer. Just me checking out different
git branches ...