[Bug 704] New: Error unknown scheduler type produced intermittently

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=704

Summary: Error unknown scheduler type produced intermittently
Product: RTT
Version: rtt-trunk
Platform: i386 Compatible
OS/Version: Mac OS X
Status: NEW
Severity: minor
Priority: P3
Component: Real-Time Toolkit (RTT)
AssignedTo: orocos-dev [..] ...
ReportedBy: kiwi [dot] net [..] ...
CC: orocos-dev [..] ...
Estimated Hours: 0.0

Created an attachment (id=499)
--> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=499)
gbd output of RTT log and backtrace

We've noticed this for many weeks now, and finally got to looking at it.
Intermittently, we get a logged error message
{{{
2.203 [ ERROR ][PeriodicThread::start] Unknown scheduler type: -1
}}}
Note that the printed scheduler value is our own mod to the code. It only
occurs on macosx - we have never seen it on any of our other systems. This does
not stop the system running and we've seen no untoward effects due to it, but
we would like to understand what is causing it.

While occuring frequently, this is not limited to any one deployment file, and
is not necessarily repeatable in subsequent runs. Due to this, I think it might
be a race condition ...

With some additional log statements in RTT, we get the attached dump from gdb.
It appears that ''msched_type'' in PeriodiciThread is getting corrupted, but we
can't figure out how. NB we added an assert() into rtos_task_check_scheduler()
to trigger the gdb backtrace.

Stephen

[Bug 704] Error unknown scheduler type produced intermittently

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=704

Peter Soetens <peter [..] ...> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |peter [..] ...

--- Comment #1 from Peter Soetens <peter [..] ...> 2009-10-02 16:22:14 ---
(In reply to comment #0)
> Created an attachment (id=499)
--> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=499) [details]
> gbd output of RTT log and backtrace
>
> We've noticed this for many weeks now, and finally got to looking at it.
> Intermittently, we get a logged error message
> {{{
> 2.203 [ ERROR ][PeriodicThread::start] Unknown scheduler type: -1
> }}}
> Note that the printed scheduler value is our own mod to the code. It only
> occurs on macosx - we have never seen it on any of our other systems. This does
> not stop the system running and we've seen no untoward effects due to it, but
> we would like to understand what is causing it.
>
> While occuring frequently, this is not limited to any one deployment file, and
> is not necessarily repeatable in subsequent runs. Due to this, I think it might
> be a race condition ...
>
> With some additional log statements in RTT, we get the attached dump from gdb.
> It appears that ''msched_type'' in PeriodiciThread is getting corrupted, but we
> can't figure out how. NB we added an assert() into rtos_task_check_scheduler()
> to trigger the gdb backtrace.

The -1 is most probably caused by the thread not being created *yet*, while the
get scheduler function already asks for which scheduler it is created.
Normally, the constructor of PeriodicThread waits until the thread function
signals it is running, so I can't match this directly.

TimerThread is using the scheduler value to see in which thread to put a
PeriodicActivity.

In gnulinux I saw a similar problem in case the constructor did not wait for
the thread to be created. I couldn't immediately pin down the problem in your
case.

Peter

[Bug 704] Error unknown scheduler type produced intermittently

On Oct 2, 2009, at 10:22 , Peter Soetens wrote:

> https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=704
>
>
> Peter Soetens <peter [..] ...> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> CC| |peter [..] ...
>
>
> --- Comment #1 from Peter Soetens <peter [..] ...>
> 2009-10-02 16:22:14 ---
> (In reply to comment #0)
>> Created an attachment (id=499)
> --> (https://www.fmtc.be/bugzilla/orocos/attachment.cgi?id=499)
> [details]
>> gbd output of RTT log and backtrace
>>
>> We've noticed this for many weeks now, and finally got to looking
>> at it.
>> Intermittently, we get a logged error message
>> {{{
>> 2.203 [ ERROR ][PeriodicThread::start] Unknown scheduler type: -1
>> }}}
>> Note that the printed scheduler value is our own mod to the code.
>> It only
>> occurs on macosx - we have never seen it on any of our other
>> systems. This does
>> not stop the system running and we've seen no untoward effects due
>> to it, but
>> we would like to understand what is causing it.
>>
>> While occuring frequently, this is not limited to any one
>> deployment file, and
>> is not necessarily repeatable in subsequent runs. Due to this, I
>> think it might
>> be a race condition ...
>>
>> With some additional log statements in RTT, we get the attached
>> dump from gdb.
>> It appears that ''msched_type'' in PeriodiciThread is getting
>> corrupted, but we
>> can't figure out how. NB we added an assert() into
>> rtos_task_check_scheduler()
>> to trigger the gdb backtrace.
>
> The -1 is most probably caused by the thread not being created
> *yet*, while the
> get scheduler function already asks for which scheduler it is created.
> Normally, the constructor of PeriodicThread waits until the thread
> function
> signals it is running, so I can't match this directly.
>
> TimerThread is using the scheduler value to see in which thread to
> put a
> PeriodicActivity.
>
> In gnulinux I saw a similar problem in case the constructor did not
> wait for
> the thread to be created. I couldn't immediately pin down the
> problem in your
> case.

I see it on deployer exit too, some times. Would that fit with your
concept of what is happening?
Stephen