Nonperiodicactivities and lengthy calculations

Hi,

I have implemented some calculations in different components,
which communicate with each other through bufferports and which are executed
through nonperiodicactivities.

The calculations are implemented in the updateHook of the components (and
reactivated by triggering)

For some reason, my application behaves unexpectedly/crashes.
If I activate the reporting, I can see that the (reported) sampling time is
sometimes as high as 0.8 s (instead of 0.002 s)

Is it necessary that the calculations in updateHook of the nonperiodic
components take less time than the sampling time? (If so, why?...)
(I am quite sure that the updatehook of the periodic components do not exceed
the sampling time)

How come that the calculations in nonperiodic components (with lower priority)
can affect the performance of the periodic components (with highest
priority)?

The only other reason for the unexpected behavior I can think of, is that the
size of my bufferports is too large and that the sampling time cannot be met
due to the overhead of copying the large bufferports.

Any ideas? Also, how do I "debug" this properly.

Thanks!
Best regards,

Diederik

Nonperiodicactivities and lengthy calculations

On Tue, Aug 5, 2008 at 3:19 PM, Diederik Verscheure
<diederik [dot] verscheure [..] ...> wrote:
> Hi,
>
> I have implemented some calculations in different components,
> which communicate with each other through bufferports and which are executed
> through nonperiodicactivities.
>
> The calculations are implemented in the updateHook of the components (and
> reactivated by triggering)
>
> For some reason, my application behaves unexpectedly/crashes.
> If I activate the reporting, I can see that the (reported) sampling time is
> sometimes as high as 0.8 s (instead of 0.002 s)

Do you mean that you have a reporter component connected to a
periodicactivity which is supposed to run every 2ms, but sometimes
runs only every 0.8s.

> Is it necessary that the calculations in updateHook of the nonperiodic
> components take less time than the sampling time? (If so, why?...)
> (I am quite sure that the updatehook of the periodic components do not exceed
> the sampling time)

I think the answer depends on the priorities given to those
activities. Let's say that you have a nonperiodic activity that takes
0.8s to complete and a periodic activity which should run every 2ms.
If your nonperiodicactivity has a higher priority than your periodic
activity, obviously you will run into problems. If not, you shouldn't
have any problems (unless there's a bug somewhere)

> How come that the calculations in nonperiodic components (with lower priority)
> can affect the performance of the periodic components (with highest
> priority)?

hmm, I should be reading further before I start answering :-)

> The only other reason for the unexpected behavior I can think of, is that the
> size of my bufferports is too large and that the sampling time cannot be met
> due to the overhead of copying the large bufferports.
>
> Any ideas? Also, how do I "debug" this properly.

Peter would say: Use valgrind on your executable, I would say: what's
the backtrace you get on the moment of the crash?

If that doesn't help, try creating an application which is *as small
as possible* (2 components, 1 Periodic High Prio, 1 NonPeriodic Low
Prio doing lengthy calculations) and see if you can reproduce the
scenario (e.g. first without ports, then adding bufferports
afterwards)

HTH,

Klaas

Nonperiodicactivities and lengthy calculations

Hi,

On Tuesday 05 August 2008 17:18:45 Klaas Gadeyne wrote:
> On Tue, Aug 5, 2008 at 3:19 PM, Diederik Verscheure
>
> <diederik [dot] verscheure [..] ...> wrote:
> > Hi,
> >
> > I have implemented some calculations in different components,
> > which communicate with each other through bufferports and which are
> > executed through nonperiodicactivities.
> >
> > The calculations are implemented in the updateHook of the components (and
> > reactivated by triggering)
> >
> > For some reason, my application behaves unexpectedly/crashes.
> > If I activate the reporting, I can see that the (reported) sampling time
> > is sometimes as high as 0.8 s (instead of 0.002 s)
>
> Do you mean that you have a reporter component connected to a
> periodicactivity which is supposed to run every 2ms, but sometimes
> runs only every 0.8s.

Yes :).

>
> > Is it necessary that the calculations in updateHook of the nonperiodic
> > components take less time than the sampling time? (If so, why?...)
> > (I am quite sure that the updatehook of the periodic components do not
> > exceed the sampling time)
>
> I think the answer depends on the priorities given to those
> activities. Let's say that you have a nonperiodic activity that takes
> 0.8s to complete and a periodic activity which should run every 2ms.
> If your nonperiodicactivity has a higher priority than your periodic
> activity, obviously you will run into problems. If not, you shouldn't
> have any problems (unless there's a bug somewhere)
>

The priority of the periodic activities is higher than that of the nonperiodic
activities (and I tried lowering them even more... to no avail).

> > How come that the calculations in nonperiodic components (with lower
> > priority) can affect the performance of the periodic components (with
> > highest priority)?
>
> hmm, I should be reading further before I start answering :-)
>
> > The only other reason for the unexpected behavior I can think of, is that
> > the size of my bufferports is too large and that the sampling time cannot
> > be met due to the overhead of copying the large bufferports.
> >
> > Any ideas? Also, how do I "debug" this properly.
>
> Peter would say: Use valgrind on your executable, I would say: what's
> the backtrace you get on the moment of the crash?
>

Well, it's usually OS that crashes and not the application itself

> If that doesn't help, try creating an application which is *as small
> as possible* (2 components, 1 Periodic High Prio, 1 NonPeriodic Low
> Prio doing lengthy calculations) and see if you can reproduce the
> scenario (e.g. first without ports, then adding bufferports
> afterwards)
>
> HTH,

Thanks! I will try that.
Best regards,
Diederik

>
> Klaas

Nonperiodicactivities and lengthy calculations

On Wed, Aug 6, 2008 at 9:45 AM, Diederik Verscheure
<diederik [dot] verscheure [..] ...> wrote:
[...]
>> > Any ideas? Also, how do I "debug" this properly.
>>
>> Peter would say: Use valgrind on your executable, I would say: what's
>> the backtrace you get on the moment of the crash?
>>
>
> Well, it's usually OS that crashes and not the application itself

[some further questions which might aid debugging]

Which (RT)OS are you using? Can you reproduce the crashes on the
gnulinux port (do you run them as root or as a normal user)? Could it
be a case of "starvation" (i.e. even your lowest priority orocos
thread has a higher priority than your "standard" linux applications)?

Klaas

Nonperiodicactivities and lengthy calculations

Hi,

On Wednesday 06 August 2008 09:55:56 Klaas Gadeyne wrote:
> On Wed, Aug 6, 2008 at 9:45 AM, Diederik Verscheure
> <diederik [dot] verscheure [..] ...> wrote:
> [...]
>
> >> > Any ideas? Also, how do I "debug" this properly.
> >>
> >> Peter would say: Use valgrind on your executable, I would say: what's
> >> the backtrace you get on the moment of the crash?
> >
> > Well, it's usually OS that crashes and not the application itself
>
> [some further questions which might aid debugging]
>
> Which (RT)OS are you using? Can you reproduce the crashes on the
> gnulinux port (do you run them as root or as a normal user)? Could it

RTAI. I can also crash gnulinux when running the app as root.

> be a case of "starvation" (i.e. even your lowest priority orocos
> thread has a higher priority than your "standard" linux applications)?
>

I don't know (also I don't understand how this can happen).
The priority of my periodic tasks is 0, that of the reporting is 2 and of the
nonperiodic tasks 5. I tried changing them, but to no avail.

I tried your suggestion with very simple nonperiodic components.
The first one reads something from a file and pushes the data on a bufferport.
The updatehook method from this component reads all data and then returns.
The second one reads the data from the bufferport and pushes the data onto
another bufferport. The third one reads the data from the other bufferport.
The bufferports are all larger or equal to the size of the file and the
maximum reported delay (by the reporting) is directly related to the size of
the bufferport/file.

So either my problem is that my bufferports are too large and the reason for
not making the sampling times is overhead in copying the bufferports?
Or there is some other unobvious cause?
Anyway, I am patiently awaiting for Peter to come back :).

Best regards,
Diederik

> Klaas

Nonperiodicactivities and lengthy calculations

On Thu, Aug 7, 2008 at 11:42 AM, Diederik Verscheure
<diederik [dot] verscheure [..] ...> wrote:
[...]
>> be a case of "starvation" (i.e. even your lowest priority orocos
>> thread has a higher priority than your "standard" linux applications)?
>>
>
> I don't know (also I don't understand how this can happen).
> The priority of my periodic tasks is 0, that of the reporting is 2 and of the
> nonperiodic tasks 5. I tried changing them, but to no avail.

Do you use adapted priorities, or the provided constants?

[kgad@ampere ~/SVN/orocos/rtt-macosx/src/os/gnulinux]$
cat gnuthreads.cpp | grep Priority
const int LowestPriority = 1;
const int HighestPriority = 99;
const int IncreasePriority = 1;
[kgad@ampere ~/SVN/orocos/rtt-macosx/src/os/gnulinux]$
cat ../lxrt/lxrtthreads.cpp | grep Priority
const int LowestPriority = 255;
const int HighestPriority = 0;
const int IncreasePriority = -1;

> So either my problem is that my bufferports are too large and the reason for
> not making the sampling times is overhead in copying the bufferports?

As long as you're priorities are right, it shouldn't be!

Klaas

Nonperiodicactivities and lengthy calculations

On Friday 08 August 2008 09:14:14 Klaas Gadeyne wrote:
> On Thu, Aug 7, 2008 at 11:42 AM, Diederik Verscheure
> <diederik [dot] verscheure [..] ...> wrote:
> > I don't know (also I don't understand how this can happen).
> > The priority of my periodic tasks is 0, that of the reporting is 2 and of
> > the nonperiodic tasks 5. I tried changing them, but to no avail.
>
> Do you use adapted priorities, or the provided constants?
>

I use the provided constants. Thanks for pointing out the evil in this :)

> [kgad@ampere ~/SVN/orocos/rtt-macosx/src/os/gnulinux]$
> cat gnuthreads.cpp | grep Priority
> const int LowestPriority = 1;
> const int HighestPriority = 99;
> const int IncreasePriority = 1;
> [kgad@ampere ~/SVN/orocos/rtt-macosx/src/os/gnulinux]$
> cat ../lxrt/lxrtthreads.cpp | grep Priority
> const int LowestPriority = 255;
> const int HighestPriority = 0;
> const int IncreasePriority = -1;
>
> > So either my problem is that my bufferports are too large and the reason
> > for not making the sampling times is overhead in copying the bufferports?
>
> As long as you're priorities are right, it shouldn't be!
>
> Klaas

Best regards,
Diederik

Met vriendelijke groeten,

Diederik

Nonperiodicactivities and lengthy calculations

On Thu, 7 Aug 2008, Diederik Verscheure wrote:

[..]
> I tried your suggestion with very simple nonperiodic components.
> The first one reads something from a file and pushes the data on a bufferport.
> The updatehook method from this component reads all data and then returns.
> The second one reads the data from the bufferport and pushes the data onto
> another bufferport. The third one reads the data from the other bufferport.
> The bufferports are all larger or equal to the size of the file and the
> maximum reported delay (by the reporting) is directly related to the size of
> the bufferport/file.
>
> So either my problem is that my bufferports are too large and the reason for
> not making the sampling times is overhead in copying the bufferports?
> Or there is some other unobvious cause?

Obvious not... Maybe you use a too small stack space for the threads you
are using...? Or you are using "new" and "delete" behind the screens
somewhere, without realising it? ... None of these possible problems are
"obvious"!

Herman

Nonperiodicactivities and lengthy calculations

On Thu, Aug 7, 2008 at 3:05 PM, Herman Bruyninckx
<Herman [dot] Bruyninckx [..] ...> wrote:
> On Thu, 7 Aug 2008, Diederik Verscheure wrote:
>
> [..]
>>
>> I tried your suggestion with very simple nonperiodic components.
>> The first one reads something from a file and pushes the data on a
>> bufferport.
>> The updatehook method from this component reads all data and then returns.
>> The second one reads the data from the bufferport and pushes the data onto
>> another bufferport. The third one reads the data from the other
>> bufferport.
>> The bufferports are all larger or equal to the size of the file and the
>> maximum reported delay (by the reporting) is directly related to the size
>> of
>> the bufferport/file.
>>
>> So either my problem is that my bufferports are too large and the reason
>> for
>> not making the sampling times is overhead in copying the bufferports?
>> Or there is some other unobvious cause?
>
> Obvious not... Maybe you use a too small stack space for the threads you
> are using...?

That would not (at least not in the gnulinux case) lead to a system
crash, only a program crash.

> Or you are using "new" and "delete" behind the screens
> somewhere, without realising it?

Dito here. Using new and delete would break real-time performance,
but that's about it. Even more, when using new and delete in
RT-threads would make him switch to secondary mode in RTAI/LXRT, and
the starvation symptoms he describes indicate exact the opposite.

Klaas

Nonperiodicactivities and lengthy calculations

On Thursday 07 August 2008 15:05:51 Herman Bruyninckx wrote:
> On Thu, 7 Aug 2008, Diederik Verscheure wrote:
>
> [..]
>
> > I tried your suggestion with very simple nonperiodic components.
> > The first one reads something from a file and pushes the data on a
> > bufferport. The updatehook method from this component reads all data and
> > then returns. The second one reads the data from the bufferport and
> > pushes the data onto another bufferport. The third one reads the data
> > from the other bufferport. The bufferports are all larger or equal to the
> > size of the file and the maximum reported delay (by the reporting) is
> > directly related to the size of the bufferport/file.
> >
> > So either my problem is that my bufferports are too large and the reason
> > for not making the sampling times is overhead in copying the bufferports?
> > Or there is some other unobvious cause?
>
> Obvious not... Maybe you use a too small stack space for the threads you
> are using...? Or you are using "new" and "delete" behind the screens
> somewhere, without realising it? ... None of these possible problems are
> "obvious"!

Okay, it's not obvious. Is it a problem to use new and delete in nonperiodic
components? I thought it was allowed, but maybe I am wrong (I am not using
new and delete anyway).

Anyway, I followed an idea of Klaas and now use usleep(100) (or alternatively
nanosleep) in the while loops of my calculations (I have 3 nonperiodic
components which perform lengthy calculations in while loops and which
communicate with eachother through bufferports).

Now everything works fine, though it is a huge mystery to me why.
(The tolerances on the sampling times are still high at times, e.g. 5 ms
instead of 2 ms, but no longer e.g. 1 s. Usually, they are relatively close
to 2 ms.).

>
> Herman

Best regards,
Diederik

Nonperiodicactivities and lengthy calculations

On Thu, Aug 7, 2008 at 5:26 PM, Diederik Verscheure
<diederik [dot] verscheure [..] ...> wrote:
> On Thursday 07 August 2008 15:05:51 Herman Bruyninckx wrote:
>> On Thu, 7 Aug 2008, Diederik Verscheure wrote:
>>
>> [..]
>>
>> > I tried your suggestion with very simple nonperiodic components.
>> > The first one reads something from a file and pushes the data on a
>> > bufferport. The updatehook method from this component reads all data and
>> > then returns. The second one reads the data from the bufferport and
>> > pushes the data onto another bufferport. The third one reads the data
>> > from the other bufferport. The bufferports are all larger or equal to the
>> > size of the file and the maximum reported delay (by the reporting) is
>> > directly related to the size of the bufferport/file.
>> >
>> > So either my problem is that my bufferports are too large and the reason
>> > for not making the sampling times is overhead in copying the bufferports?
>> > Or there is some other unobvious cause?
>>
>> Obvious not... Maybe you use a too small stack space for the threads you
>> are using...? Or you are using "new" and "delete" behind the screens
>> somewhere, without realising it? ... None of these possible problems are
>> "obvious"!
>
> Okay, it's not obvious. Is it a problem to use new and delete in nonperiodic
> components? I thought it was allowed, but maybe I am wrong (I am not using
> new and delete anyway).
>
> Anyway, I followed an idea of Klaas and now use usleep(100) (or alternatively
> nanosleep) in the while loops of my calculations (I have 3 nonperiodic
> components which perform lengthy calculations in while loops and which
> communicate with eachother through bufferports).
>
> Now everything works fine, though it is a huge mystery to me why.
> (The tolerances on the sampling times are still high at times, e.g. 5 ms
> instead of 2 ms, but no longer e.g. 1 s. Usually, they are relatively close
> to 2 ms.).

The usleep/nanosleep will make you go to secondary mode and hence
prevent starvation. I would suggest to recheck your priorities and
try running your app using RTT::HighestPriority for the periodic
components and switch the NonPeriodicComponents to ORO_SCHED_OTHER,
using something like (check the API reference, since I'm writing this
from the top of my head, so it's surely plain wrong :-)

NonPeriodicActivity MyAct(...),
MyAct.thread()->SetScheduler(ORO_SCHED_OTHER);

That will make your threads run in "non-realtime mode", and should
prevent the starvation problems...
Let us know the outcome :-)

ps. In case of lxrt, a

cat /proc/rtai/lxrt/scheduler

(or something similar) gives a nice (?) overview of your threads, and
their "usage percentage".

Klaas

Nonperiodicactivities and lengthy calculations

On Friday 08 August 2008 09:25:50 you wrote:
> On Thu, Aug 7, 2008 at 5:26 PM, Diederik Verscheure
> > Anyway, I followed an idea of Klaas and now use usleep(100) (or
> > alternatively nanosleep) in the while loops of my calculations (I have 3
> > nonperiodic components which perform lengthy calculations in while loops
> > and which communicate with eachother through bufferports).
> >
> > Now everything works fine, though it is a huge mystery to me why.
> > (The tolerances on the sampling times are still high at times, e.g. 5 ms
> > instead of 2 ms, but no longer e.g. 1 s. Usually, they are relatively
> > close to 2 ms.).
>
> The usleep/nanosleep will make you go to secondary mode and hence
> prevent starvation. I would suggest to recheck your priorities and
> try running your app using RTT::HighestPriority for the periodic
> components and switch the NonPeriodicComponents to ORO_SCHED_OTHER,
> using something like (check the API reference, since I'm writing this
> from the top of my head, so it's surely plain wrong :-)
>
> NonPeriodicActivity MyAct(...),
> MyAct.thread()->SetScheduler(ORO_SCHED_OTHER);
>
> That will make your threads run in "non-realtime mode", and should
> prevent the starvation problems...

Ok, that sounds like an elegant solution, which will probably do the job.
Thanks!

> Let us know the outcome :-)
>

I will.

> ps. In case of lxrt, a
>
> cat /proc/rtai/lxrt/scheduler
>
> (or something similar) gives a nice (?) overview of your threads, and
> their "usage percentage".
>
> Klaas

Best regards,
Diederik

Nonperiodicactivities and lengthy calculations

On Thu, 7 Aug 2008, Diederik Verscheure wrote:

> On Thursday 07 August 2008 15:05:51 Herman Bruyninckx wrote:
>> On Thu, 7 Aug 2008, Diederik Verscheure wrote:
>>
>> [..]
>>
>>> I tried your suggestion with very simple nonperiodic components.
>>> The first one reads something from a file and pushes the data on a
>>> bufferport. The updatehook method from this component reads all data and
>>> then returns. The second one reads the data from the bufferport and
>>> pushes the data onto another bufferport. The third one reads the data
>>> from the other bufferport. The bufferports are all larger or equal to the
>>> size of the file and the maximum reported delay (by the reporting) is
>>> directly related to the size of the bufferport/file.
>>>
>>> So either my problem is that my bufferports are too large and the reason
>>> for not making the sampling times is overhead in copying the bufferports?
>>> Or there is some other unobvious cause?
>>
>> Obvious not... Maybe you use a too small stack space for the threads you
>> are using...? Or you are using "new" and "delete" behind the screens
>> somewhere, without realising it? ... None of these possible problems are
>> "obvious"!
>
> Okay, it's not obvious. Is it a problem to use new and delete in nonperiodic
> components? I thought it was allowed, but maybe I am wrong (I am not using
> new and delete anyway).
new and delete can give problems in realtime, since memory allocation is
not always deterministic. And one often uses libraries that use new/delete
without you really knowing it.
There is no relationship with (non)periodicity of activities, as far as I
know.
>
> Anyway, I followed an idea of Klaas and now use usleep(100) (or alternatively
> nanosleep) in the while loops of my calculations (I have 3 nonperiodic
> components which perform lengthy calculations in while loops and which
> communicate with eachother through bufferports).
>
> Now everything works fine, though it is a huge mystery to me why.
> (The tolerances on the sampling times are still high at times, e.g. 5 ms
> instead of 2 ms, but no longer e.g. 1 s. Usually, they are relatively close
> to 2 ms.).
Maybe you are doing heavier calculations than you imagine yourself...
Again, few libraries are being developed for realtime performance...
Or do you use only you own code and nothing else?

Herman

Nonperiodicactivities and lengthy calculations

On Thursday 07 August 2008 18:03:17 Herman Bruyninckx wrote:
> On Thu, 7 Aug 2008, Diederik Verscheure wrote:
> > On Thursday 07 August 2008 15:05:51 Herman Bruyninckx wrote:
> >> On Thu, 7 Aug 2008, Diederik Verscheure wrote:
> >>
> > [..]
> > Okay, it's not obvious. Is it a problem to use new and delete in
> > nonperiodic components? I thought it was allowed, but maybe I am wrong (I
> > am not using new and delete anyway).
>
> new and delete can give problems in realtime, since memory allocation is
> not always deterministic. And one often uses libraries that use new/delete
> without you really knowing it.
> There is no relationship with (non)periodicity of activities, as far as I
> know.
>

I only use lapack as an external library. Most lapack function calls require
the user to provide a pointer to allocated memory (which I allocate
statically), so I'm pretty sure that I am not using "new" and "delete" (or
the C alternatives) behind the scenes.

> > Anyway, I followed an idea of Klaas and now use usleep(100) (or
> > alternatively nanosleep) in the while loops of my calculations (I have 3
> > nonperiodic components which perform lengthy calculations in while loops
> > and which communicate with eachother through bufferports).
> >
> > Now everything works fine, though it is a huge mystery to me why.
> > (The tolerances on the sampling times are still high at times, e.g. 5 ms
> > instead of 2 ms, but no longer e.g. 1 s. Usually, they are relatively
> > close to 2 ms.).
>
> Maybe you are doing heavier calculations than you imagine yourself...

How can a nonperiodically scheduled calculation be "too heavy"?
As far as I understand it, it is the responsability of the scheduler to first
handle the periodic tasks and then spend the remaining time on nonperiodic
tasks.

> Again, few libraries are being developed for realtime performance...
> Or do you use only you own code and nothing else?

Only lapack + own code.

>
> Herman

Best regards,
Diederik

Nonperiodicactivities and lengthy calculations

On Fri, 8 Aug 2008, Diederik Verscheure wrote:

> On Thursday 07 August 2008 18:03:17 Herman Bruyninckx wrote:
>> On Thu, 7 Aug 2008, Diederik Verscheure wrote:
>>> On Thursday 07 August 2008 15:05:51 Herman Bruyninckx wrote:
>>>> On Thu, 7 Aug 2008, Diederik Verscheure wrote:
>>>>
>>> [..]
>>> Okay, it's not obvious. Is it a problem to use new and delete in
>>> nonperiodic components? I thought it was allowed, but maybe I am wrong (I
>>> am not using new and delete anyway).
>>
>> new and delete can give problems in realtime, since memory allocation is
>> not always deterministic. And one often uses libraries that use new/delete
>> without you really knowing it.
>> There is no relationship with (non)periodicity of activities, as far as I
>> know.
>
> I only use lapack as an external library. Most lapack function calls require
> the user to provide a pointer to allocated memory (which I allocate
> statically), so I'm pretty sure that I am not using "new" and "delete" (or
> the C alternatives) behind the scenes.
>
Good!! Excellent even :-)

Herman