Getting reporters to do 'pop' on buffered connections

Currently, AFAIK, the reporters do not pop buffered connections. I
perfectly see how it is useful on a running system -- peeking data
connections without disturbing them. However, this avoid testing for
instance a sensor driver, since the reporter would only read the first
sensor reading on the buffer.

Would it be simple to get an optional way to have the reporters actually
pop buffered connections instead of only peeking on them ?

Sylvain

Getting reporters to do 'pop' on buffered connections

On Tuesday 08 July 2008 16:28:30 Sylvain Joyeux wrote:
> Currently, AFAIK, the reporters do not pop buffered connections. I
> perfectly see how it is useful on a running system -- peeking data
> connections without disturbing them. However, this avoid testing for
> instance a sensor driver, since the reporter would only read the first
> sensor reading on the buffer.
>
> Would it be simple to get an optional way to have the reporters actually
> pop buffered connections instead of only peeking on them ?

If this is for debugging purposes, wouldn't it be easier writing a small
helper component (periodic) which connects to your sensor using a buffered
port, and when running, calls 'getActivity()->trigger()' on the reporter and
then pops a value ? You'd need to make the reportingcomponent a nonperiodic
activity then. This way you won't miss a sample

Peter

Getting reporters to do 'pop' on buffered connections

On Tue, Jul 08, 2008 at 05:04:07PM +0200, Peter Soetens wrote:
> On Tuesday 08 July 2008 16:28:30 Sylvain Joyeux wrote:
> > Currently, AFAIK, the reporters do not pop buffered connections. I
> > perfectly see how it is useful on a running system -- peeking data
> > connections without disturbing them. However, this avoid testing for
> > instance a sensor driver, since the reporter would only read the first
> > sensor reading on the buffer.
> >
> > Would it be simple to get an optional way to have the reporters actually
> > pop buffered connections instead of only peeking on them ?
>
> If this is for debugging purposes, wouldn't it be easier writing a small
> helper component (periodic) which connects to your sensor using a buffered
> port, and when running, calls 'getActivity()->trigger()' on the reporter and
> then pops a value ? You'd need to make the reportingcomponent a nonperiodic
> activity then. This way you won't miss a sample

Btw. Is there any plans to allow for data-driven activities ? I.e.
activities for which step() is called whenever a source port is updated.
That would be a nice addition in general.

I may implement something like that at some point, but I wonder if
something has already been done in this direction.

Getting reporters to do 'pop' on buffered connections

On Tuesday 08 July 2008 17:35:40 Sylvain Joyeux wrote:
> On Tue, Jul 08, 2008 at 05:04:07PM +0200, Peter Soetens wrote:
> > On Tuesday 08 July 2008 16:28:30 Sylvain Joyeux wrote:
> > > Currently, AFAIK, the reporters do not pop buffered connections. I
> > > perfectly see how it is useful on a running system -- peeking data
> > > connections without disturbing them. However, this avoid testing for
> > > instance a sensor driver, since the reporter would only read the first
> > > sensor reading on the buffer.
> > >
> > > Would it be simple to get an optional way to have the reporters
> > > actually pop buffered connections instead of only peeking on them ?
> >
> > If this is for debugging purposes, wouldn't it be easier writing a small
> > helper component (periodic) which connects to your sensor using a
> > buffered port, and when running, calls 'getActivity()->trigger()' on the
> > reporter and then pops a value ? You'd need to make the
> > reportingcomponent a nonperiodic activity then. This way you won't miss a
> > sample
>
> Btw. Is there any plans to allow for data-driven activities ? I.e.
> activities for which step() is called whenever a source port is updated.
> That would be a nice addition in general.

Some rough mindflashes below...

This is/was bug report #423 "Improve Component execution model". Part of this
bug was resolved, but we're still not there. It would use the 'trigger()'
function of the ActivityInterface to notify that data has arrived / work has
to be done. The idea of the EventPort was that when it is added to a
TaskContext ( using ports()->addPort() ), it gets a pointer to the TC where
it can register event handlers or call trigger() on the TC activity when data
arrives. Which mechanism to use is yet undescided.

Because the data is stored in a connection object, the connection object would
be responsible for notifying any subscribed parties, and not the port. This
somewhat contradicts the necessity of an 'EventPort' because we want this
mechanism to work with the current ports as well (hence only the connection
object would need to change). Using an Orocos Event inside such a connection
seems a way for managing the callbacks.

As a last obstacle, the ports read/write directly to the dataobject or buffer.
So the connection actually doesn't know when it is read or written to, except
that it's data() or buffer() function is called to get the pointer to the
dataobject or buffer. We would need to call get/set/pop/poll on the
connection instead of on the contained dataobject/buffer (increases code size
again).

>
> I may implement something like that at some point, but I wonder if
> something has already been done in this direction.

You could open a new bug report to collect the ideas on this topic.

Peter

Getting reporters to do 'pop' on buffered connections

> Because the data is stored in a connection object, the connection object would
> be responsible for notifying any subscribed parties, and not the port. This
> somewhat contradicts the necessity of an 'EventPort' because we want this
> mechanism to work with the current ports as well (hence only the connection
> object would need to change). Using an Orocos Event inside such a connection
> seems a way for managing the callbacks.

> As a last obstacle, the ports read/write directly to the dataobject or buffer.
> So the connection actually doesn't know when it is read or written to, except
> that it's data() or buffer() function is called to get the pointer to the
> dataobject or buffer. We would need to call get/set/pop/poll on the
> connection instead of on the contained dataobject/buffer (increases code size
> again).

Let's see if I understood what you mean: we could not simply associate a single
Event object with all connections, having the event fired every time the
connection is written. That way, all you would have to do is link the event to
trigger and you're done.

Is the code size really a problem for you ? IMO, that functionality would really
work: having at least a framework offering both data-driven, periodic and
event-driven activities would *really* rock.

Sylvain

Getting reporters to do 'pop' on buffered connections

On Wednesday 09 July 2008 11:10:50 Sylvain Joyeux wrote:
> > Because the data is stored in a connection object, the connection object
> > would be responsible for notifying any subscribed parties, and not the
> > port. This somewhat contradicts the necessity of an 'EventPort' because
> > we want this mechanism to work with the current ports as well (hence only
> > the connection object would need to change). Using an Orocos Event inside
> > such a connection seems a way for managing the callbacks.
> >
> > As a last obstacle, the ports read/write directly to the dataobject or
> > buffer. So the connection actually doesn't know when it is read or
> > written to, except that it's data() or buffer() function is called to get
> > the pointer to the dataobject or buffer. We would need to call
> > get/set/pop/poll on the connection instead of on the contained
> > dataobject/buffer (increases code size again).
>
> Let's see if I understood what you mean: we could not simply associate a
> single Event object with all connections, having the event fired every time
> the connection is written. That way, all you would have to do is link the
> event to trigger and you're done.

With "all connections", you mean probably all TaskContexts? We can do what you
propose, we only need to have a way to detect a write in the C++ '[Data|
Buffer]Connection' class. If so, we can proceed with your approach.

>
> Is the code size really a problem for you ? IMO, that functionality would
> really work: having at least a framework offering both data-driven,
> periodic and event-driven activities would *really* rock.

Yes, we're working on it :-)

I'll be In China for the next 10 days. Maybe I'll have enough time on the
airplane to think your suggestions over. Feel free to make a proposal
yourself though :-)

Peter

Getting reporters to do 'pop' on buffered connections

> > Let's see if I understood what you mean: we could not simply associate a
> > single Event object with all connections, having the event fired every time
> > the connection is written. That way, all you would have to do is link the
> > event to trigger and you're done.
>
> With "all connections", you mean probably all TaskContexts? We can do what you
> propose, we only need to have a way to detect a write in the C++ '[Data|
> Buffer]Connection' class. If so, we can proceed with your approach.
Mmmm ... Nope. I thought one per connection object. Then you can do
even combine events to specify when the activity should be triggered.
Like "wake it up when one of the connection has been updated" or
"wake it up when all the connections have been updated", or even "wake
it up whenever connection A is updated if connection B has been updated
at least once". Things like that.

Getting reporters to do 'pop' on buffered connections

On Thursday 10 July 2008 10:21:46 Sylvain Joyeux wrote:
> > > Let's see if I understood what you mean: we could not simply associate
> > > a single Event object with all connections, having the event fired
> > > every time the connection is written. That way, all you would have to
> > > do is link the event to trigger and you're done.
> >
> > With "all connections", you mean probably all TaskContexts? We can do
> > what you propose, we only need to have a way to detect a write in the C++
> > '[Data| Buffer]Connection' class. If so, we can proceed with your
> > approach.
>
> Mmmm ... Nope. I thought one per connection object. Then you can do
> even combine events to specify when the activity should be triggered.
> Like "wake it up when one of the connection has been updated" or
> "wake it up when all the connections have been updated", or even "wake
> it up whenever connection A is updated if connection B has been updated
> at least once". Things like that.

Nevermind, that was what I was understanding as well.

Quite off-topic, but I just remember that I had ideas about making Method<...>
and Event<...> classes compatible, such that connecting a Method to an Event
would be as simle as:

Event m_event;
Method m_method;

m_event.connect( & m_method );

This would allow you as well to attach the 'trigger' method to an event easily
by using

m_event.connect( methods()->getMethod("trigger") );

Peter

Getting reporters to do 'pop' on buffered connections

On Tue, Jul 08, 2008 at 05:04:07PM +0200, Peter Soetens wrote:
> On Tuesday 08 July 2008 16:28:30 Sylvain Joyeux wrote:
> > Currently, AFAIK, the reporters do not pop buffered connections. I
> > perfectly see how it is useful on a running system -- peeking data
> > connections without disturbing them. However, this avoid testing for
> > instance a sensor driver, since the reporter would only read the first
> > sensor reading on the buffer.
> >
> > Would it be simple to get an optional way to have the reporters actually
> > pop buffered connections instead of only peeking on them ?

> If this is for debugging purposes, wouldn't it be easier writing a small
> helper component (periodic) which connects to your sensor using a buffered
> port, and when running, calls 'getActivity()->trigger()' on the reporter and
> then pops a value ? You'd need to make the reportingcomponent a nonperiodic
> activity then. This way you won't miss a sample

Well. It is not only for debugging purposes. It could also be to build
sensor datasets and do offline processing afterwards, this kind of
things. From my experience with other frameworks, the ability to run
only sensor acquisition and dump the output values is a useful one *and*
it needs to be done generically.

I was thinking about adding a isPeeking/setPeeking method pair in
DataSourceBase. If isPeeking() is true (the default), then the only
thing the datasource does when reading is read the current value of the
source. If it is false and if the source is a buffer then it does Pop().

What do you think ?

Sylvain

Getting reporters to do 'pop' on buffered connections

On Tuesday 08 July 2008 17:30:06 Sylvain Joyeux wrote:
> On Tue, Jul 08, 2008 at 05:04:07PM +0200, Peter Soetens wrote:
> > On Tuesday 08 July 2008 16:28:30 Sylvain Joyeux wrote:
> > > Currently, AFAIK, the reporters do not pop buffered connections. I
> > > perfectly see how it is useful on a running system -- peeking data
> > > connections without disturbing them. However, this avoid testing for
> > > instance a sensor driver, since the reporter would only read the first
> > > sensor reading on the buffer.
> > >
> > > Would it be simple to get an optional way to have the reporters
> > > actually pop buffered connections instead of only peeking on them ?
> >
> > If this is for debugging purposes, wouldn't it be easier writing a small
> > helper component (periodic) which connects to your sensor using a
> > buffered port, and when running, calls 'getActivity()->trigger()' on the
> > reporter and then pops a value ? You'd need to make the
> > reportingcomponent a nonperiodic activity then. This way you won't miss a
> > sample
>
> Well. It is not only for debugging purposes. It could also be to build
> sensor datasets and do offline processing afterwards, this kind of
> things. From my experience with other frameworks, the ability to run
> only sensor acquisition and dump the output values is a useful one *and*
> it needs to be done generically.

So that's a spot for the Reporter to fill in.

>
> I was thinking about adding a isPeeking/setPeeking method pair in
> DataSourceBase. If isPeeking() is true (the default), then the only
> thing the datasource does when reading is read the current value of the
> source. If it is false and if the source is a buffer then it does Pop().

That wouldn't work. The DataSource associated with a buffer is a 'watcher' of
the buffer's 'front()' element. So the datasource itself is never a buffer.
We've been separating buffers and datasources because they hardly have
something in common. A buffer has a capacity, can be empty and returns each
time a different item (or nothing). A datasource generally always returns the
same / a value to all readers. Reading a buffer may cause you to block as
well. I would extend BufferBase instead of DataSourceBase.

What about a bool pop(void) in BufferBase in combination with the reading
datasource for the value ?

>
> What do you think ?

We're looking at the Reporter as well for improving on logging 'events' or
buffered ports. Today, the idea behind the reporter is that it takes
(periodically) a snapshot of the current state of all its inputs and stores
that. Now that doesn't match with buffers (which may contain multiple or no
values). That's why we picked front() as the current snapshot of a buffer,
but that will let you miss and/or duplicate values. And how about reporting
two buffers with one having 10.000 elements/sec and the other 10/sec ?

Is the current file format (one time column) suitable for that ?

Peter

Getting reporters to do 'pop' on buffered connections

> > I was thinking about adding a isPeeking/setPeeking method pair in
> > DataSourceBase. If isPeeking() is true (the default), then the only
> > thing the datasource does when reading is read the current value of the
> > source. If it is false and if the source is a buffer then it does Pop().
>
> That wouldn't work.
Well ... actually it *does* work. I guess that what you mean is that it would
not fit in your view of the model ;-)

> The DataSource associated with a buffer is a 'watcher' of
> the buffer's 'front()' element. So the datasource itself is never a buffer.
You could see all datasources as one-element buffers where the default overflow
policy is to remove the last element. You could therefore extend the semantic of
"peek" so that the non-peeking datasource reset() its source in the non-buffered
case.

> We've been separating buffers and datasources because they hardly have
> something in common.
You're reading buffers through datasources, so that is not true. The view
"datasource-peeks-buffers" and "datasource-pops-buffers" are IMO equally
possible. Moreover, it is already possible that the buffered connection behind
the datasource is empty, in which case the datasource returns the default value
of the buffer.

That was also a question I have. What happens with a buffered connection in the
one-writer/multiple-reader case ? Do all of them see the same stream of data or
is a pop() of one reader makes the pop-ed value unavailable for the others ?

> Reading a buffer may cause you to block as well.
That would be a problem. In which case ? The implementations of Pop() I saw are
not blocking. I personally would see a blocking Pop() as problematic in the
general case (i.e. a taskcontext acting as watchdog, reading a port, could block
-- which would lead to nullifying the watchdog usefulness)

> I would extend BufferBase instead of DataSourceBase.
The problem with extending BufferBase is that pop() is only an implemented
operation on the reader side, not on the writer side (i.e. pop() has no business
being available on WriterInterface).

> Now that doesn't match with buffers (which may contain multiple or no
> values). That's why we picked front() as the current snapshot of a buffer,
> but that will let you miss and/or duplicate values.
And does not solve the empty problem. Except for the blocking case, I don't see
why using front() is really better than using pop().

> And how about reporting
> two buffers with one having 10.000 elements/sec and the other 10/sec ?
You create two reporters. In the general case, it would be way better to be data
driven for this kind of things.

> Is the current file format (one time column) suitable for that ?
Well. I'll be writing at some point in the near future a marshaller for my own
logging format, which is binary, includes multiple streams, two timestamps per
sample (sample timestamp and writing timestamp), and has already a few tools for
post-processing in Ruby -- which makes the whole post-processing very easy. I'll
let you know how it progresses.

Sylvain

Getting reporters to do 'pop' on buffered connections

On Wednesday 09 July 2008 11:06:00 Sylvain Joyeux wrote:
> > > I was thinking about adding a isPeeking/setPeeking method pair in
> > > DataSourceBase. If isPeeking() is true (the default), then the only
> > > thing the datasource does when reading is read the current value of the
> > > source. If it is false and if the source is a buffer then it does
> > > Pop().
> >
> > That wouldn't work.
>
> Well ... actually it *does* work. I guess that what you mean is that it
> would not fit in your view of the model ;-)

You got me :-)

>
> > The DataSource associated with a buffer is a 'watcher' of
> > the buffer's 'front()' element. So the datasource itself is never a
> > buffer.
>
> You could see all datasources as one-element buffers where the default
> overflow policy is to remove the last element. You could therefore extend
> the semantic of "peek" so that the non-peeking datasource reset() its
> source in the non-buffered case.

There is no problem at all to express this in software. The question I'm
having if it's making life easier for the user ? In your case it even would,
but I'm not yet convinced of the advantages for the bigger picture. My
feeling was that algorithms (ie code in updateHook() ) which operate on
buffered data ports are different than those reading non-buffered ports,
because they need to take into account that the buffer can be empty, or that
it can only read(pop) each element once ( or use front() ). While an
algorithm using data ports can run at any time and will always produce a
correct result. Now you're proposing that each such algorithm needs to
be 'prepared' to handle both buffered and unbuffered data ports ? My fear is
that in practice this will lead to situations like: this algorithm has only
been tested on type X of ports.... I'd rather make the 'contract' clear: this
algorithm expects a 'data flow' with 'flow specification' X, hence it has
only ports which enforce it. But please read on...

>
> > We've been separating buffers and datasources because they hardly have
> > something in common.
>
> You're reading buffers through datasources, so that is not true. The view
> "datasource-peeks-buffers" and "datasource-pops-buffers" are IMO equally
> possible. Moreover, it is already possible that the buffered connection
> behind the datasource is empty, in which case the datasource returns the
> default value of the buffer.

You're right, except that the datasource peeking is purely observing, while
pop is influencing the flow... see the next reply...

>
> That was also a question I have. What happens with a buffered connection in
> the one-writer/multiple-reader case ? Do all of them see the same stream of
> data or is a pop() of one reader makes the pop-ed value unavailable for the
> others ?

The buffers are many-to-many from a connection point of view but
point-to-point from a data flow view (an item is produced at one point and
consumed at one point). The only broadcast in dataflow is possible using a
DataSource. The 'buffer per receiver' connection type is still lacking in the
current implementation (it is possible to implement without changing the
current architecture, ie only write an extra class.).

>
> > Reading a buffer may cause you to block as well.
>
> That would be a problem. In which case ? The implementations of Pop() I saw
> are not blocking. I personally would see a blocking Pop() as problematic in
> the general case (i.e. a taskcontext acting as watchdog, reading a port,
> could block -- which would lead to nullifying the watchdog usefulness)

You can switch implementations in the connection object. The BufferLockFree
allows blocking on Pop+empty and/or Push+full. The default is non blocking
and return false.

>
> > I would extend BufferBase instead of DataSourceBase.
>
> The problem with extending BufferBase is that pop() is only an implemented
> operation on the reader side, not on the writer side (i.e. pop() has no
> business being available on WriterInterface).

That's right. You're getting me into a corner :-)

>
> > Now that doesn't match with buffers (which may contain multiple or no
> > values). That's why we picked front() as the current snapshot of a
> > buffer, but that will let you miss and/or duplicate values.
>
> And does not solve the empty problem. Except for the blocking case, I don't
> see why using front() is really better than using pop().

In the reporter case, it isn't. I'm here with you, using a datasource which
pops would be an improvement in the component->reporter case, but I'd rather
*know* that the buffer is empty than that I just get '0.0' as an answer from
a datasource, which could mean anything. I'm starting to believe that part of
the solution is having dedicated buffers for each reader.

>
> > And how about reporting
> > two buffers with one having 10.000 elements/sec and the other 10/sec ?
>
> You create two reporters. In the general case, it would be way better to be
> data driven for this kind of things.

So blocking on empty...

>
> > Is the current file format (one time column) suitable for that ?
>
> Well. I'll be writing at some point in the near future a marshaller for my
> own logging format, which is binary, includes multiple streams, two
> timestamps per sample (sample timestamp and writing timestamp), and has
> already a few tools for post-processing in Ruby -- which makes the whole
> post-processing very easy. I'll let you know how it progresses.

Note our NetCDF implementation in Bugzilla, which might inspire you as well.

My conclusion is that although a DataSourcePop would solve this for the
Reporter, it still has these issues:
1) can not distinguish empty buffer from buffer containing default value.
2) only allows one component to read the buffer, because it removes a value
from the flow.
3) I don't like the 'micromanagement' of setting 'peeking' flags

I see these solutions to get a better deal:
* provide a 'broadcast' buffered data flow (buffer per reader)
=> solves 2): just consume your buffer.
* Extend the reporter such that it knows it is handling a buffer or
datasource ( getting a pointer to BufferBase ),
=> solves 1): only read if not empty.
* Create a specific data source which pops an element from a buffer, similar
to the front() case.
=> solves 3)

Now if we can agree on this, you're making my day :-)

Peter

Getting reporters to do 'pop' on buffered connections

> > > The DataSource associated with a buffer is a 'watcher' of
> > > the buffer's 'front()' element. So the datasource itself is never a
> > > buffer.
> >
> > You could see all datasources as one-element buffers where the default
> > overflow policy is to remove the last element. You could therefore extend
> > the semantic of "peek" so that the non-peeking datasource reset() its
> > source in the non-buffered case.
>
> There is no problem at all to express this in software. The question I'm
> having if it's making life easier for the user ? In your case it even would,
> but I'm not yet convinced of the advantages for the bigger picture. My
> feeling was that algorithms (ie code in updateHook() ) which operate on
> buffered data ports are different than those reading non-buffered ports,
> because they need to take into account that the buffer can be empty, or that
> it can only read(pop) each element once ( or use front() ). While an
> algorithm using data ports can run at any time and will always produce a
> correct result. Now you're proposing that each such algorithm needs to
> be 'prepared' to handle both buffered and unbuffered data ports ?

Yes, it is true. But algorithms in updateHook() are using Ports objects,
which present different interfaces. IMO, you already got the devil in
the framework by allowing to get a DataSource interface to a buffered
connection. The emptiness situation is *already* a problem since, even
though the datasource does not pop(), someone else can ! So, the
datasource -- especially for the reporter, but not limited to it -- can
*already* return a default value without saying anything to anybody.

And I totally agree on the fact that a particular algorithm should be
able to expect one or the other flow type.

> My conclusion is that although a DataSourcePop would solve this for the
> Reporter, it still has these issues:
> 1) can not distinguish empty buffer from buffer containing default value.
> 2) only allows one component to read the buffer, because it removes a value
> from the flow.
> 3) I don't like the 'micromanagement' of setting 'peeking' flags
>
> I see these solutions to get a better deal:
> * provide a 'broadcast' buffered data flow (buffer per reader)
> => solves 2): just consume your buffer.
> * Extend the reporter such that it knows it is handling a buffer or
> datasource ( getting a pointer to BufferBase ),
> => solves 1): only read if not empty.
> * Create a specific data source which pops an element from a buffer, similar
> to the front() case.
> => solves 3)

I do agree on the general idea. But I think that with the broadcast
connection, we could have a sane solution by adding the following to
DataSources :
* make the datasource interface have a hasData() method. Even data
connections can be uninitialized (i.e. never written)
* make the buffereddatasource pop(). Not a problem anymore since
buffered connections are truly one-to-many.

I would also personally like to have a way to know whether or not the
connection has been updated since the last time I read it. Maybe
managing a timestamp in the connection (time of last write). Or a
sequence number. Either would then need to be read atomically with the
data.

like having
bool Pop(value_t& data, uint64_t* seq);
value_t get(uint64_t* seq);

and update seq only if non-nil. Or (more C++-ish)

pair Pop(value_t& data);
pair get();

> Now if we can agree on this, you're making my day :-)
Let's say I'm making it. Minor remarks though ;-)

Getting reporters to do 'pop' on buffered connections

On Thursday 10 July 2008 11:04:01 Sylvain Joyeux wrote:
> > > > The DataSource associated with a buffer is a 'watcher' of
> > > > the buffer's 'front()' element. So the datasource itself is never a
> > > > buffer.
> > >
> > > You could see all datasources as one-element buffers where the default
> > > overflow policy is to remove the last element. You could therefore
> > > extend the semantic of "peek" so that the non-peeking datasource
> > > reset() its source in the non-buffered case.
> >
> > There is no problem at all to express this in software. The question I'm
> > having if it's making life easier for the user ? In your case it even
> > would, but I'm not yet convinced of the advantages for the bigger
> > picture. My feeling was that algorithms (ie code in updateHook() ) which
> > operate on buffered data ports are different than those reading
> > non-buffered ports, because they need to take into account that the
> > buffer can be empty, or that it can only read(pop) each element once (
> > or use front() ). While an algorithm using data ports can run at any time
> > and will always produce a correct result. Now you're proposing that each
> > such algorithm needs to be 'prepared' to handle both buffered and
> > unbuffered data ports ?
>
> Yes, it is true. But algorithms in updateHook() are using Ports objects,
> which present different interfaces. IMO, you already got the devil in
> the framework by allowing to get a DataSource interface to a buffered
> connection. The emptiness situation is *already* a problem since, even
> though the datasource does not pop(), someone else can ! So, the
> datasource -- especially for the reporter, but not limited to it -- can
> *already* return a default value without saying anything to anybody.

That's why the reporter must know it is a buffer, and query the BufferBase
interface for the state (using PortInterface::getConnectionModel() and
PortInterface::connection()->getBuffer() ). You could only get such a
DataSource (ie BufferDataSource) by getting it from a buffered connection,
It's not a global 'problem' for data sources.

>
> And I totally agree on the fact that a particular algorithm should be
> able to expect one or the other flow type.

ok.

>
> > My conclusion is that although a DataSourcePop would solve this for the
> > Reporter, it still has these issues:
> > 1) can not distinguish empty buffer from buffer containing default
> > value. 2) only allows one component to read the buffer, because it
> > removes a value from the flow.
> > 3) I don't like the 'micromanagement' of setting 'peeking' flags
> >
> > I see these solutions to get a better deal:
> > * provide a 'broadcast' buffered data flow (buffer per reader)
> > => solves 2): just consume your buffer.
> > * Extend the reporter such that it knows it is handling a buffer or
> > datasource ( getting a pointer to BufferBase ),
> > => solves 1): only read if not empty.
> > * Create a specific data source which pops an element from a buffer,
> > similar to the front() case.
> > => solves 3)
>
> I do agree on the general idea. But I think that with the broadcast
> connection, we could have a sane solution by adding the following to
> DataSources :
> * make the datasource interface have a hasData() method. Even data
> connections can be uninitialized (i.e. never written)

No they aren't. The writer port always sets the initial value of the dataport
when the connection is created. The rule is that a datasource always
hasData(). There is only one exception, and that's when you definitely know
what you're doing: Ask the associated BufferBase if there's data.

> * make the buffereddatasource pop(). Not a problem anymore since
> buffered connections are truly one-to-many.

Would you propose to make buffered connections by default one-to-many ? We
need to have a broader user discussion about this.

I won't change BufferDataSource from front() to pop(). Although this would
help the reporter, there's plenty of other code that assumes it won't pop and
that it can be read as many times as wanted. We'll have to add a third
alternative to getBuffer/getDataSource in ConnectionInterface, which returns
a datasource which has the pop() behaviour, in case the connection is
buffered.

>
> I would also personally like to have a way to know whether or not the
> connection has been updated since the last time I read it. Maybe
> managing a timestamp in the connection (time of last write). Or a
> sequence number. Either would then need to be read atomically with the
> data.

For this you're on your own. Orocos won't support this because it's part of
the data representation, thus part of the user's responsability. If you want
timestamps, create them yourself using a

typedef std::pair timestamped_data_t;
BufferPort< timestamped_data_t > my_timestamped_port;

Where you push and pop an (element, time stamp) pair.

It would be very easy to register this type with the type system and have the
reporter print it out just the way you want.

Peter

Getting reporters to do 'pop' on buffered connections

> > I do agree on the general idea. But I think that with the broadcast
> > connection, we could have a sane solution by adding the following to
> > DataSources :
> > * make the datasource interface have a hasData() method. Even data
> > connections can be uninitialized (i.e. never written)
>
> No they aren't. The writer port always sets the initial value of the dataport
> when the connection is created. The rule is that a datasource always
> hasData().
And who sets the initial value of the writer port ? And who says that
*all* writer data ports *have* a meaningful initial value ? You would
have to rely on a "special" value which means "there is no data". That
sucks.

> I won't change BufferDataSource from front() to pop(). Although this would
> help the reporter, there's plenty of other code that assumes it won't pop and
> that it can be read as many times as wanted.
That is plain wrong. They can't properly handle the buffer connections
because then they can't handle when there is no data. "Serious"
applications (as in: not simply logging) cannot use
buffers-as-datasource or they have to assume there is one special value
which means "no data" (and, again, that sucks).

> We'll have to add a third alternative to getBuffer/getDataSource in
> ConnectionInterface, which returns a datasource which has the pop()
> behaviour, in case the connection is buffered.
I really don't understand why you want that ...

> > I would also personally like to have a way to know whether or not the
> > connection has been updated since the last time I read it. Maybe
> > managing a timestamp in the connection (time of last write). Or a
> > sequence number. Either would then need to be read atomically with the
> > data.
>
> For this you're on your own. Orocos won't support this because it's part of
> the data representation, thus part of the user's responsability. If you want
> timestamps, create them yourself using a
Not really. Being able to say "this port has been updated since the last
time you read it" is part of the data flow interface. The
timestamp/sequence number thing was merely a way to implement this.

Getting reporters to do 'pop' on buffered connections

On Thursday 10 July 2008 15:36:29 Sylvain Joyeux wrote:
> > > I do agree on the general idea. But I think that with the broadcast
> > > connection, we could have a sane solution by adding the following to
> > > DataSources :
> > > * make the datasource interface have a hasData() method. Even data
> > > connections can be uninitialized (i.e. never written)
> >
> > No they aren't. The writer port always sets the initial value of the
> > dataport when the connection is created. The rule is that a datasource
> > always hasData().
>
> And who sets the initial value of the writer port ? And who says that
> *all* writer data ports *have* a meaningful initial value ? You would
> have to rely on a "special" value which means "there is no data". That
> sucks.

I dont' want that either. But we need to distinguish between two use cases:
plain 'data ports' and 'buffered data ports'.

In our 'data port' robot control applications we see no need for such values
in data ports. Compare it with your average Simulink model. You causally
calculate the flow from model input to model output. A signal in such a model
always has a value, unless it was not yet running, then most signals have
value 0.0, or are left 'undefined'. In this case, hasData() is always true
(once the model is running, but even when not, we mostly can supply a safe or
initial value), that's what I meant with the above statement.

In a 'buffered data port' control application, for example a trajectory
interpolator or video frame processor, the data passed between components is
really a stream of 'items'. In that context, hasData() is indeed meaningful
and you want to distinguish between data or no-data in your algorithm. The
hasData() method however assumes that you have exclusive access to the
buffer, otherwise, hasData() could return true, another thread pops it and
your get an 'empty' value. In that case, hasData() is part of a race
condition. But if you use a BufferPort, you already have this check using
Pop().

>
> > I won't change BufferDataSource from front() to pop(). Although this
> > would help the reporter, there's plenty of other code that assumes it
> > won't pop and that it can be read as many times as wanted.
>
> That is plain wrong. They can't properly handle the buffer connections
> because then they can't handle when there is no data. "Serious"
> applications (as in: not simply logging) cannot use
> buffers-as-datasource or they have to assume there is one special value
> which means "no data" (and, again, that sucks).

I would no serious application advise to use BufferDataSource/front() for
reading a buffer. It is a race, except in special cases, just like it is for
hasData().

>
> > We'll have to add a third alternative to getBuffer/getDataSource in
> > ConnectionInterface, which returns a datasource which has the pop()
> > behaviour, in case the connection is buffered.
>
> I really don't understand why you want that ...

Because then I have a means to offer serious applications to read/pop a buffer
using a data source, while maintaining backwards compatibility. To me this
looked like a nice compromise... We're having this discussion because we both
think the buffered port implementation is broken (and an event based
alternative is lacking). I don't want to give up the feature to read the
front() of a buffer through a datasource ('peeking') and I don't want to
extend the DataSource* implementations even more. For my motivation, look at
the DataSource API reference:

A compromise is allowing new applications to query the connection object to
see if the connection is buffered, and if so, to get a datasource which pops.

If you don't think this is better, could you provide me an example where this
makes things worse ?

>
> > > I would also personally like to have a way to know whether or not the
> > > connection has been updated since the last time I read it. Maybe
> > > managing a timestamp in the connection (time of last write). Or a
> > > sequence number. Either would then need to be read atomically with the
> > > data.
> >
> > For this you're on your own. Orocos won't support this because it's part
> > of the data representation, thus part of the user's responsability. If
> > you want timestamps, create them yourself using a
>
> Not really. Being able to say "this port has been updated since the last
> time you read it" is part of the data flow interface. The
> timestamp/sequence number thing was merely a way to implement this.

OK. Then we need events. We had a similar problem in an application we were
designing today. We wanted to know if a DataPort had been updated. In the
end, we had to cache the previous value and compare with the current value.
An event would have solved this.
In case of a BufferPort, this is obviously easier to find out because you can
just call Pop() and check the return value. Or did you have another case in
mind where this does not suffice ?

Do you agree (from your experience) that we need a DataPort and a BufferPort
or are you looking for a way to have only one port type ?

Peter

Getting reporters to do 'pop' on buffered connections

> I dont' want that either. But we need to distinguish between two use cases:
> plain 'data ports' and 'buffered data ports'.
>
> In our 'data port' robot control applications we see no need for such values
> in data ports. Compare it with your average Simulink model. You causally
> calculate the flow from model input to model output. A signal in such a model
> always has a value, unless it was not yet running, then most signals have
> value 0.0, or are left 'undefined'.
Yes. But in Simulink (and in flow-based synchronous languages), you have
a scheduler ensuring that a data reader is called *only* when a value is
available. There is no such thing in Orocos, and it is impossible to
*detect* that (i.e. it is impossible, on the reader side, to recognize
that the value is "undefined" and that therefore there has been some
scheduling fault)

> In this case, hasData() is always true
> (we mostly can supply a safe or initial value), that's what I meant
> with the above statement.
Having a safe initial value is not true for all applications. Looks like
a workaround rather than a proper solution for me.

> In that case, hasData() is part of a race condition. But if you use a
> BufferPort, you already have this check using Pop().
Not true if you have a proper one-to-many model. Since each reader has
its own buffer, hasData() does not have a race condition anymore (and
neither front() for that matter).

> > > I won't change BufferDataSource from front() to pop(). Although this
> > > would help the reporter, there's plenty of other code that assumes it
> > > won't pop and that it can be read as many times as wanted.
> >
> > That is plain wrong. They can't properly handle the buffer connections
> > because then they can't handle when there is no data. "Serious"
> > applications (as in: not simply logging) cannot use
> > buffers-as-datasource or they have to assume there is one special value
> > which means "no data" (and, again, that sucks).
>
> I would no serious application advise to use BufferDataSource/front() for
> reading a buffer. It is a race, except in special cases, just like it is for
> hasData().

So ... You're actually defending a use-case (the current implementation
of datasources on buffers) which is broken from your very own point of
view ;-). But anyway, go on reading.

> > > We'll have to add a third alternative to getBuffer/getDataSource in
> > > ConnectionInterface, which returns a datasource which has the pop()
> > > behaviour, in case the connection is buffered.
> >
> > I really don't understand why you want that ...
>
> Because then I have a means to offer serious applications to read/pop a buffer
> using a data source, while maintaining backwards compatibility. To me this
> looked like a nice compromise...
For me, it is making Orocos uglier by cluttering the API (which is
already quite complex).

> I don't want to give up the feature to read the > front() of a buffer
> through a datasource ('peeking')
Again, when having a one-to-many connection implementation, Pop() *is* peeking !

> A compromise is allowing new applications to query the connection object to
> see if the connection is buffered, and if so, to get a datasource which pops.
>
> If you don't think this is better, could you provide me an example where this
> makes things worse ?
Well ... To somehow quote you: it will work, but is ugly from a model
point of view.

> > Not really. Being able to say "this port has been updated since the last
> > time you read it" is part of the data flow interface. The
> > timestamp/sequence number thing was merely a way to implement this.
>
> OK. Then we need events. We had a similar problem in an application we were
> designing today. We wanted to know if a DataPort had been updated. In the
> end, we had to cache the previous value and compare with the current value.
> An event would have solved this.
Mmmm ... I think that having this directly on the dataflow is an
interesting feature (regardless of the event-based thing), because it
allows periodic tasks to check their input ports without adding
complexity.

> In case of a BufferPort, this is obviously easier to find out because you can
> just call Pop() and check the return value. Or did you have another case in
> mind where this does not suffice ?
For BufferPort, it is enough. For DataPort, there is no way to know it.

> Do you agree (from your experience) that we need a DataPort and a BufferPort
> or are you looking for a way to have only one port type ?
Not at all. I do think that there is a need for both port types (i.e. a
port where there is always data once it has been written and once which
is a stream of items).

As a summary: I personally think that the backward compatibility
argument does not hold here, because by having a one-to-many connection
*and* having buffer-as-datasource pop you actually *fix* everything:
* you remove the race condition on front()
* you remove the race condition of the many reader case
* to nail things down, I also think that by changing "result_t read()"
into "bool read(result_t& value)" you fix the problem that a dataport
*may* have never been written and therefore *may* be undefined.
You can always keep result_t read() for backward compatibility
anyway.

Other than that, you are not changing anything ! If more than one reader
Pops in current applications, the application is broken, and if there is
one pop-er and multiple peek-ers, then you have a race condition anyway
! OK, some applications that are already broken will *really* break.
Big deal. It seems strange to me to allow broken use-cases to go on just
for the sake of backward compatibility. If they are broken, then they
will know it.

Sylvain

Getting reporters to do 'pop' on buffered connections

Hi,

I have been away with the Robocup team in China and this event inspired me a
lot regarding distributed communication, 'quality of data' and the whole data
flow model. In a way, much of what I saw in practice confirmed Sylvain's
points. I also learnt that 'we' have not enough experience with truely
distributed systems ('swarms') with unrelyable communication channels (we
were using Wifi), mixed with relyable 'localhost' communication. In fact, It
let me to believe that the current implementation of the Orocos RTT is not
ready to offer support for such forms of data flow. Fortunately, we're
talking software here, there ain't a thing that can't be fixed.

Anyway...

On Thursday 10 July 2008 18:42:42 Sylvain Joyeux wrote:
>
> Yes. But in Simulink (and in flow-based synchronous languages), you have
> a scheduler ensuring that a data reader is called *only* when a value is
> available. There is no such thing in Orocos, and it is impossible to
> *detect* that (i.e. it is impossible, on the reader side, to recognize
> that the value is "undefined" and that therefore there has been some
> scheduling fault)

Indeed. This is indeed especially true for distributed systems, because
scheduling is by architecture unrelyable.

>
> > In this case, hasData() is always true
> > (we mostly can supply a safe or initial value), that's what I meant
> > with the above statement.
>
> Having a safe initial value is not true for all applications. Looks like
> a workaround rather than a proper solution for me.

I agree. But it could be even worse. What if the value is much to 'old' ? What
if we should only use it if it is younger than 20ms ? hasData() is in this
scenario even not sufficient. It's an incomplete quality check.

>
> > In that case, hasData() is part of a race condition. But if you use a
> > BufferPort, you already have this check using Pop().
>
> Not true if you have a proper one-to-many model. Since each reader has
> its own buffer, hasData() does not have a race condition anymore (and
> neither front() for that matter).

With the one-to-many model, there is still the problem of a process that is
stopped, the buffer fills and then the process starts again. How can it know
that it is first processing old data ? And what if buffers are emptied at
different paces ? what is the size() of the buffer, when is it full(), what
if different buffer sizes are used for each reader, what's the capacity ?
When will Push() return false ? Our API is biased towards the fact that
there's only one buffer with a known state. I'd like to see how you implement
a proper one-to-many model.

[...]

> > I don't want to give up the feature to read the > front() of a buffer
> > through a datasource ('peeking')
>
> Again, when having a one-to-many connection implementation, Pop() *is*
> peeking !

No, it's popping. peek/front() does not influence the state of the buffer,
Pop() does. But maybe front() does not belong in a one-to-many system.
It's clear that the current API is not fit for describing it...

> > > Not really. Being able to say "this port has been updated since the
> > > last time you read it" is part of the data flow interface. The
> > > timestamp/sequence number thing was merely a way to implement this.
> >
> > OK. Then we need events. We had a similar problem in an application we
> > were designing today. We wanted to know if a DataPort had been updated.
> > In the end, we had to cache the previous value and compare with the
> > current value. An event would have solved this.
>
> Mmmm ... I think that having this directly on the dataflow is an
> interesting feature (regardless of the event-based thing), because it
> allows periodic tasks to check their input ports without adding
> complexity.

You mean adding a method to the port for checking 'age' ?

[...]
>
> As a summary: I personally think that the backward compatibility
> argument does not hold here, because by having a one-to-many connection
> *and* having buffer-as-datasource pop you actually *fix* everything:

( Assuming you drop support for other buffer architectures ! Imagine filling a
buffer with data from a network and two threads emptying that buffer on a
dual-core system. )

> * you remove the race condition on front()
> * you remove the race condition of the many reader case
> * to nail things down, I also think that by changing "result_t read()"
> into "bool read(result_t& value)" you fix the problem that a dataport
> *may* have never been written and therefore *may* be undefined.
> You can always keep result_t read() for backward compatibility
> anyway.

Yes. But recently I came to the conclusion that for distributed systems the
bool read(value) is even not sufficient.

>
> Other than that, you are not changing anything ! If more than one reader
> Pops in current applications, the application is broken,

For the record: it's not broken. It's just a different kind of application
than the one you're having in your mind.

> and if there is
> one pop-er and multiple peek-ers, then you have a race condition anyway
> !

That is true, but you'd have to return something. Popping from the buffer is
not acceptable here because it influences the data flow in this architecture.
If you want to pop in the datasource-way of reading, you need a buffer-per
reader connection architecture.

> OK, some applications that are already broken will *really* break.
> Big deal. It seems strange to me to allow broken use-cases to go on just
> for the sake of backward compatibility. If they are broken, then they
> will know it.

There's breaking and there's fixing. I'm not willing to introduce a new data
flow model (one buffer per reader) and dropping the first (one buffer). That
would be breaking. On the other hand, the data flow model clearly needs
fixing, and we won't get there without code and consensus.

These were our basis requirements:

* Support for data and buffer flows between components
* Support for transparant local and distributed communication
* Support for peeking the current state of a connection

We solved these requirements by
* Having a central data object and a central buffer object per connection
* Using pointer for in-process and CORBA for all other communication
* Providing a data source which returns the next to be read value.

It looks like these are the additional requirements:

* Support for one-buffer-per-reader
* Knowing the quality of the data in the buffer or data port
* Having a notification mechanism for new data.

We could satisfy these by :

* Allowing a way to specify the default connection type for buffers. The
buffer's read port looks like a candidate to specify this ("I want my own
data / - share it with others"). In addition, the semantics of the API must be
found for this case.
* Having an additional Get/Pop function returning quality information as
well.
* Associate Events to ports (see the other sub-thread for discussion).

So far for changing the architecture. I'm not happy at all with the current
data flow implementation as well. For inter-process or network communication,
CORBA is inefficient. For local communication, shared memory or unix sockets
would be more efficient. For network distribution, multicast or broadcast
messages offer significant advantages. However, the latter relies on UDP,
which is unrelyable for a distributed buffer implementation. So I don't have
the full solution yet and my gut-feeling tells me that we're not finished on
the architecture part either.

I'll be on holidays for the next three weeks, so you'll have a lot time
thinking these arguments over (and propose a patch).

Peter

Getting reporters to do 'pop' on buffered connections

> >
> > > In this case, hasData() is always true
> > > (we mostly can supply a safe or initial value), that's what I meant
> > > with the above statement.
> >
> > Having a safe initial value is not true for all applications. Looks like
> > a workaround rather than a proper solution for me.
>
> I agree. But it could be even worse. What if the value is much to 'old' ? What
> if we should only use it if it is younger than 20ms ? hasData() is in this
> scenario even not sufficient. It's an incomplete quality check.
That is two different things. In one case, we are talking about having
and not having data. In the other, about having improper data. The first
one is very generic indeed. The second one is application specific. See
the bottom of the mail for further explanations.

> > > In that case, hasData() is part of a race condition. But if you use a
> > > BufferPort, you already have this check using Pop().
> >
> > Not true if you have a proper one-to-many model. Since each reader has
> > its own buffer, hasData() does not have a race condition anymore (and
> > neither front() for that matter).
>
> With the one-to-many model, there is still the problem of a process that is
> stopped, the buffer fills and then the process starts again. How can it know
> that it is first processing old data ? And what if buffers are emptied at
> different paces ? what is the size() of the buffer, when is it full(), what
> if different buffer sizes are used for each reader, what's the capacity ?
> When will Push() return false ? Our API is biased towards the fact that
> there's only one buffer with a known state. I'd like to see how you implement
> a proper one-to-many model.

By having the writer completely ignore the state of the buffer -- as it
should be since we are talking about a component model. The writer only
pushes data around. The job of getting the data at the right place is
the burden of the framework, and the one of knowing if the data is valid
or not the burden of the readers. So, to answer your question, from the
point of view of the writer:

There is no capacity, full() and Push() always succeeds. It is not its
job to handle the connection "right". If the buffer is too small, and
you are losing data while you should not, then it is a runtime error *at
the system level* and has to be treated as such by higher-level
supervision tools.

> > > I don't want to give up the feature to read the > front() of a buffer
> > > through a datasource ('peeking')
> >
> > Again, when having a one-to-many connection implementation, Pop() *is*
> > peeking !
>
> No, it's popping.
It is popping the private buffer of the reader, so the other readers do
not see any changes.

> > > > Not really. Being able to say "this port has been updated since the
> > > > last time you read it" is part of the data flow interface. The
> > > > timestamp/sequence number thing was merely a way to implement this.
> > >
> > > OK. Then we need events. We had a similar problem in an application we
> > > were designing today. We wanted to know if a DataPort had been updated.
> > > In the end, we had to cache the previous value and compare with the
> > > current value. An event would have solved this.
> >
> > Mmmm ... I think that having this directly on the dataflow is an
> > interesting feature (regardless of the event-based thing), because it
> > allows periodic tasks to check their input ports without adding
> > complexity.
>
> You mean adding a method to the port for checking 'age' ?
No. Adding a way to know if the data has changed since last time. See
the bottom of the mail about the whole timestamping thing

> > As a summary: I personally think that the backward compatibility
> > argument does not hold here, because by having a one-to-many connection
> > *and* having buffer-as-datasource pop you actually *fix* everything:
>
> ( Assuming you drop support for other buffer architectures ! Imagine filling a
> buffer with data from a network and two threads emptying that buffer on a
> dual-core system. )
Then it is the problem of the dual-core thing to properly synchonize if
needs be. Maybe our disagreement comes from the fact that I'm really not
seeing that whole thing at the buffer level (connection implementation),
but at the port level (endpoint implementation). I am talking about
buffered connections, not buffer objects. The buffer objects should be
opaque implementation details that are not used by applications.

> > Other than that, you are not changing anything ! If more than one reader
> > Pops in current applications, the application is broken,
> For the record: it's not broken. It's just a different kind of application
> than the one you're having in your mind.
No. When you are reading data through the current peeking interface, you
don't know if you are reading data or whatever meaningless default value
which is initialized by the default constructor of the data type. I call
that broken by all standards.

> > and if there is
> > one pop-er and multiple peek-ers, then you have a race condition anyway
> > !
>
> That is true, but you'd have to return something. Popping from the buffer is
> not acceptable here because it influences the data flow in this architecture.
> If you want to pop in the datasource-way of reading, you need a buffer-per
> reader connection architecture.
Yes ! Exactly ! Thank you. That is the proper one-to-many
implementation: having one buffer per reader.

> > OK, some applications that are already broken will *really* break.
> > Big deal. It seems strange to me to allow broken use-cases to go on just
> > for the sake of backward compatibility. If they are broken, then they
> > will know it.
>
> There's breaking and there's fixing. I'm not willing to introduce a new data
> flow model (one buffer per reader) and dropping the first (one buffer).
Given that the current model is only valid when there is one (buffered)
reader per connection, the following holds:
in all valid uses, the one-buffer and the one-buffer-per-reader are
exactly the same.

> * Support for peeking the current state of a connection
I'd like you to give me *one* valid example for the usefulness of this,
which is not provided by the one-buffer-per-reader case.

> We could satisfy these by :
>
> * Allowing a way to specify the default connection type for buffers. The
> buffer's read port looks like a candidate to specify this ("I want my own
> data / - share it with others"). In addition, the semantics of the API must be
> found for this case.
You are talking about making the whole API even more complicated. The
current problem of Orocos -- and that is why I have a hard time
convincing people about using it at all -- is that the whole API is very
complicated. I'm convinced that going for one-buffer-per-reader is
extending the model and not changing it (see above), so there is no need
to make it even more complicated.

> * Having an additional Get/Pop function returning quality information as
> well.
No. Quality and timestamping is *very* application dependent, and since
it is tied to the data sample anyway, the current API is valid for it.
* the timestamp of a data is dependent of the data fusion process(es). For
instance, if you have an image processing pipeline, the timestamp of
the processed image is actually the same than the one of the source
image (the information in the processed image is valid at the time
the source image has been acquired).
* a quality factor is completely dependent of the kind of data you are
getting around.

Moreover, "good" data fusion algorithms handle old data seamlessly (see
distributed data filters for an example).

> * Associate Events to ports (see the other sub-thread for discussion).
I have a patch for this. I'll send it when it has been more tested.

Getting reporters to do 'pop' on buffered connections

On Wednesday 23 July 2008 11:46:36 Sylvain Joyeux wrote:
> > > > In this case, hasData() is always true
> > > > (we mostly can supply a safe or initial value), that's what I meant
> > > > with the above statement.
> > >
> > > Having a safe initial value is not true for all applications. Looks
> > > like a workaround rather than a proper solution for me.
> >
> > I agree. But it could be even worse. What if the value is much to 'old' ?
> > What if we should only use it if it is younger than 20ms ? hasData() is
> > in this scenario even not sufficient. It's an incomplete quality check.
>
> That is two different things. In one case, we are talking about having
> and not having data. In the other, about having improper data. The first
> one is very generic indeed. The second one is application specific. See
> the bottom of the mail for further explanations.

The two are very closely related. What if the application starts, the port is
written and then the writer crashes or hangs ? hasData() will remain true
forever, unless you let it become false after calling it once, until the next
write sets it to true again. If you're providing 'quality' information, why
not about the complete spectrum, i.e. when the last write was ?

>
> > > > In that case, hasData() is part of a race condition. But if you use a
> > > > BufferPort, you already have this check using Pop().
> > >
> > > Not true if you have a proper one-to-many model. Since each reader has
> > > its own buffer, hasData() does not have a race condition anymore (and
> > > neither front() for that matter).
> >
> > With the one-to-many model, there is still the problem of a process that
> > is stopped, the buffer fills and then the process starts again. How can
> > it know that it is first processing old data ? And what if buffers are
> > emptied at different paces ? what is the size() of the buffer, when is it
> > full(), what if different buffer sizes are used for each reader, what's
> > the capacity ? When will Push() return false ? Our API is biased towards
> > the fact that there's only one buffer with a known state. I'd like to see
> > how you implement a proper one-to-many model.
>
> By having the writer completely ignore the state of the buffer -- as it
> should be since we are talking about a component model. The writer only
> pushes data around. The job of getting the data at the right place is
> the burden of the framework, and the one of knowing if the data is valid
> or not the burden of the readers. So, to answer your question, from the
> point of view of the writer:
>
> There is no capacity, full() and Push() always succeeds. It is not its
> job to handle the connection "right". If the buffer is too small, and
> you are losing data while you should not, then it is a runtime error *at
> the system level* and has to be treated as such by higher-level
> supervision tools.

This is the first time I completely understand what you are looking for.
You're looking for an equivalent of the CORBA event service, a 'send and
forget' buffered data stream. This requires Orocos event-based reception in
order to process the data at the correct pace.

I was more thinking about the kind of buffers you find in traditional
producer-consumer setups, where the producer fills the buffer until it's full
(non-periodic) and the consumer empty's it at it's own (periodic) pace.

>
> > > > I don't want to give up the feature to read the > front() of a buffer
> > > > through a datasource ('peeking')
> > >
> > > Again, when having a one-to-many connection implementation, Pop() *is*
> > > peeking !
> >
> > No, it's popping.
>
> It is popping the private buffer of the reader, so the other readers do
> not see any changes.

In your application setup this is indeed correct.

[...]
> > You mean adding a method to the port for checking 'age' ?
>
> No. Adding a way to know if the data has changed since last time. See
> the bottom of the mail about the whole timestamping thing

Since the last time... being read ?

>
> > > As a summary: I personally think that the backward compatibility
> > > argument does not hold here, because by having a one-to-many connection
> > > *and* having buffer-as-datasource pop you actually *fix* everything:
> >
> > ( Assuming you drop support for other buffer architectures ! Imagine
> > filling a buffer with data from a network and two threads emptying that
> > buffer on a dual-core system. )
>
> Then it is the problem of the dual-core thing to properly synchonize if
> needs be. Maybe our disagreement comes from the fact that I'm really not
> seeing that whole thing at the buffer level (connection implementation),
> but at the port level (endpoint implementation). I am talking about
> buffered connections, not buffer objects. The buffer objects should be
> opaque implementation details that are not used by applications.

This is indeed the root of our disagreement. You're looking for a
send-and-forget, while I was looking for a state-ful buffer, providing me
information about the connection.

>
> > > Other than that, you are not changing anything ! If more than one
> > > reader Pops in current applications, the application is broken,
> >
> > For the record: it's not broken. It's just a different kind of
> > application than the one you're having in your mind.
>
> No. When you are reading data through the current peeking interface, you
> don't know if you are reading data or whatever meaningless default value
> which is initialized by the default constructor of the data type. I call
> that broken by all standards.

We already agreed that the peeking needs fixing. Multiple pops from the same
buffer is not broken by definition, it may just be a way of sharing workload
between threads/components.

>
> > > and if there is
> > > one pop-er and multiple peek-ers, then you have a race condition anyway
> > > !
> >
> > That is true, but you'd have to return something. Popping from the buffer
> > is not acceptable here because it influences the data flow in this
> > architecture. If you want to pop in the datasource-way of reading, you
> > need a buffer-per reader connection architecture.
>
> Yes ! Exactly ! Thank you. That is the proper one-to-many
> implementation: having one buffer per reader.

Don't underestimate my understanding of this complex matter :-)

>
> > > OK, some applications that are already broken will *really* break.
> > > Big deal. It seems strange to me to allow broken use-cases to go on
> > > just for the sake of backward compatibility. If they are broken, then
> > > they will know it.
> >
> > There's breaking and there's fixing. I'm not willing to introduce a new
> > data flow model (one buffer per reader) and dropping the first (one
> > buffer).
>
> Given that the current model is only valid when there is one (buffered)
> reader per connection, the following holds:
> in all valid uses, the one-buffer and the one-buffer-per-reader are
> exactly the same.

I agree that this holds for all distributed component architectures, and that
there is no other way to do it like this for such architectures. For some
thightly coupled N-producer-M-consumer architectures, sharing one buffer
might just be what the application needs. For a 1-producer-M-consumer setup,
we could solve it by letting the producer do a round-robin over
one-reader-per-consumer connections to each consumer though.

>
> > * Support for peeking the current state of a connection
>
> I'd like you to give me *one* valid example for the usefulness of this,
> which is not provided by the one-buffer-per-reader case.

I can't ! But I can provide valid scenarios for single-shared buffers. This
does not mean I don't accept your one-buffer-per-reader solution. You've
convinced me that it is necessary for distributed applications, and many
other frameworks have taken this form of communication as a standard. It
works, it's proven. But if we accept it as the only valid scheme, it means
that single-shared-buffer applications break and need to be fixed by setting
up threads within producer and concumer which emulate the current
implementation over one writer-reader connection. I'll be frank: I don't have
any knowledge of a current Orocos application that works like this. I don't
know of any application that will break because of your change. Buffers have
always been the least used type of data flow, probably because of their
current implementation. But that doesn't make it easier to drop it
completely.

>
> > We could satisfy these by :
> >
> > * Allowing a way to specify the default connection type for buffers. The
> > buffer's read port looks like a candidate to specify this ("I want my own
> > data / - share it with others"). In addition, the semantics of the API
> > must be found for this case.
>
> You are talking about making the whole API even more complicated. The
> current problem of Orocos -- and that is why I have a hard time
> convincing people about using it at all -- is that the whole API is very
> complicated. I'm convinced that going for one-buffer-per-reader is
> extending the model and not changing it (see above), so there is no need
> to make it even more complicated.

Well, you're indeed simplifying the buffer connection model to make it a
perfect fit for distributed data processing.

>
> > * Having an additional Get/Pop function returning quality information as
> > well.
>
> No. Quality and timestamping is *very* application dependent, and since
> it is tied to the data sample anyway, the current API is valid for it.
> * the timestamp of a data is dependent of the data fusion process(es).
> For instance, if you have an image processing pipeline, the timestamp of
> the processed image is actually the same than the one of the source image
> (the information in the processed image is valid at the time the source
> image has been acquired).
> * a quality factor is completely dependent of the kind of data you are
> getting around.

I'm talking about the quality of the distributed network, when data (as in
RTT::DataPort) is shared across nodes (like a distributed world model). Given
unrelyable networks, information may age and currently, there is no way to
know how relyable the information is. What I learnt is that this is an
intrinsic aspect of distributed computing: There are no guarantees
what-so-ever, so you can't rely on the absolute correctness of the data. The
only thing you can do is provide a quality measure and take your descision
based on that. In case you want to timestamp captured images, you are indeed
on your own, that's not the information I'm talking about. I'm
talking "quality of service" of the 'connection' itself. For local
communication, the quality is almost absolute (if the sender sends (writes
into memory), the receiver can receive it), but not so in distributed
systems.

>
> Moreover, "good" data fusion algorithms handle old data seamlessly (see
> distributed data filters for an example).
>
> > * Associate Events to ports (see the other sub-thread for discussion).
>
> I have a patch for this. I'll send it when it has been more tested.

Good. I'll be on holidays, this is really my last mail :-)

Peter

Getting reporters to do 'pop' on buffered connections

On Tue, 22 Jul 2008, Peter Soetens wrote:

> I have been away with the Robocup team in China and this event inspired me a
> lot regarding distributed communication, 'quality of data' and the whole data
> flow model. In a way, much of what I saw in practice confirmed Sylvain's
> points. I also learnt that 'we' have not enough experience with truely
> distributed systems ('swarms') with unrelyable communication channels (we
> were using Wifi), mixed with relyable 'localhost' communication. In fact, It
> let me to believe that the current implementation of the Orocos RTT is not
> ready to offer support for such forms of data flow. Fortunately, we're
> talking software here, there ain't a thing that can't be fixed.

What exactly is missing in RTT...? As far as I can remember, such
"middleware issues" have always been _deliberately_ outside of the
_development_ scope of Orocos (not outside of the application scope,
obviously). In other words: RTT assumes you have reliable data ports, and
the robustness should be implemented outside of RTT, in, _both_, the
communication middleware and the application programs...

I think it is high time to get the middleware projects (Orca2, Miro, ...)
in the loop (you surely remember that was wat we had already in mind when
starting Orocos in 2001 :-)).

> Anyway...

Yes, anyway... Anyway, making the integrating between reliable
communication middleware, Orocos and application libraries will be the
focus of a new European FP7 project that I will participate in, that will
have half a dozen or so programmers full time for about four years,
starting somewhere next winter :-) Remember this name: "BRICS" (Best
practices in robotics) :-)

Herman

PS Peter, hopefully you can influence the Eindhoven Robocup team enough in
order for them to become the prototype and proving ground for this robust
realtime distributed data ports project! :-)

> On Thursday 10 July 2008 18:42:42 Sylvain Joyeux wrote:
>>
>> Yes. But in Simulink (and in flow-based synchronous languages), you have
>> a scheduler ensuring that a data reader is called *only* when a value is
>> available. There is no such thing in Orocos, and it is impossible to
>> *detect* that (i.e. it is impossible, on the reader side, to recognize
>> that the value is "undefined" and that therefore there has been some
>> scheduling fault)
>
> Indeed. This is indeed especially true for distributed systems, because
> scheduling is by architecture unrelyable.
>
>>
>>> In this case, hasData() is always true
>>> (we mostly can supply a safe or initial value), that's what I meant
>>> with the above statement.
>>
>> Having a safe initial value is not true for all applications. Looks like
>> a workaround rather than a proper solution for me.
>
> I agree. But it could be even worse. What if the value is much to 'old' ? What
> if we should only use it if it is younger than 20ms ? hasData() is in this
> scenario even not sufficient. It's an incomplete quality check.
>
>>
>>> In that case, hasData() is part of a race condition. But if you use a
>>> BufferPort, you already have this check using Pop().
>>
>> Not true if you have a proper one-to-many model. Since each reader has
>> its own buffer, hasData() does not have a race condition anymore (and
>> neither front() for that matter).
>
> With the one-to-many model, there is still the problem of a process that is
> stopped, the buffer fills and then the process starts again. How can it know
> that it is first processing old data ? And what if buffers are emptied at
> different paces ? what is the size() of the buffer, when is it full(), what
> if different buffer sizes are used for each reader, what's the capacity ?
> When will Push() return false ? Our API is biased towards the fact that
> there's only one buffer with a known state. I'd like to see how you implement
> a proper one-to-many model.
>
> [...]
>
>>> I don't want to give up the feature to read the > front() of a buffer
>>> through a datasource ('peeking')
>>
>> Again, when having a one-to-many connection implementation, Pop() *is*
>> peeking !
>
> No, it's popping. peek/front() does not influence the state of the buffer,
> Pop() does. But maybe front() does not belong in a one-to-many system.
> It's clear that the current API is not fit for describing it...
>
>>>> Not really. Being able to say "this port has been updated since the
>>>> last time you read it" is part of the data flow interface. The
>>>> timestamp/sequence number thing was merely a way to implement this.
>>>
>>> OK. Then we need events. We had a similar problem in an application we
>>> were designing today. We wanted to know if a DataPort had been updated.
>>> In the end, we had to cache the previous value and compare with the
>>> current value. An event would have solved this.
>>
>> Mmmm ... I think that having this directly on the dataflow is an
>> interesting feature (regardless of the event-based thing), because it
>> allows periodic tasks to check their input ports without adding
>> complexity.
>
> You mean adding a method to the port for checking 'age' ?
>
> [...]
>>
>> As a summary: I personally think that the backward compatibility
>> argument does not hold here, because by having a one-to-many connection
>> *and* having buffer-as-datasource pop you actually *fix* everything:
>
> ( Assuming you drop support for other buffer architectures ! Imagine filling a
> buffer with data from a network and two threads emptying that buffer on a
> dual-core system. )
>
>> * you remove the race condition on front()
>> * you remove the race condition of the many reader case
>> * to nail things down, I also think that by changing "result_t read()"
>> into "bool read(result_t& value)" you fix the problem that a dataport
>> *may* have never been written and therefore *may* be undefined.
>> You can always keep result_t read() for backward compatibility
>> anyway.
>
> Yes. But recently I came to the conclusion that for distributed systems the
> bool read(value) is even not sufficient.
>
>>
>> Other than that, you are not changing anything ! If more than one reader
>> Pops in current applications, the application is broken,
>
> For the record: it's not broken. It's just a different kind of application
> than the one you're having in your mind.
>
>> and if there is
>> one pop-er and multiple peek-ers, then you have a race condition anyway
>> !
>
> That is true, but you'd have to return something. Popping from the buffer is
> not acceptable here because it influences the data flow in this architecture.
> If you want to pop in the datasource-way of reading, you need a buffer-per
> reader connection architecture.
>
>> OK, some applications that are already broken will *really* break.
>> Big deal. It seems strange to me to allow broken use-cases to go on just
>> for the sake of backward compatibility. If they are broken, then they
>> will know it.
>
> There's breaking and there's fixing. I'm not willing to introduce a new data
> flow model (one buffer per reader) and dropping the first (one buffer). That
> would be breaking. On the other hand, the data flow model clearly needs
> fixing, and we won't get there without code and consensus.
>
> These were our basis requirements:
>
> * Support for data and buffer flows between components
> * Support for transparant local and distributed communication
> * Support for peeking the current state of a connection
>
> We solved these requirements by
> * Having a central data object and a central buffer object per connection
> * Using pointer for in-process and CORBA for all other communication
> * Providing a data source which returns the next to be read value.
>
> It looks like these are the additional requirements:
>
> * Support for one-buffer-per-reader
> * Knowing the quality of the data in the buffer or data port
> * Having a notification mechanism for new data.
>
> We could satisfy these by :
>
> * Allowing a way to specify the default connection type for buffers. The
> buffer's read port looks like a candidate to specify this ("I want my own
> data / - share it with others"). In addition, the semantics of the API must be
> found for this case.
> * Having an additional Get/Pop function returning quality information as
> well.
> * Associate Events to ports (see the other sub-thread for discussion).
>
> So far for changing the architecture. I'm not happy at all with the current
> data flow implementation as well. For inter-process or network communication,
> CORBA is inefficient. For local communication, shared memory or unix sockets
> would be more efficient. For network distribution, multicast or broadcast
> messages offer significant advantages. However, the latter relies on UDP,
> which is unrelyable for a distributed buffer implementation. So I don't have
> the full solution yet and my gut-feeling tells me that we're not finished on
> the architecture part either.
>
> I'll be on holidays for the next three weeks, so you'll have a lot time
> thinking these arguments over (and propose a patch).
>
> Peter
> --
> Peter Soetens -- FMTC --
> --
> Orocos-Dev mailing list
> Orocos-Dev [..] ...
> http://lists.mech.kuleuven.be/mailman/listinfo/orocos-dev
>

Getting reporters to do 'pop' on buffered connections

> Yes, anyway... Anyway, making the integrating between reliable
> communication middleware, Orocos and application libraries will be the
> focus of a new European FP7 project that I will participate in, that will
> have half a dozen or so programmers full time for about four years,
> starting somewhere next winter :-) Remember this name: "BRICS" (Best
> practices in robotics) :-)
One thing that is very nice in the current Orocos implementation is that
the C++ framework is decoupled from the transport. We are for instance
considering the implementation of an Ice transport to replace Corba.

I hope that one of your design goals is to keep on with that philosophy
;-)

Getting reporters to do 'pop' on buffered connections

On Wednesday 23 July 2008 12:23:30 Sylvain Joyeux wrote:
> > Yes, anyway... Anyway, making the integrating between reliable
> > communication middleware, Orocos and application libraries will be the
> > focus of a new European FP7 project that I will participate in, that will
> > have half a dozen or so programmers full time for about four years,
> > starting somewhere next winter :-) Remember this name: "BRICS" (Best
> > practices in robotics) :-)
>
> One thing that is very nice in the current Orocos implementation is that
> the C++ framework is decoupled from the transport. We are for instance
> considering the implementation of an Ice transport to replace Corba.

I thought it was 'impossible' to use Ice, because it does not support any's.
From my investigation, you'd have to declare an empty Ice 'class' (replacing
the CORBA any) of which each transportable data object must inherit. But I
might be wrong.

>
> I hope that one of your design goals is to keep on with that philosophy
> ;-)

It is.

Peter

Getting reporters to do 'pop' on buffered connections

On Wed, 23 Jul 2008, Sylvain Joyeux wrote:

>> Yes, anyway... Anyway, making the integrating between reliable
>> communication middleware, Orocos and application libraries will be the
>> focus of a new European FP7 project that I will participate in, that will
>> have half a dozen or so programmers full time for about four years,
>> starting somewhere next winter :-) Remember this name: "BRICS" (Best
>> practices in robotics) :-)
> One thing that is very nice in the current Orocos implementation is that
> the C++ framework is decoupled from the transport. We are for instance
> considering the implementation of an Ice transport to replace Corba.

That's exactly the kind of flexibility we have always had in mind :-)

Can you say two words on why you want to use Ice? Ice was started by a
bunch of people because they were fed up with the extensiveness of CORBA,
but now they are slowly rebuilding the same kind of framework, although a
bit less flexible and with a GPL license that is not very
industry-friendly...

BRICS will organize "research camps" around various software topics in
advanced robot control systems, and "communication middleware" will most
certainly be the subject of more than one of these. You should come and
participate :-) However, BRICS will only start next year, so maybe we
shouldn't wait that long, and organize an ad-hoc workshop around this theme
earlier...

> I hope that one of your design goals is to keep on with that philosophy
> ;-)
Mine certainly is! :-))))

Herman