PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.0 Example]

On Aug 24, 2009, at 02:43 , Herman Bruyninckx wrote:

> On Sun, 23 Aug 2009, Sylvain Joyeux wrote:
>
>> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
>>>> Now, it is actually possible to have a MO/SI model, by allowing
>>>> an input port
>>>> to have multiple incoming channels, and having InputPort::read
>>>> round-robin on
>>>> those channels. As an added nicety, one can listen to the "new
>>>> data" event and
>>>> access only the port for which we have an indication that new
>>>> data can be
>>>> available. Implementing that would require very little added
>>>> code, since the
>>>> management of multiple channels is already present in OutputPort.
>>>
>>> Well, a 'polling' + keeping a pointer to the last read channel, such
>>> that we try that one always first, and only start polling if that
>>> channel turned out 'empty'. This empty detection is problematic for
>>> shared data connections (opposed to buffered), because once they
>>> are
>>> written, they always show a valid value. We might need to add that
>>> once an input port is read, the sample is consumed, and a next read
>>> will return false (no new data).
>>
>> I don't like the idea of read() returning false on an already
>> initialized data
>> connection. If you want a connection telling you if it has been
>> written since
>> last read(), use a buffer. Maybe having read() return a tri-state:
>> NO_SAMPLE,
>> UPDATED_SAMPLE, OLD_SAMPLE with NO_SAMPLE being false ?
>
> I think RTT should only provide the simplest and easiest-to-implement
> policy: each reader gets the last value that was written, and new
> writes
> overwrite that value.
>
> More complex policies belong to dedicated port components, each
> providing
> one (or more) of those policies.

+1

Stephen

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Aug 24, 2009, at 07:48 , Herman Bruyninckx wrote:

> On Mon, 24 Aug 2009, S Roderick wrote:
>
>> On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:
>>
>>> On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher wrote:
>>>> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
>>>>
>>>>>> Now, it is actually possible to have a MO/SI model, by allowing
>>>>>> an input port
>>>>>> to have multiple incoming channels, and having InputPort::read
>>>>>> round-robin on
>>>>>> those channels. As an added nicety, one can listen to the "new
>>>>>> data" event and
>>>>>> access only the port for which we have an indication that new
>>>>>> data can be
>>>>>> available. Implementing that would require very little added
>>>>>> code, since the
>>>>>> management of multiple channels is already present in OutputPort.
>>>>>
>>>>> Well, a 'polling' + keeping a pointer to the last read channel,
>>>>> such
>>>>> that we try that one always first, and only start polling if that
>>>>> channel turned out 'empty'. This empty detection is problematic
>>>>> for
>>>>> shared data connections (opposed to buffered), because once they
>>>>> are
>>>>> written, they always show a valid value. We might need to add that
>>>>> once an input port is read, the sample is consumed, and a next
>>>>> read
>>>>> will return false (no new data).
>>>>
>>>> Is there really a usecase for multiple incoming but unbuffered
>>>> connections? It seems to me that the result would be quite
>>>> arbitrary.
>>>
>>> Of course there is. If you think at a more broader scope there could
>>> be a coordination component controlling the individual components
>>> such
>>> that the results are not arbitrary at all.
>>>
>>> In fact this is a good example of explicit vs. implicit
>>> coordination.
>>
>> This is _exactly_ the situation we have in our projects. Multiple
>> components with unbuffered output connections, to a single input
>> connection on another component. A coordination component ensures
>> that
>> only one of the input components is running at a time, but they are
>> all connected.
>>
>> Here, we want the latest data value available. No more, no less.
>>
>> Otherwise, Markus is correct. Having more than one input component
>> running simultaneously would be arbitrary and give nonsense output
>> data.
>
> Indeed... So, the conclusion I draw from this (sub)discussion is the
> following: the _coordinated_ multi-writer use case is so special
> that it
> does not deserve its own feature in the Data Ports part of RTT. (The
> Coordinator will (have to) know about all its "data providers", and
> make/delete the connections to them explicitly. So, there is no need
> to
> "help him out" by this specific data port policy implementation.)

I'm not sure it's that clear cut? Sylvain has been talking about
adding a basic feature to the new data flow implementation that allows
(coordinated) multiple input port, single output port systems, without
any fancy policies. That I believe we still need, as the data flow
implementation as it stands does not provide for this scenario at all.
We need this as Peter has indicated that (re-)connections at runtime
are not realtime, and so the coordinator can not change port
connections (though it can start/stop components).

Besides that, yes, I agree that no extra features are needed within RTT.

Stephen

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Mon, 24 Aug 2009, Markus Klotzbuecher wrote:

> On Mon, Aug 24, 2009 at 01:48:42PM +0200, Herman Bruyninckx wrote:
>> On Mon, 24 Aug 2009, S Roderick wrote:
>>
>>> On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:
>>>
>>>> On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher wrote:
>>>>> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
>>>>>
>>>>>>> Now, it is actually possible to have a MO/SI model, by allowing
>>>>>>> an input port
>>>>>>> to have multiple incoming channels, and having InputPort::read
>>>>>>> round-robin on
>>>>>>> those channels. As an added nicety, one can listen to the "new
>>>>>>> data" event and
>>>>>>> access only the port for which we have an indication that new
>>>>>>> data can be
>>>>>>> available. Implementing that would require very little added
>>>>>>> code, since the
>>>>>>> management of multiple channels is already present in OutputPort.
>>>>>>
>>>>>> Well, a 'polling' + keeping a pointer to the last read channel, such
>>>>>> that we try that one always first, and only start polling if that
>>>>>> channel turned out 'empty'. This empty detection is problematic for
>>>>>> shared data connections (opposed to buffered), because once they
>>>>>> are
>>>>>> written, they always show a valid value. We might need to add that
>>>>>> once an input port is read, the sample is consumed, and a next read
>>>>>> will return false (no new data).
>>>>>
>>>>> Is there really a usecase for multiple incoming but unbuffered
>>>>> connections? It seems to me that the result would be quite arbitrary.
>>>>
>>>> Of course there is. If you think at a more broader scope there could
>>>> be a coordination component controlling the individual components such
>>>> that the results are not arbitrary at all.
>>>>
>>>> In fact this is a good example of explicit vs. implicit coordination.
>>>
>>> This is _exactly_ the situation we have in our projects. Multiple
>>> components with unbuffered output connections, to a single input
>>> connection on another component. A coordination component ensures that
>>> only one of the input components is running at a time, but they are
>>> all connected.
>>>
>>> Here, we want the latest data value available. No more, no less.
>>>
>>> Otherwise, Markus is correct. Having more than one input component
>>> running simultaneously would be arbitrary and give nonsense output data.
>>
>> Indeed... So, the conclusion I draw from this (sub)discussion is the
>> following: the _coordinated_ multi-writer use case is so special that it
>> does not deserve its own feature in the Data Ports part of RTT. (The
>> Coordinator will (have to) know about all its "data providers", and
>> make/delete the connections to them explicitly. So, there is no need to
>> "help him out" by this specific data port policy implementation.)
>
> I agree. However as I understand this approach is complicated by the
> fact that creating/deleting connections is not real-time safe.
>
If the Coordinator component requires realtime communication, it has to
make all connections before entering the realtime loop. That's the
traditional 'resource allocation' best practice in realtime contexts.

The _switching_ between data providers is as realtime as the Coordination
is, and that does (typically) not depend on the communication.

Herman

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Mon, Aug 24, 2009 at 01:21:13PM +0200, S Roderick wrote:
> On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:
>
> > On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher wrote:
> >> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
> >>
> >>>> Now, it is actually possible to have a MO/SI model, by allowing
> >>>> an input port
> >>>> to have multiple incoming channels, and having InputPort::read
> >>>> round-robin on
> >>>> those channels. As an added nicety, one can listen to the "new
> >>>> data" event and
> >>>> access only the port for which we have an indication that new
> >>>> data can be
> >>>> available. Implementing that would require very little added
> >>>> code, since the
> >>>> management of multiple channels is already present in OutputPort.
> >>>
> >>> Well, a 'polling' + keeping a pointer to the last read channel, such
> >>> that we try that one always first, and only start polling if that
> >>> channel turned out 'empty'. This empty detection is problematic for
> >>> shared data connections (opposed to buffered), because once they
> >>> are
> >>> written, they always show a valid value. We might need to add that
> >>> once an input port is read, the sample is consumed, and a next read
> >>> will return false (no new data).
> >>
> >> Is there really a usecase for multiple incoming but unbuffered
> >> connections? It seems to me that the result would be quite arbitrary.
> >
> > Of course there is. If you think at a more broader scope there could
> > be a coordination component controlling the individual components such
> > that the results are not arbitrary at all.
> >
> > In fact this is a good example of explicit vs. implicit coordination.
>
> This is _exactly_ the situation we have in our projects. Multiple
> components with unbuffered output connections, to a single input
> connection on another component. A coordination component ensures that
> only one of the input components is running at a time, but they are
> all connected.
>
> Here, we want the latest data value available. No more, no less.

And if you used bufferports of capacity 1 instead of dataports? This
would have the additional benefit of replacing polling by event driven
behavior. I must admit I still find this sampling of dataport values a
bit odd.

Markus

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Mon, Aug 24, 2009 at 14:22, Markus
Klotzbuecher<markus [dot] klotzbuecher [..] ...> wrote:
> On Mon, Aug 24, 2009 at 01:21:13PM +0200, S Roderick wrote:
>> On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:
>>
>> > On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher wrote:
>> >> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
>> >>
>> >>>> Now, it is actually possible to have a MO/SI model, by allowing
>> >>>> an input port
>> >>>> to have multiple incoming channels, and having InputPort::read
>> >>>> round-robin on
>> >>>> those channels. As an added nicety, one can listen to the "new
>> >>>> data" event and
>> >>>> access only the port for which we have an indication that new
>> >>>> data can be
>> >>>> available. Implementing that would require very little added
>> >>>> code, since the
>> >>>> management of multiple channels is already present in OutputPort.
>> >>>
>> >>> Well, a 'polling' + keeping a pointer to the last read channel, such
>> >>> that we try that one always first, and only start polling if that
>> >>> channel turned out 'empty'. This empty detection is problematic for
>> >>> shared data connections (opposed to buffered),  because once they
>> >>> are
>> >>> written, they always show a valid value. We might need to add that
>> >>> once an input port is read, the sample is consumed, and a next read
>> >>> will return false (no new data).
>> >>
>> >> Is there really a usecase for multiple incoming but unbuffered
>> >> connections? It seems to me that the result would be quite arbitrary.
>> >
>> > Of course there is. If you think at a more broader scope there could
>> > be a coordination component controlling the individual components such
>> > that the results are not arbitrary at all.
>> >
>> > In fact this is a good example of explicit vs. implicit coordination.
>>
>> This is _exactly_ the situation we have in our projects. Multiple
>> components with unbuffered output connections, to a single input
>> connection on another component. A coordination component ensures that
>> only one of the input components is running at a time, but they are
>> all connected.
>>
>> Here, we want the latest data value available. No more, no less.
>
> And if you used bufferports of capacity 1 instead of dataports? This
> would have the additional benefit of replacing polling by event driven
> behavior. I must admit I still find this sampling of dataport values a
> bit odd.

First of all, there is no such thing as data ports or buffer ports (in
RTT 2.0). There are only connection policies, and there are currently
two major ones: 'shared data' and 'buffered'. They allow the
application builder to choose which type of data exchange is the most
efficient between components A and B. For example, in a completely
synchronous data flow system, there is no use in installing buffers
between components, just 'sharing data' is sufficient. There are other
factors in play as well. As we discussed this earlier, we should write
this down in the wiki such that application builders can make an
informed choice.

In principle, as Herman pointed out, the algorithm in the component
should be completely independent of a connection policy. That's also
why he opposes the default policy settings of the input port. It
'taints' a clean separation of the 4 C's. Imho, convenience sometimes
overrules clean design. This is especially so for small applications
where 'specifying something in one place' overrides 'separation of
concerns'. In practice this means that a user prefers 1 file of 40
lines than 4 files of 10 lines, while on the other hand perfers 10
files of 4.000 lines over 1 file of 40.000 lines of code. We have to
accommodate both cases.

The latter aside, the algorithm vs connection policy independence is
the fundamental guideline for designing the input-port and output-port
interface in C++. That's why we need to think thourougly over the
semantics of read() and write(). We decided that write() was sent and
forget. For read, it now returns a bool with the following 'truth'
table:

{{{
Table of return values or read( Sample ) (True/False):
status\policy | data | buffered |
not connected | | |
or never written | F | F |
connected, | | |
written in past | T | F |
connected, and | | |
new data | T | T |
}}}

There are in fact three states and only two return values. I believe
that for uniformity and algorithm independence, we might benefit from
a three state return value as well. As such, the algorithm or
component logic can decide: what to do if the ports are not connected
(flag error, which is caught by the coordination layer); what to do if
some port did not receive new data (but others might have); what to do
if a new sample has arrived. These decisions are independent of the
connection policy. If on the other hand, only two return values are
provided (like in the table), the algorithm might have to make
assumptions on the type of connection policy because the middle case
has different return values for data or buffered.

So I'm basically agreeing with Markus here: to the algorithm, the
ports only see samples ( as if everything is buffered) and the choice
of data vs buffered is 'just' an optimization strategy when setting up
the intercomponent connections. But there still remain three cases
independent of the policy: 'not connected', 'connected but no new
data' and 'connected and new data'. Today, a data connection will
return in the middle case the last sample, while a buffered will not
(in first and last case they behave identical). Maybe that's the first
thing we need to fix and straight out.

Peter

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Aug 24, 2009, at 08:22 , Markus Klotzbuecher wrote:

> On Mon, Aug 24, 2009 at 01:21:13PM +0200, S Roderick wrote:
>> On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:
>>
>>> On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher wrote:
>>>> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
>>>>
>>>>>> Now, it is actually possible to have a MO/SI model, by allowing
>>>>>> an input port
>>>>>> to have multiple incoming channels, and having InputPort::read
>>>>>> round-robin on
>>>>>> those channels. As an added nicety, one can listen to the "new
>>>>>> data" event and
>>>>>> access only the port for which we have an indication that new
>>>>>> data can be
>>>>>> available. Implementing that would require very little added
>>>>>> code, since the
>>>>>> management of multiple channels is already present in OutputPort.
>>>>>
>>>>> Well, a 'polling' + keeping a pointer to the last read channel,
>>>>> such
>>>>> that we try that one always first, and only start polling if that
>>>>> channel turned out 'empty'. This empty detection is problematic
>>>>> for
>>>>> shared data connections (opposed to buffered), because once they
>>>>> are
>>>>> written, they always show a valid value. We might need to add that
>>>>> once an input port is read, the sample is consumed, and a next
>>>>> read
>>>>> will return false (no new data).
>>>>
>>>> Is there really a usecase for multiple incoming but unbuffered
>>>> connections? It seems to me that the result would be quite
>>>> arbitrary.
>>>
>>> Of course there is. If you think at a more broader scope there could
>>> be a coordination component controlling the individual components
>>> such
>>> that the results are not arbitrary at all.
>>>
>>> In fact this is a good example of explicit vs. implicit
>>> coordination.
>>
>> This is _exactly_ the situation we have in our projects. Multiple
>> components with unbuffered output connections, to a single input
>> connection on another component. A coordination component ensures
>> that
>> only one of the input components is running at a time, but they are
>> all connected.
>>
>> Here, we want the latest data value available. No more, no less.
>
> And if you used bufferports of capacity 1 instead of dataports? This
> would have the additional benefit of replacing polling by event driven
> behavior. I must admit I still find this sampling of dataport values a
> bit odd.

You find the overall structure odd? Is that what you mean? If so, we
are open to alternatives if you have something concrete in mind. If
this structure is very unusual (although based on Peter's comments, I
don't believe that is the case), we would love to know what we're
doing wrong and maybe in the process decrease Sylvain's blood pressure
at having to deal with this "exception". :-)

Stephen

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Mon, Aug 24, 2009 at 02:34:40PM +0200, Stephen Roderick wrote:
> On Aug 24, 2009, at 08:22 , Markus Klotzbuecher wrote:
>
> > On Mon, Aug 24, 2009 at 01:21:13PM +0200, S Roderick wrote:
> >> On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:
> >>
> >>> On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher wrote:
> >>>> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
> >>>>
> >>>>>> Now, it is actually possible to have a MO/SI model, by allowing
> >>>>>> an input port
> >>>>>> to have multiple incoming channels, and having InputPort::read
> >>>>>> round-robin on
> >>>>>> those channels. As an added nicety, one can listen to the "new
> >>>>>> data" event and
> >>>>>> access only the port for which we have an indication that new
> >>>>>> data can be
> >>>>>> available. Implementing that would require very little added
> >>>>>> code, since the
> >>>>>> management of multiple channels is already present in OutputPort.
> >>>>>
> >>>>> Well, a 'polling' + keeping a pointer to the last read channel,
> >>>>> such
> >>>>> that we try that one always first, and only start polling if that
> >>>>> channel turned out 'empty'. This empty detection is problematic
> >>>>> for
> >>>>> shared data connections (opposed to buffered), because once they
> >>>>> are
> >>>>> written, they always show a valid value. We might need to add that
> >>>>> once an input port is read, the sample is consumed, and a next
> >>>>> read
> >>>>> will return false (no new data).
> >>>>
> >>>> Is there really a usecase for multiple incoming but unbuffered
> >>>> connections? It seems to me that the result would be quite
> >>>> arbitrary.
> >>>
> >>> Of course there is. If you think at a more broader scope there could
> >>> be a coordination component controlling the individual components
> >>> such
> >>> that the results are not arbitrary at all.
> >>>
> >>> In fact this is a good example of explicit vs. implicit
> >>> coordination.
> >>
> >> This is _exactly_ the situation we have in our projects. Multiple
> >> components with unbuffered output connections, to a single input
> >> connection on another component. A coordination component ensures
> >> that
> >> only one of the input components is running at a time, but they are
> >> all connected.
> >>
> >> Here, we want the latest data value available. No more, no less.
> >
> > And if you used bufferports of capacity 1 instead of dataports? This
> > would have the additional benefit of replacing polling by event driven
> > behavior. I must admit I still find this sampling of dataport values a
> > bit odd.
>
> You find the overall structure odd? Is that what you mean? If so, we
> are open to alternatives if you have something concrete in mind. If
> this structure is very unusual (although based on Peter's comments, I
> don't believe that is the case), we would love to know what we're
> doing wrong and maybe in the process decrease Sylvain's blood pressure
> at having to deal with this "exception". :-)

Not the overall structure :-) I agree this is a common and valid
pattern. What I find odd is why unbuffered dataports are used for
exchanging data, which means that the receiver effectively has to
sample the values, there is this trouble with initializing and dealing
with uninitialized ports wheras it seems so much more natural (to me)
to use a buffered port, even if its only of size 1.

Actually rereading what Peter writes above ("once an input port is
read, the sample is consumed and a next read will return false") is
exactly that, no?

I'm wondering if a unbuffered dataports are a corner case which are
getting too much attention.

Regards
Markus

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Aug 24, 2009, at 09:17 , Markus Klotzbuecher wrote:

> On Mon, Aug 24, 2009 at 02:34:40PM +0200, Stephen Roderick wrote:
>> On Aug 24, 2009, at 08:22 , Markus Klotzbuecher wrote:
>>
>>> On Mon, Aug 24, 2009 at 01:21:13PM +0200, S Roderick wrote:
>>>> On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:
>>>>
>>>>> On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher
>>>>> wrote:
>>>>>> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
>>>>>>
>>>>>>>> Now, it is actually possible to have a MO/SI model, by allowing
>>>>>>>> an input port
>>>>>>>> to have multiple incoming channels, and having InputPort::read
>>>>>>>> round-robin on
>>>>>>>> those channels. As an added nicety, one can listen to the "new
>>>>>>>> data" event and
>>>>>>>> access only the port for which we have an indication that new
>>>>>>>> data can be
>>>>>>>> available. Implementing that would require very little added
>>>>>>>> code, since the
>>>>>>>> management of multiple channels is already present in
>>>>>>>> OutputPort.
>>>>>>>
>>>>>>> Well, a 'polling' + keeping a pointer to the last read channel,
>>>>>>> such
>>>>>>> that we try that one always first, and only start polling if
>>>>>>> that
>>>>>>> channel turned out 'empty'. This empty detection is problematic
>>>>>>> for
>>>>>>> shared data connections (opposed to buffered), because once
>>>>>>> they
>>>>>>> are
>>>>>>> written, they always show a valid value. We might need to add
>>>>>>> that
>>>>>>> once an input port is read, the sample is consumed, and a next
>>>>>>> read
>>>>>>> will return false (no new data).
>>>>>>
>>>>>> Is there really a usecase for multiple incoming but unbuffered
>>>>>> connections? It seems to me that the result would be quite
>>>>>> arbitrary.
>>>>>
>>>>> Of course there is. If you think at a more broader scope there
>>>>> could
>>>>> be a coordination component controlling the individual components
>>>>> such
>>>>> that the results are not arbitrary at all.
>>>>>
>>>>> In fact this is a good example of explicit vs. implicit
>>>>> coordination.
>>>>
>>>> This is _exactly_ the situation we have in our projects. Multiple
>>>> components with unbuffered output connections, to a single input
>>>> connection on another component. A coordination component ensures
>>>> that
>>>> only one of the input components is running at a time, but they are
>>>> all connected.
>>>>
>>>> Here, we want the latest data value available. No more, no less.
>>>
>>> And if you used bufferports of capacity 1 instead of dataports? This
>>> would have the additional benefit of replacing polling by event
>>> driven
>>> behavior. I must admit I still find this sampling of dataport
>>> values a
>>> bit odd.
>>
>> You find the overall structure odd? Is that what you mean? If so, we
>> are open to alternatives if you have something concrete in mind. If
>> this structure is very unusual (although based on Peter's comments, I
>> don't believe that is the case), we would love to know what we're
>> doing wrong and maybe in the process decrease Sylvain's blood
>> pressure
>> at having to deal with this "exception". :-)
>
> Not the overall structure :-) I agree this is a common and valid
> pattern. What I find odd is why unbuffered dataports are used for
> exchanging data, which means that the receiver effectively has to
> sample the values, there is this trouble with initializing and dealing
> with uninitialized ports wheras it seems so much more natural (to me)
> to use a buffered port, even if its only of size 1.
>
> Actually rereading what Peter writes above ("once an input port is
> read, the sample is consumed and a next read will return false") is
> exactly that, no?

Re-reading some of Sylvain and Peter's earlier emails, can someone
please summarize the changes that will occur with 1) unbuffered data
ports, and 2) buffered dataports, between the current and proposed
implementations. I am particularly curious how this will affect
components that use polled strategies on ports. I am also particularly
concerned with Peter's comment above, that Markus quoted.

I just want to make sure I understand the full consequences of the
upcoming change.

Thanks
Stephen

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Mon, 24 Aug 2009, Peter Soetens wrote:

> On Mon, Aug 24, 2009 at 14:22, Markus
> Klotzbuecher<markus [dot] klotzbuecher [..] ...> wrote:
>> On Mon, Aug 24, 2009 at 01:21:13PM +0200, S Roderick wrote:
>>> On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:
>>>
>>>> On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher wrote:
>>>>> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
>>>>>
>>>>>>> Now, it is actually possible to have a MO/SI model, by allowing
>>>>>>> an input port
>>>>>>> to have multiple incoming channels, and having InputPort::read
>>>>>>> round-robin on
>>>>>>> those channels. As an added nicety, one can listen to the "new
>>>>>>> data" event and
>>>>>>> access only the port for which we have an indication that new
>>>>>>> data can be
>>>>>>> available. Implementing that would require very little added
>>>>>>> code, since the
>>>>>>> management of multiple channels is already present in OutputPort.
>>>>>>
>>>>>> Well, a 'polling' + keeping a pointer to the last read channel, such
>>>>>> that we try that one always first, and only start polling if that
>>>>>> channel turned out 'empty'. This empty detection is problematic for
>>>>>> shared data connections (opposed to buffered),  because once they
>>>>>> are
>>>>>> written, they always show a valid value. We might need to add that
>>>>>> once an input port is read, the sample is consumed, and a next read
>>>>>> will return false (no new data).
>>>>>
>>>>> Is there really a usecase for multiple incoming but unbuffered
>>>>> connections? It seems to me that the result would be quite arbitrary.
>>>>
>>>> Of course there is. If you think at a more broader scope there could
>>>> be a coordination component controlling the individual components such
>>>> that the results are not arbitrary at all.
>>>>
>>>> In fact this is a good example of explicit vs. implicit coordination.
>>>
>>> This is _exactly_ the situation we have in our projects. Multiple
>>> components with unbuffered output connections, to a single input
>>> connection on another component. A coordination component ensures that
>>> only one of the input components is running at a time, but they are
>>> all connected.
>>>
>>> Here, we want the latest data value available. No more, no less.
>>
>> And if you used bufferports of capacity 1 instead of dataports? This
>> would have the additional benefit of replacing polling by event driven
>> behavior. I must admit I still find this sampling of dataport values a
>> bit odd.
>
> First of all, there is no such thing as data ports or buffer ports (in
> RTT 2.0). There are only connection policies, and there are currently
> two major ones: 'shared data' and 'buffered'. They allow the
> application builder to choose which type of data exchange is the most
> efficient between components A and B. For example, in a completely
> synchronous data flow system, there is no use in installing buffers
> between components, just 'sharing data' is sufficient. There are other
> factors in play as well. As we discussed this earlier, we should write
> this down in the wiki such that application builders can make an
> informed choice.

Thanks for this summary. I badly needed it :-)

> In principle, as Herman pointed out, the algorithm in the component
> should be completely independent of a connection policy. That's also
> why he opposes the default policy settings of the input port. It
> 'taints' a clean separation of the 4 C's. Imho, convenience sometimes
> overrules clean design. This is especially so for small applications
> where 'specifying something in one place' overrides 'separation of
> concerns'. In practice this means that a user prefers 1 file of 40
> lines than 4 files of 10 lines, while on the other hand perfers 10
> files of 4.000 lines over 1 file of 40.000 lines of code. We have to
> accommodate both cases.
>
> The latter aside, the algorithm vs connection policy independence is
> the fundamental guideline for designing the input-port and output-port
> interface in C++. That's why we need to think thourougly over the
> semantics of read() and write(). We decided that write() was sent and
> forget. For read, it now returns a bool with the following 'truth'
> table:
>
> {{{
> Table of return values or read( Sample ) (True/False):
> status\policy | data | buffered |
> not connected | | |
> or never written | F | F |
> connected, | | |
> written in past | T | F |
> connected, and | | |
> new data | T | T |
> }}}
>
> There are in fact three states and only two return values. I believe
> that for uniformity and algorithm independence, we might benefit from
> a three state return value as well. As such, the algorithm or
> component logic can decide: what to do if the ports are not connected
> (flag error, which is caught by the coordination layer); what to do if
> some port did not receive new data (but others might have); what to do
> if a new sample has arrived. These decisions are independent of the
> connection policy. If on the other hand, only two return values are
> provided (like in the table), the algorithm might have to make
> assumptions on the type of connection policy because the middle case
> has different return values for data or buffered.
>
> So I'm basically agreeing with Markus here: to the algorithm, the
> ports only see samples ( as if everything is buffered) and the choice
> of data vs buffered is 'just' an optimization strategy when setting up
> the intercomponent connections. But there still remain three cases
> independent of the policy: 'not connected', 'connected but no new
> data' and 'connected and new data'.

For the reader the first two cases do not make a difference at all! The
result is that the component wanted data that is not there, and it cannot
do anything about it (i.e,, it can (should!) do nothing about the
connection problem, or about the data-not-being-there problem). Hence, I
see no need to reflect this status difference in the read() return value.

And why is the write() case so different? It also should have a return
value that indicates whether the write was successful or not. (But it
should not have to know the _cause_ of the failure, similarly to the read()
call.)

> Today, a data connection will
> return in the middle case the last sample, while a buffered will not
> (in first and last case they behave identical). Maybe that's the first
> thing we need to fix and straight out.
>
> Peter

Herman

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Mon, Aug 24, 2009 at 01:48:42PM +0200, Herman Bruyninckx wrote:
> On Mon, 24 Aug 2009, S Roderick wrote:
>
> > On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:
> >
> >> On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher wrote:
> >>> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
> >>>
> >>>>> Now, it is actually possible to have a MO/SI model, by allowing
> >>>>> an input port
> >>>>> to have multiple incoming channels, and having InputPort::read
> >>>>> round-robin on
> >>>>> those channels. As an added nicety, one can listen to the "new
> >>>>> data" event and
> >>>>> access only the port for which we have an indication that new
> >>>>> data can be
> >>>>> available. Implementing that would require very little added
> >>>>> code, since the
> >>>>> management of multiple channels is already present in OutputPort.
> >>>>
> >>>> Well, a 'polling' + keeping a pointer to the last read channel, such
> >>>> that we try that one always first, and only start polling if that
> >>>> channel turned out 'empty'. This empty detection is problematic for
> >>>> shared data connections (opposed to buffered), because once they
> >>>> are
> >>>> written, they always show a valid value. We might need to add that
> >>>> once an input port is read, the sample is consumed, and a next read
> >>>> will return false (no new data).
> >>>
> >>> Is there really a usecase for multiple incoming but unbuffered
> >>> connections? It seems to me that the result would be quite arbitrary.
> >>
> >> Of course there is. If you think at a more broader scope there could
> >> be a coordination component controlling the individual components such
> >> that the results are not arbitrary at all.
> >>
> >> In fact this is a good example of explicit vs. implicit coordination.
> >
> > This is _exactly_ the situation we have in our projects. Multiple
> > components with unbuffered output connections, to a single input
> > connection on another component. A coordination component ensures that
> > only one of the input components is running at a time, but they are
> > all connected.
> >
> > Here, we want the latest data value available. No more, no less.
> >
> > Otherwise, Markus is correct. Having more than one input component
> > running simultaneously would be arbitrary and give nonsense output data.
>
> Indeed... So, the conclusion I draw from this (sub)discussion is the
> following: the _coordinated_ multi-writer use case is so special that it
> does not deserve its own feature in the Data Ports part of RTT. (The
> Coordinator will (have to) know about all its "data providers", and
> make/delete the connections to them explicitly. So, there is no need to
> "help him out" by this specific data port policy implementation.)

I agree. However as I understand this approach is complicated by the
fact that creating/deleting connections is not real-time safe.

Markus

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Aug 25, 2009, at 03:38 , Herman Bruyninckx wrote:

> On Mon, 24 Aug 2009, Peter Soetens wrote:
>
>> On Mon, Aug 24, 2009 at 14:22, Markus
>> Klotzbuecher<markus [dot] klotzbuecher [..] ...> wrote:
>>> On Mon, Aug 24, 2009 at 01:21:13PM +0200, S Roderick wrote:
>>>> On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:
>>>>
>>>>> On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher
>>>>> wrote:
>>>>>> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
>>>>>>
>>>>>>>> Now, it is actually possible to have a MO/SI model, by allowing
>>>>>>>> an input port
>>>>>>>> to have multiple incoming channels, and having InputPort::read
>>>>>>>> round-robin on
>>>>>>>> those channels. As an added nicety, one can listen to the "new
>>>>>>>> data" event and
>>>>>>>> access only the port for which we have an indication that new
>>>>>>>> data can be
>>>>>>>> available. Implementing that would require very little added
>>>>>>>> code, since the
>>>>>>>> management of multiple channels is already present in
>>>>>>>> OutputPort.
>>>>>>>
>>>>>>> Well, a 'polling' + keeping a pointer to the last read
>>>>>>> channel, such
>>>>>>> that we try that one always first, and only start polling if
>>>>>>> that
>>>>>>> channel turned out 'empty'. This empty detection is
>>>>>>> problematic for
>>>>>>> shared data connections (opposed to buffered), because once
>>>>>>> they
>>>>>>> are
>>>>>>> written, they always show a valid value. We might need to add
>>>>>>> that
>>>>>>> once an input port is read, the sample is consumed, and a next
>>>>>>> read
>>>>>>> will return false (no new data).
>>>>>>
>>>>>> Is there really a usecase for multiple incoming but unbuffered
>>>>>> connections? It seems to me that the result would be quite
>>>>>> arbitrary.
>>>>>
>>>>> Of course there is. If you think at a more broader scope there
>>>>> could
>>>>> be a coordination component controlling the individual
>>>>> components such
>>>>> that the results are not arbitrary at all.
>>>>>
>>>>> In fact this is a good example of explicit vs. implicit
>>>>> coordination.
>>>>
>>>> This is _exactly_ the situation we have in our projects. Multiple
>>>> components with unbuffered output connections, to a single input
>>>> connection on another component. A coordination component ensures
>>>> that
>>>> only one of the input components is running at a time, but they are
>>>> all connected.
>>>>
>>>> Here, we want the latest data value available. No more, no less.
>>>
>>> And if you used bufferports of capacity 1 instead of dataports? This
>>> would have the additional benefit of replacing polling by event
>>> driven
>>> behavior. I must admit I still find this sampling of dataport
>>> values a
>>> bit odd.
>>
>> First of all, there is no such thing as data ports or buffer ports
>> (in
>> RTT 2.0). There are only connection policies, and there are currently
>> two major ones: 'shared data' and 'buffered'. They allow the
>> application builder to choose which type of data exchange is the most
>> efficient between components A and B. For example, in a completely
>> synchronous data flow system, there is no use in installing buffers
>> between components, just 'sharing data' is sufficient. There are
>> other
>> factors in play as well. As we discussed this earlier, we should
>> write
>> this down in the wiki such that application builders can make an
>> informed choice.
>
> Thanks for this summary. I badly needed it :-)
>
>> In principle, as Herman pointed out, the algorithm in the component
>> should be completely independent of a connection policy. That's also
>> why he opposes the default policy settings of the input port. It
>> 'taints' a clean separation of the 4 C's. Imho, convenience sometimes
>> overrules clean design. This is especially so for small applications
>> where 'specifying something in one place' overrides 'separation of
>> concerns'. In practice this means that a user prefers 1 file of 40
>> lines than 4 files of 10 lines, while on the other hand perfers 10
>> files of 4.000 lines over 1 file of 40.000 lines of code. We have to
>> accommodate both cases.
>>
>> The latter aside, the algorithm vs connection policy independence is
>> the fundamental guideline for designing the input-port and output-
>> port
>> interface in C++. That's why we need to think thourougly over the
>> semantics of read() and write(). We decided that write() was sent and
>> forget. For read, it now returns a bool with the following 'truth'
>> table:
>>
>> {{{
>> Table of return values or read( Sample ) (True/False):
>> status\policy | data | buffered |
>> not connected | | |
>> or never written | F | F |
>> connected, | | |
>> written in past | T | F |
>> connected, and | | |
>> new data | T | T |
>> }}}
>>
>> There are in fact three states and only two return values. I believe
>> that for uniformity and algorithm independence, we might benefit from
>> a three state return value as well. As such, the algorithm or
>> component logic can decide: what to do if the ports are not connected
>> (flag error, which is caught by the coordination layer); what to do
>> if
>> some port did not receive new data (but others might have); what to
>> do
>> if a new sample has arrived. These decisions are independent of the
>> connection policy. If on the other hand, only two return values are
>> provided (like in the table), the algorithm might have to make
>> assumptions on the type of connection policy because the middle case
>> has different return values for data or buffered.
>>
>> So I'm basically agreeing with Markus here: to the algorithm, the
>> ports only see samples ( as if everything is buffered) and the choice
>> of data vs buffered is 'just' an optimization strategy when setting
>> up
>> the intercomponent connections. But there still remain three cases
>> independent of the policy: 'not connected', 'connected but no new
>> data' and 'connected and new data'.
>
> For the reader the first two cases do not make a difference at all!
> The
> result is that the component wanted data that is not there, and it
> cannot
> do anything about it (i.e,, it can (should!) do nothing about the
> connection problem, or about the data-not-being-there problem).
> Hence, I
> see no need to reflect this status difference in the read() return
> value.

I disagree. The first two cases are different, if you have a component
that is only interested in the latest data (even if it is a cycle or
two old). We get this when we have devices (eg GPS) that don't always
return data at a fixed rate. Whatever is using that data is simply
happy to have the latest available, not the exact value from that
exact cycle.

> And why is the write() case so different? It also should have a return
> value that indicates whether the write was successful or not. (But it
> should not have to know the _cause_ of the failure, similarly to the
> read()
> call.)

As with the other questioners, what exact semantics are you
attributing to "successful or not" write here?
S

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Tue, Aug 25, 2009 at 09:38:49AM +0200, Herman Bruyninckx wrote:
> On Mon, 24 Aug 2009, Peter Soetens wrote:
>
> > On Mon, Aug 24, 2009 at 14:22, Markus
> > Klotzbuecher<markus [dot] klotzbuecher [..] ...> wrote:
> >> On Mon, Aug 24, 2009 at 01:21:13PM +0200, S Roderick wrote:
> >>> On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:
> >>>
> >>>> On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher wrote:
> >>>>> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
> >>>>>
> >>>>>>> Now, it is actually possible to have a MO/SI model, by allowing
> >>>>>>> an input port
> >>>>>>> to have multiple incoming channels, and having InputPort::read
> >>>>>>> round-robin on
> >>>>>>> those channels. As an added nicety, one can listen to the "new
> >>>>>>> data" event and
> >>>>>>> access only the port for which we have an indication that new
> >>>>>>> data can be
> >>>>>>> available. Implementing that would require very little added
> >>>>>>> code, since the
> >>>>>>> management of multiple channels is already present in OutputPort.
> >>>>>>
> >>>>>> Well, a 'polling' + keeping a pointer to the last read channel, such
> >>>>>> that we try that one always first, and only start polling if that
> >>>>>> channel turned out 'empty'. This empty detection is problematic for
> >>>>>> shared data connections (opposed to buffered),  because once they
> >>>>>> are
> >>>>>> written, they always show a valid value. We might need to add that
> >>>>>> once an input port is read, the sample is consumed, and a next read
> >>>>>> will return false (no new data).
> >>>>>
> >>>>> Is there really a usecase for multiple incoming but unbuffered
> >>>>> connections? It seems to me that the result would be quite arbitrary.
> >>>>
> >>>> Of course there is. If you think at a more broader scope there could
> >>>> be a coordination component controlling the individual components such
> >>>> that the results are not arbitrary at all.
> >>>>
> >>>> In fact this is a good example of explicit vs. implicit coordination.
> >>>
> >>> This is _exactly_ the situation we have in our projects. Multiple
> >>> components with unbuffered output connections, to a single input
> >>> connection on another component. A coordination component ensures that
> >>> only one of the input components is running at a time, but they are
> >>> all connected.
> >>>
> >>> Here, we want the latest data value available. No more, no less.
> >>
> >> And if you used bufferports of capacity 1 instead of dataports? This
> >> would have the additional benefit of replacing polling by event driven
> >> behavior. I must admit I still find this sampling of dataport values a
> >> bit odd.
> >
> > First of all, there is no such thing as data ports or buffer ports (in
> > RTT 2.0). There are only connection policies, and there are currently
> > two major ones: 'shared data' and 'buffered'. They allow the
> > application builder to choose which type of data exchange is the most
> > efficient between components A and B. For example, in a completely
> > synchronous data flow system, there is no use in installing buffers
> > between components, just 'sharing data' is sufficient. There are other
> > factors in play as well. As we discussed this earlier, we should write
> > this down in the wiki such that application builders can make an
> > informed choice.
>
> Thanks for this summary. I badly needed it :-)
>
> > In principle, as Herman pointed out, the algorithm in the component
> > should be completely independent of a connection policy. That's also
> > why he opposes the default policy settings of the input port. It
> > 'taints' a clean separation of the 4 C's. Imho, convenience sometimes
> > overrules clean design. This is especially so for small applications
> > where 'specifying something in one place' overrides 'separation of
> > concerns'. In practice this means that a user prefers 1 file of 40
> > lines than 4 files of 10 lines, while on the other hand perfers 10
> > files of 4.000 lines over 1 file of 40.000 lines of code. We have to
> > accommodate both cases.
> >
> > The latter aside, the algorithm vs connection policy independence is
> > the fundamental guideline for designing the input-port and output-port
> > interface in C++. That's why we need to think thourougly over the
> > semantics of read() and write(). We decided that write() was sent and
> > forget. For read, it now returns a bool with the following 'truth'
> > table:
> >
> > {{{
> > Table of return values or read( Sample ) (True/False):
> > status\policy | data | buffered |
> > not connected | | |
> > or never written | F | F |
> > connected, | | |
> > written in past | T | F |
> > connected, and | | |
> > new data | T | T |
> > }}}
> >
> > There are in fact three states and only two return values. I believe
> > that for uniformity and algorithm independence, we might benefit from
> > a three state return value as well. As such, the algorithm or
> > component logic can decide: what to do if the ports are not connected
> > (flag error, which is caught by the coordination layer); what to do if
> > some port did not receive new data (but others might have); what to do
> > if a new sample has arrived. These decisions are independent of the
> > connection policy. If on the other hand, only two return values are
> > provided (like in the table), the algorithm might have to make
> > assumptions on the type of connection policy because the middle case
> > has different return values for data or buffered.
> >
> > So I'm basically agreeing with Markus here: to the algorithm, the
> > ports only see samples ( as if everything is buffered) and the choice
> > of data vs buffered is 'just' an optimization strategy when setting up
> > the intercomponent connections. But there still remain three cases
> > independent of the policy: 'not connected', 'connected but no new
> > data' and 'connected and new data'.
>
> For the reader the first two cases do not make a difference at all! The
> result is that the component wanted data that is not there, and it cannot
> do anything about it (i.e,, it can (should!) do nothing about the
> connection problem, or about the data-not-being-there problem). Hence, I
> see no need to reflect this status difference in the read() return value.

I don't agree generally that data-not-being-there should be treated
identically as a connection error. This might be true for certain
components which rely on data always being available, but might not be
true for others which check for data but continue doing something else
if none is available. Example state machine, the UML event queue is
modeled as a port with a buffered policy. If event in queue then
transition, else continue executing "do" behavior.

> And why is the write() case so different? It also should have a return
> value that indicates whether the write was successful or not. (But it
> should not have to know the _cause_ of the failure, similarly to the read()
> call.)

One problem I see is defining what successful means. Does is mean the
data was locally stored in the output queue or that is was put in the
recipient queue or read by the recipient? I suppose these could be QoS
policies of the connection. But when does a writer need to know if the
write was successful or not? One situation could be the case where it
is absolutely necessary that a particular datum is delivered. The
sequence of coordination (e.g. for a disconnected cable) could be as
follows:

1) component writes value to port, but gets an error. The same error
is propagated to the coordinator

2) coordinator stops writer and reader, removes connection and creates
backup connection through some other physical connection

3) coordinator restarts writer and reader, writer retries.

However this coordination would require knowledge on the writer side
(namely to retry), so computation and coordination is slightly
coupled. The only way to avoid this I can see would be to have a local
"reliable transmission" helper component which takes care of the above
sequence, without complicating the actual computation. Or is this
example to contrived?

Markus

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Tue, Aug 25, 2009 at 12:10, Markus
Klotzbuecher<markus [dot] klotzbuecher [..] ...> wrote:
> On Tue, Aug 25, 2009 at 09:38:49AM +0200, Herman Bruyninckx wrote:
>> On Mon, 24 Aug 2009, Peter Soetens wrote:
>>
>> > On Mon, Aug 24, 2009 at 14:22, Markus
>> > Klotzbuecher<markus [dot] klotzbuecher [..] ...> wrote:
>> >> On Mon, Aug 24, 2009 at 01:21:13PM +0200, S Roderick wrote:
>> >>> On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:
>> >>>
>> >>>> On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher wrote:
>> >>>>> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
>> >>>>>
>> >>>>>>> Now, it is actually possible to have a MO/SI model, by allowing
>> >>>>>>> an input port
>> >>>>>>> to have multiple incoming channels, and having InputPort::read
>> >>>>>>> round-robin on
>> >>>>>>> those channels. As an added nicety, one can listen to the "new
>> >>>>>>> data" event and
>> >>>>>>> access only the port for which we have an indication that new
>> >>>>>>> data can be
>> >>>>>>> available. Implementing that would require very little added
>> >>>>>>> code, since the
>> >>>>>>> management of multiple channels is already present in OutputPort.
>> >>>>>>
>> >>>>>> Well, a 'polling' + keeping a pointer to the last read channel, such
>> >>>>>> that we try that one always first, and only start polling if that
>> >>>>>> channel turned out 'empty'. This empty detection is problematic for
>> >>>>>> shared data connections (opposed to buffered),  because once they
>> >>>>>> are
>> >>>>>> written, they always show a valid value. We might need to add that
>> >>>>>> once an input port is read, the sample is consumed, and a next read
>> >>>>>> will return false (no new data).
>> >>>>>
>> >>>>> Is there really a usecase for multiple incoming but unbuffered
>> >>>>> connections? It seems to me that the result would be quite arbitrary.
>> >>>>
>> >>>> Of course there is. If you think at a more broader scope there could
>> >>>> be a coordination component controlling the individual components such
>> >>>> that the results are not arbitrary at all.
>> >>>>
>> >>>> In fact this is a good example of explicit vs. implicit coordination.
>> >>>
>> >>> This is _exactly_ the situation we have in our projects. Multiple
>> >>> components with unbuffered output connections, to a single input
>> >>> connection on another component. A coordination component ensures that
>> >>> only one of the input components is running at a time, but they are
>> >>> all connected.
>> >>>
>> >>> Here, we want the latest data value available. No more, no less.
>> >>
>> >> And if you used bufferports of capacity 1 instead of dataports? This
>> >> would have the additional benefit of replacing polling by event driven
>> >> behavior. I must admit I still find this sampling of dataport values a
>> >> bit odd.
>> >
>> > First of all, there is no such thing as data ports or buffer ports (in
>> > RTT 2.0). There are only connection policies, and there are currently
>> > two major ones: 'shared data' and 'buffered'. They allow the
>> > application builder to choose which type of data exchange is the most
>> > efficient between components A and B. For example, in a completely
>> > synchronous data flow system, there is no use in installing buffers
>> > between components, just 'sharing data' is sufficient. There are other
>> > factors in play as well. As we discussed this earlier, we should write
>> > this down in the wiki such that application builders can make an
>> > informed choice.
>>
>> Thanks for this summary. I badly needed it :-)
>>
>> > In principle, as Herman pointed out, the algorithm in the component
>> > should be completely independent of a connection policy. That's also
>> > why he opposes the default policy settings of the input port. It
>> > 'taints' a clean separation of the 4 C's. Imho, convenience sometimes
>> > overrules clean design. This is especially so for small applications
>> > where 'specifying something in one place' overrides 'separation of
>> > concerns'. In practice this means that a user prefers 1 file of 40
>> > lines than 4 files of 10 lines, while on the other hand perfers 10
>> > files of 4.000 lines over 1 file of 40.000 lines of code. We have to
>> > accommodate both cases.
>> >
>> > The latter aside, the algorithm vs connection policy independence is
>> > the fundamental guideline for designing the input-port and output-port
>> > interface in C++. That's why we need to think thourougly over the
>> > semantics of read() and write(). We decided that write() was sent and
>> > forget. For read, it now returns a bool with the following 'truth'
>> > table:
>> >
>> > {{{
>> > Table of return values or read( Sample ) (True/False):
>> > status\policy    |  data  |  buffered |
>> > not connected  |          |              |
>> > or never written |    F    |      F      |
>> > connected,       |          |              |
>> > written in past   |   T     |      F      |
>> > connected, and |          |              |
>> > new data          |    T    |       T      |
>> > }}}
>> >
>> > There are in fact three states and only two return values. I believe
>> > that for uniformity and algorithm independence, we might benefit from
>> > a three state return value as well. As such, the algorithm or
>> > component logic can decide: what to do if the ports are not connected
>> > (flag error, which is caught by the coordination layer); what to do if
>> > some port did not receive new data (but others might have); what to do
>> > if a new sample has arrived. These decisions are independent of the
>> > connection policy. If on the other hand, only two return values are
>> > provided (like in the table), the algorithm might have to make
>> > assumptions on the type of connection policy because the middle case
>> > has different return values for data or buffered.
>> >
>> > So I'm basically agreeing with Markus here: to the algorithm, the
>> > ports only see samples ( as if everything is buffered) and the choice
>> > of data vs buffered is 'just' an optimization strategy when setting up
>> > the intercomponent connections. But there still remain three cases
>> > independent of the policy: 'not connected', 'connected but no new
>> > data' and 'connected and new data'.
>>
>> For the reader the first two cases do not make a difference at all! The
>> result is that the component wanted data that is not there, and it cannot
>> do anything about it (i.e,, it can (should!) do nothing about the
>> connection problem, or about the data-not-being-there problem). Hence, I
>> see no need to reflect this status difference in the read() return value.
>
> I don't agree generally that data-not-being-there should be treated
> identically as a connection error. This might be true for certain
> components which rely on data always being available, but might not be
> true for others which check for data but continue doing something else
> if none is available. Example state machine, the UML event queue is
> modeled as a port with a buffered policy. If event in queue then
> transition, else continue executing "do" behavior.

I agree with Markus again: one state is an error, the two others are
data/no data.

>
>> And why is the write() case so different? It also should have a return
>> value that indicates whether the write was successful or not. (But it
>> should not have to know the _cause_ of the failure, similarly to the read()
>> call.)
>
> One problem I see is defining what successful means. Does is mean the
> data was locally stored in the output queue or that is was put in the
> recipient queue or read by the recipient? I suppose these could be QoS
> policies of the connection. But when does a writer need to know if the
> write was successful or not? One situation could be the case where it
> is absolutely necessary that a particular datum is delivered. The
> sequence of coordination (e.g. for a disconnected cable) could be as
> follows:
>
> 1) component writes value to port, but gets an error. The same error
>   is propagated to the coordinator
>
> 2) coordinator stops writer and reader, removes connection and creates
>   backup connection through some other physical connection
>
> 3) coordinator restarts writer and reader, writer retries.
>
> However this coordination would require knowledge on the writer side
> (namely to retry), so computation and coordination is slightly
> coupled. The only way to avoid this I can see would be to have a local
> "reliable transmission" helper component which takes care of the above
> sequence, without complicating the actual computation. Or is this
> example to contrived?

It is. The component can't know the sample 'must' be delivered. That
is a system property. So logically, the component can only push stuff
into its output port and assume that the system will take action if
there are no receivers but there should have been.

Peter

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Tuesday 25 August 2009 12:10:39 Markus Klotzbuecher wrote:
> On Tue, Aug 25, 2009 at 09:38:49AM +0200, Herman Bruyninckx wrote:
> > On Mon, 24 Aug 2009, Peter Soetens wrote:
> > > On Mon, Aug 24, 2009 at 14:22, Markus
> > >
> > > Klotzbuecher<markus [dot] klotzbuecher [..] ...> wrote:
> > >> On Mon, Aug 24, 2009 at 01:21:13PM +0200, S Roderick wrote:
> > >>> On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:
> > >>>> On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher wrote:
> > >>>>> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
> > >>>>>>> Now, it is actually possible to have a MO/SI model, by allowing
> > >>>>>>> an input port
> > >>>>>>> to have multiple incoming channels, and having InputPort::read
> > >>>>>>> round-robin on
> > >>>>>>> those channels. As an added nicety, one can listen to the "new
> > >>>>>>> data" event and
> > >>>>>>> access only the port for which we have an indication that new
> > >>>>>>> data can be
> > >>>>>>> available. Implementing that would require very little added
> > >>>>>>> code, since the
> > >>>>>>> management of multiple channels is already present in OutputPort.
> > >>>>>>
> > >>>>>> Well, a 'polling' + keeping a pointer to the last read channel,
> > >>>>>> such that we try that one always first, and only start polling if
> > >>>>>> that channel turned out 'empty'. This empty detection is
> > >>>>>> problematic for shared data connections (opposed to buffered),
> > >>>>>> because once they are
> > >>>>>> written, they always show a valid value. We might need to add that
> > >>>>>> once an input port is read, the sample is consumed, and a next
> > >>>>>> read will return false (no new data).
> > >>>>>
> > >>>>> Is there really a usecase for multiple incoming but unbuffered
> > >>>>> connections? It seems to me that the result would be quite
> > >>>>> arbitrary.
> > >>>>
> > >>>> Of course there is. If you think at a more broader scope there could
> > >>>> be a coordination component controlling the individual components
> > >>>> such that the results are not arbitrary at all.
> > >>>>
> > >>>> In fact this is a good example of explicit vs. implicit
> > >>>> coordination.
> > >>>
> > >>> This is _exactly_ the situation we have in our projects. Multiple
> > >>> components with unbuffered output connections, to a single input
> > >>> connection on another component. A coordination component ensures
> > >>> that only one of the input components is running at a time, but they
> > >>> are all connected.
> > >>>
> > >>> Here, we want the latest data value available. No more, no less.
> > >>
> > >> And if you used bufferports of capacity 1 instead of dataports? This
> > >> would have the additional benefit of replacing polling by event driven
> > >> behavior. I must admit I still find this sampling of dataport values a
> > >> bit odd.
> > >
> > > First of all, there is no such thing as data ports or buffer ports (in
> > > RTT 2.0). There are only connection policies, and there are currently
> > > two major ones: 'shared data' and 'buffered'. They allow the
> > > application builder to choose which type of data exchange is the most
> > > efficient between components A and B. For example, in a completely
> > > synchronous data flow system, there is no use in installing buffers
> > > between components, just 'sharing data' is sufficient. There are other
> > > factors in play as well. As we discussed this earlier, we should write
> > > this down in the wiki such that application builders can make an
> > > informed choice.
> >
> > Thanks for this summary. I badly needed it :-)
> >
> > > In principle, as Herman pointed out, the algorithm in the component
> > > should be completely independent of a connection policy. That's also
> > > why he opposes the default policy settings of the input port. It
> > > 'taints' a clean separation of the 4 C's. Imho, convenience sometimes
> > > overrules clean design. This is especially so for small applications
> > > where 'specifying something in one place' overrides 'separation of
> > > concerns'. In practice this means that a user prefers 1 file of 40
> > > lines than 4 files of 10 lines, while on the other hand perfers 10
> > > files of 4.000 lines over 1 file of 40.000 lines of code. We have to
> > > accommodate both cases.
> > >
> > > The latter aside, the algorithm vs connection policy independence is
> > > the fundamental guideline for designing the input-port and output-port
> > > interface in C++. That's why we need to think thourougly over the
> > > semantics of read() and write(). We decided that write() was sent and
> > > forget. For read, it now returns a bool with the following 'truth'
> > > table:
> > >
> > > {{{
> > > Table of return values or read( Sample ) (True/False):
> > > status\policy | data | buffered |
> > > not connected | | |
> > > or never written | F | F |
> > > connected, | | |
> > > written in past | T | F |
> > > connected, and | | |
> > > new data | T | T |
> > > }}}
> > >
> > > There are in fact three states and only two return values. I believe
> > > that for uniformity and algorithm independence, we might benefit from
> > > a three state return value as well. As such, the algorithm or
> > > component logic can decide: what to do if the ports are not connected
> > > (flag error, which is caught by the coordination layer); what to do if
> > > some port did not receive new data (but others might have); what to do
> > > if a new sample has arrived. These decisions are independent of the
> > > connection policy. If on the other hand, only two return values are
> > > provided (like in the table), the algorithm might have to make
> > > assumptions on the type of connection policy because the middle case
> > > has different return values for data or buffered.
> > >
> > > So I'm basically agreeing with Markus here: to the algorithm, the
> > > ports only see samples ( as if everything is buffered) and the choice
> > > of data vs buffered is 'just' an optimization strategy when setting up
> > > the intercomponent connections. But there still remain three cases
> > > independent of the policy: 'not connected', 'connected but no new
> > > data' and 'connected and new data'.
> >
> > For the reader the first two cases do not make a difference at all! The
> > result is that the component wanted data that is not there, and it cannot
> > do anything about it (i.e,, it can (should!) do nothing about the
> > connection problem, or about the data-not-being-there problem). Hence, I
> > see no need to reflect this status difference in the read() return value.
>
> I don't agree generally that data-not-being-there should be treated
> identically as a connection error. This might be true for certain
> components which rely on data always being available, but might not be
> true for others which check for data but continue doing something else
> if none is available. Example state machine, the UML event queue is
> modeled as a port with a buffered policy. If event in queue then
> transition, else continue executing "do" behavior.
>
> > And why is the write() case so different? It also should have a return
> > value that indicates whether the write was successful or not. (But it
> > should not have to know the _cause_ of the failure, similarly to the
> > read() call.)
>
> One problem I see is defining what successful means. Does is mean the
> data was locally stored in the output queue or that is was put in the
> recipient queue or read by the recipient? I suppose these could be QoS
> policies of the connection. But when does a writer need to know if the
> write was successful or not? One situation could be the case where it
> is absolutely necessary that a particular datum is delivered. The
> sequence of coordination (e.g. for a disconnected cable) could be as
> follows:
>
> 1) component writes value to port, but gets an error. The same error
> is propagated to the coordinator
>
> 2) coordinator stops writer and reader, removes connection and creates
> backup connection through some other physical connection
>
> 3) coordinator restarts writer and reader, writer retries.
>
> However this coordination would require knowledge on the writer side
> (namely to retry), so computation and coordination is slightly
> coupled. The only way to avoid this I can see would be to have a local
> "reliable transmission" helper component which takes care of the above
> sequence, without complicating the actual computation. Or is this
> example to contrived?

As you wrote it yourself, the overall fault-tolerance is achieved by a
coordinator component, not by the writer himself. Hence, the writer does not
have to know if the write was successful or not. I.e. it is achieved by a
component that does not have access to the return value of write() ...

In my opinion, what one needs is an introspection interface for the transport.
The most important feature of that interface is that the individual components
should not have to know about it, only the supervision components should.

Now, on the return value of read(). I think there is two visions about the
data connections:
- the "distributed state" vision. When you write to a data connection, you
just update part of the state with the latest estimate. The point being that,
by reading the connection, you always get the latest state estimate.
- the "data item" vision. In that case, data connections are just like buffer
connections of size 1. I actually don't see the "optimization" claim of Peter
-- I don't see how a data connection is an optimization versus a buffer of size
1.

In my opinion, there is room for both buffer and data connections as they are
now. The only issue with data connections is that they are completely
unsuitable for polling (as there is always data on them). I actually feel like
the tri-state return value is the best, as it unifies data and buffers (and,
later on, ring buffers if they are implemented). If a component only cares
about data availability, then it just tests the return value as true/false.
Otherwise, it can have the information available to him *regardless of the
type of connection it is reading from*.

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Tue, Aug 25, 2009 at 12:32, Sylvain Joyeux<sylvain [dot] joyeux [..] ...> wrote:
> On Tuesday 25 August 2009 12:10:39 Markus Klotzbuecher wrote:
...
>>
>> 1) component writes value to port, but gets an error. The same error
>>    is propagated to the coordinator
>>
>> 2) coordinator stops writer and reader, removes connection and creates
>>    backup connection through some other physical connection
>>
>> 3) coordinator restarts writer and reader, writer retries.
>>
>> However this coordination would require knowledge on the writer side
>> (namely to retry), so computation and coordination is slightly
>> coupled. The only way to avoid this I can see would be to have a local
>> "reliable transmission" helper component which takes care of the above
>> sequence, without complicating the actual computation. Or is this
>> example to contrived?
>
> As you wrote it yourself, the overall fault-tolerance is achieved by a
> coordinator component, not by the writer himself. Hence, the writer does not
> have to know if the write was successful or not. I.e. it is achieved by a
> component that does not have access to the return value of write() ...
>
> In my opinion, what one needs is an introspection interface for the transport.
> The most important feature of that interface is that the individual components
> should not have to know about it, only the supervision components should.

This functionality is not essential for RTT 2.0.

>
> Now, on the return value of read(). I think there is two visions about the
> data connections:
>  - the "distributed state" vision. When you write to a data connection, you
> just update part of the state with the latest estimate. The point being that,
> by reading the connection, you always get the latest state estimate.
>  - the "data item" vision.

But does the component's algorithm know this difference ?

> In that case, data connections are just like buffer
> connections of size 1. I actually don't see the "optimization" claim of Peter
> -- I don't see how a data connection is an optimization versus a buffer of size
> 1.

A buffer of size 1 can be more efficiently implemented than a generic
buffer of size N. Hence, a data connection which stores one element is
an optimized case of a buffer.

>
> In my opinion, there is room for both buffer and data connections as they are
> now. The only issue with data connections is that they are completely
> unsuitable for polling (as there is always data on them). I actually feel like
> the tri-state return value is the best, as it unifies data and buffers (and,
> later on, ring buffers if they are implemented). If a component only cares
> about data availability, then it just tests the return value as true/false.

This would mean that a buffer which was written once and now empty
writes the last read sample in 'sample' when returning NO_NEW_DATA
from read( sample ) ? Otherwise, it wouldn't be analogical to the
shared data case.

> Otherwise, it can have the information available to him *regardless of the
> type of connection it is reading from*.

For clarity, can you write down the truth table of return values + if
the ref-argument of read is filled in or not ? (the wiki can display
tables)

Peter

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Aug 25, 2009, at 06:32 , Sylvain Joyeux wrote:

> On Tuesday 25 August 2009 12:10:39 Markus Klotzbuecher wrote:
>> On Tue, Aug 25, 2009 at 09:38:49AM +0200, Herman Bruyninckx wrote:
>>> On Mon, 24 Aug 2009, Peter Soetens wrote:
>>>> On Mon, Aug 24, 2009 at 14:22, Markus
>>>>
>>>> Klotzbuecher<markus [dot] klotzbuecher [..] ...> wrote:
>>>>> On Mon, Aug 24, 2009 at 01:21:13PM +0200, S Roderick wrote:
>>>>>> On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:
>>>>>>> On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher
>>>>>>> wrote:
>>>>>>>> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
>>>>>>>>>> Now, it is actually possible to have a MO/SI model, by
>>>>>>>>>> allowing
>>>>>>>>>> an input port
>>>>>>>>>> to have multiple incoming channels, and having
>>>>>>>>>> InputPort::read
>>>>>>>>>> round-robin on
>>>>>>>>>> those channels. As an added nicety, one can listen to the
>>>>>>>>>> "new
>>>>>>>>>> data" event and
>>>>>>>>>> access only the port for which we have an indication that new
>>>>>>>>>> data can be
>>>>>>>>>> available. Implementing that would require very little added
>>>>>>>>>> code, since the
>>>>>>>>>> management of multiple channels is already present in
>>>>>>>>>> OutputPort.
>>>>>>>>>
>>>>>>>>> Well, a 'polling' + keeping a pointer to the last read
>>>>>>>>> channel,
>>>>>>>>> such that we try that one always first, and only start
>>>>>>>>> polling if
>>>>>>>>> that channel turned out 'empty'. This empty detection is
>>>>>>>>> problematic for shared data connections (opposed to buffered),
>>>>>>>>> because once they are
>>>>>>>>> written, they always show a valid value. We might need to
>>>>>>>>> add that
>>>>>>>>> once an input port is read, the sample is consumed, and a next
>>>>>>>>> read will return false (no new data).
>>>>>>>>
>>>>>>>> Is there really a usecase for multiple incoming but unbuffered
>>>>>>>> connections? It seems to me that the result would be quite
>>>>>>>> arbitrary.
>>>>>>>
>>>>>>> Of course there is. If you think at a more broader scope there
>>>>>>> could
>>>>>>> be a coordination component controlling the individual
>>>>>>> components
>>>>>>> such that the results are not arbitrary at all.
>>>>>>>
>>>>>>> In fact this is a good example of explicit vs. implicit
>>>>>>> coordination.
>>>>>>
>>>>>> This is _exactly_ the situation we have in our projects. Multiple
>>>>>> components with unbuffered output connections, to a single input
>>>>>> connection on another component. A coordination component ensures
>>>>>> that only one of the input components is running at a time, but
>>>>>> they
>>>>>> are all connected.
>>>>>>
>>>>>> Here, we want the latest data value available. No more, no less.
>>>>>
>>>>> And if you used bufferports of capacity 1 instead of dataports?
>>>>> This
>>>>> would have the additional benefit of replacing polling by event
>>>>> driven
>>>>> behavior. I must admit I still find this sampling of dataport
>>>>> values a
>>>>> bit odd.
>>>>
>>>> First of all, there is no such thing as data ports or buffer
>>>> ports (in
>>>> RTT 2.0). There are only connection policies, and there are
>>>> currently
>>>> two major ones: 'shared data' and 'buffered'. They allow the
>>>> application builder to choose which type of data exchange is the
>>>> most
>>>> efficient between components A and B. For example, in a completely
>>>> synchronous data flow system, there is no use in installing buffers
>>>> between components, just 'sharing data' is sufficient. There are
>>>> other
>>>> factors in play as well. As we discussed this earlier, we should
>>>> write
>>>> this down in the wiki such that application builders can make an
>>>> informed choice.
>>>
>>> Thanks for this summary. I badly needed it :-)
>>>
>>>> In principle, as Herman pointed out, the algorithm in the component
>>>> should be completely independent of a connection policy. That's
>>>> also
>>>> why he opposes the default policy settings of the input port. It
>>>> 'taints' a clean separation of the 4 C's. Imho, convenience
>>>> sometimes
>>>> overrules clean design. This is especially so for small
>>>> applications
>>>> where 'specifying something in one place' overrides 'separation of
>>>> concerns'. In practice this means that a user prefers 1 file of 40
>>>> lines than 4 files of 10 lines, while on the other hand perfers 10
>>>> files of 4.000 lines over 1 file of 40.000 lines of code. We have
>>>> to
>>>> accommodate both cases.
>>>>
>>>> The latter aside, the algorithm vs connection policy independence
>>>> is
>>>> the fundamental guideline for designing the input-port and output-
>>>> port
>>>> interface in C++. That's why we need to think thourougly over the
>>>> semantics of read() and write(). We decided that write() was sent
>>>> and
>>>> forget. For read, it now returns a bool with the following 'truth'
>>>> table:
>>>>
>>>> {{{
>>>> Table of return values or read( Sample ) (True/False):
>>>> status\policy | data | buffered |
>>>> not connected | | |
>>>> or never written | F | F |
>>>> connected, | | |
>>>> written in past | T | F |
>>>> connected, and | | |
>>>> new data | T | T |
>>>> }}}
>>>>
>>>> There are in fact three states and only two return values. I
>>>> believe
>>>> that for uniformity and algorithm independence, we might benefit
>>>> from
>>>> a three state return value as well. As such, the algorithm or
>>>> component logic can decide: what to do if the ports are not
>>>> connected
>>>> (flag error, which is caught by the coordination layer); what to
>>>> do if
>>>> some port did not receive new data (but others might have); what
>>>> to do
>>>> if a new sample has arrived. These decisions are independent of the
>>>> connection policy. If on the other hand, only two return values are
>>>> provided (like in the table), the algorithm might have to make
>>>> assumptions on the type of connection policy because the middle
>>>> case
>>>> has different return values for data or buffered.
>>>>
>>>> So I'm basically agreeing with Markus here: to the algorithm, the
>>>> ports only see samples ( as if everything is buffered) and the
>>>> choice
>>>> of data vs buffered is 'just' an optimization strategy when
>>>> setting up
>>>> the intercomponent connections. But there still remain three cases
>>>> independent of the policy: 'not connected', 'connected but no new
>>>> data' and 'connected and new data'.
>>>
>>> For the reader the first two cases do not make a difference at
>>> all! The
>>> result is that the component wanted data that is not there, and it
>>> cannot
>>> do anything about it (i.e,, it can (should!) do nothing about the
>>> connection problem, or about the data-not-being-there problem).
>>> Hence, I
>>> see no need to reflect this status difference in the read() return
>>> value.
>>
>> I don't agree generally that data-not-being-there should be treated
>> identically as a connection error. This might be true for certain
>> components which rely on data always being available, but might not
>> be
>> true for others which check for data but continue doing something
>> else
>> if none is available. Example state machine, the UML event queue is
>> modeled as a port with a buffered policy. If event in queue then
>> transition, else continue executing "do" behavior.
>>
>>> And why is the write() case so different? It also should have a
>>> return
>>> value that indicates whether the write was successful or not. (But
>>> it
>>> should not have to know the _cause_ of the failure, similarly to the
>>> read() call.)
>>
>> One problem I see is defining what successful means. Does is mean the
>> data was locally stored in the output queue or that is was put in the
>> recipient queue or read by the recipient? I suppose these could be
>> QoS
>> policies of the connection. But when does a writer need to know if
>> the
>> write was successful or not? One situation could be the case where it
>> is absolutely necessary that a particular datum is delivered. The
>> sequence of coordination (e.g. for a disconnected cable) could be as
>> follows:
>>
>> 1) component writes value to port, but gets an error. The same error
>> is propagated to the coordinator
>>
>> 2) coordinator stops writer and reader, removes connection and
>> creates
>> backup connection through some other physical connection
>>
>> 3) coordinator restarts writer and reader, writer retries.
>>
>> However this coordination would require knowledge on the writer side
>> (namely to retry), so computation and coordination is slightly
>> coupled. The only way to avoid this I can see would be to have a
>> local
>> "reliable transmission" helper component which takes care of the
>> above
>> sequence, without complicating the actual computation. Or is this
>> example to contrived?
>
> As you wrote it yourself, the overall fault-tolerance is achieved by a
> coordinator component, not by the writer himself. Hence, the writer
> does not
> have to know if the write was successful or not. I.e. it is achieved
> by a
> component that does not have access to the return value of write() ...
>
> In my opinion, what one needs is an introspection interface for the
> transport.
> The most important feature of that interface is that the individual
> components
> should not have to know about it, only the supervision components
> should.
>
> Now, on the return value of read(). I think there is two visions
> about the
> data connections:
> - the "distributed state" vision. When you write to a data
> connection, you
> just update part of the state with the latest estimate. The point
> being that,
> by reading the connection, you always get the latest state estimate.
> - the "data item" vision. In that case, data connections are just
> like buffer
> connections of size 1. I actually don't see the "optimization" claim
> of Peter
> -- I don't see how a data connection is an optimization versus a
> buffer of size
> 1.

This is a nice summary of the differences between the two conceptual
points of view. The vast majority of our existing Orocos code is
written with the "distributed state" vision. This may be a byproduct
of how the existing ports work - some of these connections may change
to the "data item" vision in the future.

> In my opinion, there is room for both buffer and data connections as
> they are
> now. The only issue with data connections is that they are completely
> unsuitable for polling (as there is always data on them). I actually
> feel like
> the tri-state return value is the best, as it unifies data and
> buffers (and,
> later on, ring buffers if they are implemented). If a component only
> cares
> about data availability, then it just tests the return value as true/
> false.
> Otherwise, it can have the information available to him *regardless
> of the
> type of connection it is reading from*.

I agree, there is room for both. I would like the tri-state return
value for the non-buffered port case, to be able to explicitly decide
on the three cases that Peter's table summarized.

With these changes, do we or do we _not_ get the latest value
available from a non-buffered port when no new value is avaliable? If
not, then it sounds like this is a big chance from the existing to the
new port frameworks. This will require rework of *all* code that uses
the "distributed state" vision (as the code will now have to locally
hold the last value sampled, and use that when the tri-state return
indicates no new data is available).

Stephen

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Tuesday 25 August 2009 13:28:38 S Roderick wrote:
> On Aug 25, 2009, at 06:32 , Sylvain Joyeux wrote:
> > On Tuesday 25 August 2009 12:10:39 Markus Klotzbuecher wrote:
> >> On Tue, Aug 25, 2009 at 09:38:49AM +0200, Herman Bruyninckx wrote:
> >>> On Mon, 24 Aug 2009, Peter Soetens wrote:
> >>>> On Mon, Aug 24, 2009 at 14:22, Markus
> >>>>
> >>>> Klotzbuecher<markus [dot] klotzbuecher [..] ...> wrote:
> >>>>> On Mon, Aug 24, 2009 at 01:21:13PM +0200, S Roderick wrote:
> >>>>>> On Aug 24, 2009, at 05:33 , Markus Klotzbuecher wrote:
> >>>>>>> On Mon, Aug 24, 2009 at 11:19:58AM +0200, Markus Klotzbuecher
> >>>>>>>
> >>>>>>> wrote:
> >>>>>>>> On Sun, Aug 23, 2009 at 03:13:19PM +0200, Peter Soetens wrote:
> >>>>>>>>>> Now, it is actually possible to have a MO/SI model, by
> >>>>>>>>>> allowing
> >>>>>>>>>> an input port
> >>>>>>>>>> to have multiple incoming channels, and having
> >>>>>>>>>> InputPort::read
> >>>>>>>>>> round-robin on
> >>>>>>>>>> those channels. As an added nicety, one can listen to the
> >>>>>>>>>> "new
> >>>>>>>>>> data" event and
> >>>>>>>>>> access only the port for which we have an indication that new
> >>>>>>>>>> data can be
> >>>>>>>>>> available. Implementing that would require very little added
> >>>>>>>>>> code, since the
> >>>>>>>>>> management of multiple channels is already present in
> >>>>>>>>>> OutputPort.
> >>>>>>>>>
> >>>>>>>>> Well, a 'polling' + keeping a pointer to the last read
> >>>>>>>>> channel,
> >>>>>>>>> such that we try that one always first, and only start
> >>>>>>>>> polling if
> >>>>>>>>> that channel turned out 'empty'. This empty detection is
> >>>>>>>>> problematic for shared data connections (opposed to buffered),
> >>>>>>>>> because once they are
> >>>>>>>>> written, they always show a valid value. We might need to
> >>>>>>>>> add that
> >>>>>>>>> once an input port is read, the sample is consumed, and a next
> >>>>>>>>> read will return false (no new data).
> >>>>>>>>
> >>>>>>>> Is there really a usecase for multiple incoming but unbuffered
> >>>>>>>> connections? It seems to me that the result would be quite
> >>>>>>>> arbitrary.
> >>>>>>>
> >>>>>>> Of course there is. If you think at a more broader scope there
> >>>>>>> could
> >>>>>>> be a coordination component controlling the individual
> >>>>>>> components
> >>>>>>> such that the results are not arbitrary at all.
> >>>>>>>
> >>>>>>> In fact this is a good example of explicit vs. implicit
> >>>>>>> coordination.
> >>>>>>
> >>>>>> This is _exactly_ the situation we have in our projects. Multiple
> >>>>>> components with unbuffered output connections, to a single input
> >>>>>> connection on another component. A coordination component ensures
> >>>>>> that only one of the input components is running at a time, but
> >>>>>> they
> >>>>>> are all connected.
> >>>>>>
> >>>>>> Here, we want the latest data value available. No more, no less.
> >>>>>
> >>>>> And if you used bufferports of capacity 1 instead of dataports?
> >>>>> This
> >>>>> would have the additional benefit of replacing polling by event
> >>>>> driven
> >>>>> behavior. I must admit I still find this sampling of dataport
> >>>>> values a
> >>>>> bit odd.
> >>>>
> >>>> First of all, there is no such thing as data ports or buffer
> >>>> ports (in
> >>>> RTT 2.0). There are only connection policies, and there are
> >>>> currently
> >>>> two major ones: 'shared data' and 'buffered'. They allow the
> >>>> application builder to choose which type of data exchange is the
> >>>> most
> >>>> efficient between components A and B. For example, in a completely
> >>>> synchronous data flow system, there is no use in installing buffers
> >>>> between components, just 'sharing data' is sufficient. There are
> >>>> other
> >>>> factors in play as well. As we discussed this earlier, we should
> >>>> write
> >>>> this down in the wiki such that application builders can make an
> >>>> informed choice.
> >>>
> >>> Thanks for this summary. I badly needed it :-)
> >>>
> >>>> In principle, as Herman pointed out, the algorithm in the component
> >>>> should be completely independent of a connection policy. That's
> >>>> also
> >>>> why he opposes the default policy settings of the input port. It
> >>>> 'taints' a clean separation of the 4 C's. Imho, convenience
> >>>> sometimes
> >>>> overrules clean design. This is especially so for small
> >>>> applications
> >>>> where 'specifying something in one place' overrides 'separation of
> >>>> concerns'. In practice this means that a user prefers 1 file of 40
> >>>> lines than 4 files of 10 lines, while on the other hand perfers 10
> >>>> files of 4.000 lines over 1 file of 40.000 lines of code. We have
> >>>> to
> >>>> accommodate both cases.
> >>>>
> >>>> The latter aside, the algorithm vs connection policy independence
> >>>> is
> >>>> the fundamental guideline for designing the input-port and output-
> >>>> port
> >>>> interface in C++. That's why we need to think thourougly over the
> >>>> semantics of read() and write(). We decided that write() was sent
> >>>> and
> >>>> forget. For read, it now returns a bool with the following 'truth'
> >>>> table:
> >>>>
> >>>> {{{
> >>>> Table of return values or read( Sample ) (True/False):
> >>>> status\policy | data | buffered |
> >>>> not connected | | |
> >>>> or never written | F | F |
> >>>> connected, | | |
> >>>> written in past | T | F |
> >>>> connected, and | | |
> >>>> new data | T | T |
> >>>> }}}
> >>>>
> >>>> There are in fact three states and only two return values. I
> >>>> believe
> >>>> that for uniformity and algorithm independence, we might benefit
> >>>> from
> >>>> a three state return value as well. As such, the algorithm or
> >>>> component logic can decide: what to do if the ports are not
> >>>> connected
> >>>> (flag error, which is caught by the coordination layer); what to
> >>>> do if
> >>>> some port did not receive new data (but others might have); what
> >>>> to do
> >>>> if a new sample has arrived. These decisions are independent of the
> >>>> connection policy. If on the other hand, only two return values are
> >>>> provided (like in the table), the algorithm might have to make
> >>>> assumptions on the type of connection policy because the middle
> >>>> case
> >>>> has different return values for data or buffered.
> >>>>
> >>>> So I'm basically agreeing with Markus here: to the algorithm, the
> >>>> ports only see samples ( as if everything is buffered) and the
> >>>> choice
> >>>> of data vs buffered is 'just' an optimization strategy when
> >>>> setting up
> >>>> the intercomponent connections. But there still remain three cases
> >>>> independent of the policy: 'not connected', 'connected but no new
> >>>> data' and 'connected and new data'.
> >>>
> >>> For the reader the first two cases do not make a difference at
> >>> all! The
> >>> result is that the component wanted data that is not there, and it
> >>> cannot
> >>> do anything about it (i.e,, it can (should!) do nothing about the
> >>> connection problem, or about the data-not-being-there problem).
> >>> Hence, I
> >>> see no need to reflect this status difference in the read() return
> >>> value.
> >>
> >> I don't agree generally that data-not-being-there should be treated
> >> identically as a connection error. This might be true for certain
> >> components which rely on data always being available, but might not
> >> be
> >> true for others which check for data but continue doing something
> >> else
> >> if none is available. Example state machine, the UML event queue is
> >> modeled as a port with a buffered policy. If event in queue then
> >> transition, else continue executing "do" behavior.
> >>
> >>> And why is the write() case so different? It also should have a
> >>> return
> >>> value that indicates whether the write was successful or not. (But
> >>> it
> >>> should not have to know the _cause_ of the failure, similarly to the
> >>> read() call.)
> >>
> >> One problem I see is defining what successful means. Does is mean the
> >> data was locally stored in the output queue or that is was put in the
> >> recipient queue or read by the recipient? I suppose these could be
> >> QoS
> >> policies of the connection. But when does a writer need to know if
> >> the
> >> write was successful or not? One situation could be the case where it
> >> is absolutely necessary that a particular datum is delivered. The
> >> sequence of coordination (e.g. for a disconnected cable) could be as
> >> follows:
> >>
> >> 1) component writes value to port, but gets an error. The same error
> >> is propagated to the coordinator
> >>
> >> 2) coordinator stops writer and reader, removes connection and
> >> creates
> >> backup connection through some other physical connection
> >>
> >> 3) coordinator restarts writer and reader, writer retries.
> >>
> >> However this coordination would require knowledge on the writer side
> >> (namely to retry), so computation and coordination is slightly
> >> coupled. The only way to avoid this I can see would be to have a
> >> local
> >> "reliable transmission" helper component which takes care of the
> >> above
> >> sequence, without complicating the actual computation. Or is this
> >> example to contrived?
> >
> > As you wrote it yourself, the overall fault-tolerance is achieved by a
> > coordinator component, not by the writer himself. Hence, the writer
> > does not
> > have to know if the write was successful or not. I.e. it is achieved
> > by a
> > component that does not have access to the return value of write() ...
> >
> > In my opinion, what one needs is an introspection interface for the
> > transport.
> > The most important feature of that interface is that the individual
> > components
> > should not have to know about it, only the supervision components
> > should.
> >
> > Now, on the return value of read(). I think there is two visions
> > about the
> > data connections:
> > - the "distributed state" vision. When you write to a data
> > connection, you
> > just update part of the state with the latest estimate. The point
> > being that,
> > by reading the connection, you always get the latest state estimate.
> > - the "data item" vision. In that case, data connections are just
> > like buffer
> > connections of size 1. I actually don't see the "optimization" claim
> > of Peter
> > -- I don't see how a data connection is an optimization versus a
> > buffer of size
> > 1.
>
> This is a nice summary of the differences between the two conceptual
> points of view. The vast majority of our existing Orocos code is
> written with the "distributed state" vision. This may be a byproduct
> of how the existing ports work - some of these connections may change
> to the "data item" vision in the future.
>
> > In my opinion, there is room for both buffer and data connections as
> > they are
> > now. The only issue with data connections is that they are completely
> > unsuitable for polling (as there is always data on them). I actually
> > feel like
> > the tri-state return value is the best, as it unifies data and
> > buffers (and,
> > later on, ring buffers if they are implemented). If a component only
> > cares
> > about data availability, then it just tests the return value as true/
> > false.
> > Otherwise, it can have the information available to him *regardless
> > of the
> > type of connection it is reading from*.
>
> I agree, there is room for both. I would like the tri-state return
> value for the non-buffered port case, to be able to explicitly decide
> on the three cases that Peter's table summarized.
>
> With these changes, do we or do we _not_ get the latest value
> available from a non-buffered port when no new value is avaliable? If
> not, then it sounds like this is a big chance from the existing to the
> new port frameworks. This will require rework of *all* code that uses
> the "distributed state" vision (as the code will now have to locally
> hold the last value sampled, and use that when the tri-state return
> indicates no new data is available).

We of course *do* get the value. In my opinion, what we would have is (names
are definitely not final :P)

data ports:
NO_DATA: connection never written, no sample returned (== false)
OLD_SAMPLE: sample already read, sample returned (!= false)
NEW_SAMPLE: sample never read, sample returned

buffer ports:
NO_DATA: no sample in buffer, no sample returned (== false)
NEW_SAMPLE: sample never read, sample returned

OLD_SAMPLE is obviously never returned.

Sylvain

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Tue, Aug 25, 2009 at 14:30, Sylvain Joyeux<sylvain [dot] joyeux [..] ...> wrote:
> On Tuesday 25 August 2009 13:28:38 S Roderick wrote:
>> With these changes, do we or do we _not_ get the latest value
>> available from a non-buffered port when no new value is avaliable? If
>> not, then it sounds like this is a big chance from the existing to the
>> new port frameworks. This will require rework of *all* code that uses
>> the "distributed state" vision (as the code will now have to locally
>> hold the last value sampled, and use that when the tri-state return
>> indicates no new data is available).
>
> We of course *do* get the value. In my opinion, what we would have is (names
> are definitely not final :P)
>
>  data ports:
>     NO_DATA:        connection never written, no sample returned (== false)
>     OLD_SAMPLE:  sample already read, sample returned (!= false)
>     NEW_SAMPLE: sample never read, sample returned
>
>  buffer ports:
>     NO_DATA:        no sample in buffer, no sample returned (== false)
>     NEW_SAMPLE: sample never read, sample returned
>
>  OLD_SAMPLE is obviously never returned.

I can't agree here. This lets connection policies implicitly leak
through in user/algorithm code. NO_DATA, OLD_SAMPLE and NEW_SAMPLE
must be interpretable without talking about connection policies. To
make this concrete, if we were to document the return values of
read(), the words 'buffer' or 'shared state/data' would not show up.
That's why I would suggest that the three values are documented as
such and that all connection policies can return them:

NO_DATA: connection never written, no sample returned (== false)
OLD_SAMPLE: last sample already read, old sample returned (!= false)
NEW_SAMPLE: new sample arrived, new sample returned

So a buffer policy will have to cache the last read sample in case the
buffer is empty.OLD_SAMPLE is a convenience function offered by the
RTT, in order to not have the duplication (and management) of caching
in user/algorithm code for "just in case it's a not-buffered
connection". Algorithms can't predict at what rate samples will
arrive, and sure can't predict the connection policy, so caching is
the only option, and I prefer to do it in RTT than in each and every
component written.

Peter

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Aug 31, 2009, at 07:30 , Peter Soetens wrote:

> On Tue, Aug 25, 2009 at 14:30, Sylvain
> Joyeux<sylvain [dot] joyeux [..] ...> wrote:
>> On Tuesday 25 August 2009 13:28:38 S Roderick wrote:
>>> With these changes, do we or do we _not_ get the latest value
>>> available from a non-buffered port when no new value is avaliable?
>>> If
>>> not, then it sounds like this is a big chance from the existing to
>>> the
>>> new port frameworks. This will require rework of *all* code that
>>> uses
>>> the "distributed state" vision (as the code will now have to locally
>>> hold the last value sampled, and use that when the tri-state return
>>> indicates no new data is available).
>>
>> We of course *do* get the value. In my opinion, what we would have
>> is (names
>> are definitely not final :P)
>>
>> data ports:
>> NO_DATA: connection never written, no sample returned
>> (== false)
>> OLD_SAMPLE: sample already read, sample returned (!= false)
>> NEW_SAMPLE: sample never read, sample returned
>>
>> buffer ports:
>> NO_DATA: no sample in buffer, no sample returned (==
>> false)
>> NEW_SAMPLE: sample never read, sample returned
>>
>> OLD_SAMPLE is obviously never returned.
>
> I can't agree here. This lets connection policies implicitly leak
> through in user/algorithm code. NO_DATA, OLD_SAMPLE and NEW_SAMPLE
> must be interpretable without talking about connection policies. To
> make this concrete, if we were to document the return values of
> read(), the words 'buffer' or 'shared state/data' would not show up.
> That's why I would suggest that the three values are documented as
> such and that all connection policies can return them:
>
> NO_DATA: connection never written, no sample returned
> (== false)
> OLD_SAMPLE: last sample already read, old sample returned (!=
> false)
> NEW_SAMPLE: new sample arrived, new sample returned
>
> So a buffer policy will have to cache the last read sample in case the
> buffer is empty.OLD_SAMPLE is a convenience function offered by the
> RTT, in order to not have the duplication (and management) of caching
> in user/algorithm code for "just in case it's a not-buffered
> connection". Algorithms can't predict at what rate samples will
> arrive, and sure can't predict the connection policy, so caching is
> the only option, and I prefer to do it in RTT than in each and every
> component written.

+1

Not having to special case user code based on (potentially changing)
connection policies is good.

Stephen

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Mon, 31 Aug 2009, Peter Soetens wrote:

> On Tue, Aug 25, 2009 at 14:30, Sylvain Joyeux<sylvain [dot] joyeux [..] ...> wrote:
>> On Tuesday 25 August 2009 13:28:38 S Roderick wrote:
>>> With these changes, do we or do we _not_ get the latest value
>>> available from a non-buffered port when no new value is avaliable? If
>>> not, then it sounds like this is a big chance from the existing to the
>>> new port frameworks. This will require rework of *all* code that uses
>>> the "distributed state" vision (as the code will now have to locally
>>> hold the last value sampled, and use that when the tri-state return
>>> indicates no new data is available).
>>
>> We of course *do* get the value. In my opinion, what we would have is (names
>> are definitely not final :P)
>>
>>  data ports:
>>     NO_DATA:        connection never written, no sample returned (== false)
>>     OLD_SAMPLE:  sample already read, sample returned (!= false)
>>     NEW_SAMPLE: sample never read, sample returned
>>
>>  buffer ports:
>>     NO_DATA:        no sample in buffer, no sample returned (== false)
>>     NEW_SAMPLE: sample never read, sample returned
>>
>>  OLD_SAMPLE is obviously never returned.
>
> I can't agree here. This lets connection policies implicitly leak
> through in user/algorithm code. NO_DATA, OLD_SAMPLE and NEW_SAMPLE
> must be interpretable without talking about connection policies. To
> make this concrete, if we were to document the return values of
> read(), the words 'buffer' or 'shared state/data' would not show up.
> That's why I would suggest that the three values are documented as
> such and that all connection policies can return them:
>
> NO_DATA: connection never written, no sample returned (== false)
> OLD_SAMPLE: last sample already read, old sample returned (!= false)
> NEW_SAMPLE: new sample arrived, new sample returned

"OLD" and "NEW" are more ambiguous than "ALREADY_READ_BEFORE" or
"NOT_YET_READ", so I prefer something like the latter.

> So a buffer policy will have to cache the last read sample in case the
> buffer is empty.OLD_SAMPLE is a convenience function offered by the
> RTT, in order to not have the duplication (and management) of caching
> in user/algorithm code for "just in case it's a not-buffered
> connection". Algorithms can't predict at what rate samples will
> arrive, and sure can't predict the connection policy, so caching is
> the only option, and I prefer to do it in RTT than in each and every
> component written.

Agreed.

Herman

PROPOSAL: multi-output/single-input behaviour [was: Data Flow 2.

On Aug 31, 2009, at 07:57 , Herman Bruyninckx wrote:

> On Mon, 31 Aug 2009, Peter Soetens wrote:
>
>> On Tue, Aug 25, 2009 at 14:30, Sylvain
>> Joyeux<sylvain [dot] joyeux [..] ...> wrote:
>>> On Tuesday 25 August 2009 13:28:38 S Roderick wrote:
>>>> With these changes, do we or do we _not_ get the latest value
>>>> available from a non-buffered port when no new value is
>>>> avaliable? If
>>>> not, then it sounds like this is a big chance from the existing
>>>> to the
>>>> new port frameworks. This will require rework of *all* code that
>>>> uses
>>>> the "distributed state" vision (as the code will now have to
>>>> locally
>>>> hold the last value sampled, and use that when the tri-state return
>>>> indicates no new data is available).
>>>
>>> We of course *do* get the value. In my opinion, what we would have
>>> is (names
>>> are definitely not final :P)
>>>
>>> data ports:
>>> NO_DATA: connection never written, no sample returned
>>> (== false)
>>> OLD_SAMPLE: sample already read, sample returned (!= false)
>>> NEW_SAMPLE: sample never read, sample returned
>>>
>>> buffer ports:
>>> NO_DATA: no sample in buffer, no sample returned (==
>>> false)
>>> NEW_SAMPLE: sample never read, sample returned
>>>
>>> OLD_SAMPLE is obviously never returned.
>>
>> I can't agree here. This lets connection policies implicitly leak
>> through in user/algorithm code. NO_DATA, OLD_SAMPLE and NEW_SAMPLE
>> must be interpretable without talking about connection policies. To
>> make this concrete, if we were to document the return values of
>> read(), the words 'buffer' or 'shared state/data' would not show up.
>> That's why I would suggest that the three values are documented as
>> such and that all connection policies can return them:
>>
>> NO_DATA: connection never written, no sample returned
>> (== false)
>> OLD_SAMPLE: last sample already read, old sample returned (!=
>> false)
>> NEW_SAMPLE: new sample arrived, new sample returned
>
> "OLD" and "NEW" are more ambiguous than "ALREADY_READ_BEFORE" or
> "NOT_YET_READ", so I prefer something like the latter.

+1, though I do believe users would figure the first one out if they
had to.

>> So a buffer policy will have to cache the last read sample in case
>> the
>> buffer is empty.OLD_SAMPLE is a convenience function offered by the
>> RTT, in order to not have the duplication (and management) of caching
>> in user/algorithm code for "just in case it's a not-buffered
>> connection". Algorithms can't predict at what rate samples will
>> arrive, and sure can't predict the connection policy, so caching is
>> the only option, and I prefer to do it in RTT than in each and every
>> component written.
>
> Agreed.

+1

Stephen