DataFlow 2.0 status and push/pull policy

Lets start with a status update. As ye'all know, there is no
rt-middleware available for inter-process communication, except the
low-level messaging libraries (and even most don't target hard
real-time). So we decided to write another transport (next to CORBA)
using message queues for doing real-time data flow ipc. Conveniently,
mqueues are supported by plain gnulinux and Xenomai, using the same
API.

One of the tricky parts is that since there's no middleware (unlike
with CORBA), there's someone needed to do the serialisation to/from
the queue. I came across the boost::serialization library and it fits
the purpose very well. It defines serialization algorithms independent
of 'archiving' algorithms. The former describes how data is
composed/decomposed in primitive types, the latter takes that
information and writes it into some format (text, binary, xml). The
boost::serialization library allows easy extension, but because it
does a lot bookkeeping (read mallocs) because it also serializes
pointers, subclasses etc. this was not real-time. The 'hard way' of
extending this library is by implementing the Archive concept
independent of the available helper classes. For our simple purpose,
this was fairly doable, if it wasn't for the wrong documentation or
implementation of the serialization part. But that's another thread on
another list. The end result is that it is possible to use
boost::serialization for real-time marshalling/demarshalling of
primitive types and std::vector or complexer (variable size) types.

Once serialization worked, it was with little effort added as a new
transport template. One can setup data flow streams using CORBA
('out-of-band') or by using the createStream functions on input and
output port. Also the return values of read are now NoData, NewData
and OldData. The deployer has not yet been adapted.

One thing I was struggling with is how large the buffer size of the
message queue should be. The current implementation creates the
requested pull (output side) or push (input side) buffer/data object
(a connection policy) and *in addition* and by definition, the message
queue is a buffer too. Practically this means that MQ based dataflow
has always two buffers: the MQ itself and the policy buffer. At least,
that's what you would think. In real practice, there is always an
input side buffer, optionally an output side buffer and then the MQ
buffer. That's because from the moment a message arrives on the MQ, we
pull it (such that the MQ won't fill up), store it at the input side
and inform the input port by raising the new data event.

In case you lost me, this is how it works:
PUSH:
output -> (buffer + input)
PULL:
(output + buffer) -> input

When using corba, this translates to:
PUSH:
output -> CORBA -> (buffer + input) : output.write() goes over corba
to store data in buffer
PULL:
(output + buffer) -> CORBA -> input : input.read() goes over corba to
read data from buffer

When using MQ, this translates to:
PUSH:
output -> MQ -> (buffer + input) : all is real-time
PULL:
(output + buffer) -> MQ -> (buffer + input) : all is real-time, last
buffer added de-facto by implementation (see below)

I was wondering two things:
1. is it really necessary that the user can specify push/pull ? Won't
this derive itself from the application architecture ?
2. couldnt' the MQ be the buffer/data element (regardless of
push/pull) in the data flow channel ?

If 1 is true, then 2 is answered as well. To know whether the
application architecture itself is enough to derive where
buffering/data storage must take place, we can test all cases:

1. in-process:
There is no difference between push and pull. You can specify it, but
it will result in the same topology
Conclusion: one buffer in the middle (push nor pull)
2. through corba:
PUSH: the output is punished for a remote client. This is fairly
unacceptable, unless the remote client is a real-time process itself,
(and output is not). Also, every sample output produced is sent over
the wire (possible bottleneck). It is still ok if input would do more
reads than output writes.
PULL: input is punished for listening to remote data, so input can't
be a real-time process. It is advantageous if output does more writes
than input does reads.
Conclusion: in case both sender and receiver are real-time processes,
neither push nor pull can satisfy the necessary architecture. It would
be therefore better to install buffers at both ends at all times and
let the ORB threads do the delivery of the data. So a buffer on each
side (push and pull)
3. through MQ:
PUSH: the buffer on input side is there for 'corba' legacy issues.
CORBA had a buffer there, so MQ too. it could possibly be replaced by
the MQ itself
PULL: same comment as PUSH, but the added buffer on input side is for
the message dispatcher which listens to the message queues for new
data and then needs some place to store that data. That's why we
always need a buffer at input side.
Conclusion: the MQ could play as the buffer in the middle (push nor pull)

I'm mixing current implementation with a new design proposal here,
which might be confusing. The *real* point I needed to make is: should
the user specify push/pull or can the application always derive the
correct places to put buffers ? I would say yes, but I might be
overlooking why Sylvain installed this policy.

Peter

DataFlow 2.0 status and push/pull policy

Hi Peter,

On Tue, Sep 22, 2009 at 03:07:26PM +0200, Peter Soetens wrote:
> Lets start with a status update. As ye'all know, there is no
> rt-middleware available for inter-process communication, except the
> low-level messaging libraries (and even most don't target hard
> real-time). So we decided to write another transport (next to CORBA)
> using message queues for doing real-time data flow ipc. Conveniently,
> mqueues are supported by plain gnulinux and Xenomai, using the same
> API.
>
> One of the tricky parts is that since there's no middleware (unlike
> with CORBA), there's someone needed to do the serialisation to/from
> the queue. I came across the boost::serialization library and it fits
> the purpose very well. It defines serialization algorithms independent
> of 'archiving' algorithms. The former describes how data is
> composed/decomposed in primitive types, the latter takes that
> information and writes it into some format (text, binary, xml). The
> boost::serialization library allows easy extension, but because it
> does a lot bookkeeping (read mallocs) because it also serializes
> pointers, subclasses etc. this was not real-time. The 'hard way' of
> extending this library is by implementing the Archive concept
> independent of the available helper classes. For our simple purpose,
> this was fairly doable, if it wasn't for the wrong documentation or
> implementation of the serialization part. But that's another thread on
> another list. The end result is that it is possible to use
> boost::serialization for real-time marshalling/demarshalling of
> primitive types and std::vector or complexer (variable size) types.

How will custom types be treated?

> Once serialization worked, it was with little effort added as a new
> transport template. One can setup data flow streams using CORBA
> ('out-of-band') or by using the createStream functions on input and
> output port. Also the return values of read are now NoData, NewData
> and OldData. The deployer has not yet been adapted.
>
> One thing I was struggling with is how large the buffer size of the
> message queue should be. The current implementation creates the
> requested pull (output side) or push (input side) buffer/data object
> (a connection policy) and *in addition* and by definition, the message
> queue is a buffer too. Practically this means that MQ based dataflow
> has always two buffers: the MQ itself and the policy buffer. At least,
> that's what you would think. In real practice, there is always an
> input side buffer, optionally an output side buffer and then the MQ
> buffer. That's because from the moment a message arrives on the MQ, we
> pull it (such that the MQ won't fill up), store it at the input side
> and inform the input port by raising the new data event.
>
> In case you lost me, this is how it works:
> PUSH:
> output -> (buffer + input)
> PULL:
> (output + buffer) -> input
>
> When using corba, this translates to:
> PUSH:
> output -> CORBA -> (buffer + input) : output.write() goes over corba
> to store data in buffer
> PULL:
> (output + buffer) -> CORBA -> input : input.read() goes over corba to
> read data from buffer
>
> When using MQ, this translates to:
> PUSH:
> output -> MQ -> (buffer + input) : all is real-time
> PULL:
> (output + buffer) -> MQ -> (buffer + input) : all is real-time, last
> buffer added de-facto by implementation (see below)
>
> I was wondering two things:
> 1. is it really necessary that the user can specify push/pull ? Won't
> this derive itself from the application architecture ?
> 2. couldnt' the MQ be the buffer/data element (regardless of
> push/pull) in the data flow channel ?

Yes!

> If 1 is true, then 2 is answered as well. To know whether the
> application architecture itself is enough to derive where
> buffering/data storage must take place, we can test all cases:
>
> 1. in-process:
> There is no difference between push and pull. You can specify it, but
> it will result in the same topology
> Conclusion: one buffer in the middle (push nor pull)
> 2. through corba:
> PUSH: the output is punished for a remote client. This is fairly
> unacceptable, unless the remote client is a real-time process itself,

Maybe, maybe not.

> (and output is not). Also, every sample output produced is sent over
> the wire (possible bottleneck). It is still ok if input would do more
> reads than output writes.

Which is impossible to predict.

> PULL: input is punished for listening to remote data, so input can't
> be a real-time process. It is advantageous if output does more writes
> than input does reads.
> Conclusion: in case both sender and receiver are real-time processes,
> neither push nor pull can satisfy the necessary architecture. It would
> be therefore better to install buffers at both ends at all times and
> let the ORB threads do the delivery of the data. So a buffer on each
> side (push and pull)

Agreed.

> 3. through MQ:
> PUSH: the buffer on input side is there for 'corba' legacy issues.
> CORBA had a buffer there, so MQ too. it could possibly be replaced by
> the MQ itself
> PULL: same comment as PUSH, but the added buffer on input side is for
> the message dispatcher which listens to the message queues for new
> data and then needs some place to store that data. That's why we
> always need a buffer at input side.
> Conclusion: the MQ could play as the buffer in the middle (push nor pull)
>
> I'm mixing current implementation with a new design proposal here,
> which might be confusing. The *real* point I needed to make is: should
> the user specify push/pull or can the application always derive the
> correct places to put buffers ? I would say yes, but I might be
> overlooking why Sylvain installed this policy.

My feeling is that you should *not* hide these details nor try to
guess them. Communication channels simply are heterogeneous and have
different properties. So while it might not seem sensible to have
input and output buffers when using a MQ with a configurable size,
somebody might have a good reason to do so. You just can't know. Of
course, if possible, reasonable defaults should be provided.

Markus

DataFlow 2.0 status and push/pull policy

On Wed, Sep 23, 2009 at 17:07, Markus Klotzbuecher
<markus [dot] klotzbuecher [..] ...> wrote:
> Hi Peter,
>
> On Tue, Sep 22, 2009 at 03:07:26PM +0200, Peter Soetens wrote:
...
>> The end result is that it is possible to use
>> boost::serialization for real-time marshalling/demarshalling of
>> primitive types and std::vector or complexer (variable size) types.
>
> How will custom types be treated?

You need to use the MQTemplateTransport<T> (for POD) or
MQSerializationTransport<T> for complex data (like std::vector) The
former casts your data type to void*, the latter uses and requires
boost::serialization support of your type.

...
>
> My feeling is that you should *not* hide these details nor try to
> guess them. Communication channels simply are heterogeneous and have
> different properties. So while it might not seem sensible to have
> input and output buffers when using a MQ with a configurable size,
> somebody might have a good reason to do so. You just can't know. Of
> course, if possible, reasonable defaults should be provided.

Yes, sensible defaults, but policies can be tweaked. Some policies
will be transport specific (it seems to be now that push and pull are
CORBA specific), but are always mapped to the lower level mechanisms,
which you can influence nevertheless if you don't agree with the
default.

Peter

DataFlow 2.0 status and push/pull policy

On Sep 22, 2009, at 09:07 , Peter Soetens wrote:

> Lets start with a status update. As ye'all know, there is no
> rt-middleware available for inter-process communication, except the
> low-level messaging libraries (and even most don't target hard
> real-time). So we decided to write another transport (next to CORBA)
> using message queues for doing real-time data flow ipc. Conveniently,
> mqueues are supported by plain gnulinux and Xenomai, using the same
> API.

I know that macosx doesn't have mqueues. I am guessing Windoze is the
same. Will this new mechanism simply not be available on those
platforms, or are we intending to provide some non-real-time mechanism
with the same API?

I notice that you say "data flow" here. Is this only for data ports,
or only for buffer ports, or for both?

Am I correct in saying that the "real-time" guarantee only applies
here when you have a real-time O/S level IPC (mqueues) for processes
on the same machine, and/or you have a real-time communication's bus
(eg real-time ethernet) for inter-machine communications? This is the
basic problem you are trying to solve, right?

> One of the tricky parts is that since there's no middleware (unlike
> with CORBA), there's someone needed to do the serialisation to/from
> the queue. I came across the boost::serialization library and it fits
> the purpose very well. It defines serialization algorithms independent
> of 'archiving' algorithms. The former describes how data is
> composed/decomposed in primitive types, the latter takes that
> information and writes it into some format (text, binary, xml). The
> boost::serialization library allows easy extension, but because it
> does a lot bookkeeping (read mallocs) because it also serializes
> pointers, subclasses etc. this was not real-time. The 'hard way' of
> extending this library is by implementing the Archive concept
> independent of the available helper classes. For our simple purpose,
> this was fairly doable, if it wasn't for the wrong documentation or
> implementation of the serialization part. But that's another thread on
> another list. The end result is that it is possible to use
> boost::serialization for real-time marshalling/demarshalling of
> primitive types and std::vector or complexer (variable size) types.

I've used the boost::serialization in the past. It's a good library,
though we weren't worried about the real-time aspects back then.

> Once serialization worked, it was with little effort added as a new
> transport template. One can setup data flow streams using CORBA
> ('out-of-band') or by using the createStream functions on input and
> output port. Also the return values of read are now NoData, NewData
> and OldData. The deployer has not yet been adapted.

So this is an either or? Either we use non-real-time CORBA, OR we use
RTT's custom real-time IPC? Is this alongside Sylvain's upcoming
changes (ie the soon-to-be default RTT data flow mechanism)?

> One thing I was struggling with is how large the buffer size of the
> message queue should be. The current implementation creates the
> requested pull (output side) or push (input side) buffer/data object
> (a connection policy) and *in addition* and by definition, the message
> queue is a buffer too. Practically this means that MQ based dataflow
> has always two buffers: the MQ itself and the policy buffer. At least,
> that's what you would think. In real practice, there is always an
> input side buffer, optionally an output side buffer and then the MQ
> buffer. That's because from the moment a message arrives on the MQ, we
> pull it (such that the MQ won't fill up), store it at the input side
> and inform the input port by raising the new data event.
>
> In case you lost me, this is how it works:
> PUSH:
> output -> (buffer + input)
> PULL:
> (output + buffer) -> input

What does the "+" indicate above?

> When using corba, this translates to:
> PUSH:
> output -> CORBA -> (buffer + input) : output.write() goes over corba
> to store data in buffer
> PULL:
> (output + buffer) -> CORBA -> input : input.read() goes over corba to
> read data from buffer
>
> When using MQ, this translates to:
> PUSH:
> output -> MQ -> (buffer + input) : all is real-time
> PULL:
> (output + buffer) -> MQ -> (buffer + input) : all is real-time, last
> buffer added de-facto by implementation (see below)
>
> I was wondering two things:
> 1. is it really necessary that the user can specify push/pull ? Won't
> this derive itself from the application architecture ?
> 2. couldnt' the MQ be the buffer/data element (regardless of
> push/pull) in the data flow channel ?

What is the overhead compared to the existing implementation? Looks
like we have an additional message-queue-based buffer, and maybe one
other buffer as well? Anything else? I'm just thinking of heavily
resource constrained embedded devices.

> If 1 is true, then 2 is answered as well. To know whether the
> application architecture itself is enough to derive where
> buffering/data storage must take place, we can test all cases:
>
> 1. in-process:
> There is no difference between push and pull. You can specify it, but
> it will result in the same topology
> Conclusion: one buffer in the middle (push nor pull)
> 2. through corba:
> PUSH: the output is punished for a remote client. This is fairly
> unacceptable, unless the remote client is a real-time process itself,
> (and output is not). Also, every sample output produced is sent over
> the wire (possible bottleneck). It is still ok if input would do more
> reads than output writes.
> PULL: input is punished for listening to remote data, so input can't
> be a real-time process. It is advantageous if output does more writes
> than input does reads.
> Conclusion: in case both sender and receiver are real-time processes,
> neither push nor pull can satisfy the necessary architecture. It would
> be therefore better to install buffers at both ends at all times and
> let the ORB threads do the delivery of the data. So a buffer on each
> side (push and pull)

If two real-time processes require real-time messaging/data-flow, they
should not be using CORBA. It has dynamic memory allocation
throughout, right? I think the above scenario simply demonstrates a
known axiom.

> 3. through MQ:
> PUSH: the buffer on input side is there for 'corba' legacy issues.
> CORBA had a buffer there, so MQ too. it could possibly be replaced by
> the MQ itself

I agree with Herman. If not needed for MQ's, then why not just remove
it?

> PULL: same comment as PUSH, but the added buffer on input side is for
> the message dispatcher which listens to the message queues for new
> data and then needs some place to store that data. That's why we
> always need a buffer at input side.
> Conclusion: the MQ could play as the buffer in the middle (push nor
> pull)
>
> I'm mixing current implementation with a new design proposal here,
> which might be confusing. The *real* point I needed to make is: should
> the user specify push/pull or can the application always derive the
> correct places to put buffers ? I would say yes, but I might be
> overlooking why Sylvain installed this policy.

All of this is not clear to me yet. Not the actual problem you are
trying to solve, nor where it slots in or replaces something in the
existing RTT implementation, nor the actual affects this will have on
the application programmer and how they will have to change their
code. I'll keep listening though ... :-)
Stephen

DataFlow 2.0 status and push/pull policy

On Wed, Sep 23, 2009 at 14:38, S Roderick <kiwi [dot] net [..] ...> wrote:
> On Sep 22, 2009, at 09:07 , Peter Soetens wrote:
>
>> Lets start with a status update. As ye'all know, there is no
>> rt-middleware available for inter-process communication, except the
>> low-level messaging libraries (and even most don't target hard
>> real-time). So we decided to write another transport (next to CORBA)
>> using message queues for doing real-time data flow ipc. Conveniently,
>> mqueues are supported by plain gnulinux and Xenomai, using the same
>> API.
>
> I know that macosx doesn't have mqueues. I am guessing Windoze is the same.
> Will this new mechanism simply not be available on those platforms, or are
> we intending to provide some non-real-time mechanism with the same API?

They will keep using corba to transport their data between processes.
But the MQ will serve as an example on how to do this using different
message passing libs (like zero mq).

>
> I notice that you say "data flow" here. Is this only for data ports, or only
> for buffer ports, or for both?

data flow is the sending and receiving of data samples between
components. buffering or not is a connection policy, or even dependent
on the transport used. So when I say data flow, it's about everything.

>
> Am I correct in saying that the "real-time" guarantee only applies here when
> you have a real-time O/S level IPC (mqueues) for processes on the same
> machine, and/or you have a real-time communication's bus (eg real-time
> ethernet) for inter-machine communications? This is the basic problem you
> are trying to solve, right?

Yes. Real-time means RTOS + rt comm primitive. With this new system,
we can also setup (real-time) udp<->udp connections for dataflow quite
easily.

>> Once serialization worked, it was with little effort added as a new
>> transport template. One can setup data flow streams using CORBA
>> ('out-of-band') or by using the createStream functions on input and
>> output port. Also the return values of read are now NoData, NewData
>> and OldData. The deployer has not yet been adapted.
>
> So this is an either or? Either we use non-real-time CORBA, OR we use RTT's
> custom real-time IPC? Is this alongside Sylvain's upcoming changes (ie the
> soon-to-be default RTT data flow mechanism)?

This is all built only on top of Sylvain's code. You can mix any
transport in any way. So have a port communicating locally, using
corba and any other transport transparantly.SO it's AND.

>
>> One thing I was struggling with is how large the buffer size of the
>> message queue should be. The current implementation creates the
>> requested pull (output side) or push (input side) buffer/data object
>> (a connection policy) and *in addition* and by definition, the message
>> queue is a buffer too. Practically this means that MQ based dataflow
>> has always two buffers: the MQ itself and the policy buffer. At least,
>> that's what you would think. In real practice, there is always an
>> input side buffer, optionally an output side buffer and then the MQ
>> buffer. That's because from the moment a message arrives on the MQ, we
>> pull it (such that the MQ won't fill up), store it at the input side
>> and inform the input port by raising the new data event.
>>
>> In case you lost me, this is how it works:
>> PUSH:
>> output -> (buffer + input)
>> PULL:
>> (output + buffer) -> input
>
> What does the "+" indicate above?

buffer lives in same address space as other argument.

>
>> When using corba, this translates to:
>> PUSH:
>> output -> CORBA -> (buffer + input) : output.write() goes over corba
>> to store data in buffer
>> PULL:
>> (output + buffer) -> CORBA -> input : input.read() goes over corba to
>> read data from buffer
>>
>> When using MQ, this translates to:
>> PUSH:
>> output -> MQ -> (buffer + input) : all is real-time
>> PULL:
>> (output + buffer) -> MQ -> (buffer + input) : all is real-time, last
>> buffer added de-facto by implementation (see below)
>>
>> I was wondering two things:
>> 1. is it really necessary that the user can specify push/pull ? Won't
>> this derive itself from the application architecture ?
>> 2. couldnt' the MQ be the buffer/data element (regardless of
>> push/pull) in the data flow channel ?
>
> What is the overhead compared to the existing implementation? Looks like we
> have an additional message-queue-based buffer, and maybe one other buffer as
> well? Anything else? I'm just thinking of heavily resource constrained
> embedded devices.

I'm still aiming for the mq is the only buffer, but as it looks now,
it's at least 1 MQ (=buffer) + 1 orocos buffer on receiving side.

>
>> If 1 is true, then 2 is answered as well. To know whether the
>> application architecture itself is enough to derive where
>> buffering/data storage must take place, we can test all cases:
>>
>> 1. in-process:
>> There is no difference between push and pull. You can specify it, but
>> it will result in the same topology
>> Conclusion: one buffer in the middle (push nor pull)
>> 2. through corba:
>> PUSH: the output is punished for a remote client. This is fairly
>> unacceptable, unless the remote client is a real-time process itself,
>> (and output is not). Also, every sample output produced is sent over
>> the wire (possible bottleneck). It is still ok if input would do more
>> reads than output writes.
>> PULL: input is punished for listening to remote data, so input can't
>> be a real-time process. It is advantageous if output does more writes
>> than input does reads.
>> Conclusion: in case both sender and receiver are real-time processes,
>> neither push nor pull can satisfy the necessary architecture. It would
>> be therefore better to install buffers at both ends at all times and
>> let the ORB threads do the delivery of the data. So a buffer on each
>> side (push and pull)
>
> If two real-time processes require real-time messaging/data-flow, they
> should not be using CORBA. It has dynamic memory allocation throughout,
> right? I think the above scenario simply demonstrates a known axiom.

Imagine this: you have a controller running on a node A. You want to
report the data it has using a reporter component on node B. With
wrong data flow setup, you'll force the controller to push the data
over the network to the reporter, violating a working system. CORBA
does not avoid this in either (old/new) way.

>
> All of this is not clear to me yet. Not the actual problem you are trying to
> solve, nor where it slots in or replaces something in the existing RTT
> implementation, nor the actual affects this will have on the application
> programmer and how they will have to change their code. I'll keep listening
> though ... :-)

This big question is: which policies do *you* wish to specify when you
setup a connection from port A to B and what effect do these policies
have on real-time behaviour and performance/memory
footprint/behaviour.

Peter

DataFlow 2.0 status and push/pull policy

On Wed, 23 Sep 2009, S Roderick wrote:

> On Sep 22, 2009, at 09:07 , Peter Soetens wrote:
>
>> Lets start with a status update. As ye'all know, there is no
>> rt-middleware available for inter-process communication, except the
>> low-level messaging libraries (and even most don't target hard
>> real-time). So we decided to write another transport (next to CORBA)
>> using message queues for doing real-time data flow ipc. Conveniently,
>> mqueues are supported by plain gnulinux and Xenomai, using the same
>> API.
>
> I know that macosx doesn't have mqueues. I am guessing Windoze is the
> same. Will this new mechanism simply not be available on those
> platforms, or are we intending to provide some non-real-time mechanism
> with the same API?

Maybe <http://en.wikipedia.org/wiki/Message_Passing_Interface> can provide
some pointers to relevant implementations on various platforms...

Herman

DataFlow 2.0 status and push/pull policy

On Wed, Sep 23, 2009 at 10:58, Sylvain Joyeux <sylvain [dot] joyeux [..] ...> wrote:
> I think you are mixing two things: the introduction of push/pull was mostly
> driven by network bandwidth and marshalling CPU utilization issues. The

Yes, real-time and bandwidth/marshalling are the two forces
influencing the push/pull strategy.

> "keeping things RT" was simply a side effect. In my opinion, it would make
> sense to always have an intermediate buffer at the sending side, which would
> either be introduced "in the background" in the case of push and explicitely
> by the policy in the case of pull.

You proved my point :) We actually always need a buffer at receiving
side in case of stream-based (ie MQ) connections to 'catch' the
packets that are sent.

>
> As for the size of the MQ, I think it should map the connection policy. If
> data, then MQ of size one and otherwise MQ of the buffer size.

I realized that we don't have an option here. the MQ can't play
buffer, because we need to emit an event for *each* sample written
into the MQ (so we need to empty it to know how much are in there). At
least that's what the policy is right now. We could also emit an event
if we go from OldData/NoData to NewData, then the MQ can be the single
buffer. It's tricky again to forsee all consequences though if we
would change this.

>
> Given that the MQ transport is actually a mode in the CORBA one (i.e. it is
> not a transport in itself, but simply a way to communicate in the CORBA
> transport),

Your assumption is not correct. The MQ transport is independent from
CORBA. The only feature that CORBA has is that it can create
connections using different transports (than its own). I added a
'transport' policy to the connection policy struct for this.

> I think that the best way to implement and configure all of this is
> to have the policy *for the CORBA transport* have a tri-state configuration
> option instead of only a boolean push option:
>  * CORBAPush
>  * CORBAPull
>  * Realtime

If we would implement other transports, they would also have a
push/pull like setup, so the word 'CORBA' can certainly go. But we're
closing in to a solution.

Anyway, we also need to take into account Herman's remarks about how
simple we need to keep the data flow transport and put all advanced
stuff in components. See next mail.

Peter

DataFlow 2.0 status and push/pull policy

On Wednesday 23 September 2009 11:59:05 Peter Soetens wrote:
> > "keeping things RT" was simply a side effect. In my opinion, it would
> > make sense to always have an intermediate buffer at the sending side,
> > which would either be introduced "in the background" in the case of push
> > and explicitely by the policy in the case of pull.
>
> You proved my point :) We actually always need a buffer at receiving
> side in case of stream-based (ie MQ) connections to 'catch' the
> packets that are sent.
Well, I'm talking about having one at the sending side, so I don't think I
proved anything.

> > As for the size of the MQ, I think it should map the connection policy.
> > If data, then MQ of size one and otherwise MQ of the buffer size.
>
> I realized that we don't have an option here. the MQ can't play
> buffer, because we need to emit an event for *each* sample written
> into the MQ (so we need to empty it to know how much are in there). At
> least that's what the policy is right now. We could also emit an event
> if we go from OldData/NoData to NewData, then the MQ can be the single
> buffer. It's tricky again to forsee all consequences though if we
> would change this.
I don't know the MQ API, but don't you have a select-like type of call ? Or
even better, something that tells you how many

> > Given that the MQ transport is actually a mode in the CORBA one (i.e. it
> > is not a transport in itself, but simply a way to communicate in the
> > CORBA transport),
>
> Your assumption is not correct. The MQ transport is independent from
> CORBA. The only feature that CORBA has is that it can create
> connections using different transports (than its own). I added a
> 'transport' policy to the connection policy struct for this.
If I understand you correctly, you want to have CORBA as a transport and MQ as
a "sub-transport" ? Could you be more specific on how do you structure all of
this ?

> Anyway, we also need to take into account Herman's remarks about how
> simple we need to keep the data flow transport and put all advanced
> stuff in components. See next mail.
Yes, that is actually part of the "having connection() being simple and then
have to use transport-specific code, which can either be a foreign interface
(IDL with CORBA) or separate RTT components. Now, I feel that the cases we are
talking here should be doable with the current interface.

Sylvain

DataFlow 2.0 status and push/pull policy

On Wed, Sep 23, 2009 at 13:27, Sylvain Joyeux <sylvain [dot] joyeux [..] ...> wrote:
> On Wednesday 23 September 2009 11:59:05 Peter Soetens wrote:
>> > "keeping things RT" was simply a side effect. In my opinion, it would
>> > make sense to always have an intermediate buffer at the sending side,
>> > which would either be introduced "in the background" in the case of push
>> > and explicitely by the policy in the case of pull.
>>
>> You proved my point :) We actually always need a buffer at receiving
>> side in case of stream-based (ie MQ) connections to 'catch' the
>> packets that are sent.
> Well, I'm talking about having one at the sending side, so I don't think I
> proved anything.

I know. Never mind.

>
>> > As for the size of the MQ, I think it should map the connection policy.
>> > If data, then MQ of size one and otherwise MQ of the buffer size.
>>
>> I realized that we don't have an option here. the MQ can't play
>> buffer, because we need to emit an event for *each* sample written
>> into the MQ (so we need to empty it to know how much are in there). At
>> least that's what the policy is right now. We could also emit an event
>> if we go from OldData/NoData to NewData, then the MQ can be the single
>> buffer. It's tricky again to forsee all consequences though if we
>> would change this.
> I don't know the MQ API, but don't you have a select-like type of call ? Or
> even better, something that tells you how many

Once one or more messages are in the mq, select() keeps returning (I
tested this) immediately. So we are forced to empty the queue *or*
remove the file descriptor with data in it from select(). We need to
re-add to select when it is empty though (maybe there's a query for
that).

>
>> > Given that the MQ transport is actually a mode in the CORBA one (i.e. it
>> > is not a transport in itself, but simply a way to communicate in the
>> > CORBA transport),
>>
>> Your assumption is not correct. The MQ transport is independent from
>> CORBA. The only feature that CORBA has is that it can create
>> connections using different transports (than its own). I added a
>> 'transport' policy to the connection policy struct for this.
> If I understand you correctly, you want to have CORBA as a transport and MQ as
> a "sub-transport" ? Could you be more specific on how do you structure all of
> this ?

No. It's a worthy transport. In the policy, you can specify a
transport parameter that will be used in your call to getTransport(
nbr ). If the user specifies the MQ nbr, logically, mq's will be
created and not corba connections. If the transport nbr is left empty
or explicitly states the corba transport number, corba connections
will be created. It's really fairly simple.

Peter

DataFlow 2.0 status and push/pull policy

On Wednesday 23 September 2009 15:26:08 Peter Soetens wrote:
> > I don't know the MQ API, but don't you have a select-like type of call ?
> > Or even better, something that tells you how many
>
> Once one or more messages are in the mq, select() keeps returning (I
> tested this) immediately. So we are forced to empty the queue *or*
> remove the file descriptor with data in it from select(). We need to
> re-add to select when it is empty though (maybe there's a query for
> that).
Mmmm ... I understand now.

> >> > Given that the MQ transport is actually a mode in the CORBA one (i.e.
> >> > it is not a transport in itself, but simply a way to communicate in
> >> > the CORBA transport),
> >>
> >> Your assumption is not correct. The MQ transport is independent from
> >> CORBA. The only feature that CORBA has is that it can create
> >> connections using different transports (than its own). I added a
> >> 'transport' policy to the connection policy struct for this.
> >
> > If I understand you correctly, you want to have CORBA as a transport and
> > MQ as a "sub-transport" ? Could you be more specific on how do you
> > structure all of this ?
>
> No. It's a worthy transport. In the policy, you can specify a
> transport parameter that will be used in your call to getTransport(
> nbr ). If the user specifies the MQ nbr, logically, mq's will be
> created and not corba connections. If the transport nbr is left empty
> or explicitly states the corba transport number, corba connections
> will be created. It's really fairly simple.
Interesting. But by doing this you actually *need* to have all these
transports be *exactly the same* than the CORBA transport. The proposal of
having specific configurations falls down. No ?

DataFlow 2.0 status and push/pull policy

For what it's worth, Peter and I had already a discussion about making the
CORBA transport neatly RT-friendly, in the sense that it would not make a RT
task non-RT.

The solution I though was best was to introduce one or multiple CORBA-
management TaskContext that would be the middle-man between the user's
components and the CORBA layer itself, in other words some kind of forwarder.

I feel it can be generically implemented, reuse the current CORBA connection
establishment and would also provide an entry point to inspect the CORBA state
(i.e. communication status, maybe throughput, lost samples, this kind of
things that are really lacking to do proper multi-robot management).

So, I actually agree with Herman here, that the dataflow should remain simple
and complex policies should be implemented by specific components/modules that
are put in the middle.

In my opinion, the pattern should be around these lines:
* the C++-side, RTT connection establishment only maintains the stuff we
already have, minus the push/pull which is a more CORBA-specific thing.
* some transports can add some *equally simple* parameters to create
connections. push/pull comes to my mind for CORBA. These parameters should
not reflect complex policies, only simple cases that are very relevant for
the transport.
* finally, complex policies that go into specific task contexts in rtt/extras/

DataFlow 2.0 status and push/pull policy

On Wed, 23 Sep 2009, Sylvain Joyeux wrote:

> For what it's worth, Peter and I had already a discussion about making the
> CORBA transport neatly RT-friendly, in the sense that it would not make a RT
> task non-RT.
>
> The solution I though was best was to introduce one or multiple CORBA-
> management TaskContext that would be the middle-man between the user's
> components and the CORBA layer itself, in other words some kind of forwarder.
>
> I feel it can be generically implemented, reuse the current CORBA connection
> establishment and would also provide an entry point to inspect the CORBA state
> (i.e. communication status, maybe throughput, lost samples, this kind of
> things that are really lacking to do proper multi-robot management).
>
> So, I actually agree with Herman here, that the dataflow should remain simple
> and complex policies should be implemented by specific components/modules that
> are put in the middle.
>
> In my opinion, the pattern should be around these lines:
> * the C++-side, RTT connection establishment only maintains the stuff we
> already have, minus the push/pull which is a more CORBA-specific thing.
> * some transports can add some *equally simple* parameters to create
> connections. push/pull comes to my mind for CORBA. These parameters should
> not reflect complex policies, only simple cases that are very relevant for
> the transport.
> * finally, complex policies that go into specific task contexts in rtt/extras/
>
These three "levels" are some sort of "best practice", I think, and I agree
very much with the suggestion.

The same three conceptual levels appear in multiple systems or can be
motivated in a very rational way, so they deserve to be called "best practice" :-)
I am thinking of:
- the three top levels of the ISO stack
- the emerging BRICS component model (BRICS is a European research project
in which the Orocos@Leuven people are heavily involved).
- the traditional surface mail postal system.
The features they have in common, and that define the three levels, are
fully _coordination_ issues:
- lowest level: only asynchronous message passing without any policy or
guarantee. E.g.: the postal system delivers letters, just dropping them
in our mailboxes.
- medium level: the " asynchronous message passing" between two peers is
coordinated, such that they can synchronize their activities on it.
E.g.: the postal system has the "registered mail" concept
<http://en.wikipedia.org/wiki/Registered_mail>
that allows the sender to know that its message was received, but not
more than that.
- highest level: the applications that are sending messages to each other
agree on a protocol that has a specific meaning for both applications.
(But not for the lower levels.)
E.g.: two peers sign contracts and monitor the corresponding compliance,
by exchanging registered mail letters.
(I explicitly used a non-IT example to illustrate that this best practice
is as old as our civilized societies. Or, rather, as old as the period from
which our civilized societies became so civilized that they needed lawyers :-))

Message of this story: Peter will have to come up with very good arguments
_not_ to follow this three-level best practice in peer to peer
communication! :-)

Herman

DataFlow 2.0 status and push/pull policy

On Wed, Sep 23, 2009 at 14:39, Herman Bruyninckx
<Herman [dot] Bruyninckx [..] ...> wrote:
> On Wed, 23 Sep 2009, Sylvain Joyeux wrote:
>
>> For what it's worth, Peter and I had already a discussion about making the
>> CORBA transport neatly RT-friendly, in the sense that it would not make a RT
>> task non-RT.
>>
>> The solution I though was best was to introduce one or multiple CORBA-
>> management TaskContext that would be the middle-man between the user's
>> components and the CORBA layer itself, in other words some kind of forwarder.
>>
>> I feel it can be generically implemented, reuse the current CORBA connection
>> establishment and would also provide an entry point to inspect the CORBA state
>> (i.e. communication status, maybe throughput, lost samples, this kind of
>> things that are really lacking to do proper multi-robot management).
>>
>> So, I actually agree with Herman here, that the dataflow should remain simple
>> and complex policies should be implemented by specific components/modules that
>> are put in the middle.
>>
>> In my opinion, the pattern should be around these lines:
>> * the C++-side, RTT connection establishment only maintains the stuff we
>>   already have, minus the push/pull which is a more CORBA-specific thing.
>> * some transports can add some *equally simple* parameters to create
>>    connections. push/pull comes to my mind for CORBA. These parameters should
>>    not reflect complex policies, only simple cases that are very relevant for
>>    the transport.
>> * finally, complex policies that go into specific task contexts in rtt/extras/
>>
> These three "levels" are some sort of "best practice", I think, and I agree
> very much with the suggestion.
>
> The same three conceptual levels appear in multiple systems or can be
> motivated in a very rational way, so they deserve to be called "best practice" :-)
> I am thinking of:
> - the three top levels of the ISO stack
> - the emerging BRICS component model (BRICS is a European research project
>   in which the Orocos@Leuven people are heavily involved).
> - the traditional surface mail postal system.
> The features they have in common, and that define the three levels, are
> fully _coordination_ issues:
> - lowest level: only asynchronous message passing without any policy or
>   guarantee. E.g.: the postal system delivers letters, just dropping them
>   in our mailboxes.
> - medium level: the " asynchronous message passing" between two peers is
>   coordinated, such that they can synchronize their activities on it.
>   E.g.: the postal system has the "registered mail" concept
>    <http://en.wikipedia.org/wiki/Registered_mail>
>   that allows the sender to know that its message was received, but not
>   more than that.
> - highest level: the applications that are sending messages to each other
>   agree on a protocol that has a specific meaning for both applications.
>   (But not for the lower levels.)
>   E.g.: two peers sign contracts and monitor the corresponding compliance,
>   by exchanging registered mail letters.
> (I explicitly used a non-IT example to illustrate that this best practice
> is as old as our civilized societies. Or, rather, as old as the period from
> which our civilized societies became so civilized that they needed lawyers :-))
>
> Message of this story: Peter will have to come up with very good arguments
> _not_ to follow this three-level best practice in peer to peer
> communication! :-)

I for one, am on the other hand easily convinced by patches and use
cases from the robotics field.

Peter

DataFlow 2.0 status and push/pull policy

On Wed, 23 Sep 2009, Peter Soetens wrote:

> On Wed, Sep 23, 2009 at 14:39, Herman Bruyninckx
> <Herman [dot] Bruyninckx [..] ...> wrote:
>> On Wed, 23 Sep 2009, Sylvain Joyeux wrote:
>>
>>> For what it's worth, Peter and I had already a discussion about making the
>>> CORBA transport neatly RT-friendly, in the sense that it would not make a RT
>>> task non-RT.
>>>
>>> The solution I though was best was to introduce one or multiple CORBA-
>>> management TaskContext that would be the middle-man between the user's
>>> components and the CORBA layer itself, in other words some kind of forwarder.
>>>
>>> I feel it can be generically implemented, reuse the current CORBA connection
>>> establishment and would also provide an entry point to inspect the CORBA state
>>> (i.e. communication status, maybe throughput, lost samples, this kind of
>>> things that are really lacking to do proper multi-robot management).
>>>
>>> So, I actually agree with Herman here, that the dataflow should remain simple
>>> and complex policies should be implemented by specific components/modules that
>>> are put in the middle.
>>>
>>> In my opinion, the pattern should be around these lines:
>>> * the C++-side, RTT connection establishment only maintains the stuff we
>>>   already have, minus the push/pull which is a more CORBA-specific thing.
>>> * some transports can add some *equally simple* parameters to create
>>>    connections. push/pull comes to my mind for CORBA. These parameters should
>>>    not reflect complex policies, only simple cases that are very relevant for
>>>    the transport.
>>> * finally, complex policies that go into specific task contexts in rtt/extras/
>>>
>> These three "levels" are some sort of "best practice", I think, and I agree
>> very much with the suggestion.
>>
>> The same three conceptual levels appear in multiple systems or can be
>> motivated in a very rational way, so they deserve to be called "best practice" :-)
>> I am thinking of:
>> - the three top levels of the ISO stack
>> - the emerging BRICS component model (BRICS is a European research project
>>   in which the Orocos@Leuven people are heavily involved).
>> - the traditional surface mail postal system.
>> The features they have in common, and that define the three levels, are
>> fully _coordination_ issues:
>> - lowest level: only asynchronous message passing without any policy or
>>   guarantee. E.g.: the postal system delivers letters, just dropping them
>>   in our mailboxes.
>> - medium level: the " asynchronous message passing" between two peers is
>>   coordinated, such that they can synchronize their activities on it.
>>   E.g.: the postal system has the "registered mail" concept
>>    <http://en.wikipedia.org/wiki/Registered_mail>
>>   that allows the sender to know that its message was received, but not
>>   more than that.
>> - highest level: the applications that are sending messages to each other
>>   agree on a protocol that has a specific meaning for both applications.
>>   (But not for the lower levels.)
>>   E.g.: two peers sign contracts and monitor the corresponding compliance,
>>   by exchanging registered mail letters.
>> (I explicitly used a non-IT example to illustrate that this best practice
>> is as old as our civilized societies. Or, rather, as old as the period from
>> which our civilized societies became so civilized that they needed lawyers :-))
>>
>> Message of this story: Peter will have to come up with very good arguments
>> _not_ to follow this three-level best practice in peer to peer
>> communication! :-)
>
> I for one, am on the other hand easily convinced by patches and use
> cases from the robotics field.

I consider current use cases from the robotics domain as non-normative:
robotics is definitely _not_ setting the state of the art in software
development... :-)

Herman

DataFlow 2.0 status and push/pull policy

On Sep 23, 2009, at 08:30 , Sylvain Joyeux wrote:

> For what it's worth, Peter and I had already a discussion about
> making the
> CORBA transport neatly RT-friendly, in the sense that it would not
> make a RT
> task non-RT.

That would be very nice.

> The solution I though was best was to introduce one or multiple CORBA-
> management TaskContext that would be the middle-man between the user's
> components and the CORBA layer itself, in other words some kind of
> forwarder.

We already do this manually, so a generic solution would definitely be
welcome. We route all incoming and outgoing comm's from a deployed
process, through an HMI component. This decouples the non-RT
communication over CORBA/ethernet, from everything going between the
deployed components. It is labor intensive, and a bit of a pain in all
honesty, but is a requirement right now.

> I feel it can be generically implemented, reuse the current CORBA
> connection
> establishment and would also provide an entry point to inspect the
> CORBA state
> (i.e. communication status, maybe throughput, lost samples, this
> kind of
> things that are really lacking to do proper multi-robot management).
>
> So, I actually agree with Herman here, that the dataflow should
> remain simple
> and complex policies should be implemented by specific components/
> modules that
> are put in the middle.
>
> In my opinion, the pattern should be around these lines:
> * the C++-side, RTT connection establishment only maintains the
> stuff we
> already have, minus the push/pull which is a more CORBA-specific
> thing.

"that we already have" in RTT v1.x or v2.x? Can you explain what part
of either of those implementations is the "push/pull" please.

> * some transports can add some *equally simple* parameters to create
> connections. push/pull comes to my mind for CORBA. These
> parameters should
> not reflect complex policies, only simple cases that are very
> relevant for
> the transport.

Can you give a simple example?

> * finally, complex policies that go into specific task contexts in
> rtt/extras/

Again, a simple example?

I'll also point out Peter's comment, that there seems to be a mix here
of current implementation and new design proposal. It is definitely a
little confusing ...

Cheers
Stehpen

DataFlow 2.0 status and push/pull policy

On Wednesday 23 September 2009 14:43:30 S Roderick wrote:
> > The solution I though was best was to introduce one or multiple CORBA-
> > management TaskContext that would be the middle-man between the user's
> > components and the CORBA layer itself, in other words some kind of
> > forwarder.
>
> We already do this manually, so a generic solution would definitely be
> welcome. We route all incoming and outgoing comm's from a deployed
> process, through an HMI component. This decouples the non-RT
> communication over CORBA/ethernet, from everything going between the
> deployed components. It is labor intensive, and a bit of a pain in all
> honesty, but is a requirement right now.
It is actually not so hard. Instead of linking A.p to B.p, you have C create a
pair of ports -- something the type system can do right now -- connect A.p to
the new input port, B.p to the new output port and have the new input port be
a triggering port for C. With a bit of logic in the updateHook() of C you are
done. Finally, you garbage collect every disconnected ports in the
updateHook() call.

> > In my opinion, the pattern should be around these lines:
> > * the C++-side, RTT connection establishment only maintains the
> > stuff we
> > already have, minus the push/pull which is a more CORBA-specific
> > thing.
>
> "that we already have" in RTT v1.x or v2.x? Can you explain what part
> of either of those implementations is the "push/pull" please.
What the connection interface allows to specify right now is:
1. choosing between data and buffered
2. the 'init' flag, i.e. keep the last written data and push it to every new
connections (we should probably remove that one actually)
3. the push/pull
4. the locking policy (lock-free or mutex-based)

The only things that is *really* relevant from both the local and remote
connections point of view are 1 and 2. The rest is pretty much dependent on
the actual connection: push/pull is a CORBA thing and the locking policy is
only completely meaningful for local connections.

> > * some transports can add some *equally simple* parameters to create
> > connections. push/pull comes to my mind for CORBA. These
> > parameters should
> > not reflect complex policies, only simple cases that are very
> > relevant for
> > the transport.
>
> Can you give a simple example?
Well ... push/pull for CORBA ?

> > * finally, complex policies that go into specific task contexts in
> > rtt/extras/
>
> Again, a simple example?

Writing a component that ensures that, for component A sending samples to
component B, B gets an update at least every 100ms (i.e. sample that B has is
never older than 100ms). To implement this, you would probably need a feedback
mechanism from B to A, or at least a communication middleware that gives rich
information on the data being passed. I would see it as a generic component
that takes into account this rich information and makes use of it to meet the
specification.

DataFlow 2.0 status and push/pull policy

On Wed, 23 Sep 2009, Sylvain Joyeux wrote:

> On Wednesday 23 September 2009 08:01:28 MAILER-DAEMON [..] ... wrote:
>>> 3. through MQ:
>>> PUSH: the buffer on input side is there for 'corba' legacy issues.
>>> CORBA had a buffer there, so MQ too. it could possibly be replaced by
>>> the MQ itself
>>
>> I don't like to see "corba legacy" be introduced! It's too hard a
>> precedent...
> And I don't think that there is a need for it anyway. My POV here is the
> following: when you are introducing connection, you actually choose a policy
> for it. That policy has -- obviously -- effects on the actual connection
> behaviour: will you miss data or not ?
>
> Therefore, I don't see why you can't just have the MQ *be* the DataObject and
> *that's all*. No added buffers, because the MQ is realtime already anyway.
>
>>> PULL: same comment as PUSH, but the added buffer on input side is for
>>> the message dispatcher which listens to the message queues for new
>>> data and then needs some place to store that data. That's why we
>>> always need a buffer at input side.
> Same comment, I don't see why.
>
> Sylvain
>
I follow your concerns... The MQ would be the simplest IPC primitive
available, and every more complex buffering/multiplexing/... protocol is to
be provided in customized (Communication) components.

Herman

DataFlow 2.0 status and push/pull policy

On Wed, Sep 23, 2009 at 08:01, <MAILER-DAEMON> wrote:
> On Tue, 22 Sep 2009, Peter Soetens wrote:
>
>> Lets start with a status update. As ye'all know, there is no
>> rt-middleware available for inter-process communication, except the
>> low-level messaging libraries (and even most don't target hard
>> real-time).
> I know about the QNX message library, and MIRPA
>  <http://www.rob.cs.tu-bs.de/research/projects/mirpa/>.
> None of them open source.

I was thinking about ZeroMQ. http://www.zeromq.org/

>
>> One thing I was struggling with is how large the buffer size of the
>> message queue should be. The current implementation creates the
>> requested pull (output side) or push (input side) buffer/data object
>> (a connection policy) and *in addition* and by definition, the message
>> queue is a buffer too. Practically this means that MQ based dataflow
>> has always two buffers: the MQ itself and the policy buffer. At least,
>> that's what you would think. In real practice, there is always an
>> input side buffer, optionally an output side buffer and then the MQ
>> buffer. That's because from the moment a message arrives on the MQ, we
>> pull it (such that the MQ won't fill up), store it at the input side
>> and inform the input port by raising the new data event.
>>
>> In case you lost me, this is how it works:
>> PUSH:
>> output -> (buffer + input)
>> PULL:
>> (output + buffer) -> input
>>
>> When using corba, this translates to:
>> PUSH:
>> output -> CORBA -> (buffer + input) : output.write() goes over corba
>> to store data in buffer
>> PULL:
>> (output + buffer) -> CORBA -> input : input.read() goes over corba to
>> read data from buffer
>>
>> When using MQ, this translates to:
>> PUSH:
>> output -> MQ -> (buffer + input) : all is real-time
>> PULL:
>> (output + buffer) -> MQ -> (buffer + input) : all is real-time, last
>> buffer added de-facto by implementation (see below)
>>
>> I was wondering two things:
>> 1. is it really necessary that the user can specify push/pull ? Won't
>> this derive itself from the application architecture ?
>
> How...? I think the architect has to define these things anyway, won't she?
>
>> 2. couldnt' the MQ be the buffer/data element (regardless of
>> push/pull) in the data flow channel ?
>
> I see no direct reason why not! But I haven't spent much time on it.
> Couldn't we find "prior art" in existing non-realtime message passing
> libraries, since this issue is not rt-specific, is it?
>
>> If 1 is true, then 2 is answered as well. To know whether the
>> application architecture itself is enough to derive where
>> buffering/data storage must take place, we can test all cases:
>>
>> 1. in-process:
>> There is no difference between push and pull. You can specify it, but
>> it will result in the same topology
>> Conclusion: one buffer in the middle (push nor pull)
>
> I tend to be in favour of stand-alone DataObject components, to implement
> policies when needed, and to let all other data producing/consuming
> components use the simplest, least guaranteed (wrt buffering) but fastest
> message passing.

I'm in favor of this 'attractor/force' in the design as well. But in
our 'robotics' domain, data flow has some characteristics that we wish
to express in our component model in order to easily model existing
concepts. Burdening the user with modeling each time higher level
constructs with the lower level constructs you talk about, is not
justified.

I'm not particularly against or for message based, minimalistic data
flow, but if that is the interface you want to present to user, just
point them to POSIX message queues or zeromq and we're done on this
list. For me, these low level messages are implementation details and
if they fit well the current software design, they go in; if they
don't fit, they stay out. What should be on the table here is how
users can

a. specify which data they send/receive in a component
b. specify how data is routed when components are deployed.

The a. part is indeed very 'message' based. output.write(data) *is*
send and forget (it returns void !) the input indicates: i got a
message or i didn't.
When we're talking connection policies, buffers on which or either
side etc, we're talking about the b. part, and how this is implemented
is/should be hidden from the user. I'm not a fan of using components
for modeling data flow connections. A connection is not a component,
it lives in the middleware and only lives to move a message from a to
b. I *am* a fan for using components to *influence* data flow (drop
packets, dispatch, re-route etc).

>
>> 2. through corba:
>> PUSH: the output is punished for a remote client. This is fairly
>> unacceptable, unless the remote client is a real-time process itself,
>> (and output is not). Also, every sample output produced is sent over
>> the wire (possible bottleneck). It is still ok if input would do more
>> reads than output writes.
>> PULL: input is punished for listening to remote data, so input can't
>> be a real-time process. It is advantageous if output does more writes
>> than input does reads.
>> Conclusion: in case both sender and receiver are real-time processes,
>> neither push nor pull can satisfy the necessary architecture.
>
> You make a conceptual error here! Being a realtime process does not
> _necessarily_ mean that one must have realtime guarantees wrt message
> passing! It does mean that the process must do something sensible within
> the alotted time frame. If you start distributing realtime applications,
> you should make sure that (i) your communication hardware is fast enough,
> and _especially_ (ii) your components are robust against communication
> delays. In summary, I do not agree with your Conclusion.

When writing to a port with a corba connection behind it, you have
*no* guarantees when the function call returns when you read/write
that port, because the CORBA middleware does/can not offer it. It has
nothing to do with bandwith or response time. I just wanted to point
out that we need something special when using CORBA such that
reading/writing a port always returns in a deterministic time.
Sylvain's push/pull mechanism does not offer this guarantee yet.

>
>> I'm mixing current implementation with a new design proposal here,
>> which might be confusing. The *real* point I needed to make is: should
>> the user specify push/pull or can the application always derive the
>> correct places to put buffers ? I would say yes, but I might be
>> overlooking why Sylvain installed this policy.
>
> I think I do not follow your conclusion... Users can have very good reasons
> to introduce explicit data management components.

I agree completely. But buffering is since all ages a property of a
data flow connection (even in your postal mail example, that uses
buffers in all kinds of places.) so in my opinion buffering is still
in.

Peter

DataFlow 2.0 status and push/pull policy

On Wed, 23 Sep 2009, Peter Soetens wrote:

> On Wed, Sep 23, 2009 at 08:01, <MAILER-DAEMON> wrote:
>> On Tue, 22 Sep 2009, Peter Soetens wrote:
>>
>>> Lets start with a status update. As ye'all know, there is no
>>> rt-middleware available for inter-process communication, except the
>>> low-level messaging libraries (and even most don't target hard
>>> real-time).
>> I know about the QNX message library, and MIRPA
>>  <http://www.rob.cs.tu-bs.de/research/projects/mirpa/>.
>> None of them open source.
>
> I was thinking about ZeroMQ. http://www.zeromq.org/

That one has been on my radar since a long time already too! So long, that
I forgot about it :-) Its LGPL license is a nice match to what we have in
Orocos.
Its "messaging models" webpage
<http://www.zeromq.org/whitepapers:brokerless>
also fits very well in the ongoing discussion, about putting all complex
Coordination into customized Communication components.
Another nice thing is that QNX uses ZeroMQ too, to interconnect its own
message passing stuff to the external world. For me, that's a very good
reference! :-)

>>> One thing I was struggling with is how large the buffer size of the
>>> message queue should be. The current implementation creates the
>>> requested pull (output side) or push (input side) buffer/data object
>>> (a connection policy) and *in addition* and by definition, the message
>>> queue is a buffer too. Practically this means that MQ based dataflow
>>> has always two buffers: the MQ itself and the policy buffer. At least,
>>> that's what you would think. In real practice, there is always an
>>> input side buffer, optionally an output side buffer and then the MQ
>>> buffer. That's because from the moment a message arrives on the MQ, we
>>> pull it (such that the MQ won't fill up), store it at the input side
>>> and inform the input port by raising the new data event.
>>>
>>> In case you lost me, this is how it works:
>>> PUSH:
>>> output -> (buffer + input)
>>> PULL:
>>> (output + buffer) -> input
>>>
>>> When using corba, this translates to:
>>> PUSH:
>>> output -> CORBA -> (buffer + input) : output.write() goes over corba
>>> to store data in buffer
>>> PULL:
>>> (output + buffer) -> CORBA -> input : input.read() goes over corba to
>>> read data from buffer
>>>
>>> When using MQ, this translates to:
>>> PUSH:
>>> output -> MQ -> (buffer + input) : all is real-time
>>> PULL:
>>> (output + buffer) -> MQ -> (buffer + input) : all is real-time, last
>>> buffer added de-facto by implementation (see below)
>>>
>>> I was wondering two things:
>>> 1. is it really necessary that the user can specify push/pull ? Won't
>>> this derive itself from the application architecture ?
>>
>> How...? I think the architect has to define these things anyway, won't she?
>>
>>> 2. couldnt' the MQ be the buffer/data element (regardless of
>>> push/pull) in the data flow channel ?
>>
>> I see no direct reason why not! But I haven't spent much time on it.
>> Couldn't we find "prior art" in existing non-realtime message passing
>> libraries, since this issue is not rt-specific, is it?
>>
>>> If 1 is true, then 2 is answered as well. To know whether the
>>> application architecture itself is enough to derive where
>>> buffering/data storage must take place, we can test all cases:
>>>
>>> 1. in-process:
>>> There is no difference between push and pull. You can specify it, but
>>> it will result in the same topology
>>> Conclusion: one buffer in the middle (push nor pull)
>>
>> I tend to be in favour of stand-alone DataObject components, to implement
>> policies when needed, and to let all other data producing/consuming
>> components use the simplest, least guaranteed (wrt buffering) but fastest
>> message passing.
>
> I'm in favor of this 'attractor/force' in the design as well. But in
> our 'robotics' domain, data flow has some characteristics that we wish
> to express in our component model in order to easily model existing
> concepts. Burdening the user with modeling each time higher level
> constructs with the lower level constructs you talk about, is not
> justified.

You should know by now that I always consider at least three levels of
users! (Not coincidentally the same three levels as what I advocate to have
in the component + communication models!) The current discussion is
targeted to the lowest level(s) of user, isn't it? That is, the level of
the framework builder that knows about all these tricky details. The medium
level, the system architect, should also know about them, but only as far
as _applying_ them goes. The eventual real end-user, well, (s)he just needs
a GUI! :-)

> I'm not particularly against or for message based, minimalistic data
> flow, but if that is the interface you want to present to user, just
> point them to POSIX message queues or zeromq and we're done on this
> list. For me, these low level messages are implementation details and
> if they fit well the current software design, they go in; if they
> don't fit, they stay out. What should be on the table here is how
> users can
>
> a. specify which data they send/receive in a component
> b. specify how data is routed when components are deployed.

I agree with a., not with b. At least, these two things are the
responsibilities of different "users"!

> The a. part is indeed very 'message' based. output.write(data) *is*
> send and forget (it returns void !) the input indicates: i got a
> message or i didn't.
> When we're talking connection policies, buffers on which or either
> side etc, we're talking about the b. part, and how this is implemented
> is/should be hidden from the user. I'm not a fan of using components
> for modeling data flow connections. A connection is not a component,
> it lives in the middleware and only lives to move a message from a to
> b. I *am* a fan for using components to *influence* data flow (drop
> packets, dispatch, re-route etc).

We disagree! Communication middleware has lots of components, and not the
lightest ones for that matter...

>>> 2. through corba:
>>> PUSH: the output is punished for a remote client. This is fairly
>>> unacceptable, unless the remote client is a real-time process itself,
>>> (and output is not). Also, every sample output produced is sent over
>>> the wire (possible bottleneck). It is still ok if input would do more
>>> reads than output writes.
>>> PULL: input is punished for listening to remote data, so input can't
>>> be a real-time process. It is advantageous if output does more writes
>>> than input does reads.
>>> Conclusion: in case both sender and receiver are real-time processes,
>>> neither push nor pull can satisfy the necessary architecture.
>>
>> You make a conceptual error here! Being a realtime process does not
>> _necessarily_ mean that one must have realtime guarantees wrt message
>> passing! It does mean that the process must do something sensible within
>> the alotted time frame. If you start distributing realtime applications,
>> you should make sure that (i) your communication hardware is fast enough,
>> and _especially_ (ii) your components are robust against communication
>> delays. In summary, I do not agree with your Conclusion.
>
> When writing to a port with a corba connection behind it, you have
> *no* guarantees when the function call returns when you read/write
> that port, because the CORBA middleware does/can not offer it. It has
> nothing to do with bandwith or response time. I just wanted to point
> out that we need something special when using CORBA such that
> reading/writing a port always returns in a deterministic time.
> Sylvain's push/pull mechanism does not offer this guarantee yet.
>
>>
>>> I'm mixing current implementation with a new design proposal here,
>>> which might be confusing. The *real* point I needed to make is: should
>>> the user specify push/pull or can the application always derive the
>>> correct places to put buffers ? I would say yes, but I might be
>>> overlooking why Sylvain installed this policy.
>>
>> I think I do not follow your conclusion... Users can have very good reasons
>> to introduce explicit data management components.
>
> I agree completely. But buffering is since all ages a property of a
> data flow connection (even in your postal mail example, that uses
> buffers in all kinds of places.) so in my opinion buffering is still
> in.

These local buffers in the postal mail example indeed reflect the fact that
communication middleware _systems_ are full of dedicated _coordination_
components! But their functionality should be in _components_, not in the
(message passing) _library_...

Herman

DataFlow 2.0 status and push/pull policy

On Wed, Sep 23, 2009 at 15:34, Herman Bruyninckx
<Herman [dot] Bruyninckx [..] ...> wrote:
> On Wed, 23 Sep 2009, Peter Soetens wrote:
...
>> I agree completely. But buffering is since all ages a property of a
>> data flow connection (even in your postal mail example, that uses
>> buffers in all kinds of places.) so in my opinion buffering is still
>> in.
>
> These local buffers in the postal mail example indeed reflect the fact that
> communication middleware _systems_ are full of dedicated _coordination_
> components! But their functionality should be in _components_, not in the
> (message passing) _library_...

The point we're differing in again is that I see the 'houses' as
components and all in between them as a middleware library. Yes, some
'houses' sort, buffer and dispatch mail. But there is still lots of
buffering in between too ( a postman's mail bag, a huge buffer, is not
a component! ) and that's all in the middleware library. That's why I
say we're talking implementation details. The only relevant conceptual
discussion is about the points a and b I wrote about before: how do
ports look like, and which policies do we allow when connecting them.
Some of these policies will have the effect of buffering in the
middleware, just like the Linux kernel buffers its tcp or udp packets.
It's an implementation detail.

Peter

DataFlow 2.0 status and push/pull policy

On Wed, 23 Sep 2009, Peter Soetens wrote:

> On Wed, Sep 23, 2009 at 15:34, Herman Bruyninckx
> <Herman [dot] Bruyninckx [..] ...> wrote:
>> On Wed, 23 Sep 2009, Peter Soetens wrote:
> ...
>>> I agree completely. But buffering is since all ages a property of a
>>> data flow connection (even in your postal mail example, that uses
>>> buffers in all kinds of places.) so in my opinion buffering is still
>>> in.
>>
>> These local buffers in the postal mail example indeed reflect the fact that
>> communication middleware _systems_ are full of dedicated _coordination_
>> components! But their functionality should be in _components_, not in the
>> (message passing) _library_...
>
> The point we're differing in again is that I see the 'houses' as
> components and all in between them as a middleware library. Yes, some
> 'houses' sort, buffer and dispatch mail. But there is still lots of
> buffering in between too ( a postman's mail bag, a huge buffer, is not
> a component! ) and that's all in the middleware library.

That's a good example, but it shows that the postal system is (rightfully
so) not a _library_, but a huge amount of dedicated (and optimized, at
least technically) components!

> That's why I
> say we're talking implementation details. The only relevant conceptual
> discussion is about the points a and b I wrote about before: how do
> ports look like, and which policies do we allow when connecting them.
> Some of these policies will have the effect of buffering in the
> middleware, just like the Linux kernel buffers its tcp or udp packets.
> It's an implementation detail.

No, not at all! It's a fundamental decoupled design issue... This is one of
the domains where Orocos can set a standard, but you don't seem motivated
to make that happen :-)

Herman

DataFlow 2.0 status and push/pull policy

On Wednesday 23 September 2009 15:34:25 Herman Bruyninckx wrote:
> > The a. part is indeed very 'message' based. output.write(data) *is*
> > send and forget (it returns void !) the input indicates: i got a
> > message or i didn't.
> > When we're talking connection policies, buffers on which or either
> > side etc, we're talking about the b. part, and how this is implemented
> > is/should be hidden from the user. I'm not a fan of using components
> > for modeling data flow connections. A connection is not a component,
> > it lives in the middleware and only lives to move a message from a to
> > b. I *am* a fan for using components to *influence* data flow (drop
> > packets, dispatch, re-route etc).
>
> We disagree! Communication middleware has lots of components, and not the
> lightest ones for that matter...

I agree with Herman on this. Components should definitely not influence data flow
as Peter is describing it, as they should have no clue about where the data
they are receiving comes from, where their outputs are going, what are the
requirements of the receiving components, ...

The only thing that would make sense would be to have, on the specification
side, components saying that they need "one sample per period", and then have
middlewares that implement that specification. But we're far from it yet, and I
personally feel it is a deployment-time property.

DataFlow 2.0 status and push/pull policy

On Wed, 23 Sep 2009, Sylvain Joyeux wrote:

> On Wednesday 23 September 2009 15:34:25 Herman Bruyninckx wrote:
>>> The a. part is indeed very 'message' based. output.write(data) *is*
>>> send and forget (it returns void !) the input indicates: i got a
>>> message or i didn't.
>>> When we're talking connection policies, buffers on which or either
>>> side etc, we're talking about the b. part, and how this is implemented
>>> is/should be hidden from the user. I'm not a fan of using components
>>> for modeling data flow connections. A connection is not a component,
>>> it lives in the middleware and only lives to move a message from a to
>>> b. I *am* a fan for using components to *influence* data flow (drop
>>> packets, dispatch, re-route etc).
>>
>> We disagree! Communication middleware has lots of components, and not the
>> lightest ones for that matter...
>
> I agree with Herman on this. Components should definitely not influence data flow
> as Peter is describing it, as they should have no clue about where the data
> they are receiving comes from, where their outputs are going, what are the
> requirements of the receiving components, ...
>
> The only thing that would make sense would be to have, on the specification
> side, components saying that they need "one sample per period", and then have
> middlewares that implement that specification. But we're far from it yet, and I
> personally feel it is a deployment-time property.

It is, indeed. But the infrastructure has to be there, independent of
whether it is going to be used at deployment-time or not...

I feel Peter's pain... :-(

Herman