Optional dataflow/services in deployer and managing disconnection/reconnection

Hello,

we are trying to implement a connection manager on top of RTT to cleanly
handle disconnection/reconnection of ports and services (operations). Here
is the requierements:

- Detect disconnection of ports and operations, which can be one of these
use case:
1. Peer not present (connection fail at startup/deployment)
2. Peer just "quit" at runtime (application killed of cleanly closed)
3. Network connection problem (timeout)
- When a disconnection is detected:
1. Disconnect all ports/services related to this peer
2. Automatically trigger a reconnection attempt when a disconnection is
detected (in an other thread)

We are currently looking for a good design to implement that. Actually, we
were able to do that quite easily because we weren't using the deployer
(e.g. remote GUI). To this end, we were validating ready() and try/catching
all remote operation. We are however looking for a more generic design that
would be also usable in deployed applications. We are currently facing some
problems:

When deploying an application with remote port connection, the
deployer kick-start fail. We would like to be able to have "optional"
connections and manage the connection retries in an other thread when the
components are running. Also, the port has no memory (tell me if I'm wrong)
about it's remote peer.port, so we can't handle the connection later on. The
same thing would also apply to remote operations/services.

To handle that, our first idea was to modify the DeploymentComponent to
store the connections configuration through an external plugin that would
manage the disconnection/reconnection in its own thread. This plugin could
periodically "ping" (e.g. getTaskState()) all peers to detect disconnection,
but could also be notified by components when they detect any connection
failure (e.g. CORBA exception). I'm not really sure that it's the best
design and I'm also worried about the "thread-safety" of managing
connections in an other thread while a component is running and possibly
interacting with these operations/ports.

I would greatly appreciate any comments/suggestions about the design to
implement this in RTT-2.

Thank you,

Philippe Hamelin

Optional dataflow/services in deployer and managing disconnectio

Hi Philippe,

On Thursday 18 November 2010 16:51:34 Philippe Hamelin wrote:
> Hello,
>
> we are trying to implement a connection manager on top of RTT to cleanly
> handle disconnection/reconnection of ports and services (operations). Here
> is the requierements:

You're tackling a problem here everyone is aware of. The general idea was to
not further extend the DeploymentComponent, but let it more act as a 'dumb
slave' and let a supervision component monitor all activity and inform the DC
of actions to take. I am against extending the DC even more than it already
is.

An example: the lua work of Markus uses the DC as a dumb slave to do the
classical port/peer connections and configurations. RTT scripting will have
similar capabilities in 2.2, but with lesser expressiveness than lua. The
logic you describe below could be best expressed in such a flexible scripting
language. Note that Sylvain's ruby supervision framework also contains this
functionality, although it has been built on top of orogen, and might require
that all typekits are generated by orogen/typegen (Sylvain can comment best on
the requirements).

I would rather extend the DC to provide the necessary hooks such that such a
supervision program can use them to do what it can do best.

The first hook missing is a notification mechanism that a component, port
connection or service appeared or disappeared. With that information, you
could really get started providing simple rules for re-connection and
discovery.

Peter

>
> - Detect disconnection of ports and operations, which can be one of these
> use case:
> 1. Peer not present (connection fail at startup/deployment)
> 2. Peer just "quit" at runtime (application killed of cleanly closed)
> 3. Network connection problem (timeout)
> - When a disconnection is detected:
> 1. Disconnect all ports/services related to this peer
> 2. Automatically trigger a reconnection attempt when a disconnection is
> detected (in an other thread)
>
> We are currently looking for a good design to implement that. Actually, we
> were able to do that quite easily because we weren't using the deployer
> (e.g. remote GUI). To this end, we were validating ready() and try/catching
> all remote operation. We are however looking for a more generic design that
> would be also usable in deployed applications. We are currently facing some
> problems:
>
> When deploying an application with remote port connection, the
> deployer kick-start fail. We would like to be able to have "optional"
> connections and manage the connection retries in an other thread when the
> components are running. Also, the port has no memory (tell me if I'm wrong)
> about it's remote peer.port, so we can't handle the connection later on.
> The same thing would also apply to remote operations/services.
>
> To handle that, our first idea was to modify the DeploymentComponent to
> store the connections configuration through an external plugin that would
> manage the disconnection/reconnection in its own thread. This plugin could
> periodically "ping" (e.g. getTaskState()) all peers to detect
> disconnection, but could also be notified by components when they detect
> any connection failure (e.g. CORBA exception). I'm not really sure that
> it's the best design and I'm also worried about the "thread-safety" of
> managing
> connections in an other thread while a component is running and possibly
> interacting with these operations/ports.
>
> I would greatly appreciate any comments/suggestions about the design to
> implement this in RTT-2.
>
> Thank you,
>
> Philippe Hamelin

Optional dataflow/services in deployer and managing disconnectio

Hi Peter,

thank you Peter for these interesting comments. As far as I see many people
has tried to implement something to fit their needs about the connections
management. I understand the idea behing the *supervision component*, and
the plugin that I proposed was aiming to that. However, given that this
plugin would anyway need a thread, I agree that it would be better to start
from a TaskContext and implement it as an Orocos component. I actually don't
fully understand the design that you are thinking about, so please let me
ask you some more questions.

- One thing that I can't see in your proposition is where the connections
data is stored? Currently, this information seems somewhat volatile and we
will obviously need to maintain a database about the various connections to
ensure their supervision or handle the reconnections.

- I agree that the DeploymentComponent must act as a "dumb slave". My
proposition was not to make the DC handle the reconnection, but only make it
"pushes" the configuration of the connections (coming from the XML) to
another component (e.g. Supervision component) that will manage the
monitoring/reconnection. The idea behind that is to make the dependency to
the DC "optional", so that someone that use C++ connections could also
"manually" pushes these information to the Supervision component. Does it
make senses to you?

- I do not agree about implementing this application-wide feature over the
scripting/lua service. For embedded systems, who actually clearly needs this
"connection robustness", they probably can't afford such a big dependency. I
think that using a Supervision component (with maybe the option to add hooks
in lua/scripting) is more appropriate.

- In all case, when using the DeploymentComponent, I think that we will need
the possibility to add optional peer/ports/services. What do you think about
that?

- Finally, is there somewhere that I can find an example implementation
(Lua, ruby, ...) ?

Thank you,

Philippe

2010/11/19 Peter Soetens <peter [..] ...>

> Hi Philippe,
>
>
> On Thursday 18 November 2010 16:51:34 Philippe Hamelin wrote:
> > Hello,
> >
> > we are trying to implement a connection manager on top of RTT to cleanly
> > handle disconnection/reconnection of ports and services (operations).
> Here
> > is the requierements:
>
> You're tackling a problem here everyone is aware of. The general idea was
> to
> not further extend the DeploymentComponent, but let it more act as a 'dumb
> slave' and let a supervision component monitor all activity and inform the
> DC
> of actions to take. I am against extending the DC even more than it already
> is.
>
> An example: the lua work of Markus uses the DC as a dumb slave to do the
> classical port/peer connections and configurations. RTT scripting will have
> similar capabilities in 2.2, but with lesser expressiveness than lua. The
> logic you describe below could be best expressed in such a flexible
> scripting
> language. Note that Sylvain's ruby supervision framework also contains this
> functionality, although it has been built on top of orogen, and might
> require
> that all typekits are generated by orogen/typegen (Sylvain can comment best
> on
> the requirements).
>
> I would rather extend the DC to provide the necessary hooks such that such
> a
> supervision program can use them to do what it can do best.
>
> The first hook missing is a notification mechanism that a component, port
> connection or service appeared or disappeared. With that information, you
> could really get started providing simple rules for re-connection and
> discovery.
>
> Peter
>
> >
> > - Detect disconnection of ports and operations, which can be one of these
> > use case:
> > 1. Peer not present (connection fail at startup/deployment)
> > 2. Peer just "quit" at runtime (application killed of cleanly closed)
> > 3. Network connection problem (timeout)
> > - When a disconnection is detected:
> > 1. Disconnect all ports/services related to this peer
> > 2. Automatically trigger a reconnection attempt when a disconnection
> is
> > detected (in an other thread)
> >
> > We are currently looking for a good design to implement that. Actually,
> we
> > were able to do that quite easily because we weren't using the deployer
> > (e.g. remote GUI). To this end, we were validating ready() and
> try/catching
> > all remote operation. We are however looking for a more generic design
> that
> > would be also usable in deployed applications. We are currently facing
> some
> > problems:
> >
> > When deploying an application with remote port connection, the
> > deployer kick-start fail. We would like to be able to have "optional"
> > connections and manage the connection retries in an other thread when the
> > components are running. Also, the port has no memory (tell me if I'm
> wrong)
> > about it's remote peer.port, so we can't handle the connection later on.
> > The same thing would also apply to remote operations/services.
> >
> > To handle that, our first idea was to modify the DeploymentComponent to
> > store the connections configuration through an external plugin that would
> > manage the disconnection/reconnection in its own thread. This plugin
> could
> > periodically "ping" (e.g. getTaskState()) all peers to detect
> > disconnection, but could also be notified by components when they detect
> > any connection failure (e.g. CORBA exception). I'm not really sure that
> > it's the best design and I'm also worried about the "thread-safety" of
> > managing
> > connections in an other thread while a component is running and possibly
> > interacting with these operations/ports.
> >
> > I would greatly appreciate any comments/suggestions about the design to
> > implement this in RTT-2.
> >
> > Thank you,
> >
> > Philippe Hamelin
>

Optional dataflow/services in deployer and managing disconnectio

On Friday 19 November 2010 14:04:23 Philippe Hamelin wrote:
> Hi Peter,
>
> thank you Peter for these interesting comments. As far as I see many people
> has tried to implement something to fit their needs about the connections
> management. I understand the idea behing the *supervision component*, and
> the plugin that I proposed was aiming to that. However, given that this
> plugin would anyway need a thread, I agree that it would be better to start
> from a TaskContext and implement it as an Orocos component. I actually
> don't fully understand the design that you are thinking about, so please
> let me ask you some more questions.
>
> - One thing that I can't see in your proposition is where the connections
> data is stored? Currently, this information seems somewhat volatile and we
> will obviously need to maintain a database about the various connections to
> ensure their supervision or handle the reconnections.

The supervisor has this data. The problem we're facing today is that the
deployer is storing the data in order to be able to do 'phased' deployment
(first configure all, then start all etc). All such capabilities should be in a
supervisor, which stores the connection data as a set of 'rules', no matter
how that is concretely implemented.

>
> - I agree that the DeploymentComponent must act as a "dumb slave". My
> proposition was not to make the DC handle the reconnection, but only make
> it "pushes" the configuration of the connections (coming from the XML) to
> another component (e.g. Supervision component) that will manage the
> monitoring/reconnection. The idea behind that is to make the dependency to
> the DC "optional", so that someone that use C++ connections could also
> "manually" pushes these information to the Supervision component. Does it
> make senses to you?

It makes sense to decouple software components if they are to be used
independently as well. If you want to use the DC as an XML reader that pushes
to Supervision, I can live with that if we come up with the proper interface.
But since I'll be reluctant to even more update the current XML format you'd
have to consider if we don't have better/easier alternatives. For example, a
new xml format, better suited for describing such things.

>
> - I do not agree about implementing this application-wide feature over the
> scripting/lua service. For embedded systems, who actually clearly needs
> this "connection robustness", they probably can't afford such a big
> dependency. I think that using a Supervision component (with maybe the
> option to add hooks in lua/scripting) is more appropriate.

So you wouldn't consider the Roby supervision framework either (see below) ?

We generally believe that hierarchical state machines are the most suitable to
define deployment supervisions. They are taking 'rule based' to a higher level,
since they allow you to express your rules (= transitions in SM) in a state-
specific context. That is, rules are bound to the state of your system and
*not* global.

Lua is extremely minimal (at least compared to Orocos scripting), and used
often in embedded systems. So I see no arguments against using Lua.

>
> - In all case, when using the DeploymentComponent, I think that we will
> need the possibility to add optional peer/ports/services. What do you
> think about that?

I agree, but we don't implement this in the DC.

>
> - Finally, is there somewhere that I can find an example implementation
> (Lua, ruby, ...) ?

For Roby, you can get easy-on examples for running applications at:
http://www.orocos.org/dfki/runtime/index.html
and more advanced at
http://www.orocos.org/dfki/system/index.html

Roby depends on orogen model descriptions of your components and deployments
(one deployment = one process binary). I don't know how well this works in
Windows though. I also couldn't right-away find how it handles reconnecting
processes if only one process dies, or how one could define fall-back
scenarios, except by coding them explicitly in ruby functions that manipulate
the task context objects.

It does handle automatic data flow configuration, ie, calculating buffer sizes
depending on the component and deployment descriptions.

I could only find the Lua manual in the git repository of markus:

http://gitorious.org/~markusk/orocos-toolchain/ocl-
mk/blobs/raw/lua/lua/doc/MANUAL.html

Markus, could you update the ocl cmake files such that this file is installed in
build/doc/xml/lua such that it is picked up by my documentation release
scripts ?

Peter

Optional dataflow/services in deployer and managing disconnectio

2010/11/22 Peter Soetens <peter [..] ...>

> On Friday 19 November 2010 14:04:23 Philippe Hamelin wrote:
> > Hi Peter,
> >
> > thank you Peter for these interesting comments. As far as I see many
> people
> > has tried to implement something to fit their needs about the connections
> > management. I understand the idea behing the *supervision component*, and
> > the plugin that I proposed was aiming to that. However, given that this
> > plugin would anyway need a thread, I agree that it would be better to
> start
> > from a TaskContext and implement it as an Orocos component. I actually
> > don't fully understand the design that you are thinking about, so please
> > let me ask you some more questions.
> >
> > - One thing that I can't see in your proposition is where the connections
> > data is stored? Currently, this information seems somewhat volatile and
> we
> > will obviously need to maintain a database about the various connections
> to
> > ensure their supervision or handle the reconnections.
>
> The supervisor has this data. The problem we're facing today is that the
> deployer is storing the data in order to be able to do 'phased' deployment
> (first configure all, then start all etc). All such capabilities should be
> in a
> supervisor, which stores the connection data as a set of 'rules', no matter
> how that is concretely implemented.
>
>
>
> > - I agree that the DeploymentComponent must act as a "dumb slave". My
> > proposition was not to make the DC handle the reconnection, but only make
> > it "pushes" the configuration of the connections (coming from the XML) to
> > another component (e.g. Supervision component) that will manage the
> > monitoring/reconnection. The idea behind that is to make the dependency
> to
> > the DC "optional", so that someone that use C++ connections could also
> > "manually" pushes these information to the Supervision component. Does it
> > make senses to you?
>
> It makes sense to decouple software components if they are to be used
> independently as well. If you want to use the DC as an XML reader that
> pushes
> to Supervision, I can live with that if we come up with the proper
> interface.
> But since I'll be reluctant to even more update the current XML format
> you'd
> have to consider if we don't have better/easier alternatives. For example,
> a
> new xml format, better suited for describing such things.
>
>
>
> > - I do not agree about implementing this application-wide feature over
> the
> > scripting/lua service. For embedded systems, who actually clearly needs
> > this "connection robustness", they probably can't afford such a big
> > dependency. I think that using a Supervision component (with maybe the
> > option to add hooks in lua/scripting) is more appropriate.
>
> So you wouldn't consider the Roby supervision framework either (see below)
> ?
>
> We generally believe that hierarchical state machines are the most suitable
> to
> define deployment supervisions. They are taking 'rule based' to a higher
> level,
> since they allow you to express your rules (= transitions in SM) in a
> state-
> specific context. That is, rules are bound to the state of your system and
> *not* global.
>
>
In fact, it's not so much a question of size, but rather than quantity of
work to maintain an additional dependency in an embedded system used in an
industrial product. There must be a really significant gain to justify the
addition of such a dependency. However, I really see the advantage of using
a more flexible framework to manage deployment/supervision. I must admit
that until now I didn't look so much at the Ruby supervision framework. At
first glance, one thing that I can't live with is the fact that the
supervision framework actually handle all the deployment of the components.
It's a very complete solution for pure Orocos application, but how about
external Client (e.g. GUI) ? Let say I have a standalone GUI process which
use a single Orocos component to communicate with different Orocos peers. I
don't want the supervision framework to load my GUI, but I want my GUI to
make use of it (indirectly). If so, I would have to wrap my GUI in an Orocos
component which is a major drawback if I want my client to support
alternative non-Orocos peers. As I see now this is rather a question of
decoupling deployment and supervision. It may work If I have the possibility
to "create" my component in C++ and then let the Ruby supervision framework
do the rest of the deployment and the supervision. This would be equivalent
to use the Ruby supervision framework as a third-party connecting "external"
corba-only components? For non-pure Orocos application, I also like the fact
that the connection data isn't hardcoded in C++. This also makes the
configuration of deployment alternatives easier (e.g. actual vs. simulated
modes).

> Lua is extremely minimal (at least compared to Orocos scripting), and used
> often in embedded systems. So I see no arguments against using Lua.
>
>
The same arguments hold for Lua. However, as you stated Lua seems to be more
used/appropriate for embedded systems. As far as I see in the
documentation of the Lua bindings, altough it has all the services required
to implement the supervision, it lacks of the connections information
compared to the Ruby supervision framework which has everything since it
deploys everything.

> >
> > - In all case, when using the DeploymentComponent, I think that we will
> > need the possibility to add optional peer/ports/services. What do you
> > think about that?
>
> I agree, but we don't implement this in the DC.
>
> >
> > - Finally, is there somewhere that I can find an example implementation
> > (Lua, ruby, ...) ?
>
> For Roby, you can get easy-on examples for running applications at:
> http://www.orocos.org/dfki/runtime/index.html
> and more advanced at
> http://www.orocos.org/dfki/system/index.html
>
> Roby depends on orogen model descriptions of your components and
> deployments
> (one deployment = one process binary). I don't know how well this works in
> Windows though. I also couldn't right-away find how it handles reconnecting
> processes if only one process dies, or how one could define fall-back
> scenarios, except by coding them explicitly in ruby functions that
> manipulate
> the task context objects.
>
>
I can't find it either.

> It does handle automatic data flow configuration, ie, calculating buffer
> sizes
> depending on the component and deployment descriptions.
>
> I could only find the Lua manual in the git repository of markus:
>
> http://gitorious.org/~markusk/orocos-toolchain/ocl-
> mk/blobs/raw/lua/lua/doc/MANUAL.html
>
> Markus, could you update the ocl cmake files such that this file is
> installed in
> build/doc/xml/lua such that it is picked up by my documentation release
> scripts ?
>
> Peter
>

Optional dataflow/services in deployer and managing disconnectio

On Fri, 19 Nov 2010, Philippe Hamelin wrote:

> Hi Peter,
> thank you Peter for these interesting comments. As far as I see many people has
> tried to implement something to fit their needs about the connections
> management. I understand the idea behing the *supervision component*, and the
> plugin that I proposed was aiming to that. However, given that this plugin would
> anyway need a thread, I agree that it would be better to start from a
> TaskContext and implement it as an Orocos component. I actually don't fully
> understand the design that you are thinking about, so please let me ask you some
> more questions.
>  
> - One thing that I can't see in your proposition is where the connections data
> is stored? Currently, this information seems somewhat volatile and we will
> obviously need to maintain a database about the various connections to ensure
> their supervision or handle the reconnections.

You are right about all these things. And that means that such dynamic
(re)deployment is an art in itself, and deserves professionally made
deployment frameworks. I would not like to see Orocos effort being spent on
this, since that will lead to (i) less efforts being available for the core
Orocos work, and (ii) a sub-par result anyway.

The way to go is to look for external deployment frameworks (OSGi etc.) and
to try to use those with Orocos components.

A similar remark holds for the Communication between components: except
for the collocated, in-process, hard realtime communication between
components, this inter-component communication software support should come
from decent communication frameworks (DDS, CERTI, LabComm,...), instead of
trying to extend Orocos to also do "middleware".

Herman

> - I agree that the DeploymentComponent must act as a "dumb slave". My
> proposition was not to make the DC handle the reconnection, but only make it
> "pushes" the configuration of the connections (coming from the XML) to another
> component (e.g. Supervision component) that will manage the
> monitoring/reconnection. The idea behind that is to make the dependency to the
> DC "optional", so that someone that use C++ connections could also "manually"
> pushes these information to the Supervision component. Does it make senses to
> you?
>
> - I do not agree about implementing this application-wide feature over the
> scripting/lua service. For embedded systems, who actually clearly needs this
> "connection robustness", they probably can't afford such a big dependency. I
> think that using a Supervision component (with maybe the option to add hooks in
> lua/scripting)  is more appropriate.
>
> - In all case, when using the DeploymentComponent, I think that we will need the
> possibility to add optional peer/ports/services. What do you think about that?
>
> - Finally, is there somewhere that I can find an example implementation (Lua,
> ruby, ...) ?
>
> Thank you,
>
> Philippe
>
> 2010/11/19 Peter Soetens <peter [..] ...>
> Hi Philippe,
>
>
> On Thursday 18 November 2010 16:51:34 Philippe Hamelin wrote:
> > Hello,
> >
> > we are trying to implement a connection manager on top of RTT to
> cleanly
> > handle disconnection/reconnection of ports and services
> (operations). Here
> > is the requierements:
>
> You're tackling a problem here everyone is aware of. The general idea was
> to
> not further extend the DeploymentComponent, but let it more act as a 'dumb
> slave' and let a supervision component monitor all activity and inform the
> DC
> of actions to take. I am against extending the DC even more than it
> already
> is.
>
> An example: the lua work of Markus uses the DC as a dumb slave to do the
> classical port/peer connections and configurations. RTT scripting will
> have
> similar capabilities in 2.2, but with lesser expressiveness than lua. The
> logic you describe below could be best expressed in such a flexible
> scripting
> language. Note that Sylvain's ruby supervision framework also contains
> this
> functionality, although it has been built on top of orogen, and might
> require
> that all typekits are generated by orogen/typegen (Sylvain can comment
> best on
> the requirements).
>
> I would rather extend the DC to provide the necessary hooks such that such
> a
> supervision program can use them to do what it can do best.
>
> The first hook missing is a notification mechanism that a component, port
> connection or service appeared or disappeared. With that information, you
> could really get started providing simple rules for re-connection and
> discovery.
>
> Peter
>
> >
> > - Detect disconnection of ports and operations, which can be one of
> these
> > use case:
> >     1. Peer not present (connection fail at startup/deployment)
> >     2. Peer just "quit" at runtime (application killed of cleanly
> closed)
> >     3. Network connection problem (timeout)
> > - When a disconnection is detected:
> >     1. Disconnect all ports/services related to this peer
> >     2. Automatically trigger a reconnection attempt when a disconnection
> is
> > detected (in an other thread)
> >
> > We are currently looking for a good design to implement that. Actually,
> we
> > were able to do that quite easily because we weren't using the deployer
> > (e.g. remote GUI). To this end, we were validating ready() and
> try/catching
> > all remote operation. We are however looking for a more generic design
> that
> > would be also usable in deployed applications. We are currently facing
> some
> > problems:
> >
> > When deploying an application with remote port connection, the
> > deployer kick-start fail. We would like to be able to have "optional"
> > connections and manage the connection retries in an other thread when
> the
> > components are running. Also, the port has no memory (tell me if I'm
> wrong)
> > about it's remote peer.port, so we can't handle the connection later on.
> > The same thing would also apply to remote operations/services.
> >
> > To handle that, our first idea was to modify the DeploymentComponent to
> > store the connections configuration through an external plugin that
> would
> > manage the disconnection/reconnection in its own thread. This plugin
> could
> > periodically "ping" (e.g. getTaskState()) all peers to detect
> > disconnection, but could also be notified by components when they detect
> > any connection failure (e.g. CORBA exception). I'm not really sure that
> > it's the best design and I'm also worried about the "thread-safety" of
> > managing
> > connections in an other thread while a component is running and possibly
> > interacting with these operations/ports.
> >
> > I would greatly appreciate any comments/suggestions about the design to
> > implement this in RTT-2.
> >
> > Thank you,
> >
> > Philippe Hamelin
>
>
>
>

--
K.U.Leuven, Mechanical Eng., Mechatronics & Robotics Research Group
<http://people.mech.kuleuven.be/~bruyninc> Tel: +32 16 328056
EURON Coordinator (European Robotics Research Network) <http://www.euron.org>
Open Realtime Control Services <http://www.orocos.org>
Associate Editor JOSER <http://www.joser.org>, IJRR <http://www.ijrr.org>

Optional dataflow/services in deployer and managing disconnectio

I forget to add one more fundamental question. Is it possible to connect
ports/operations of components while they are running? I found that the
ready() method of ServiceRequester doesn't seems to be thread-safe so I
don't see how it could be possible to manage reconnection in runtime?

2010/11/19 Philippe Hamelin <philippe [dot] hamelin [..] ...>

> Hi Peter,
>
> thank you Peter for these interesting comments. As far as I see many people
> has tried to implement something to fit their needs about the connections
> management. I understand the idea behing the *supervision component*, and
> the plugin that I proposed was aiming to that. However, given that this
> plugin would anyway need a thread, I agree that it would be better to start
> from a TaskContext and implement it as an Orocos component. I actually don't
> fully understand the design that you are thinking about, so please let me
> ask you some more questions.
>
> - One thing that I can't see in your proposition is where the connections
> data is stored? Currently, this information seems somewhat volatile and we
> will obviously need to maintain a database about the various connections to
> ensure their supervision or handle the reconnections.
>
> - I agree that the DeploymentComponent must act as a "dumb slave". My
> proposition was not to make the DC handle the reconnection, but only make it
> "pushes" the configuration of the connections (coming from the XML) to
> another component (e.g. Supervision component) that will manage the
> monitoring/reconnection. The idea behind that is to make the dependency to
> the DC "optional", so that someone that use C++ connections could also
> "manually" pushes these information to the Supervision component. Does it
> make senses to you?
>
> - I do not agree about implementing this application-wide feature over the
> scripting/lua service. For embedded systems, who actually clearly needs this
> "connection robustness", they probably can't afford such a big dependency. I
> think that using a Supervision component (with maybe the option to add hooks
> in lua/scripting) is more appropriate.
>
> - In all case, when using the DeploymentComponent, I think that we will
> need the possibility to add optional peer/ports/services. What do you think
> about that?
>
> - Finally, is there somewhere that I can find an example implementation
> (Lua, ruby, ...) ?
>
> Thank you,
>
> Philippe
>
> 2010/11/19 Peter Soetens <peter [..] ...>
>
> Hi Philippe,
>>
>>
>> On Thursday 18 November 2010 16:51:34 Philippe Hamelin wrote:
>> > Hello,
>> >
>> > we are trying to implement a connection manager on top of RTT to cleanly
>> > handle disconnection/reconnection of ports and services (operations).
>> Here
>> > is the requierements:
>>
>> You're tackling a problem here everyone is aware of. The general idea was
>> to
>> not further extend the DeploymentComponent, but let it more act as a 'dumb
>> slave' and let a supervision component monitor all activity and inform the
>> DC
>> of actions to take. I am against extending the DC even more than it
>> already
>> is.
>>
>> An example: the lua work of Markus uses the DC as a dumb slave to do the
>> classical port/peer connections and configurations. RTT scripting will
>> have
>> similar capabilities in 2.2, but with lesser expressiveness than lua. The
>> logic you describe below could be best expressed in such a flexible
>> scripting
>> language. Note that Sylvain's ruby supervision framework also contains
>> this
>> functionality, although it has been built on top of orogen, and might
>> require
>> that all typekits are generated by orogen/typegen (Sylvain can comment
>> best on
>> the requirements).
>>
>> I would rather extend the DC to provide the necessary hooks such that such
>> a
>> supervision program can use them to do what it can do best.
>>
>> The first hook missing is a notification mechanism that a component, port
>> connection or service appeared or disappeared. With that information, you
>> could really get started providing simple rules for re-connection and
>> discovery.
>>
>> Peter
>>
>> >
>> > - Detect disconnection of ports and operations, which can be one of
>> these
>> > use case:
>> > 1. Peer not present (connection fail at startup/deployment)
>> > 2. Peer just "quit" at runtime (application killed of cleanly
>> closed)
>> > 3. Network connection problem (timeout)
>> > - When a disconnection is detected:
>> > 1. Disconnect all ports/services related to this peer
>> > 2. Automatically trigger a reconnection attempt when a disconnection
>> is
>> > detected (in an other thread)
>> >
>> > We are currently looking for a good design to implement that. Actually,
>> we
>> > were able to do that quite easily because we weren't using the deployer
>> > (e.g. remote GUI). To this end, we were validating ready() and
>> try/catching
>> > all remote operation. We are however looking for a more generic design
>> that
>> > would be also usable in deployed applications. We are currently facing
>> some
>> > problems:
>> >
>> > When deploying an application with remote port connection, the
>> > deployer kick-start fail. We would like to be able to have "optional"
>> > connections and manage the connection retries in an other thread when
>> the
>> > components are running. Also, the port has no memory (tell me if I'm
>> wrong)
>> > about it's remote peer.port, so we can't handle the connection later on.
>> > The same thing would also apply to remote operations/services.
>> >
>> > To handle that, our first idea was to modify the DeploymentComponent to
>> > store the connections configuration through an external plugin that
>> would
>> > manage the disconnection/reconnection in its own thread. This plugin
>> could
>> > periodically "ping" (e.g. getTaskState()) all peers to detect
>> > disconnection, but could also be notified by components when they detect
>> > any connection failure (e.g. CORBA exception). I'm not really sure that
>> > it's the best design and I'm also worried about the "thread-safety" of
>> > managing
>> > connections in an other thread while a component is running and possibly
>> > interacting with these operations/ports.
>> >
>> > I would greatly appreciate any comments/suggestions about the design to
>> > implement this in RTT-2.
>> >
>> > Thank you,
>> >
>> > Philippe Hamelin
>>
>
>

Optional dataflow/services in deployer and managing disconnectio

On Friday 19 November 2010 15:59:56 Philippe Hamelin wrote:
> I forget to add one more fundamental question. Is it possible to connect
> ports/operations of components while they are running? I found that the
> ready() method of ServiceRequester doesn't seems to be thread-safe so I
> don't see how it could be possible to manage reconnection in runtime?
>

Was it a conflict between ready() and addOperationCaller() ?

In general, only the communication primitives (excluding properties) of the
RTT are thread safe . All other functions are not, although this is not
necessarily so. For example, we'll probably make the plugin loader thread-safe
very soon, since multiple components in the same process can try to load a
plugin. There might be other such cases which we can always consider to make
thread-safe.

Specifically, (dis-)connecting ports is certainly thread-safe in a running
system, and so is setting up an operation call.

What *isn't* thread-safe is adding ports or operations. For example, adding an
operation while ports/operations are connected will cause a crash. The same
holds for ServiceRequester::ready(). If you add operationcallers to the SR
during a ready(), connectTo() etc call, it will crash.

Peter

Optional dataflow/services in deployer and managing disconnectio

2010/11/22 Peter Soetens <peter [..] ...>

> On Friday 19 November 2010 15:59:56 Philippe Hamelin wrote:
> > I forget to add one more fundamental question. Is it possible to connect
> > ports/operations of components while they are running? I found that the
> > ready() method of ServiceRequester doesn't seems to be thread-safe so I
> > don't see how it could be possible to manage reconnection in runtime?
> >
>
> Was it a conflict between ready() and addOperationCaller() ?
>
> In general, only the communication primitives (excluding properties) of the
> RTT are thread safe . All other functions are not, although this is not
> necessarily so. For example, we'll probably make the plugin loader
> thread-safe
> very soon, since multiple components in the same process can try to load a
> plugin. There might be other such cases which we can always consider to
> make
> thread-safe.
>
> Specifically, (dis-)connecting ports is certainly thread-safe in a running
> system, and so is setting up an operation call.
>

What *isn't* thread-safe is adding ports or operations. For example, adding
> an
> operation while ports/operations are connected will cause a crash. The same
> holds for ServiceRequester::ready(). If you add operationcallers to the SR
> during a ready(), connectTo() etc call, it will crash.
>
> Peter
>

Maybe my question wasn't that clear. Let say we have two remote corba
components running. One has a service, while the other has an
OperationCaller (not connected yet). The OperationCaller::ready() returns
"false" because it's not connected. A third-party application (e.g. Deployer
or Supervisor) try to connect the OperationCaller to the remote Operation.
While connecting (i.e. creating/initializing the implementation of the
OperationCaller), the component calls the OperationCaller::ready() again.
Since the connection operation isn't atomic, is it possible that the ready()
returns true even if the connection process isn't completely done?

This can be briefly resumed to disconnecting/reconnecting OperationCaller in
an other thread while they still in use by the component.

Philippe

Optional dataflow/services in deployer and managing disconnectio

On Wednesday 24 November 2010 19:51:37 Philippe Hamelin wrote:
> 2010/11/22 Peter Soetens <peter [..] ...>
>
> > On Friday 19 November 2010 15:59:56 Philippe Hamelin wrote:
> > > I forget to add one more fundamental question. Is it possible to
> > > connect ports/operations of components while they are running? I found
> > > that the ready() method of ServiceRequester doesn't seems to be
> > > thread-safe so I don't see how it could be possible to manage
> > > reconnection in runtime?
> >
> > Was it a conflict between ready() and addOperationCaller() ?
> >
> > In general, only the communication primitives (excluding properties) of
> > the RTT are thread safe . All other functions are not, although this is
> > not necessarily so. For example, we'll probably make the plugin loader
> > thread-safe
> > very soon, since multiple components in the same process can try to load
> > a plugin. There might be other such cases which we can always consider
> > to make
> > thread-safe.
> >
> > Specifically, (dis-)connecting ports is certainly thread-safe in a
> > running system, and so is setting up an operation call.
>
> What *isn't* thread-safe is adding ports or operations. For example, adding
>
> > an
> > operation while ports/operations are connected will cause a crash. The
> > same holds for ServiceRequester::ready(). If you add operationcallers to
> > the SR during a ready(), connectTo() etc call, it will crash.
> >
> > Peter
>
> Maybe my question wasn't that clear. Let say we have two remote corba
> components running. One has a service, while the other has an
> OperationCaller (not connected yet). The OperationCaller::ready() returns
> "false" because it's not connected. A third-party application (e.g.
> Deployer or Supervisor) try to connect the OperationCaller to the remote
> Operation. While connecting (i.e. creating/initializing the implementation
> of the OperationCaller), the component calls the OperationCaller::ready()
> again. Since the connection operation isn't atomic, is it possible that
> the ready() returns true even if the connection process isn't completely
> done?

Did you observe that ? It shouldn't be, because 'ready' only returns true if
an implementation is present *and* the implementation returns true (by testing
the remote connection for example). The implementation object is created first,
and then added to the operation caller.

That's the theory. In case you saw something else, there might be a bug in
operator= of OperationCaller. In case you have the test-case, could you apply
the patch in attachment and see if something changes ?

>
> This can be briefly resumed to disconnecting/reconnecting OperationCaller
> in an other thread while they still in use by the component.

Disconnection will lead to an exception during a call or send + ready() will
return false. Reconnecting first reconnects + tests on a test object, and then
copies it and makes ready() true (at least with the patch in attachment :-)

So in this perspective, it must/should be thread-safe.

Peter

Optional dataflow/services in deployer and managing disconnectio

2010/11/24 Peter Soetens <peter [..] ...>

> On Wednesday 24 November 2010 19:51:37 Philippe Hamelin wrote:
> > 2010/11/22 Peter Soetens <peter [..] ...>
> >
> > > On Friday 19 November 2010 15:59:56 Philippe Hamelin wrote:
> > > > I forget to add one more fundamental question. Is it possible to
> > > > connect ports/operations of components while they are running? I
> found
> > > > that the ready() method of ServiceRequester doesn't seems to be
> > > > thread-safe so I don't see how it could be possible to manage
> > > > reconnection in runtime?
> > >
> > > Was it a conflict between ready() and addOperationCaller() ?
> > >
> > > In general, only the communication primitives (excluding properties) of
> > > the RTT are thread safe . All other functions are not, although this is
> > > not necessarily so. For example, we'll probably make the plugin loader
> > > thread-safe
> > > very soon, since multiple components in the same process can try to
> load
> > > a plugin. There might be other such cases which we can always consider
> > > to make
> > > thread-safe.
> > >
> > > Specifically, (dis-)connecting ports is certainly thread-safe in a
> > > running system, and so is setting up an operation call.
> >
> > What *isn't* thread-safe is adding ports or operations. For example,
> adding
> >
> > > an
> > > operation while ports/operations are connected will cause a crash. The
> > > same holds for ServiceRequester::ready(). If you add operationcallers
> to
> > > the SR during a ready(), connectTo() etc call, it will crash.
> > >
> > > Peter
> >
> > Maybe my question wasn't that clear. Let say we have two remote corba
> > components running. One has a service, while the other has an
> > OperationCaller (not connected yet). The OperationCaller::ready() returns
> > "false" because it's not connected. A third-party application (e.g.
> > Deployer or Supervisor) try to connect the OperationCaller to the remote
> > Operation. While connecting (i.e. creating/initializing the
> implementation
> > of the OperationCaller), the component calls the OperationCaller::ready()
> > again. Since the connection operation isn't atomic, is it possible that
> > the ready() returns true even if the connection process isn't completely
> > done?
>
> Did you observe that ? It shouldn't be, because 'ready' only returns true
> if
> an implementation is present *and* the implementation returns true (by
> testing
> the remote connection for example). The implementation object is created
> first,
> and then added to the operation caller.
>
>
No I didn't observe that, altough I didn't specifically tested that yet. I
wasn't 100% sure that the implementation would be completly initialized when
the pointer is stored in the OperationCaller.

> That's the theory. In case you saw something else, there might be a bug in
> operator= of OperationCaller. In case you have the test-case, could you
> apply
> the patch in attachment and see if something changes ?
>
> >
> > This can be briefly resumed to disconnecting/reconnecting OperationCaller
> > in an other thread while they still in use by the component.
>
> Disconnection will lead to an exception during a call or send + ready()
> will
> return false. Reconnecting first reconnects + tests on a test object, and
> then
> copies it and makes ready() true (at least with the patch in attachment :-)
>
> So in this perspective, it must/should be thread-safe.
>
> Peter
>
>
Also, you did mention that:

<quote>
What *isn't* thread-safe is adding ports or operations. For example, adding
an
operation while ports/operations are connected will cause a crash. The same
holds for ServiceRequester::ready(). If you add operationcallers to the SR
during a ready(), connectTo() etc call, it will crash.
<quote>

What you mean is that we can't add an OperationCaller while it's being
connected? I guess that this doesn't mean that we can't add an
OperationCaller to the TaskContext interface while the component is running?

We don't know yet if we are going on C++, Lua or Ruby for our supervision
framework. We have to draw some use case and see what's going to be the best
solution for our embedded systems. However, we will certainly have to
extensively test these things in the next weeks so I will keep you in touch
if we are having problems.

Philippe

Optional dataflow/services in deployer and managing disconnectio

On Thursday 25 November 2010 14:22:50 Philippe Hamelin wrote:
> 2010/11/24 Peter Soetens <peter [..] ...>
>
> > On Wednesday 24 November 2010 19:51:37 Philippe Hamelin wrote:
> > > 2010/11/22 Peter Soetens <peter [..] ...>
> > >
> > > > On Friday 19 November 2010 15:59:56 Philippe Hamelin wrote:
> > > > > I forget to add one more fundamental question. Is it possible to
> > > > > connect ports/operations of components while they are running? I
> >
> > found
> >
> > > > > that the ready() method of ServiceRequester doesn't seems to be
> > > > > thread-safe so I don't see how it could be possible to manage
> > > > > reconnection in runtime?
> > > >
> > > > Was it a conflict between ready() and addOperationCaller() ?
> > > >
> > > > In general, only the communication primitives (excluding properties)
> > > > of the RTT are thread safe . All other functions are not, although
> > > > this is not necessarily so. For example, we'll probably make the
> > > > plugin loader thread-safe
> > > > very soon, since multiple components in the same process can try to
> >
> > load
> >
> > > > a plugin. There might be other such cases which we can always
> > > > consider to make
> > > > thread-safe.
> > > >
> > > > Specifically, (dis-)connecting ports is certainly thread-safe in a
> > > > running system, and so is setting up an operation call.
> > >
> > > What *isn't* thread-safe is adding ports or operations. For example,
> >
> > adding
> >
> > > > an
> > > > operation while ports/operations are connected will cause a crash.
> > > > The same holds for ServiceRequester::ready(). If you add
> > > > operationcallers
> >
> > to
> >
> > > > the SR during a ready(), connectTo() etc call, it will crash.
> > > >
> > > > Peter
> > >
> > > Maybe my question wasn't that clear. Let say we have two remote corba
> > > components running. One has a service, while the other has an
> > > OperationCaller (not connected yet). The OperationCaller::ready()
> > > returns "false" because it's not connected. A third-party application
> > > (e.g. Deployer or Supervisor) try to connect the OperationCaller to
> > > the remote Operation. While connecting (i.e. creating/initializing the
> >
> > implementation
> >
> > > of the OperationCaller), the component calls the
> > > OperationCaller::ready() again. Since the connection operation isn't
> > > atomic, is it possible that the ready() returns true even if the
> > > connection process isn't completely done?
> >
> > Did you observe that ? It shouldn't be, because 'ready' only returns true
> > if
> > an implementation is present *and* the implementation returns true (by
> > testing
> > the remote connection for example). The implementation object is created
> > first,
> > and then added to the operation caller.
>
> No I didn't observe that, altough I didn't specifically tested that yet. I
> wasn't 100% sure that the implementation would be completly initialized
> when the pointer is stored in the OperationCaller.
>
> > That's the theory. In case you saw something else, there might be a bug
> > in operator= of OperationCaller. In case you have the test-case, could
> > you apply
> > the patch in attachment and see if something changes ?
> >
> > > This can be briefly resumed to disconnecting/reconnecting
> > > OperationCaller in an other thread while they still in use by the
> > > component.
> >
> > Disconnection will lead to an exception during a call or send + ready()
> > will
> > return false. Reconnecting first reconnects + tests on a test object, and
> > then
> > copies it and makes ready() true (at least with the patch in attachment
> > :-)
> >
> > So in this perspective, it must/should be thread-safe.
> >
> > Peter
>
> Also, you did mention that:
>
> <quote>
> What *isn't* thread-safe is adding ports or operations. For example, adding
> an
> operation while ports/operations are connected will cause a crash. The same
> holds for ServiceRequester::ready(). If you add operationcallers to the SR
> during a ready(), connectTo() etc call, it will crash.
> <quote>
>
> What you mean is that we can't add an OperationCaller while it's being
> connected?

Yes. We use an stl containre to store the callers, which is not protected.

> I guess that this doesn't mean that we can't add an
> OperationCaller to the TaskContext interface while the component is
> running?

That's indeed something else. For the record: the reason you can't do it is
due to the lack of a mutex around an STL container, not due to a
design/architectural issue. We could provide a cmake option to make the
interface itself thread-safe as well, which would be desired in cases of
components which change their interface afther they have been connected to
others. This would not be a big addition.

Peter