CORBA connection lost

Hello,
I would like to know if Orocos can manage the lost of network
connection for components connected through Corba? I have tried the
following two setups :

1. "Hello" component, on computer #1, connected through CORBA to
"World" component, on computer #2. The "World" component is configured
as the server with the cdeployer. The "ctaskbrowser", on computer #1,
is connected to the "Hello" component.

2. "World" component, on computer #2 and is configured as a server via
the cdeployer. The "ctaskbrowser", on computer #1, is connected to the
"Hello" component.

Hence, the main difference between the two setups, is that in setup #1
the ctaskbrowser is connected to the remote component via another
component. To validate the behaviour of the system, I have unconnected
the network cable of computer #2. Then, I typed "ls" in the
ctaskbrowser and nothing appends. After 1 minute, I have reconnected
the network cable of computer #2, and the "ls" executed correctly.

What's the predicted behaviour of components to a connection lost? Is
it normal that the taskbrowser just "wait" if the connection is lost
to a remote peer? The idea behind this test is that we want to handle
the lost of connection with remote peer to ensure the reliability of
the entire system.

Thank you!

Philippe Hamelin

CORBA connection lost

On Wednesday 04 February 2009 21:12:45 Philippe Hamelin wrote:
> Hello,
> I would like to know if Orocos can manage the lost of network
> connection for components connected through Corba? I have tried the
> following two setups :
>
> 1. "Hello" component, on computer #1, connected through CORBA to
> "World" component, on computer #2. The "World" component is configured
> as the server with the cdeployer. The "ctaskbrowser", on computer #1,
> is connected to the "Hello" component.
>
> 2. "World" component, on computer #2 and is configured as a server via
> the cdeployer. The "ctaskbrowser", on computer #1, is connected to the
> "Hello" component.
>
> Hence, the main difference between the two setups, is that in setup #1
> the ctaskbrowser is connected to the remote component via another
> component. To validate the behaviour of the system, I have unconnected
> the network cable of computer #2. Then, I typed "ls" in the
> ctaskbrowser and nothing appends. After 1 minute, I have reconnected
> the network cable of computer #2, and the "ls" executed correctly.
>
> What's the predicted behaviour of components to a connection lost? Is
> it normal that the taskbrowser just "wait" if the connection is lost
> to a remote peer? The idea behind this test is that we want to handle
> the lost of connection with remote peer to ensure the reliability of
> the entire system.

There are three major cases which you want to protect against:

1. remote program 'crash'
2. remote program 'quits'
3. network connectivety problems but both programs keep running.

You only tested case 3. here: CORBA is TCP/IP based, which means lost packets
are compensated for until the connection comes up again. Both programs had
their sockets still alive, so both waited until the connection restored. I'm
assuming pulling the cable does not bring the 'eth' interface down.

Case 1 and 2 are different from 3:
It means the connection is still alive, but the sockets are closed. I'm not a
TCP/IP specialist, but this is detected at the other side and that socket then
closes as well, leading to a CORBA exception in our case. When we receive a
CORBA exception in the RTT, we catch it and clean up the connections. No
exception is thrown to the user (this might be a bad idea). This means ports
are disconnected and proxy objects become 'dead'.

There might be a difference between case 1 and 2, since in case 1, a CORBA call
could be in progress (which alegedly caused the crash) and might deliver a
different exception than case 2. But we don't distinguish and clean up anyway.

You can use 'ready()' on the RTT primitives to check for validity. I'm sure it
works for ports, and peers (TaskContext::ready()), I'm unsure about
properties, commands and methods... but they should be able in principle (a
unit test should conclude this).

The main flaw is that once a peer crashes/quits, we don't try to reconnect.
You'd need to ask the DeploymentComponent to unload the proxy and re-create a
new one in a supervision-like task. It's not very elegant and you'd need to
verify that all links to the old proxy are removed (all except ports, which
are disconnected automatically.) and re-connected to the new proxy. You could
only do so by calling stop(), configure(), start() on all the old proxy's
peers.

A peer going lost is like a service you need for proper operation going down
permanently. Maybe exceptions should be thrown anyway for this very drastic
event. On the other hand, a 'broken wire' has severe disadvantages as well.
Your method call just blocks until the connection is restored and you have no
way to interrupt or detect that (from your current thread).

While I'm writing lengthy mails, I might add that another robot communication
framework ( http://developer.berlios.de/projects/rack/ ) solved this by
disconnecting the 'worker' thread from the 'communications' thread, such that
the worker was never stalled by communication problems. For data flow, this is
certainly more robust. For the other communication primitives ('service or
client/server' oriented) you'd probably like to specify a timeout instead. We
got neither.

Your comments/flames (*why didn't you tell me _before_ !*) are welcome :-)

Peter

CORBA connection lost

2009/2/4 Peter Soetens <peter [dot] soetens [..] ...>:
> On Wednesday 04 February 2009 21:12:45 Philippe Hamelin wrote:
>> Hello,
>> I would like to know if Orocos can manage the lost of network
>> connection for components connected through Corba? I have tried the
>> following two setups :
>>
>> 1. "Hello" component, on computer #1, connected through CORBA to
>> "World" component, on computer #2. The "World" component is configured
>> as the server with the cdeployer. The "ctaskbrowser", on computer #1,
>> is connected to the "Hello" component.
>>
>> 2. "World" component, on computer #2 and is configured as a server via
>> the cdeployer. The "ctaskbrowser", on computer #1, is connected to the
>> "Hello" component.
>>
>> Hence, the main difference between the two setups, is that in setup #1
>> the ctaskbrowser is connected to the remote component via another
>> component. To validate the behaviour of the system, I have unconnected
>> the network cable of computer #2. Then, I typed "ls" in the
>> ctaskbrowser and nothing appends. After 1 minute, I have reconnected
>> the network cable of computer #2, and the "ls" executed correctly.
>>
>> What's the predicted behaviour of components to a connection lost? Is
>> it normal that the taskbrowser just "wait" if the connection is lost
>> to a remote peer? The idea behind this test is that we want to handle
>> the lost of connection with remote peer to ensure the reliability of
>> the entire system.
>
> There are three major cases which you want to protect against:
>
> 1. remote program 'crash'
> 2. remote program 'quits'
> 3. network connectivety problems but both programs keep running.
>
> You only tested case 3. here: CORBA is TCP/IP based, which means lost packets
> are compensated for until the connection comes up again. Both programs had
> their sockets still alive, so both waited until the connection restored. I'm
> assuming pulling the cable does not bring the 'eth' interface down.

Concerning this case, I found that it's possible to manage
request/reply timeout with TAO using the CORBA::TIMEOUT exception :

http://www.cs.wustl.edu/~schmidt/PDF/C++-report-col19.pdf
http://www.theaceorb.com/faq/index.html#097

It seems only a matter of overriding the messaging policies in the
orb. As stated here :

http://objectmix.com/object/195435-corba-timeout.html

the default timeout is "infinite" and this is why we don't catch a
CORBA::TIMEOUT exception. Since I'm not familiar with Corba and RTT
internals, do you know if this is something that could be added to RTT
?

I think that we could trap all the 3 cases in the same way. However,
at this moment I don't really know what should be the clean and good
way to handle the timeouts (maybe via the TaskContext::ready() as you
suggested?).

>
> Case 1 and 2 are different from 3:
> It means the connection is still alive, but the sockets are closed. I'm not a
> TCP/IP specialist, but this is detected at the other side and that socket then
> closes as well, leading to a CORBA exception in our case. When we receive a
> CORBA exception in the RTT, we catch it and clean up the connections. No
> exception is thrown to the user (this might be a bad idea). This means ports
> are disconnected and proxy objects become 'dead'.
>
> There might be a difference between case 1 and 2, since in case 1, a CORBA call
> could be in progress (which alegedly caused the crash) and might deliver a
> different exception than case 2. But we don't distinguish and clean up anyway.
>
> You can use 'ready()' on the RTT primitives to check for validity. I'm sure it
> works for ports, and peers (TaskContext::ready()), I'm unsure about
> properties, commands and methods... but they should be able in principle (a
> unit test should conclude this).
>
> The main flaw is that once a peer crashes/quits, we don't try to reconnect.
> You'd need to ask the DeploymentComponent to unload the proxy and re-create a
> new one in a supervision-like task. It's not very elegant and you'd need to
> verify that all links to the old proxy are removed (all except ports, which
> are disconnected automatically.) and re-connected to the new proxy. You could
> only do so by calling stop(), configure(), start() on all the old proxy's
> peers.
>

Also, we could add an "auto-reconnect" setting to the proxy that may
be available in the deployer xml. Assuming that we could trap all
corba exceptions, there should be an elegant way to add this feature
without being annoying for other users.

> A peer going lost is like a service you need for proper operation going down
> permanently. Maybe exceptions should be thrown anyway for this very drastic
> event. On the other hand, a 'broken wire' has severe disadvantages as well.
> Your method call just blocks until the connection is restored and you have no
> way to interrupt or detect that (from your current thread).
>
> While I'm writing lengthy mails, I might add that another robot communication
> framework ( http://developer.berlios.de/projects/rack/ ) solved this by
> disconnecting the 'worker' thread from the 'communications' thread, such that
> the worker was never stalled by communication problems. For data flow, this is
> certainly more robust. For the other communication primitives ('service or
> client/server' oriented) you'd probably like to specify a timeout instead. We
> got neither.
>
> Your comments/flames (*why didn't you tell me _before_ !*) are welcome :-)
>
> Peter
> --
> Peter Soetens -- FMTC -- <http://www.fmtc.be>
> --
> Orocos-Dev mailing list
> Orocos-Dev [..] ...
> http://lists.mech.kuleuven.be/mailman/listinfo/orocos-dev
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
>

CORBA connection lost

Hi Simon,

On Thursday 05 February 2009 16:42:32 Simon Pelletier-Thibault wrote:
> I would like to know how we can reconnect the ctaskbrowser to a component
> after the component had crashed. Is their a command to reconnect the
> componnent and can we program something to reconnect In C++?

I believe you can, but it will require the code from ControlTaskProxy's
constructors to be factored out in functions. The name and is_ior will need to
be stored as member variables. Then a 'reconnect()' function needs to be made
which calls basically these functions.

However, 'automatic' reconnection is a different story.

If a CORBA primitive detects a termination of the server, it could look up the
proxy TaskContext (all proxies are registered globally) if it got the
Corba::ControlTask_ptr and call the reconnect() function. The question remains
then if it fails, who will retry (and how many times). Also, all your
commands, methods etc, need to be rebuilt using the new server interface.

CORBA has a solution as well: persistence. It requires special configuration of
the CORBA server, such that it re-uses the same IOR as the first time, such
that it can be contacted again using the same IOR at the proxy side. But we
can't use that for reviving commands, methods and data sources (expressions),
because these are created on the fly and there's no way to recover them after a
crash of your server.

>
> I am currently trying my own patch on ControlTaskProxy.cpp and
> ControlTaskServer.cpp using http://www.theaceorb.com/faq/index.html#097.
> Since you are using catch(...) in corba communication I think we would have
> the same behaviour for case 1,2,3.

The current implementation is not ready to handle this gracefully, especially
since each primitive has its own CORBA server. It will require the redesign of
RTT 2.0 (which doesn't exist yet) to get this fixed.

Peter

CORBA connection lost

I would like to know how we can reconnect the ctaskbrowser to a component after the component had crashed.  Is their a command to reconnect the componnent and can we program something to reconnect In C++?


I am currently trying my own patch on ControlTaskProxy.cpp and ControlTaskServer.cpp using http://www.theaceorb.com/faq/index.html#097. Since you are using catch(...) in corba communication I think we would have the same behaviour for case 1,2,3.



thanks.



On Wed, Feb 4, 2009 at 7:51 PM, Philippe Hamelin <span dir="ltr"><philippe [dot] hamelin [..] ...><span> wrote:

<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid">2009/2/4 Peter Soetens <peter [dot] soetens [..] ...>:

> On Wednesday 04 February 2009 21:12:45 Philippe Hamelin wrote:
>> Hello,
>> I would like to know if Orocos can manage the lost of network
>> connection for components connected through Corba? I have tried the

>> following two setups :
>>
>> 1. "Hello" component, on computer #1, connected through CORBA to
>> "World" component, on computer #2. The "World" component is configured

>> as the server with the cdeployer. The "ctaskbrowser", on computer #1,
>> is connected to the "Hello" component.
>>
>> 2. "World" component, on computer #2 and is configured as a server via

>> the cdeployer. The "ctaskbrowser", on computer #1, is connected to the
>> "Hello" component.
>>
>> Hence, the main difference between the two setups, is that in setup #1

>> the ctaskbrowser is connected to the remote component via another
>> component. To validate the behaviour of the system, I have unconnected
>> the network cable of computer #2. Then, I typed "ls" in the

>> ctaskbrowser and nothing appends. After 1 minute, I have reconnected
>> the network cable of computer #2, and the "ls" executed correctly.
>>
>> What's the predicted behaviour of components to a connection lost? Is

>> it normal that the taskbrowser just "wait" if the connection is lost
>> to a remote peer? The idea behind this test is that we want to handle
>> the lost of connection with remote peer to ensure the reliability of

>> the entire system.
>
> There are three major cases which you want to protect against:
>
> 1. remote program 'crash'
> 2. remote program 'quits'
> 3. network connectivety problems but both programs keep running.

>
> You only tested case 3. here: CORBA is TCP/IP based, which means lost packets
> are compensated for until the connection comes up again. Both programs had
> their sockets still alive, so both waited until the connection restored. I'm

> assuming pulling the cable does not bring the 'eth' interface down.

Concerning this case, I found that it's possible to manage
request/reply timeout with TAO using the CORBA::TIMEOUT exception :


http://www.cs.wustl.edu/~schmidt/PDF/C++-report-col19.pdf
http://www.theaceorb.com/faq/index.html#097


It seems only a matter of overriding the messaging policies in the
orb. As stated here :

http://objectmix.com/object/195435-corba-timeout.html


the default timeout is "infinite" and this is why we don't catch a
CORBA::TIMEOUT exception. Since I'm not familiar with Corba and RTT
internals, do you know if this is something that could be added to RTT

?

I think that we could trap all the 3 cases in the same way. However,
at this moment I don't really know what should be the clean and good
way to handle the timeouts (maybe via the TaskContext::ready() as you

suggested?).


>
> Case 1 and 2 are different from 3:
> It means the connection is still alive, but the sockets are closed. I'm not a
> TCP/IP specialist, but this is detected at the other side and that socket then

> closes as well, leading to a CORBA exception in our case. When we receive a
> CORBA exception in the RTT, we catch it and clean up the connections. No
> exception is thrown to the user (this might be a bad idea). This means ports

> are disconnected and proxy objects become 'dead'.
>
> There might be a difference between case 1 and 2, since in case 1, a CORBA call
> could be in progress (which alegedly caused the crash) and might deliver a

> different exception than case 2. But we don't distinguish and clean up anyway.
>
> You can use 'ready()' on the RTT primitives to check for validity. I'm sure it
> works for ports, and peers (TaskContext::ready()), I'm unsure about

> properties, commands and methods... but they should be able in principle (a
> unit test should conclude this).
>
> The main flaw is that once a peer crashes/quits, we don't try to reconnect.

> You'd need to ask the DeploymentComponent to unload the proxy and re-create a
> new one in a supervision-like task. It's not very elegant and you'd need to
> verify that all links to the old proxy are removed (all except ports, which

> are disconnected automatically.) and re-connected to the new proxy. You could
> only do so by calling stop(), configure(), start() on all the old proxy's
> peers.
>

Also, we could add an "auto-reconnect" setting to the proxy that may

be available in the deployer xml. Assuming that we could trap all
corba exceptions, there should be an elegant way to add this feature
without being annoying for other users.


> A peer going lost is like a service you need for proper operation going down
> permanently. Maybe exceptions should be thrown anyway for this very drastic
> event. On the other hand, a 'broken wire' has severe disadvantages as well.

> Your method call just blocks until the connection is restored and you have no
> way to interrupt or detect that (from your current thread).
>
> While I'm writing lengthy mails, I might add that another robot communication

> framework ( http://developer.berlios.de/projects/rack/ ) solved this by
> disconnecting the 'worker' thread from the 'communications' thread, such that

> the worker was never stalled by communication problems. For data flow, this is
> certainly more robust. For the other communication primitives ('service or
> client/server' oriented) you'd probably like to specify a timeout instead. We

> got neither.
>
> Your comments/flames (*why didn't you tell me _before_ !*) are welcome :-)
>
> Peter
> --
> Peter Soetens -- FMTC -- <http://www.fmtc.be>

> --
> Orocos-Dev mailing list
> Orocos-Dev [..] ...
> http://lists.mech.kuleuven.be/mailman/listinfo/orocos-dev

>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
>

<blockquote>



CORBA connection lost

I tried to apply http://www.theaceorb.com/faq/index.html#097 , Does TAO support request/reply timeouts?, in ControlTaskProxy.cpp,ControlTaskServer.cpp.
 
I got a problem at  the line :

policies[0] =

orb->create_policy (Messaging::RELATIVE_RT_TIMEOUT_POLICY_TYPE,

relative_rt_timeout_as_any);

This always raise a CORBA exception, and then we are not able to set a communication timeout. In our application it would be useful to detect network connectivety problems and this freature would be useful.

Thanks for your help.

 

For ControlTaskPoxy.cpp :
 
<font size="2">

<font><font color="#0000ff" size="2"><font color="#0000ff" size="2">bool<font><font><font size="2"> ControlTaskProxy::InitOrb(<font><font color="#0000ff" size="2"><font color="#0000ff" size="2">int<font><font><font size="2"> argc, <font><font color="#0000ff" size="2"><font color="#0000ff" size="2">char<font><font><font size="2">* argv[] ) {

<font><font color="#0000ff" size="2"><font color="#0000ff" size="2">if<font><font><font size="2"> ( orb.in() )

<font><font color="#0000ff" size="2"><font color="#0000ff" size="2">return<font><font><font size="2"> <font><font color="#0000ff" size="2"><font color="#0000ff" size="2">false<font><font><font size="2">;

<font><font color="#0000ff" size="2"><font color="#0000ff" size="2">try<font><font><font size="2"> {

<font><font color="#008000" size="2"><font color="#008000" size="2">// First initialize the ORB, that will remove some arguments...<font><font><font size="2">

orb =CORBA::ORB_init (argc,

<font><font color="#0000ff" size="2"><font color="#0000ff" size="2">const_cast<font><font><font size="2"><<font><font color="#0000ff" size="2"><font color="#0000ff" size="2">char<font><font><font size="2">**>(argv),

<font><font color="#a31515" size="2"><font color="#a31515" size="2">""<font><font><font size="2"> <font><font color="#008000" size="2"><font color="#008000" size="2">/* the ORB name, it can be anything! */<font><font><font size="2">);

<font><font color="#008000" size="2"><font color="#008000" size="2">// Set the timeout value as a TimeBase::TimeT (100 nanosecond units)<font><font><font size="2">

<font><font color="#008000" size="2"><font color="#008000" size="2">// and insert it into a CORBA::Any.<font><font><font size="2">

TimeBase::TimeT relative_rt_timeout = 1.0 * 1.0e7;

CORBA::Any relative_rt_timeout_as_any;

relative_rt_timeout_as_any <<= relative_rt_timeout;

<font><font color="#008000" size="2"><font color="#008000" size="2">// Create the policy and put it in a policy list.<font><font><font size="2">

CORBA::PolicyList policies;

policies.length(1);

policies[0] =

orb->create_policy (Messaging::RELATIVE_RT_TIMEOUT_POLICY_TYPE,

relative_rt_timeout_as_any);

<font><font color="#008000" size="2"><font color="#008000" size="2">// Apply the policy at the ORB level using the ORBPolicyManager.<font><font><font size="2">

CORBA::Object_var obj = orb->resolve_initial_references (

<font><font color="#a31515" size="2"><font color="#a31515" size="2">"ORBPolicyManager"<font><font><font size="2">);

CORBA::PolicyManager_var policy_manager = CORBA::PolicyManager::_narrow (obj.in());

policy_manager->set_policy_overrides (policies, CORBA::SET_OVERRIDE);

<font><font color="#008000" size="2"><font color="#008000" size="2">// Also activate the POA Manager, since we may get call-backs !<font><font><font size="2">

CORBA::Object_var poa_object =

orb->resolve_initial_references (

<font><font color="#a31515" size="2"><font color="#a31515" size="2">"RootPOA"<font><font><font size="2">);

rootPOA =

PortableServer::POA::_narrow (poa_object.in ());

PortableServer::POAManager_var poa_manager =

rootPOA->the_POAManager ();

poa_manager->activate ();

<font><font color="#0000ff" size="2"><font color="#0000ff" size="2">return<font><font><font size="2"> <font><font color="#0000ff" size="2"><font color="#0000ff" size="2">true<font><font><font size="2">;

}

<font><font color="#0000ff" size="2"><font color="#0000ff" size="2">catch<font><font><font size="2"> (CORBA::Exception &e) {

log(Error) <<

<font><font color="#a31515" size="2"><font color="#a31515" size="2">"Orb Init : CORBA exception raised!"<font><font><font size="2"> << Logger::nl;

Logger::log() << e._info().c_str() << endlog();

}

<font><font color="#0000ff" size="2"><font color="#0000ff" size="2">return<font><font><font size="2"> <font><font color="#0000ff" size="2"><font color="#0000ff" size="2">false<font><font><font size="2">;

}

<font> 

On Thu, Feb 5, 2009 at 3:11 PM, Simon Pelletier-Thibault <span dir="ltr"><simon [dot] pelletiert [..] ...><span> wrote:

<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">

I would like to know how we can reconnect the ctaskbrowser to a component after the component had crashed.  Is their a command to reconnect the componnent and can we program something to reconnect In C++?


I am currently trying my own patch on ControlTaskProxy.cpp and ControlTaskServer.cpp using http://www.theaceorb.com/faq/index.html#097. Since you are using catch(...) in corba communication I think we would have the same behaviour for case 1,2,3.



thanks.



On Wed, Feb 4, 2009 at 7:51 PM, Philippe Hamelin <span dir="ltr"><philippe [dot] hamelin [..] ...><span> wrote:

<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0pt 0pt 0pt 0.8ex; BORDER-LEFT: rgb(204,204,204) 1px solid">2009/2/4 Peter Soetens <peter [dot] soetens [..] ...>:

> On Wednesday 04 February 2009 21:12:45 Philippe Hamelin wrote:
>> Hello,
>> I would like to know if Orocos can manage the lost of network
>> connection for components connected through Corba? I have tried the

>> following two setups :
>>
>> 1. "Hello" component, on computer #1, connected through CORBA to
>> "World" component, on computer #2. The "World" component is configured

>> as the server with the cdeployer. The "ctaskbrowser", on computer #1,
>> is connected to the "Hello" component.
>>
>> 2. "World" component, on computer #2 and is configured as a server via

>> the cdeployer. The "ctaskbrowser", on computer #1, is connected to the
>> "Hello" component.
>>
>> Hence, the main difference between the two setups, is that in setup #1

>> the ctaskbrowser is connected to the remote component via another
>> component. To validate the behaviour of the system, I have unconnected
>> the network cable of computer #2. Then, I typed "ls" in the

>> ctaskbrowser and nothing appends. After 1 minute, I have reconnected
>> the network cable of computer #2, and the "ls" executed correctly.
>>
>> What's the predicted behaviour of components to a connection lost? Is

>> it normal that the taskbrowser just "wait" if the connection is lost
>> to a remote peer? The idea behind this test is that we want to handle
>> the lost of connection with remote peer to ensure the reliability of

>> the entire system.
>
> There are three major cases which you want to protect against:
>
> 1. remote program 'crash'
> 2. remote program 'quits'
> 3. network connectivety problems but both programs keep running.

>
> You only tested case 3. here: CORBA is TCP/IP based, which means lost packets
> are compensated for until the connection comes up again. Both programs had
> their sockets still alive, so both waited until the connection restored. I'm

> assuming pulling the cable does not bring the 'eth' interface down.

Concerning this case, I found that it's possible to manage
request/reply timeout with TAO using the CORBA::TIMEOUT exception :


http://www.cs.wustl.edu/~schmidt/PDF/C++-report-col19.pdf
http://www.theaceorb.com/faq/index.html#097


It seems only a matter of overriding the messaging policies in the
orb. As stated here :

http://objectmix.com/object/195435-corba-timeout.html


the default timeout is "infinite" and this is why we don't catch a
CORBA::TIMEOUT exception. Since I'm not familiar with Corba and RTT
internals, do you know if this is something that could be added to RTT

?

I think that we could trap all the 3 cases in the same way. However,
at this moment I don't really know what should be the clean and good
way to handle the timeouts (maybe via the TaskContext::ready() as you

suggested?).


>
> Case 1 and 2 are different from 3:
> It means the connection is still alive, but the sockets are closed. I'm not a
> TCP/IP specialist, but this is detected at the other side and that socket then

> closes as well, leading to a CORBA exception in our case. When we receive a
> CORBA exception in the RTT, we catch it and clean up the connections. No
> exception is thrown to the user (this might be a bad idea). This means ports

> are disconnected and proxy objects become 'dead'.
>
> There might be a difference between case 1 and 2, since in case 1, a CORBA call
> could be in progress (which alegedly caused the crash) and might deliver a

> different exception than case 2. But we don't distinguish and clean up anyway.
>
> You can use 'ready()' on the RTT primitives to check for validity. I'm sure it
> works for ports, and peers (TaskContext::ready()), I'm unsure about

> properties, commands and methods... but they should be able in principle (a
> unit test should conclude this).
>
> The main flaw is that once a peer crashes/quits, we don't try to reconnect.

> You'd need to ask the DeploymentComponent to unload the proxy and re-create a
> new one in a supervision-like task. It's not very elegant and you'd need to
> verify that all links to the old proxy are removed (all except ports, which

> are disconnected automatically.) and re-connected to the new proxy. You could
> only do so by calling stop(), configure(), start() on all the old proxy's
> peers.
>

Also, we could add an "auto-reconnect" setting to the proxy that may

be available in the deployer xml. Assuming that we could trap all
corba exceptions, there should be an elegant way to add this feature
without being annoying for other users.


> A peer going lost is like a service you need for proper operation going down
> permanently. Maybe exceptions should be thrown anyway for this very drastic
> event. On the other hand, a 'broken wire' has severe disadvantages as well.

> Your method call just blocks until the connection is restored and you have no
> way to interrupt or detect that (from your current thread).
>
> While I'm writing lengthy mails, I might add that another robot communication

> framework ( http://developer.berlios.de/projects/rack/ ) solved this by
> disconnecting the 'worker' thread from the 'communications' thread, such that

> the worker was never stalled by communication problems. For data flow, this is
> certainly more robust. For the other communication primitives ('service or
> client/server' oriented) you'd probably like to specify a timeout instead. We

> got neither.
>
> Your comments/flames (*why didn't you tell me _before_ !*) are welcome :-)
>
> Peter
> --
> Peter Soetens -- FMTC -- <http://www.fmtc.be>

> --
> Orocos-Dev mailing list
> Orocos-Dev [..] ...
> http://lists.mech.kuleuven.be/mailman/listinfo/orocos-dev

>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
>

<blockquote>



<blockquote>


CORBA connection lost

Hi Simon,

On Thursday 05 February 2009 21:32:24 Simon Pelletier-Thibault wrote:
> I tried to apply http://www.theaceorb.com/faq/index.html#097 , Does TAO
> support request/reply
> timeouts?<http://www.theaceorb.com/faq/index.html#097>, in
> ControlTaskProxy.cpp,ControlTaskServer.cpp.
>
> I got a problem at the line :
>
> policies[0] =
>
> orb->create_policy (Messaging::RELATIVE_RT_TIMEOUT_POLICY_TYPE,
>
> relative_rt_timeout_as_any);
>
> This always raise a CORBA exception, and then we are not able to set a
> communication timeout. In our application it would be useful to detect
> network connectivety problems and this freature would be useful.
>
> Thanks for your help.

Possible reasons for receiving CORBA exceptions is that the POA or ORB is
wrongly configured. Since you're using the orb wide configuration, I'm out of
options... you copy-pasted the example code. It 'should' work.

What exception do you get ?

Peter

CORBA connection lost

<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta name="ProgId" content="Word.Document"><meta name="Generator" content="Microsoft Word 11"><meta name="Originator" content="Microsoft Word 11"><link rel="File-List" href="file:///C:%5CDOCUME%7E1%5CPELLET%7E1%5CLOCALS%7E1%5CTemp%5Cmsohtml1%5C01%5Cclip_filelist.xml"><style>
<!--
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{mso-style-parent:"";
margin:0cm;
margin-bottom:.0001pt;
text-align:justify;
mso-pagination:widow-orphan;
font-size:12.0pt;
font-family:"Times New Roman";
mso-fareast-font-family:"Times New Roman";
mso-fareast-language:EN-US;}
p.MsoFooter, li.MsoFooter, div.MsoFooter
{margin:0cm;
margin-bottom:.0001pt;
text-align:justify;
mso-pagination:widow-orphan;
tab-stops:center 216.0pt right 432.0pt;
font-size:12.0pt;
font-family:"Times New Roman";
mso-fareast-font-family:"Times New Roman";
mso-fareast-language:EN-US;}
@page Section1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;
mso-header-margin:35.45pt;
mso-footer-margin:35.45pt;
mso-paper-source:0;}
div.Section1
{page:Section1;}
-->
<style>

Hi Peter,

<span style="" lang="EN-CA">

I did it. It was a problem with the include files and some TAO libraries were
missing.

I would like to give you the patch and I was waiting for Philippe to show me how
to make a patch. (Wednesday)<span>

<span style="" lang="EN-CA">


I could send you the 3 files that were modified. Are you interested?<span>




On Mon, Feb 9, 2009 at 4:31 PM, Peter Soetens <span dir="ltr"><peter [dot] soetens [..] ...><span> wrote:
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi Simon,



On Thursday 05 February 2009 21:32:24 Simon Pelletier-Thibault wrote:

> I tried to apply http://www.theaceorb.com/faq/index.html#097 , Does TAO

> support request/reply

> timeouts?<http://www.theaceorb.com/faq/index.html#097>, in

> ControlTaskProxy.cpp,ControlTaskServer.cpp.

>

> I got a problem at  the line :

>

> policies[0] =

>

> orb->create_policy (Messaging::RELATIVE_RT_TIMEOUT_POLICY_TYPE,

>

> relative_rt_timeout_as_any);

>

> This always raise a CORBA exception, and then we are not able to set a

> communication timeout. In our application it would be useful to detect

> network connectivety problems and this freature would be useful.

>

> Thanks for your help.


Possible reasons for receiving CORBA exceptions is that the POA or ORB is

wrongly configured. Since you're using the orb wide configuration, I'm out of

options... you copy-pasted the example code. It 'should' work.



What exception do you get ?



Peter

--

Peter Soetens -- FMTC -- <http://www.fmtc.be>

<blockquote>


CORBA connection lost

On Tuesday 10 February 2009 14:46:18 Simon Pelletier-Thibault wrote:
> Hi Peter,
>
>
> I did it. It was a problem with the include files and some TAO libraries
> were missing.
> I would like to give you the patch and I was waiting for Philippe to show
> me how to make a patch. (Wednesday)

I'd rather have them as real patches, because Omniorb has been merged on
trunk, your files do no longer resemble the files on the trunk RTT.

You can do so by unpacking the original and comparing with your modified
version (Assuming you use 1.6.1, src/ contains the modified orocos-rtt tree )

cd src
mkdir orocos-rtt-1.6.1.orig
tar --strip 1 -C orocos-rtt-1.6.1.orig -xzf orocos-rtt-1.6.1-src.tar.gz
diff -Naur orocos-rtt-1.6.1.orig/src orocos-rtt-1.6.1/src > corba-
connection.patch

Or something close to that. Keep the .orig directory for creating future
patches.

Peter

CORBA connection lost


Here your patch from the 1.6.0 version


On Tue, Feb 10, 2009 at 10:55 AM, Peter Soetens <span dir="ltr"><peter [dot] soetens [..] ...><span> wrote:

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
On Tuesday 10 February 2009 14:46:18 Simon Pelletier-Thibault wrote:

> Hi Peter,

>

>

> I did it. It was a problem with the include files and some TAO libraries

> were missing.

> I would like to give you the patch and I was waiting for Philippe to show

> me how to make a patch. (Wednesday)


I'd rather have them as real patches, because Omniorb has been merged on

trunk, your files do no longer resemble the files on the trunk RTT.



You can do so by unpacking the original and comparing with your modified

version (Assuming you use 1.6.1, src/ contains the modified orocos-rtt tree )



 cd src

 mkdir orocos-rtt-1.6.1.orig

 tar --strip 1 -C orocos-rtt-1.6.1.orig -xzf orocos-rtt-1.6.1-src.tar.gz

 diff -Naur orocos-rtt-1.6.1.orig/src orocos-rtt-1.6.1/src > corba-

connection.patch



Or something close to that. Keep the .orig directory for creating future

patches.



Peter



--

Peter Soetens -- FMTC -- <http://www.fmtc.be>

<blockquote>


CORBA connection lost

It is possible to add this patch to bugzilla and apply it on trunk?

Thank you!

Philippe

2009/2/11 Simon Pelletier-Thibault <simon [dot] pelletiert [..] ...>:
>
> Here your patch from the 1.6.0 version
>
> On Tue, Feb 10, 2009 at 10:55 AM, Peter Soetens <peter [dot] soetens [..] ...>
> wrote:
>>
>> On Tuesday 10 February 2009 14:46:18 Simon Pelletier-Thibault wrote:
>> > Hi Peter,
>> >
>> >
>> > I did it. It was a problem with the include files and some TAO libraries
>> > were missing.
>> > I would like to give you the patch and I was waiting for Philippe to
>> > show
>> > me how to make a patch. (Wednesday)
>>
>> I'd rather have them as real patches, because Omniorb has been merged on
>> trunk, your files do no longer resemble the files on the trunk RTT.
>>
>> You can do so by unpacking the original and comparing with your modified
>> version (Assuming you use 1.6.1, src/ contains the modified orocos-rtt
>> tree )
>>
>> cd src
>> mkdir orocos-rtt-1.6.1.orig
>> tar --strip 1 -C orocos-rtt-1.6.1.orig -xzf orocos-rtt-1.6.1-src.tar.gz
>> diff -Naur orocos-rtt-1.6.1.orig/src orocos-rtt-1.6.1/src > corba-
>> connection.patch
>>
>> Or something close to that. Keep the .orig directory for creating future
>> patches.
>>
>> Peter
>>
>> --
>> Peter Soetens -- FMTC -- <http://www.fmtc.be>
>> --
>> Orocos-Dev mailing list
>> Orocos-Dev [..] ...
>> http://lists.mech.kuleuven.be/mailman/listinfo/orocos-dev
>>
>> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>>
>
>
> --
> Orocos-Dev mailing list
> Orocos-Dev [..] ...
> http://lists.mech.kuleuven.be/mailman/listinfo/orocos-dev
>
>

CORBA connection lost

Take these files and replace the corresponding to try the patch.

It is now possible to set a TimeOut at the Orb level in InitOrb. If 0 or not set it keeps the infinite TimeOut.



On Tue, Feb 10, 2009 at 8:46 AM, Simon Pelletier-Thibault <span dir="ltr"><simon [dot] pelletiert [..] ...><span> wrote:

<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Hi Peter,

<span lang="EN-CA">

I did it. It was a problem with the include files and some TAO libraries were
missing.

I would like to give you the patch and I was waiting for Philippe to show me how
to make a patch. (Wednesday)<span>

<span lang="EN-CA">


I could send you the 3 files that were modified. Are you interested?<span>




On Mon, Feb 9, 2009 at 4:31 PM, Peter Soetens <span dir="ltr"><peter [dot] soetens [..] ...><span> wrote:
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Hi Simon,



On Thursday 05 February 2009 21:32:24 Simon Pelletier-Thibault wrote:

> I tried to apply http://www.theaceorb.com/faq/index.html#097 , Does TAO

> support request/reply

> timeouts?<http://www.theaceorb.com/faq/index.html#097>, in

> ControlTaskProxy.cpp,ControlTaskServer.cpp.

>

> I got a problem at  the line :

>

> policies[0] =

>

> orb->create_policy (Messaging::RELATIVE_RT_TIMEOUT_POLICY_TYPE,

>

> relative_rt_timeout_as_any);

>

> This always raise a CORBA exception, and then we are not able to set a

> communication timeout. In our application it would be useful to detect

> network connectivety problems and this freature would be useful.

>

> Thanks for your help.


Possible reasons for receiving CORBA exceptions is that the POA or ORB is

wrongly configured. Since you're using the orb wide configuration, I'm out of

options... you copy-pasted the example code. It 'should' work.



What exception do you get ?



Peter

--

Peter Soetens -- FMTC -- <http://www.fmtc.be>

<blockquote>


<blockquote>


CORBA connection lost

On Tuesday 10 February 2009 15:03:50 Simon Pelletier-Thibault wrote:
> Take these files and replace the corresponding to try the patch.
>
> It is now possible to set a TimeOut at the Orb level in InitOrb. If 0 or
> not set it keeps the infinite TimeOut.

The hardest part will now be to do something useful when the exception occurs.
Keep us posted :-)

Peter