corba, non blocking call and pass by reference

Hello,

I have an issue while calling a none blocking method which take argument
by reference. I found where the issue lies, but I don't know how to
solve it.

When calling a non blocking remote method through corba (see
RTT_corba_COperationInterface_i::sendOperation in
OperationInterfaceI.cpp, line 343), the method arguments are
deserialized in AssignableDataSource objects. Then a
LocalOperationCallImpl object is created and the arguments are stored in
AStore objects owned by BindStorage. The LocalOperation is then stored
by the ExecutionEngine and evaluated later.

When passing by value, the object is copied from the
AssignableDataSource to the AStore objects, but when passing by
reference, the AStore objects point to the DataSource objects which are
deleted at the end of the sendOperation call. When the LocalOperation is
eventually executed, the arguments are invalid.

This issue does not arise when the task which provide the method is not
blocked, since the call can be executed before the DataSource is deleted.

In my case I have a task A which makes a blocking call to a task B which
makes a non blocking call to the task A. The issue arise in the second
call. I can provide a small test case if needed.

A solution would be to store the DataSource, maybe through a shared_ptr
in the LocalOperation and to delete it after the evaluation of the
operation. An other solution would be to change the storage by reference
for a storage by copy, but it will generate unwanted copies.

Regards,

corba, non blocking call and pass by reference

Hi Mathieu,

On Tuesday 07 September 2010 14:27:13 Mathieu Gautier wrote:
> Hello,
>
> I have an issue while calling a none blocking method which take argument
> by reference. I found where the issue lies, but I don't know how to
> solve it.
>
> When calling a non blocking remote method through corba (see
> RTT_corba_COperationInterface_i::sendOperation in
> OperationInterfaceI.cpp, line 343), the method arguments are
> deserialized in AssignableDataSource objects. Then a
> LocalOperationCallImpl object is created and the arguments are stored in
> AStore objects owned by BindStorage. The LocalOperation is then stored
> by the ExecutionEngine and evaluated later.
>
> When passing by value, the object is copied from the
> AssignableDataSource to the AStore objects, but when passing by
> reference, the AStore objects point to the DataSource objects which are
> deleted at the end of the sendOperation call. When the LocalOperation is
> eventually executed, the arguments are invalid.
>
> This issue does not arise when the task which provide the method is not
> blocked, since the call can be executed before the DataSource is deleted.
>
> In my case I have a task A which makes a blocking call to a task B which
> makes a non blocking call to the task A. The issue arise in the second
> call. I can provide a small test case if needed.
>
> A solution would be to store the DataSource, maybe through a shared_ptr
> in the LocalOperation and to delete it after the evaluation of the
> operation. An other solution would be to change the storage by reference
> for a storage by copy, but it will generate unwanted copies.

Thanks for this analysis. I believe it is correct.

One solution is to add a single shared_ptr to the LocalOperationCallerImpl
which keeps a reference to the FusedFunctorDataSource that holds the shared
pointers to the data source arguments. The problem here is that when the
LocalOperationCallerImpl is destroyed by the executing component, it may
cleanup the FusedFunctorDataSource too, which is not a real-time operation.

The alternative is to make copies of the arguments during the send(), which
might be acceptable in a Corba execution context (the data was earlier copied
from an any anyway).
In that case, the AStore keeps the copies and passes them by reference to
the operation. A similar problem may occur though if the AStore keeps storage
to a dynamically sized object. Also in that case, a cleanup of the AStore may
lead to not real-time usage of free().

I would first go for the latter solution, since it is the least obstrusive. I'd
need some more time to come up with a patch though.

Peter

corba, non blocking call and pass by reference

Hi,

> One solution is to add a single shared_ptr to the LocalOperationCallerImpl
> which keeps a reference to the FusedFunctorDataSource that holds the shared
> pointers to the data source arguments. The problem here is that when the
> LocalOperationCallerImpl is destroyed by the executing component, it may
> cleanup the FusedFunctorDataSource too, which is not a real-time operation.
>
> The alternative is to make copies of the arguments during the send(), which
> might be acceptable in a Corba execution context (the data was earlier copied
> from an any anyway).
> In that case, the AStore keeps the copies and passes them by reference to
> the operation. A similar problem may occur though if the AStore keeps storage
> to a dynamically sized object. Also in that case, a cleanup of the AStore may
> lead to not real-time usage of free().
>
> I would first go for the latter solution, since it is the least obstrusive. I'd
> need some more time to come up with a patch though.

Ok, I have no preferences for one or another solution. Since I'm
currently working under windows, my real-time requirements are not very
strong. I can use copy or avoid none blocking call while waiting for a
patch.

corba, non blocking call and pass by reference

On Wednesday 08 September 2010 10:38:59 Mathieu Gautier wrote:
> Hi,
>
> > One solution is to add a single shared_ptr to the
> > LocalOperationCallerImpl which keeps a reference to the
> > FusedFunctorDataSource that holds the shared pointers to the data source
> > arguments. The problem here is that when the LocalOperationCallerImpl is
> > destroyed by the executing component, it may cleanup the
> > FusedFunctorDataSource too, which is not a real-time operation.
> >
> > The alternative is to make copies of the arguments during the send(),
> > which might be acceptable in a Corba execution context (the data was
> > earlier copied from an any anyway).
> > In that case, the AStore keeps the copies and passes them by reference to
> > the operation. A similar problem may occur though if the AStore keeps
> > storage to a dynamically sized object. Also in that case, a cleanup of
> > the AStore may lead to not real-time usage of free().
> >
> > I would first go for the latter solution, since it is the least
> > obstrusive. I'd need some more time to come up with a patch though.
>
> Ok, I have no preferences for one or another solution. Since I'm
> currently working under windows, my real-time requirements are not very
> strong. I can use copy or avoid none blocking call while waiting for a
> patch.

There is actually a third option. Since we are using the SendHandleC and the
OperationCallerC, we can transfer the shared_ptr from the OperationCallerC to
the SendHandleC, such that all refcounts stay up. We can then synchronize the
destruction of the SendHandleC with the effect of the 'send', meaning, we block
in SendHandleC's destructor, until the operation has been processed, or
reports to have failed.

This would solve all related synchronisation issues, also in non-corba cases.

I have implemented this theory with the patch below, but it breaks API
(internal one though) so I can't include it in this form on the toolchain-2.0
branch, but if you could test it and report success/failure, that would be
great.

The only thing I'm afraid of is that the SendHandleC destructor dead-locks. We
need a unit test urgently to cover this case.

Another thing is that Corba's CSendHandle (+1 for hilarious naming) server has
no means to be cleaned up. I believe they are all leaked, which means that
repetitive calls to sendOperation() will cause leaks on the server side. We
can easily solve this by limiting the numbers of servers alive in one
interface (the API says that CSendHandle can be cleaned up at any time), and
providing a dispose() method such that clients can trigger cleanups too.

Still some work to do here :-)

Peter

corba, non blocking call and pass by reference

Hi,

> The only thing I'm afraid of is that the SendHandleC destructor dead-locks. We

that's the case, I've got a deadlock when SendHandleC is deleted in
OperationCallerC::send(). The call to evaluate on its DataSource never
returns.

corba, non blocking call and pass by reference

On Friday 10 September 2010 15:08:44 Mathieu Gautier wrote:
> Hi,
>
> > The only thing I'm afraid of is that the SendHandleC destructor
> > dead-locks. We
>
> that's the case, I've got a deadlock when SendHandleC is deleted in
> OperationCallerC::send(). The call to evaluate on its DataSource never
> returns.

Ok. Can you apply the patch in attachment in addition to the previous patch ?

It makes sure that only the last SendHandleC actually blocks on the operation.

Since we currently leak them in a CSendHandle servant, this should now work.
The next step is to keep a list of CSendHandle servants and allow client code
to free them or to free them in the server itself.

I hope this to be fixed in 2.1.

Peter

corba, non blocking call and pass by reference

> Ok. Can you apply the patch in attachment in addition to the previous patch ?

I am unable to test this, because the patch made my test crash with a
simple blocking call :

{
Msg message("something")
RTT::OperationCaller<void(const Msg&)> Send =
peer->getOperation("Send");
Send(message);
}

I have a corrupted heap when the OperationCaller is deleted. It's
actually happening afer all destructors are called, when the memory is
freed. I'm trying to locate where it went wrong but I'm still unable to
pin it down. The crash doesn't happen with a standalone test, ie. using
LocalOperationCaller.

corba, non blocking call and pass by reference

On Tuesday 14 September 2010 14:07:16 Mathieu Gautier wrote:
> > Ok. Can you apply the patch in attachment in addition to the previous
> > patch ?
>
> I am unable to test this, because the patch made my test crash with a
> simple blocking call :
>
> {
> Msg message("something")
> RTT::OperationCaller<void(const Msg&)> Send =
> peer->getOperation("Send");
> Send(message);
> }
>
> I have a corrupted heap when the OperationCaller is deleted. It's
> actually happening afer all destructors are called, when the memory is
> freed. I'm trying to locate where it went wrong but I'm still unable to
> pin it down. The crash doesn't happen with a standalone test, ie. using
> LocalOperationCaller.

I have pushed a series of patches to the toolchain-2.0-fix-corba-send branch on
my *github* account. The patches are based on the latest toolchain-2.0 status
(although the patches will never end up in that branch due to binary
compatibility issues).

There is also a new unit test in the corba-ipc-test example that does a series
of call/sends recursively. The CSendHandle object is still leaked though.

When I block on the sendhandle during destruction (see comments in latest
patch), the unit test still deadlocks, but I need to find out why exactly, I
think it's the unit test itself being broken in that respect. To be sorted out
further. You could test your code without this latest patch
(b375a229368d1521c45e) too.

Finally: be sure to specify the second argument to the OperationCaller when
your code might get callbacks. For example:

OperationCaller<void(TaskContext*, string const&)> op1 (peer-
>getOperation(opname), this->engine());

We should print a warning when the 'this->engine()' is omitted, since it will
almost certainly cause deadlocks in callback scenario's. The OperationCaller
*needs to know* which component is doing the call in order to wait in the
correct ExecutionEngine. If there is no callback, then it works either way.

Peter

corba, non blocking call and pass by reference

Hi,

> I have pushed a series of patches to the toolchain-2.0-fix-corba-send
branch on
> my *github* account. The patches are based on the latest
toolchain-2.0 status
> (although the patches will never end up in that branch due to binary
> compatibility issues).

how do you plan to integrate this change ? Will you use this branch as
the base for 2.1 version of the toolchain ? Is there any problems for
further merges if these changes are put in Jean's repository ?

> There is also a new unit test in the corba-ipc-test example that does
a series
> of call/sends recursively. The CSendHandle object is still leaked though.

> When I block on the sendhandle during destruction (see comments in latest
> patch), the unit test still deadlocks, but I need to find out why
exactly, I
> think it's the unit test itself being broken in that respect. To be
sorted out
> further. You could test your code without this latest patch
> (b375a229368d1521c45e) too.

It's working fine with the tip of your branch. I've only had to cherry
pick the last commit from the Jean's win32 repository and change some
symbols.

corba, non blocking call and pass by reference

On Thursday 16 September 2010 15:20:07 Mathieu Gautier wrote:
> Hi,
>
> > I have pushed a series of patches to the toolchain-2.0-fix-corba-send
>
> branch on
>
> > my *github* account. The patches are based on the latest
>
> toolchain-2.0 status
>
> > (although the patches will never end up in that branch due to binary
> > compatibility issues).
>
> how do you plan to integrate this change ? Will you use this branch as
> the base for 2.1 version of the toolchain ? Is there any problems for
> further merges if these changes are put in Jean's repository ?

I will base 2.1 on this branch. You may extend this branch as you see fit,
we'll have to cherry-pick to toolchain-2.0 if it contains important fixes.

>
> > There is also a new unit test in the corba-ipc-test example that does
>
> a series
>
> > of call/sends recursively. The CSendHandle object is still leaked
> > though.
> >
> > When I block on the sendhandle during destruction (see comments in
> > latest patch), the unit test still deadlocks, but I need to find out why
>
> exactly, I
>
> > think it's the unit test itself being broken in that respect. To be
>
> sorted out
>
> > further. You could test your code without this latest patch
> > (b375a229368d1521c45e) too.
>
> It's working fine with the tip of your branch. I've only had to cherry
> pick the last commit from the Jean's win32 repository and change some
> symbols.

I have pushed new updates to this branch on github. As good as all memleaks in
the corba transport have been fixed. I have also refined the solution to your
problem to make automatic collecting optional, which is necessary in case you
only want to send-and-forget over a remote connection. This solution should
work even better than what you have now. The unit tests seem to agree (both
tao/omniorb) for what it's worth. I hope we're very close now to the final
solution.

Peter