Seg fault in user-created Corba toolkit

Looking for suggestions on debugging user-created Corba toolkits, based on the attached seg-fault and GDB backtrace. I've commented out all code that actually does anything within the AnyConversion< IplImageWrapper > class, but are still getting this seg fault. The image type appears to be registered correctly in both RTT and Corba.

I start a cdeployer with ImageCapture and HMI components, and then start a deployer-corba with a proxy ImageCapture component and a display component (which connects to the capture proxy). These are in two shells on the same machine. Starting the deployer-corba causes the seg-fault in the cdeployer when it tries to marshal the Image type to Corba::Any (I think) ...

Any directions to try ...? Any suggestions ...?

TIA
S

AttachmentSize
trace-corba-opencv.txt8.52 KB

Seg fault in user-created Corba toolkit

On Wednesday 10 December 2008 22:14:24 kiwi [dot] net [..] ... wrote:
> Looking for suggestions on debugging user-created Corba toolkits, based on
> the attached seg-fault and GDB backtrace. I've commented out all code that
> actually does anything within the AnyConversion< IplImageWrapper > class,
> but are still getting this seg fault. The image type appears to be
> registered correctly in both RTT and Corba.

The first thing I do to test new CORBA features is to run everything in the
same process. So you load/create a normal component, then a server for that
component, then a proxy for that server and finally let the client component
connect to that proxy. All in the same process. You can do this using a
deployer or statically in a main.cpp (easier). gdb or valgrind makes much more
sense in such a setting.

>
> I start a cdeployer with ImageCapture and HMI components, and then start a
> deployer-corba with a proxy ImageCapture component and a display component
> (which connects to the capture proxy). These are in two shells on the same
> machine. Starting the deployer-corba causes the seg-fault in the cdeployer
> when it tries to marshal the Image type to Corba::Any (I think) ...
>
> Any directions to try ...? Any suggestions ...?

Most caveats are listed in the source code in the interface/template classes.
If you want a review, you can post your openCV toolkit code...

Peter

Seg fault in user-created Corba toolkit

On Dec 11, 2008, at 03:57 , Peter Soetens wrote:

> On Wednesday 10 December 2008 22:14:24 kiwi [dot] net [..] ... wrote:
>> Looking for suggestions on debugging user-created Corba toolkits,
>> based on
>> the attached seg-fault and GDB backtrace. I've commented out all
>> code that
>> actually does anything within the AnyConversion< IplImageWrapper >
>> class,
>> but are still getting this seg fault. The image type appears to be
>> registered correctly in both RTT and Corba.
>
> The first thing I do to test new CORBA features is to run everything
> in the
> same process. So you load/create a normal component, then a server
> for that
> component, then a proxy for that server and finally let the client
> component
> connect to that proxy. All in the same process. You can do this
> using a
> deployer or statically in a main.cpp (easier). gdb or valgrind makes
> much more
> sense in such a setting.

BTW, when I do this in the same process, it completely bypasses Corba
and appears to go straight through RTT, even though I'm using
CorbaTask servers and proxy's. Any Corba-related problems do not
manifest themselves in this situation, but they do pop up at soon as I
use two processes.

>> I start a cdeployer with ImageCapture and HMI components, and then
>> start a
>> deployer-corba with a proxy ImageCapture component and a display
>> component
>> (which connects to the capture proxy). These are in two shells on
>> the same
>> machine. Starting the deployer-corba causes the seg-fault in the
>> cdeployer
>> when it tries to marshal the Image type to Corba::Any (I think) ...
>>
>> Any directions to try ...? Any suggestions ...?
>
> Most caveats are listed in the source code in the interface/template
> classes.
> If you want a review, you can post your openCV toolkit code...

Do the types being sent through a CORBA toolkit *have* to be in an RTT
namespace? I'm trying to send some standard OpenCV types through
Orocos/Corba. They work fine in Orocos without using Corba, but as
soon as I try to send them through Corba it throws an assert(). If I
change the type used to an identical one but declared inside an RTT
namespace (and don't change any of my code besides renaming the type),
it works fine.

If this is the situation, then it's not very friendly to being able to
integrate other packages and their types into Orocos. What are our
options to eliminate this requirement?

Cheers
S

Seg fault in user-created Corba toolkit

On Monday 22 December 2008 21:18:13 S Roderick wrote:
> On Dec 11, 2008, at 03:57 , Peter Soetens wrote:
> > On Wednesday 10 December 2008 22:14:24 kiwi [dot] net [..] ... wrote:
> >> Looking for suggestions on debugging user-created Corba toolkits,
> >> based on
> >> the attached seg-fault and GDB backtrace. I've commented out all
> >> code that
> >> actually does anything within the AnyConversion< IplImageWrapper >
> >> class,
> >> but are still getting this seg fault. The image type appears to be
> >> registered correctly in both RTT and Corba.
> >
> > The first thing I do to test new CORBA features is to run everything
> > in the
> > same process. So you load/create a normal component, then a server
> > for that
> > component, then a proxy for that server and finally let the client
> > component
> > connect to that proxy. All in the same process. You can do this
> > using a
> > deployer or statically in a main.cpp (easier). gdb or valgrind makes
> > much more
> > sense in such a setting.
>
> BTW, when I do this in the same process, it completely bypasses Corba
> and appears to go straight through RTT, even though I'm using
> CorbaTask servers and proxy's. Any Corba-related problems do not
> manifest themselves in this situation, but they do pop up at soon as I
> use two processes.

You're right. Sylvain pointed this out once too. Orocos/CORBA has a mechanism
to look up components to be sure they are not going through a proxy if not
necessary. So this means only manually setting up using the main(), not using
the deployer. We could as well disable in some way this caching in order to
allow easier debugging. Debugging multi-process CORBA problems is almost
impossible.

> > If you want a review, you can post your openCV toolkit code...
>
> Do the types being sent through a CORBA toolkit *have* to be in an RTT
> namespace?

Not at all. Look at KDL where Frame etc are not in the RTT namespace.

> I'm trying to send some standard OpenCV types through
> Orocos/Corba. They work fine in Orocos without using Corba, but as
> soon as I try to send them through Corba it throws an assert(). If I
> change the type used to an identical one but declared inside an RTT
> namespace (and don't change any of my code besides renaming the type),
> it works fine.

Then we need to figure out what went wrong and how to fix it (submit a bug
report).

>
> If this is the situation, then it's not very friendly to being able to
> integrate other packages and their types into Orocos. What are our
> options to eliminate this requirement?

Hunt 'n kill.

Peter

Seg fault in user-created Corba toolkit

>>> If you want a review, you can post your openCV toolkit code...
>>
>> Do the types being sent through a CORBA toolkit *have* to be in an
>> RTT
>> namespace?
>
> Not at all. Look at KDL where Frame etc are not in the RTT namespace.
>
>> I'm trying to send some standard OpenCV types through
>> Orocos/Corba. They work fine in Orocos without using Corba, but as
>> soon as I try to send them through Corba it throws an assert(). If I
>> change the type used to an identical one but declared inside an RTT
>> namespace (and don't change any of my code besides renaming the
>> type),
>> it works fine.
>
> Then we need to figure out what went wrong and how to fix it (submit
> a bug
> report).

I can verify that moving one of my types from inside an RTT namespace
to outside the namespace causes a seg-fault. I've hit this seg-fault
quite a bit while debugging my recent work with CORBA toolkits, and
got nowhere with it. It is buried deep down in TAO. It sucks down there.

Here's what I did: I had working opencv RTT and RTT/Corba toolkits
that transmit images, 2D points, and vectors of 2D points between
processes. I was using a 2D point structure that I had created, after
I hit this same issue while trying to use OpenCV's native C version of
this structure. I literally moved the namespace closing brace from
after my 2D point and vector<2DPoint> classes, to before the classes.
I then went through and fixed all associated namespace resolutions
throughout the code. Did a clean build. Started the deployer and then
when I start the GUI process ... Voila! ... seg-fault in deployer-
corba-gnulinux! :-( I went back and reran the deployer and got it to
dump the type repository for me.

I'm attaching the gdb backtrace from the deployer and the type
repository dump, as well as the entire orocos.log file from the gdb'd
deployer run. Hopefully the ML won't cut them off.

I'll look into this but I think I'm going to need some help. Like I
said above, I bypassed this recently as trying to debug down into TAO
is a right nightmare. :-(( Any suggestions definitely appreciated.
S

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Seg fault in user-created Corba toolkit

On Tuesday 23 December 2008 22:02:31 S Roderick wrote:
> >>> If you want a review, you can post your openCV toolkit code...
> >>
> >> Do the types being sent through a CORBA toolkit *have* to be in an
> >> RTT
> >> namespace?
> >
> > Not at all. Look at KDL where Frame etc are not in the RTT namespace.
> >
> >> I'm trying to send some standard OpenCV types through
> >> Orocos/Corba. They work fine in Orocos without using Corba, but as
> >> soon as I try to send them through Corba it throws an assert(). If I
> >> change the type used to an identical one but declared inside an RTT
> >> namespace (and don't change any of my code besides renaming the
> >> type),
> >> it works fine.
> >
> > Then we need to figure out what went wrong and how to fix it (submit
> > a bug
> > report).
>
> I can verify that moving one of my types from inside an RTT namespace
> to outside the namespace causes a seg-fault. I've hit this seg-fault
> quite a bit while debugging my recent work with CORBA toolkits, and
> got nowhere with it. It is buried deep down in TAO. It sucks down there.

You'll have to compile the RTT with the -g option for getting more meaningful
backtraces (change the CMAKE_BUILD_TYPE to Debug or add -g -O0 to
CMAKE_CXX_FLAGS_RTT)

>
> Here's what I did: I had working opencv RTT and RTT/Corba toolkits
> that transmit images, 2D points, and vectors of 2D points between
> processes. I was using a 2D point structure that I had created, after
> I hit this same issue while trying to use OpenCV's native C version of
> this structure. I literally moved the namespace closing brace from
> after my 2D point and vector<2DPoint> classes, to before the classes.
> I then went through and fixed all associated namespace resolutions
> throughout the code. Did a clean build. Started the deployer and then
> when I start the GUI process ... Voila! ... seg-fault in deployer-
> corba-gnulinux! :-( I went back and reran the deployer and got it to
> dump the type repository for me.

The only relevant thing I know is that partial template specialisations do
need to happen in the same namespace and the only bug I could think of is that
a wrong specialisation is taken in the RTT code (when the type/data conversion
functions are chosen by the compiler). But since you're talking about a data-
type namespace, even this seems not to apply.

As you put it, it should be dead-easy to reproduce. Could you send over the
file with the crashing type classes ?

Peter

Seg fault in user-created Corba toolkit


Here's what I did: I had working opencv RTT and RTT/Corba toolkits

that transmit images, 2D points, and vectors of 2D points between

processes. I was using a 2D point structure that I had created, after

I hit this same issue while trying to use OpenCV's native C version of

this structure. I literally moved the namespace closing brace from

after my 2D point and vector<2DPoint> classes, to before the classes.

I then went through and fixed all associated namespace resolutions

throughout the code. Did a clean build. Started the deployer and then

when I start the GUI process ... Voila! ... seg-fault in deployer-

corba-gnulinux! :-( I went back and reran the deployer and got it to

dump the type repository for me.


The only relevant thing I know is that partial template specialisations do
need to happen in the same namespace and the only bug I could think of is that
a wrong specialisation is taken in the RTT code (when the type/data conversion
functions are chosen by the compiler). But since you're talking about a data-
type namespace, even this seems not to apply.


I've been working on a second Corba toolkit, a simpler set of data structures than the first one with these problems. The second one also had the same problems, but I was able to analyze it a bit more. It is not a namespace issue, nor a problem with C "typedef struct" data structures. However, if I copied their "typedef struct" into one of my header files things worked. As soon as I used their header file instead, things fail. I also tried copying the contents of their header file (includes, other struct's) prior to the typedef struct that I needed, into my header file, it still worked. Something very bizarre here ... I will keep at it and let you know ...

Cheers
S

Seg fault in user-created Corba toolkit

Need some more help with this one ... :-(

Throws exception when tries to narrow() as part of ControlTaskProxy (line 130, rtt/src/corba/ControlTaskProxy.cpp). I can't figure out why. The deployer knows that the type of Image is "RTT::IplImageWrapper", and this type is registered with the RTT type system and has a registered Corba transport.

Had to add .txt extensions to .xml files to make forum happy ...

Any help appreciated.
S

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm

Seg fault in user-created Corba toolkit

Need some more help with this one ... :-(

Throws exception when tries to narrow() as part of ControlTaskProxy (line 130, rtt/src/corba/ControlTaskProxy.cpp). I can't figure out why. The deployer knows that the type of Image is "RTT::IplImageWrapper", and this type is registered with the RTT type system and has a registered Corba transport.

Had to add .txt extensions to .xml files to make forum happy ...

Any help appreciated.
S

Seg fault in user-created Corba toolkit

Ok, 30 min's of playing Halo cleared my brain. The above is a transient startup race condition ... solved. On to next issue ...
S

Seg fault in user-created Corba toolkit

On Monday 12 January 2009 14:18:21 S Roderick wrote:
>
> I've been working on a second Corba toolkit, a simpler set of data
> structures than the first one with these problems. The second one also
> had the same problems, but I was able to analyze it a bit more. It is
> not a namespace issue, nor a problem with C "typedef struct" data
> structures. However, if I copied their "typedef struct" into one of my
> header files things worked. As soon as I used their header file
> instead, things fail. I also tried copying the contents of their
> header file (includes, other struct's) prior to the typedef struct
> that I needed, into my header file, it still worked. Something very
> bizarre here ... I will keep at it and let you know ...

Would you dare testing Sylvain's Omniorb port and see if that works instead of
TAO ? It *should* be a mere recompilation of the RTT.

Peter

Seg fault in user-created Corba toolkit

Ok, 30 min's of playing Halo cleared my brain. The above is a transient startup race condition ... solved. On to next issue ...
S

Seg fault in user-created Corba toolkit

On Friday 12 December 2008 20:08:57 kiwi [dot] net [..] ... wrote:
> Ok, 30 min's of playing Halo cleared my brain. The above is a transient
> startup race condition ... solved. On to next issue ... S

What should we change in the RTT/deployer to avoid/detect such races ?

Peter