[Bug 765] New: Compilation time regression in 2.0-mainline

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=765

Summary: Compilation time regression in 2.0-mainline
Product: RTT
Version: 2.0-mainline
Platform: AMD 64bit
OS/Version: All
Status: NEW
Severity: minor
Priority: P3
Component: Real-Time Toolkit (RTT)
AssignedTo: orocos-dev [..] ...
ReportedBy: peter [..] ...
CC: orocos-dev [..] ...
Estimated Hours: 0.0

I've investigated the compilation time regression in 2.0. The reason is that
too much code is emitted by the compiler due to an explosion of
C++ templated classes. For example, when compiling the typekits, it takes the
assembler (!) more than 30s to convert the asm file to an .o file, on a very
fast machine.

There are multiple reasons why so much code is emitted when type info
structures (like 'TemplateTypeInfo') are created:
* we build in/out ports of registered types. So all port related code is
emitted for that type, including data objects and buffers, connections etc.
* The method/operation API uses more classes which end up in 'data sources', ie
an encapsulating structure that exposes these to scripting, corba etc.
* For each DataSource<T>, also the DataSource<[const] T[&]> is emitted as well
(so 4 variants) and a similar thing happens to 'derivatives' of DataSource,
which are a bunch of helper classes that also go x4. This allowed users to
write functions like [const] R [&] foo( [const] A [&] a), and still use it in
scripting or over CORBA.

These three points combined allow for very powerful introspection, but also a
combinatory(!) explosion of generated code. The remedies are few:

1) play with 'extern template' definitions for each type T. This will reduce
amount of generated code in user code (ie components), but won't stop the
emission when building the typekits
2) hide internal data structures behind a void* api, such that less templated
classes are needed. This has the disadvantage that you can't copy void* data
(no access to copy constructor), only pass the pointer, which limits its
applicability.
3) Rework the datasource API such that the [const] T [&] variants are no longer
needed. This is basically changing the return type of DataSource<T>::get() to
const& T instead of T.

Even with these changes, it's now clear that the Method/Operation API produces
more code than the Method/Command API of 1.x. It's more powerful, at a price. I
think this change was the 'tipping point' that caused the slow-downs to get
annoying.

I believe only remedy 3) will solve some slow-downs at typekit side, and
probably 1) is necessary to speed up component compilation times / reduce code
sizes.

If you want to investigate this yourself, just compile (-c) this piece of code:

cat > ttinfo.cpp << EOF
#include <rtt/types/TemplateTypeInfo.hpp>
#include <rtt/types/Types.hpp>
 
struct A {};
using namespace RTT::types;
 
void bar() {
  Types()->addType( new TemplateTypeInfo<A>("A") );
}
EOF
g++ -I/opt/2.0/include -DOROCOS_TARGET=gnulinux   -c -o ttinfo.o ttinfo.cpp
nm -C ttinfo.o | grep RTT | wc -l
4917
ls -lh ttinfo.o
-rw-r--r-- 1 kaltan kaltan 3.5M 2010-05-11 12:35 ttinfo.o
strip ttinfo.o
ls -lh ttinfo.o
-rw-r--r-- 1 kaltan kaltan 1.7M 2010-05-11 12:41 ttinfo.o
nm -C ttinfo.o | grep RTT | less -S

That's 4917 functions generated for 1 user type (in 463 classes). This is
without the serialization or transport but with scripting support. All the
numbers (except class count) above go in half if you enable -O3. It's still
huge.

I wonder which priority should be set first now. I'm very close to the
serialization code generation now, tackling the above issue first would delay
that again with a week. On the other hand, I find myself more and more applying
tricks (disable tests or functionality) to speed up compilation time when
iterating.

Peter

[Bug 765] Compilation time regression in 2.0-mainline

https://www.fmtc.be/bugzilla/orocos/show_bug.cgi?id=765

Peter Soetens <peter [..] ...> changed:

What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |next-major
Component|Real-Time Toolkit (RTT) |RTT
Version|2.0-mainline |unspecified
Product|RTT |Toolchain

--- Comment #1 from Peter Soetens <peter [..] ...> 2010-12-21 15:37:36 ---
This regression is still an issue on the 2.x line. Migrating this bug to
Toolchain/RTT.

[Bug 765] New: Compilation time regression in 2.0-mainline

Peter Soetens wrote:
> These three points combined allow for very powerful introspection, but also a
> combinatory(!) explosion of generated code. The remedies are few:
>
> 1) play with 'extern template' definitions for each type T. This will reduce
> amount of generated code in user code (ie components), but won't stop the
> emission when building the typekits
> 2) hide internal data structures behind a void* api, such that less templated
> classes are needed. This has the disadvantage that you can't copy void* data
> (no access to copy constructor), only pass the pointer, which limits its
> applicability.
> 3) Rework the datasource API such that the [const] T [&] variants are no longer
> needed. This is basically changing the return type of DataSource<T>::get() to
> const& T instead of T.
>
> Even with these changes, it's now clear that the Method/Operation API produces
> more code than the Method/Command API of 1.x. It's more powerful, at a price. I
> think this change was the 'tipping point' that caused the slow-downs to get
> annoying.
>
For the record, oroGen already uses "extern template" to reduce code and
compilation size. I had to make it emit the bunch of templates on a
per-type file (i.e. there is one file that emits all the needed
templates for one type). It has multiple advantages: each file requires
less code, and -jX speeds up compilation.

All and all, maybe we should investigate how to reduce the amount of
templating. I know that templating is cool, but it has a lot of
drawbacks as well (obviously).
> That's 4917 functions generated for 1 user type (in 463 classes). This is
> without the serialization or transport but with scripting support. All the
> numbers (except class count) above go in half if you enable -O3. It's still
> huge.
>
> I wonder which priority should be set first now. I'm very close to the
> serialization code generation now, tackling the above issue first would delay
> that again with a week. On the other hand, I find myself more and more applying
> tricks (disable tests or functionality) to speed up compilation time when
> iterating.
>
I'll play the devil's advocate, but maybe introducing yet another
templating engine (boost serialization) is not really the best thing
ever, since we already have this problem, no ?

Moreover, I just realized something:

template<class Archive>
void serialize(Archive& a, Mystruct& m) {
a & BOOST_NVP("a",m.a);
a & BOOST_NVP("b",m.b);
a & BOOST_NVP("c",m.c);
a & BOOST_NVP("stamps", boost::make_array(m.stamps,10) );
}

*is* a template !

It means that, if we want to reuse the serialize methods across typekit
(and we want to !), we will have to get all the serialize methods in
headers and recompile them for every typekit. *Really* not nice. Or not
reuse them, but that stinks a bit.

As I stated in the original "new serialization" thread, code generators
don't mind generating code. I liked the new serialization idea, but I
start to think that we should weight the pros and cons of it. As I see
it right now:

(+) hand-writing typekits is much easier
(+) common serialization API (no need to have code generators generate
typekit serialization for every transport)

(-) amount of generated code (serialization functions are templates)
(-) code generators don't mind generating code

Peter: you also stated that it would integrate nicely in scripting.
Could you expand on that one a little bit more ?

[Bug 765] New: Compilation time regression in 2.0-mainline

On Fri, May 14, 2010 at 10:10, Sylvain Joyeux <sylvain [dot] joyeux [..] ...>wrote:

> Peter Soetens wrote:
>
>> These three points combined allow for very powerful introspection, but
>> also a
>> combinatory(!) explosion of generated code. The remedies are few:
>>
>> 1) play with 'extern template' definitions for each type T. This will
>> reduce
>> amount of generated code in user code (ie components), but won't stop the
>> emission when building the typekits
>> 2) hide internal data structures behind a void* api, such that less
>> templated
>> classes are needed. This has the disadvantage that you can't copy void*
>> data
>> (no access to copy constructor), only pass the pointer, which limits its
>> applicability.
>> 3) Rework the datasource API such that the [const] T [&] variants are no
>> longer
>> needed. This is basically changing the return type of DataSource<T>::get()
>> to
>> const& T instead of T.
>>
>> Even with these changes, it's now clear that the Method/Operation API
>> produces
>> more code than the Method/Command API of 1.x. It's more powerful, at a
>> price. I
>> think this change was the 'tipping point' that caused the slow-downs to
>> get
>> annoying.
>>
>>
> For the record, oroGen already uses "extern template" to reduce code and
> compilation size. I had to make it emit the bunch of templates on a per-type
> file (i.e. there is one file that emits all the needed templates for one
> type). It has multiple advantages: each file requires less code, and -jX
> speeds up compilation.
>
> All and all, maybe we should investigate how to reduce the amount of
> templating. I know that templating is cool, but it has a lot of drawbacks as
> well (obviously).

I'd like to use templates as 'code generators' but only for type specific
stuff, ie, for stuff for which a real code generator would generate code too
anyway. Basically, this boils down to putting as much as possible in
template-less base classes and write infrastructure code (like transports)
that is type independent.

The new data flow code actually violated this rule, since channels,
factories and channels are type specific, while only the port and the
storage should be. We're suffering from the rule that user data must be
usable, even when it's not in the type system. The only solution to that are
dropping that 'freedom' OR using external templates and then linking with
the typekit, as orogen does.

>
> That's 4917 functions generated for 1 user type (in 463 classes). This is
>> without the serialization or transport but with scripting support. All the
>> numbers (except class count) above go in half if you enable -O3. It's
>> still
>> huge.
>> I wonder which priority should be set first now. I'm very close to the
>> serialization code generation now, tackling the above issue first would
>> delay
>> that again with a week. On the other hand, I find myself more and more
>> applying
>> tricks (disable tests or functionality) to speed up compilation time when
>> iterating.
>>
>>
> I'll play the devil's advocate, but maybe introducing yet another
> templating engine (boost serialization) is not really the best thing ever,
> since we already have this problem, no ?
>
> Moreover, I just realized something:
>
> template<class Archive>
> void serialize(Archive& a, Mystruct& m) {
> a & BOOST_NVP("a",m.a);
> a & BOOST_NVP("b",m.b);
> a & BOOST_NVP("c",m.c);
> a & BOOST_NVP("stamps", boost::make_array(m.stamps,10) );
> }
>
> *is* a template !
>
> It means that, if we want to reuse the serialize methods across typekit
> (and we want to !), we will have to get all the serialize methods in headers
> and recompile them for every typekit. *Really* not nice. Or not reuse them,
> but that stinks a bit.
>

The default typekit can offer this functionality to all other extensions
such as transports. Serialization to/from binary is now done in the MQueue
transport, but we could migrate it to the typekit. Mqueue is a good example
of a pain point: we need to compile an mqueue type transport for every user
type, ie a lot of duplication, what we want is that mqueue should just be
able to get the data in a binary form from the typekit itself. If we solve
this, the boost::serialization template is only used by the typekit, so not
much duplication there anymore. But I 100% share your concerns...

>
> As I stated in the original "new serialization" thread, code generators
> don't mind generating code. I liked the new serialization idea, but I start
> to think that we should weight the pros and cons of it. As I see it right
> now:
>
> (+) hand-writing typekits is much easier
> (+) common serialization API (no need to have code generators generate
> typekit serialization for every transport)
>
> (-) amount of generated code (serialization functions are templates)
> (-) code generators don't mind generating code
>
> Peter: you also stated that it would integrate nicely in scripting. Could
> you expand on that one a little bit more ?
>

Well, I just submitted a patch on the ros-dev mailing list (non-public, but
see https://code.ros.org/trac/ros/ticket/2544) to generate this
boost::serialization code for each ROS type, I also tested it with the ROS
point-cloud, which has 1000000 3d points in it. Clearly, the point cloud is
purely defined as ROS-MessageFormat, yet, adding it to RTT would just be
this[*]:

#include <ros/PointCloud.hpp>
#include <rtt/types/StructTypeInfo.hpp>
...
 
// note: also register sub-structs of the PointCloud struct (
http://pr.willowgarage.com/wiki/MessageFormats/PointCloud):
RTT::types::Types()->addType( new
RTT::types::StructTypeInfo<Points32>("Points32"));
RTT::types::Types()->addType( new
RTT::types::StructTypeInfo<ChannelFloat32>("ChannelFloat32"));
RTT::types::Types()->addType( new
RTT::types::StructTypeInfo<PointCloud>("PointCloud"));

Which allows this code in scripting (random code...):

var PointCloud pc1, pc2;
pc1.width = pc1.height = 1000;
this.fillCloud( pc1 );
pc2 = pc1;
for( var int i=0; i != pc1.pts.size(); i = i+1) {
    pc1.pts[i] = pc2.pts[i];
}

That's because the new typekit code knows a struct or sequence with its
parts, and the parts of these parts, all because of the serialize()
function. It's not perfect yet, sequences of sequences are not yet properly
handled. But that's a matter of integration, most of this required code is
already written.

This allowed for example, that serialization to XML (or any other format) is
handled too. The property's compose/decompose functions rely now on a
generic template-less function that uses the same type information as
scripting uses.

So in short, the rtt::mqueue transport, rtt::scripting and rtt::properties
rely on this single function, or to put it in another way: one typekit
plugin should be able to provide all that is needed to get these three
working for that type. For CORBA, I see no easy solution to make it number
four, unless you start sending sequences of anys or binary blobs over the
wire, both not very popular on this list, and a pita for users using the
CORBA IDL to talk to Orocos components...

Peter

[*] Actually, pointcloud is the worst example I could take, since it's the
only ROS type where the MessageFormat mapping is not followed and
serializers are handcoded to allow more efficient C representation, all for
the sake of performance. Ugly. UglyUglyUgly.

[Bug 765] New: Compilation time regression in 2.0-mainline

Peter Soetens wrote:
> On Fri, May 14, 2010 at 10:10, Sylvain Joyeux <sylvain [dot] joyeux [..] ...
> <mailto:sylvain [dot] joyeux [..] ...>> wrote:
>
> Peter Soetens wrote:
>
> These three points combined allow for very powerful
> introspection, but also a
> combinatory(!) explosion of generated code. The remedies are few:
>
> 1) play with 'extern template' definitions for each type T.
> This will reduce
> amount of generated code in user code (ie components), but
> won't stop the
> emission when building the typekits
> 2) hide internal data structures behind a void* api, such that
> less templated
> classes are needed. This has the disadvantage that you can't
> copy void* data
> (no access to copy constructor), only pass the pointer, which
> limits its
> applicability.
> 3) Rework the datasource API such that the [const] T [&]
> variants are no longer
> needed. This is basically changing the return type of
> DataSource<T>::get() to
> const& T instead of T.
>
> Even with these changes, it's now clear that the
> Method/Operation API produces
> more code than the Method/Command API of 1.x. It's more
> powerful, at a price. I
> think this change was the 'tipping point' that caused the
> slow-downs to get
> annoying.
>
>
> For the record, oroGen already uses "extern template" to reduce
> code and compilation size. I had to make it emit the bunch of
> templates on a per-type file (i.e. there is one file that emits
> all the needed templates for one type). It has multiple
> advantages: each file requires less code, and -jX speeds up
> compilation.
>
> All and all, maybe we should investigate how to reduce the amount
> of templating. I know that templating is cool, but it has a lot of
> drawbacks as well (obviously).
>
>
> I'd like to use templates as 'code generators' but only for type
> specific stuff, ie, for stuff for which a real code generator would
> generate code too anyway. Basically, this boils down to putting as
> much as possible in template-less base classes and write
> infrastructure code (like transports) that is type independent.
>
> The new data flow code actually violated this rule, since channels,
> factories and channels are type specific, while only the port and the
> storage should be. We're suffering from the rule that user data must
> be usable, even when it's not in the type system. The only solution to
> that are dropping that 'freedom' OR using external templates and then
> linking with the typekit, as orogen does.
Maybe we should consider dropping that.

>
> That's 4917 functions generated for 1 user type (in 463
> classes). This is
> without the serialization or transport but with scripting
> support. All the
> numbers (except class count) above go in half if you enable
> -O3. It's still
> huge.
> I wonder which priority should be set first now. I'm very
> close to the serialization code generation now, tackling the
> above issue first would delay
> that again with a week. On the other hand, I find myself more
> and more applying
> tricks (disable tests or functionality) to speed up
> compilation time when
> iterating.
>
>
> I'll play the devil's advocate, but maybe introducing yet another
> templating engine (boost serialization) is not really the best
> thing ever, since we already have this problem, no ?
>
> Moreover, I just realized something:
>
> template<class Archive>
> void serialize(Archive& a, Mystruct& m) {
> a & BOOST_NVP("a",m.a);
> a & BOOST_NVP("b",m.b);
> a & BOOST_NVP("c",m.c);
> a & BOOST_NVP("stamps", boost::make_array(m.stamps,10) );
> }
>
> *is* a template !
>
> It means that, if we want to reuse the serialize methods across
> typekit (and we want to !), we will have to get all the serialize
> methods in headers and recompile them for every typekit. *Really*
> not nice. Or not reuse them, but that stinks a bit.
>
>
> The default typekit can offer this functionality to all other
> extensions such as transports. Serialization to/from binary is now
> done in the MQueue transport, but we could migrate it to the typekit.
> Mqueue is a good example of a pain point: we need to compile an mqueue
> type transport for every user type, ie a lot of duplication, what we
> want is that mqueue should just be able to get the data in a binary
> form from the typekit itself. If we solve this, the
> boost::serialization template is only used by the typekit, so not much
> duplication there anymore. But I 100% share your concerns...
Not true.

The issue is that we combine types. I.e. one will have *in one typekit*
to generate the serialize method for, let's say, an image. Then, in
another toolkit, it will desire to transmit two frames for a
stereocamera. Ideally, the second typekit should be able to reuse the
serialize methods from the first typekit.

I know that it sounds like I'm pushing for my own solution, but I think
you should consider using typelib. Typelib is *not* template-based *at
all*: it gets a in-memory representation of the types it manipulates and
*dynamically* manipulates it. There is no representation needed at
compile time.

[Bug 765] New: Compilation time regression in 2.0-mainline

On Fri, May 14, 2010 at 15:27, Sylvain Joyeux <sylvain [dot] joyeux [..] ...>wrote:

> Peter Soetens wrote:
>
> On Fri, May 14, 2010 at 10:10, Sylvain Joyeux <sylvain [dot] joyeux [..] ... >> sylvain [dot] joyeux [..] ...>> wrote:
>>
>> Peter Soetens wrote:
>>
>> These three points combined allow for very powerful
>> introspection, but also a
>> combinatory(!) explosion of generated code. The remedies are few:
>>
>> 1) play with 'extern template' definitions for each type T.
>> This will reduce
>> amount of generated code in user code (ie components), but
>> won't stop the
>> emission when building the typekits
>> 2) hide internal data structures behind a void* api, such that
>> less templated
>> classes are needed. This has the disadvantage that you can't
>> copy void* data
>> (no access to copy constructor), only pass the pointer, which
>> limits its
>> applicability.
>> 3) Rework the datasource API such that the [const] T [&]
>> variants are no longer
>> needed. This is basically changing the return type of
>> DataSource<T>::get() to
>> const& T instead of T.
>>
>> Even with these changes, it's now clear that the
>> Method/Operation API produces
>> more code than the Method/Command API of 1.x. It's more
>> powerful, at a price. I
>> think this change was the 'tipping point' that caused the
>> slow-downs to get
>> annoying.
>>
>> For the record, oroGen already uses "extern template" to reduce
>> code and compilation size. I had to make it emit the bunch of
>> templates on a per-type file (i.e. there is one file that emits
>> all the needed templates for one type). It has multiple
>> advantages: each file requires less code, and -jX speeds up
>> compilation.
>>
>> All and all, maybe we should investigate how to reduce the amount
>> of templating. I know that templating is cool, but it has a lot of
>> drawbacks as well (obviously).
>>
>>
>> I'd like to use templates as 'code generators' but only for type specific
>> stuff, ie, for stuff for which a real code generator would generate code too
>> anyway. Basically, this boils down to putting as much as possible in
>> template-less base classes and write infrastructure code (like transports)
>> that is type independent.
>>
>> The new data flow code actually violated this rule, since channels,
>> factories and channels are type specific, while only the port and the
>> storage should be. We're suffering from the rule that user data must be
>> usable, even when it's not in the type system. The only solution to that are
>> dropping that 'freedom' OR using external templates and then linking with
>> the typekit, as orogen does.
>>
> Maybe we should consider dropping that.
>
>
>
>> That's 4917 functions generated for 1 user type (in 463
>> classes). This is
>> without the serialization or transport but with scripting
>> support. All the
>> numbers (except class count) above go in half if you enable
>> -O3. It's still
>> huge.
>> I wonder which priority should be set first now. I'm very
>> close to the serialization code generation now, tackling the
>> above issue first would delay
>> that again with a week. On the other hand, I find myself more
>> and more applying
>> tricks (disable tests or functionality) to speed up
>> compilation time when
>> iterating.
>>
>> I'll play the devil's advocate, but maybe introducing yet another
>> templating engine (boost serialization) is not really the best
>> thing ever, since we already have this problem, no ?
>>
>> Moreover, I just realized something:
>>
>> template<class Archive>
>> void serialize(Archive& a, Mystruct& m) {
>> a & BOOST_NVP("a",m.a);
>> a & BOOST_NVP("b",m.b);
>> a & BOOST_NVP("c",m.c);
>> a & BOOST_NVP("stamps", boost::make_array(m.stamps,10) );
>> }
>>
>> *is* a template !
>>
>> It means that, if we want to reuse the serialize methods across
>> typekit (and we want to !), we will have to get all the serialize
>> methods in headers and recompile them for every typekit. *Really*
>> not nice. Or not reuse them, but that stinks a bit.
>>
>>
>> The default typekit can offer this functionality to all other extensions
>> such as transports. Serialization to/from binary is now done in the MQueue
>> transport, but we could migrate it to the typekit. Mqueue is a good example
>> of a pain point: we need to compile an mqueue type transport for every user
>> type, ie a lot of duplication, what we want is that mqueue should just be
>> able to get the data in a binary form from the typekit itself. If we solve
>> this, the boost::serialization template is only used by the typekit, so not
>> much duplication there anymore. But I 100% share your concerns...
>>
> Not true.
>
> The issue is that we combine types. I.e. one will have *in one typekit* to
> generate the serialize method for, let's say, an image. Then, in another
> toolkit, it will desire to transmit two frames for a stereocamera. Ideally,
> the second typekit should be able to reuse the serialize methods from the
> first typekit.
>

well, that's what the serialize() function certainly does, it reuses the
serialize() methods from each contained type. It's also how the typekits
work, ie, that's why we had to register both Points32 and ChannelFloat32 in
the example. On the other hand, on the ros-dev mailinglist, it was said that
the best performance was obtained when the whole type serialization was
inlined as much as possible, which is what they obtained on the 1.1
development branch.

>
> I know that it sounds like I'm pushing for my own solution, but I think you
> should consider using typelib. Typelib is *not* template-based *at all*: it
> gets a in-memory representation of the types it manipulates and
> *dynamically* manipulates it. There is no representation needed at compile
> time.

Typelib is certainly more advanced than the type system in RTT, but the only
reason it doesn't use templates is because it relies on a code generator to
emit that introspection code. Typelib knows that will be emitted in one
compilation unit, the C++ template system we have today doesn't (Visual
Studio has something like that though), so it emits code in every cu, with
the known consequences.

It will be certainly a topic at the rtt-dev meeting.

Peter

[Bug 765] New: Compilation time regression in 2.0-mainline

Peter Soetens wrote:
> On Fri, May 14, 2010 at 15:27, Sylvain Joyeux <sylvain [dot] joyeux [..] ...
> <mailto:sylvain [dot] joyeux [..] ...>> wrote:
>
> Peter Soetens wrote:
>
> On Fri, May 14, 2010 at 10:10, Sylvain Joyeux
> <sylvain [dot] joyeux [..] ... sylvain [dot] joyeux [..] ...>
> <mailto:sylvain [dot] joyeux [..] ...
> <mailto:sylvain [dot] joyeux [..] ...>>> wrote:
>
> Peter Soetens wrote:
>
> These three points combined allow for very powerful
> introspection, but also a
> combinatory(!) explosion of generated code. The
> remedies are few:
>
> 1) play with 'extern template' definitions for each type T.
> This will reduce
> amount of generated code in user code (ie components), but
> won't stop the
> emission when building the typekits
> 2) hide internal data structures behind a void* api,
> such that
> less templated
> classes are needed. This has the disadvantage that you
> can't
> copy void* data
> (no access to copy constructor), only pass the pointer,
> which
> limits its
> applicability.
> 3) Rework the datasource API such that the [const] T [&]
> variants are no longer
> needed. This is basically changing the return type of
> DataSource<T>::get() to
> const& T instead of T.
>
> Even with these changes, it's now clear that the
> Method/Operation API produces
> more code than the Method/Command API of 1.x. It's more
> powerful, at a price. I
> think this change was the 'tipping point' that caused the
> slow-downs to get
> annoying.
>
> For the record, oroGen already uses "extern template" to reduce
> code and compilation size. I had to make it emit the bunch of
> templates on a per-type file (i.e. there is one file that emits
> all the needed templates for one type). It has multiple
> advantages: each file requires less code, and -jX speeds up
> compilation.
>
> All and all, maybe we should investigate how to reduce the
> amount
> of templating. I know that templating is cool, but it has a
> lot of
> drawbacks as well (obviously).
>
>
> I'd like to use templates as 'code generators' but only for
> type specific stuff, ie, for stuff for which a real code
> generator would generate code too anyway. Basically, this
> boils down to putting as much as possible in template-less
> base classes and write infrastructure code (like transports)
> that is type independent.
>
> The new data flow code actually violated this rule, since
> channels, factories and channels are type specific, while only
> the port and the storage should be. We're suffering from the
> rule that user data must be usable, even when it's not in the
> type system. The only solution to that are dropping that
> 'freedom' OR using external templates and then linking with
> the typekit, as orogen does.
>
> Maybe we should consider dropping that.
>
>
>
> That's 4917 functions generated for 1 user type (in 463
> classes). This is
> without the serialization or transport but with scripting
> support. All the
> numbers (except class count) above go in half if you enable
> -O3. It's still
> huge.
> I wonder which priority should be set first now. I'm very
> close to the serialization code generation now,
> tackling the
> above issue first would delay
> that again with a week. On the other hand, I find
> myself more
> and more applying
> tricks (disable tests or functionality) to speed up
> compilation time when
> iterating.
>
> I'll play the devil's advocate, but maybe introducing yet
> another
> templating engine (boost serialization) is not really the best
> thing ever, since we already have this problem, no ?
>
> Moreover, I just realized something:
>
> template<class Archive>
> void serialize(Archive& a, Mystruct& m) {
> a & BOOST_NVP("a",m.a);
> a & BOOST_NVP("b",m.b);
> a & BOOST_NVP("c",m.c);
> a & BOOST_NVP("stamps", boost::make_array(m.stamps,10) );
> }
>
> *is* a template !
>
> It means that, if we want to reuse the serialize methods across
> typekit (and we want to !), we will have to get all the
> serialize
> methods in headers and recompile them for every typekit.
> *Really*
> not nice. Or not reuse them, but that stinks a bit.
>
>
> The default typekit can offer this functionality to all other
> extensions such as transports. Serialization to/from binary is
> now done in the MQueue transport, but we could migrate it to
> the typekit. Mqueue is a good example of a pain point: we need
> to compile an mqueue type transport for every user type, ie a
> lot of duplication, what we want is that mqueue should just be
> able to get the data in a binary form from the typekit itself.
> If we solve this, the boost::serialization template is only
> used by the typekit, so not much duplication there anymore.
> But I 100% share your concerns...
>
> Not true.
>
> The issue is that we combine types. I.e. one will have *in one
> typekit* to generate the serialize method for, let's say, an
> image. Then, in another toolkit, it will desire to transmit two
> frames for a stereocamera. Ideally, the second typekit should be
> able to reuse the serialize methods from the first typekit.
>
>
> well, that's what the serialize() function certainly does, it reuses
> the serialize() methods from each contained type. It's also how the
> typekits work, ie, that's why we had to register both Points32 and
> ChannelFloat32 in the example. On the other hand, on the ros-dev
> mailinglist, it was said that the best performance was obtained when
> the whole type serialization was inlined as much as possible, which is
> what they obtained on the 1.1 development branch.
What I was talking about is reusing serialize<> methods *across
typekits*. The only way to do that would be to have all serialize
methods available at compile time, which will probably lead to a code
explosion.

As for the efficiency. My POV there is that, when serializing small
types, it does not really matter (very fast to serialize anyway). If
you serialize big types, most of the performance cost will be
concentrated in one big "thing" (array, vector, data structure), which
is serialized by one serialization method anyway.

I.e. I am a bit skeptical about the "best performance" claim above. Did
they give you real numbers ?

[Bug 765] New: Compilation time regression in 2.0-mainline

Peter Soetens wrote:
> On Fri, May 14, 2010 at 15:27, Sylvain Joyeux <sylvain [dot] joyeux [..] ...
> <mailto:sylvain [dot] joyeux [..] ...>> wrote:
>
> Peter Soetens wrote:
>
> On Fri, May 14, 2010 at 10:10, Sylvain Joyeux
> <sylvain [dot] joyeux [..] ... sylvain [dot] joyeux [..] ...>
> <mailto:sylvain [dot] joyeux [..] ...
> <mailto:sylvain [dot] joyeux [..] ...>>> wrote:
>
> Peter Soetens wrote:
>
> These three points combined allow for very powerful
> introspection, but also a
> combinatory(!) explosion of generated code. The
> remedies are few:
>
> 1) play with 'extern template' definitions for each type T.
> This will reduce
> amount of generated code in user code (ie components), but
> won't stop the
> emission when building the typekits
> 2) hide internal data structures behind a void* api,
> such that
> less templated
> classes are needed. This has the disadvantage that you
> can't
> copy void* data
> (no access to copy constructor), only pass the pointer,
> which
> limits its
> applicability.
> 3) Rework the datasource API such that the [const] T [&]
> variants are no longer
> needed. This is basically changing the return type of
> DataSource<T>::get() to
> const& T instead of T.
>
> Even with these changes, it's now clear that the
> Method/Operation API produces
> more code than the Method/Command API of 1.x. It's more
> powerful, at a price. I
> think this change was the 'tipping point' that caused the
> slow-downs to get
> annoying.
>
> For the record, oroGen already uses "extern template" to reduce
> code and compilation size. I had to make it emit the bunch of
> templates on a per-type file (i.e. there is one file that emits
> all the needed templates for one type). It has multiple
> advantages: each file requires less code, and -jX speeds up
> compilation.
>
> All and all, maybe we should investigate how to reduce the
> amount
> of templating. I know that templating is cool, but it has a
> lot of
> drawbacks as well (obviously).
>
>
> I'd like to use templates as 'code generators' but only for
> type specific stuff, ie, for stuff for which a real code
> generator would generate code too anyway. Basically, this
> boils down to putting as much as possible in template-less
> base classes and write infrastructure code (like transports)
> that is type independent.
>
> The new data flow code actually violated this rule, since
> channels, factories and channels are type specific, while only
> the port and the storage should be. We're suffering from the
> rule that user data must be usable, even when it's not in the
> type system. The only solution to that are dropping that
> 'freedom' OR using external templates and then linking with
> the typekit, as orogen does.
>
> Maybe we should consider dropping that.
>
>
>
> That's 4917 functions generated for 1 user type (in 463
> classes). This is
> without the serialization or transport but with scripting
> support. All the
> numbers (except class count) above go in half if you enable
> -O3. It's still
> huge.
> I wonder which priority should be set first now. I'm very
> close to the serialization code generation now,
> tackling the
> above issue first would delay
> that again with a week. On the other hand, I find
> myself more
> and more applying
> tricks (disable tests or functionality) to speed up
> compilation time when
> iterating.
>
> I'll play the devil's advocate, but maybe introducing yet
> another
> templating engine (boost serialization) is not really the best
> thing ever, since we already have this problem, no ?
>
> Moreover, I just realized something:
>
> template<class Archive>
> void serialize(Archive& a, Mystruct& m) {
> a & BOOST_NVP("a",m.a);
> a & BOOST_NVP("b",m.b);
> a & BOOST_NVP("c",m.c);
> a & BOOST_NVP("stamps", boost::make_array(m.stamps,10) );
> }
>
> *is* a template !
>
> It means that, if we want to reuse the serialize methods across
> typekit (and we want to !), we will have to get all the
> serialize
> methods in headers and recompile them for every typekit.
> *Really*
> not nice. Or not reuse them, but that stinks a bit.
>
>
> The default typekit can offer this functionality to all other
> extensions such as transports. Serialization to/from binary is
> now done in the MQueue transport, but we could migrate it to
> the typekit. Mqueue is a good example of a pain point: we need
> to compile an mqueue type transport for every user type, ie a
> lot of duplication, what we want is that mqueue should just be
> able to get the data in a binary form from the typekit itself.
> If we solve this, the boost::serialization template is only
> used by the typekit, so not much duplication there anymore.
> But I 100% share your concerns...
>
> Not true.
>
> The issue is that we combine types. I.e. one will have *in one
> typekit* to generate the serialize method for, let's say, an
> image. Then, in another toolkit, it will desire to transmit two
> frames for a stereocamera. Ideally, the second typekit should be
> able to reuse the serialize methods from the first typekit.
>
>
> well, that's what the serialize() function certainly does, it reuses
> the serialize() methods from each contained type. It's also how the
> typekits work, ie, that's why we had to register both Points32 and
> ChannelFloat32 in the example. On the other hand, on the ros-dev
> mailinglist, it was said that the best performance was obtained when
> the whole type serialization was inlined as much as possible, which is
> what they obtained on the 1.1 development branch.
>
>
>
> I know that it sounds like I'm pushing for my own solution, but I
> think you should consider using typelib. Typelib is *not*
> template-based *at all*: it gets a in-memory representation of the
> types it manipulates and *dynamically* manipulates it. There is no
> representation needed at compile time.
>
>
> Typelib is certainly more advanced than the type system in RTT, but
> the only reason it doesn't use templates is because it relies on a
> code generator to emit that introspection code. Typelib knows that
> will be emitted in one compilation unit, the C++ template system we
> have today doesn't (Visual Studio has something like that though), so
> it emits code in every cu, with the known consequences.
No it does not.

Typelib is purely dynamic. You can load C type definitions (plus some
other stuff) *dynamically* and without any code generation. I originally
wrote typelib to interface *dynamically* with Genom components (at LAAS)
without requiring another code generation engine (which was the way to
go at LAAS).

The only thing for which typelib would require code generation is
writing C++ containers as std::map and std::set (reading them would
probably be OK though).

As for the Ruby/C++ discussion: typelib is *all* C++, there is only a
Ruby binding that makes its use easier. oroGen uses that binding to
access Typelib's functionality.

The only functionality that I plan to have only on the Ruby side is
gccxml import. But that would be to translate it into typelib's own XML
format so that the C++ side can read it.

[Bug 765] New: Compilation time regression in 2.0-mainline

On May 14, 2010, at 12:27 , Peter Soetens wrote:

> On Fri, May 14, 2010 at 15:27, Sylvain Joyeux <sylvain [dot] joyeux [..] ...> wrote:
> Peter Soetens wrote:
>
> On Fri, May 14, 2010 at 10:10, Sylvain Joyeux <sylvain [dot] joyeux [..] ... sylvain [dot] joyeux [..] ...>> wrote:
>
> Peter Soetens wrote:
>
> These three points combined allow for very powerful
> introspection, but also a
> combinatory(!) explosion of generated code. The remedies are few:
>
> 1) play with 'extern template' definitions for each type T.
> This will reduce
> amount of generated code in user code (ie components), but
> won't stop the
> emission when building the typekits
> 2) hide internal data structures behind a void* api, such that
> less templated
> classes are needed. This has the disadvantage that you can't
> copy void* data
> (no access to copy constructor), only pass the pointer, which
> limits its
> applicability.
> 3) Rework the datasource API such that the [const] T [&]
> variants are no longer
> needed. This is basically changing the return type of
> DataSource<T>::get() to
> const& T instead of T.
>
> Even with these changes, it's now clear that the
> Method/Operation API produces
> more code than the Method/Command API of 1.x. It's more
> powerful, at a price. I
> think this change was the 'tipping point' that caused the
> slow-downs to get
> annoying.
>
> For the record, oroGen already uses "extern template" to reduce
> code and compilation size. I had to make it emit the bunch of
> templates on a per-type file (i.e. there is one file that emits
> all the needed templates for one type). It has multiple
> advantages: each file requires less code, and -jX speeds up
> compilation.
>
> All and all, maybe we should investigate how to reduce the amount
> of templating. I know that templating is cool, but it has a lot of
> drawbacks as well (obviously).
>
>
> I'd like to use templates as 'code generators' but only for type specific stuff, ie, for stuff for which a real code generator would generate code too anyway. Basically, this boils down to putting as much as possible in template-less base classes and write infrastructure code (like transports) that is type independent.
>
> The new data flow code actually violated this rule, since channels, factories and channels are type specific, while only the port and the storage should be. We're suffering from the rule that user data must be usable, even when it's not in the type system. The only solution to that are dropping that 'freedom' OR using external templates and then linking with the typekit, as orogen does.
> Maybe we should consider dropping that.

Better not, not without providing some extremely simple system to get user types across the wire. I suspect that you will get a user revolt otherwise ...

>
> That's 4917 functions generated for 1 user type (in 463
> classes). This is
> without the serialization or transport but with scripting
> support. All the
> numbers (except class count) above go in half if you enable
> -O3. It's still
> huge.
> I wonder which priority should be set first now. I'm very
> close to the serialization code generation now, tackling the
> above issue first would delay
> that again with a week. On the other hand, I find myself more
> and more applying
> tricks (disable tests or functionality) to speed up
> compilation time when
> iterating.
>
> I'll play the devil's advocate, but maybe introducing yet another
> templating engine (boost serialization) is not really the best
> thing ever, since we already have this problem, no ?
>
> Moreover, I just realized something:
>
> template<class Archive>
> void serialize(Archive& a, Mystruct& m) {
> a & BOOST_NVP("a",m.a);
> a & BOOST_NVP("b",m.b);
> a & BOOST_NVP("c",m.c);
> a & BOOST_NVP("stamps", boost::make_array(m.stamps,10) );
> }
>
> *is* a template !
>
> It means that, if we want to reuse the serialize methods across
> typekit (and we want to !), we will have to get all the serialize
> methods in headers and recompile them for every typekit. *Really*
> not nice. Or not reuse them, but that stinks a bit.
>
>
> The default typekit can offer this functionality to all other extensions such as transports. Serialization to/from binary is now done in the MQueue transport, but we could migrate it to the typekit. Mqueue is a good example of a pain point: we need to compile an mqueue type transport for every user type, ie a lot of duplication, what we want is that mqueue should just be able to get the data in a binary form from the typekit itself. If we solve this, the boost::serialization template is only used by the typekit, so not much duplication there anymore. But I 100% share your concerns...
> Not true.
>
> The issue is that we combine types. I.e. one will have *in one typekit* to generate the serialize method for, let's say, an image. Then, in another toolkit, it will desire to transmit two frames for a stereocamera. Ideally, the second typekit should be able to reuse the serialize methods from the first typekit.
>
> well, that's what the serialize() function certainly does, it reuses the serialize() methods from each contained type. It's also how the typekits work, ie, that's why we had to register both Points32 and ChannelFloat32 in the example. On the other hand, on the ros-dev mailinglist, it was said that the best performance was obtained when the whole type serialization was inlined as much as possible, which is what they obtained on the 1.1 development branch.

Performance isn't everything. A super fast product is useless if it doesn't do the job, or is too damn hard to use.

>
> I know that it sounds like I'm pushing for my own solution, but I think you should consider using typelib. Typelib is *not* template-based *at all*: it gets a in-memory representation of the types it manipulates and *dynamically* manipulates it. There is no representation needed at compile time.
>
> Typelib is certainly more advanced than the type system in RTT, but the only reason it doesn't use templates is because it relies on a code generator to emit that introspection code. Typelib knows that will be emitted in one compilation unit, the C++ template system we have today doesn't (Visual Studio has something like that though), so it emits code in every cu, with the known consequences.
>
> It will be certainly a topic at the rtt-dev meeting.

IIRC, typelib et al use Ruby. That for me is a strike against it, simply because my team members don't need yet another language to learn.

Only one of my several projects is truly CPU and/or RAM limited, so extra code (generated or template-related) isn't a huge deal for us. Yes, it is a pain, but we will live with it for an easily used tool.

YMMV
Stephen

[Bug 765] New: Compilation time regression in 2.0-mainline

S Roderick wrote:
> On May 14, 2010, at 12:27 , Peter Soetens wrote:
>
>> On Fri, May 14, 2010 at 15:27, Sylvain Joyeux <sylvain [dot] joyeux [..] ...
>> <mailto:sylvain [dot] joyeux [..] ...>> wrote:
>>
>> Peter Soetens wrote:
>>
>> On Fri, May 14, 2010 at 10:10, Sylvain Joyeux
>> <sylvain [dot] joyeux [..] ... sylvain [dot] joyeux [..] ...>
>> <mailto:sylvain [dot] joyeux [..] ...
>> <mailto:sylvain [dot] joyeux [..] ...>>> wrote:
>>
>> Peter Soetens wrote:
>>
>> These three points combined allow for very powerful
>> introspection, but also a
>> combinatory(!) explosion of generated code. The
>> remedies are few:
>>
>> 1) play with 'extern template' definitions for each
>> type T.
>> This will reduce
>> amount of generated code in user code (ie components), but
>> won't stop the
>> emission when building the typekits
>> 2) hide internal data structures behind a void* api,
>> such that
>> less templated
>> classes are needed. This has the disadvantage that you
>> can't
>> copy void* data
>> (no access to copy constructor), only pass the
>> pointer, which
>> limits its
>> applicability.
>> 3) Rework the datasource API such that the [const] T [&]
>> variants are no longer
>> needed. This is basically changing the return type of
>> DataSource<T>::get() to
>> const& T instead of T.
>>
>> Even with these changes, it's now clear that the
>> Method/Operation API produces
>> more code than the Method/Command API of 1.x. It's more
>> powerful, at a price. I
>> think this change was the 'tipping point' that caused the
>> slow-downs to get
>> annoying.
>>
>> For the record, oroGen already uses "extern template" to
>> reduce
>> code and compilation size. I had to make it emit the bunch of
>> templates on a per-type file (i.e. there is one file that
>> emits
>> all the needed templates for one type). It has multiple
>> advantages: each file requires less code, and -jX speeds up
>> compilation.
>>
>> All and all, maybe we should investigate how to reduce the
>> amount
>> of templating. I know that templating is cool, but it has
>> a lot of
>> drawbacks as well (obviously).
>>
>>
>> I'd like to use templates as 'code generators' but only for
>> type specific stuff, ie, for stuff for which a real code
>> generator would generate code too anyway. Basically, this
>> boils down to putting as much as possible in template-less
>> base classes and write infrastructure code (like transports)
>> that is type independent.
>>
>> The new data flow code actually violated this rule, since
>> channels, factories and channels are type specific, while
>> only the port and the storage should be. We're suffering from
>> the rule that user data must be usable, even when it's not in
>> the type system. The only solution to that are dropping that
>> 'freedom' OR using external templates and then linking with
>> the typekit, as orogen does.
>>
>> Maybe we should consider dropping that.
>>
>
> Better not, not without providing some extremely simple system to get
> user types across the wire. I suspect that you will get a user revolt
> otherwise ...
>
>>
>> That's 4917 functions generated for 1 user type (in 463
>> classes). This is
>> without the serialization or transport but with scripting
>> support. All the
>> numbers (except class count) above go in half if you
>> enable
>> -O3. It's still
>> huge.
>> I wonder which priority should be set first now. I'm very
>> close to the serialization code generation now,
>> tackling the
>> above issue first would delay
>> that again with a week. On the other hand, I find
>> myself more
>> and more applying
>> tricks (disable tests or functionality) to speed up
>> compilation time when
>> iterating.
>>
>> I'll play the devil's advocate, but maybe introducing yet
>> another
>> templating engine (boost serialization) is not really the best
>> thing ever, since we already have this problem, no ?
>>
>> Moreover, I just realized something:
>>
>> template<class Archive>
>> void serialize(Archive& a, Mystruct& m) {
>> a & BOOST_NVP("a",m.a);
>> a & BOOST_NVP("b",m.b);
>> a & BOOST_NVP("c",m.c);
>> a & BOOST_NVP("stamps", boost::make_array(m.stamps,10) );
>> }
>>
>> *is* a template !
>>
>> It means that, if we want to reuse the serialize methods
>> across
>> typekit (and we want to !), we will have to get all the
>> serialize
>> methods in headers and recompile them for every typekit.
>> *Really*
>> not nice. Or not reuse them, but that stinks a bit.
>>
>>
>> The default typekit can offer this functionality to all other
>> extensions such as transports. Serialization to/from binary
>> is now done in the MQueue transport, but we could migrate it
>> to the typekit. Mqueue is a good example of a pain point: we
>> need to compile an mqueue type transport for every user type,
>> ie a lot of duplication, what we want is that mqueue should
>> just be able to get the data in a binary form from the
>> typekit itself. If we solve this, the boost::serialization
>> template is only used by the typekit, so not much duplication
>> there anymore. But I 100% share your concerns...
>>
>> Not true.
>>
>> The issue is that we combine types. I.e. one will have *in one
>> typekit* to generate the serialize method for, let's say, an
>> image. Then, in another toolkit, it will desire to transmit two
>> frames for a stereocamera. Ideally, the second typekit should be
>> able to reuse the serialize methods from the first typekit.
>>
>>
>> well, that's what the serialize() function certainly does, it reuses
>> the serialize() methods from each contained type. It's also how the
>> typekits work, ie, that's why we had to register both Points32 and
>> ChannelFloat32 in the example. On the other hand, on the ros-dev
>> mailinglist, it was said that the best performance was obtained when
>> the whole type serialization was inlined as much as possible, which
>> is what they obtained on the 1.1 development branch.
>
> Performance isn't everything. A super fast product is useless if it
> doesn't do the job, or is too damn hard to use.
>
>>
>> I know that it sounds like I'm pushing for my own solution, but
>> I think you should consider using typelib. Typelib is *not*
>> template-based *at all*: it gets a in-memory representation of
>> the types it manipulates and *dynamically* manipulates it. There
>> is no representation needed at compile time.
>>
>>
>> Typelib is certainly more advanced than the type system in RTT, but
>> the only reason it doesn't use templates is because it relies on a
>> code generator to emit that introspection code. Typelib knows that
>> will be emitted in one compilation unit, the C++ template system we
>> have today doesn't (Visual Studio has something like that though), so
>> it emits code in every cu, with the known consequences.
>>
>> It will be certainly a topic at the rtt-dev meeting.
>
> IIRC, typelib et al use Ruby. That for me is a strike against it,
> simply because my team members don't need yet another language to learn.
Typelib is a C++ library, the Ruby access is only a binding giving
access to that functionality.

> YMMV
My M varies, as I push my team members to learn new languages when that
gives them desired functionality (yes, people in my team use both Ruby
and Python).

Sylvain Joyeux (Dr. Ing.)
Researcher - Space and Security Robotics
DFKI Robotics Innovation Center
Bremen, Robert-Hooke-Straße 5, 28359 Bremen, Germany

Phone: +49 421 218-64136
Fax: +49 421 218-64150
Email: sylvain [dot] joyeux [..] ...

Weitere Informationen: http://www.dfki.de