# Tuesday, April 27, 2004

As previously mentioned, Scott and I are working on some pretty cool stuff involving generating code from XML Schema, and then doing various bits of stuff to those generated objects.  Today what I'm pondering is not the code generation part per se, but the bits of stuff afterwards part. 

For example, if you want to take your generated class and serialize it as a string, there are (at least) three options:

  • Generate the serialization code as part of the code generation itself.  E.g. code gen the objects .ToString() method at the same time as the rest of the code.
  • Insert some cues into the object at code gen time, in the form of custom attributes, then reflect over the object at runtime and decide how to serialize it.
  • Insert some cues into the object at code gen time, in the form of custom attributes, then reflect over the object once at runtime and dynamically build the serialization code, which can then be cached (a la the XmlSerializer).

The advantage to number one is that it's fairly easy to understand, the code is pretty easy to generate, and the resulting code is probably the most performant.  The disadvantages are that the code embedded in the code gen process is a bit harder to maintain, since it's "meta-code", and that if you want to support different serialization formats for the same object, you might have to generate quite a lot of code, which embeds a lot of behavior into the objects.  That may not be a bad thing, since the whole point is that you can always regenerate the objects if the serialization behavior needs to change.

The advantage to number two is that it is the most flexible.  The objects don't need to know anything at all about any of the formats they might be serialized into, with the exception of having to be marked up with any appropriate custom attributes.  You can add new serialization methods without having to touch the objects themselves.  The biggest drawback (I think) is the reliance on reflection.  It's hard for some people to understand the reflection model, which might make the code harder to maintain.  Also, there are theoretical performance issues with using reflection all the time (many of which would be solved by #3) although we've been running this code in production and it's performing fine, but that's no guarantee that it always will.  Scott is also a bit concerned that we may run into a situation where our code isn't granted reflection privileges, but in that case we'll have plenty of other problems to deal with too.  As long as our code can be fully trusted I don't think that's a big issue.

The advantage to number three is essentially that of number two, with the addition of extra performance at runtime.  We can dynamically generate the serialization code for each object, and cache the compiled assembly just like the XmlSerializer does.  The biggest disadvantages are that it still leaves open the reflection permission isse, and (first and foremost) it's a lot more work to code it that way.  Like #1, you end up writing meta-code which takes that extra level of abstraction in thinking about.

Practically speaking, so far I find #1 easier to understand but harder to maintain, and #2 easier to maintain but much harder to explain to people. 

If anyone has any opinions or experiences I'd love to hear them.