# Thursday, April 29, 2004

OK, usually I try to stay out of the whole MS/OSS thing, but I find this interesting enought to comment on.  Just to get this out of the way, I'm all in favor of OSS, and I think both Linux and Gnome are pretty cool.  I'm a dedicated Firefox user personally.  (I realize that sounds a lot like "some of my best friends are gay".)

That said, I find statements [via TSS.NET] like "Gnome and Mozilla need to align to counter [Longhorn]" to be pretty strange.  That sounds an awful lot like the statement of someone who sees Gnome/Mozilla/Linux as a commercial, for profit enterprise.  Why else is "competition" with MS something to consider?  (One possible alternative is that the OSS community is driven by people who have way too much ego invested in OSS projects.)  I thought the whole "point" of OSS was that it was driven by a different set of concerns that is commercial, for profit software.  Am I missing something? 

Either the Gnome/Mozilla community is being run as if it was commercial, or it's being run by people who are driving the direction of the community for the sake of their own egos.  Neither is very attractive. 

Thursday, April 29, 2004 11:57:55 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
# Tuesday, April 27, 2004

As previously mentioned, Scott and I are working on some pretty cool stuff involving generating code from XML Schema, and then doing various bits of stuff to those generated objects.  Today what I'm pondering is not the code generation part per se, but the bits of stuff afterwards part. 

For example, if you want to take your generated class and serialize it as a string, there are (at least) three options:

  • Generate the serialization code as part of the code generation itself.  E.g. code gen the objects .ToString() method at the same time as the rest of the code.
  • Insert some cues into the object at code gen time, in the form of custom attributes, then reflect over the object at runtime and decide how to serialize it.
  • Insert some cues into the object at code gen time, in the form of custom attributes, then reflect over the object once at runtime and dynamically build the serialization code, which can then be cached (a la the XmlSerializer).

The advantage to number one is that it's fairly easy to understand, the code is pretty easy to generate, and the resulting code is probably the most performant.  The disadvantages are that the code embedded in the code gen process is a bit harder to maintain, since it's "meta-code", and that if you want to support different serialization formats for the same object, you might have to generate quite a lot of code, which embeds a lot of behavior into the objects.  That may not be a bad thing, since the whole point is that you can always regenerate the objects if the serialization behavior needs to change.

The advantage to number two is that it is the most flexible.  The objects don't need to know anything at all about any of the formats they might be serialized into, with the exception of having to be marked up with any appropriate custom attributes.  You can add new serialization methods without having to touch the objects themselves.  The biggest drawback (I think) is the reliance on reflection.  It's hard for some people to understand the reflection model, which might make the code harder to maintain.  Also, there are theoretical performance issues with using reflection all the time (many of which would be solved by #3) although we've been running this code in production and it's performing fine, but that's no guarantee that it always will.  Scott is also a bit concerned that we may run into a situation where our code isn't granted reflection privileges, but in that case we'll have plenty of other problems to deal with too.  As long as our code can be fully trusted I don't think that's a big issue.

The advantage to number three is essentially that of number two, with the addition of extra performance at runtime.  We can dynamically generate the serialization code for each object, and cache the compiled assembly just like the XmlSerializer does.  The biggest disadvantages are that it still leaves open the reflection permission isse, and (first and foremost) it's a lot more work to code it that way.  Like #1, you end up writing meta-code which takes that extra level of abstraction in thinking about.

Practically speaking, so far I find #1 easier to understand but harder to maintain, and #2 easier to maintain but much harder to explain to people. 

If anyone has any opinions or experiences I'd love to hear them.

Tuesday, April 27, 2004 10:39:53 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [1]  | 
# Monday, April 19, 2004

My wife and I went to see Hellboy last night.  Humph.  I was disappointed.  The hightlights were the art direction, which was very consistent and retro-cool, and Ron Perlman was a fabulous choice for Hellboy.

However, there was little to no character development in a script that really called for some.  Hellboy is supposed to be a much more tragic figure than he comes across as because of the lack of characterization.  And John Hurt's professor was totally one-dimensional. 

A pretty film, and worth a viewing, but I won't rush right out after the DVD when it comes out.

Monday, April 19, 2004 10:17:50 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
# Saturday, April 17, 2004

I just saw the new Omnimax film that's playing at OMSI here in Portland.  What an amazing film.  I've seen a bunch of Omnimax films, and I think this was one of the more compelling.  It's composed of profiles of four people who go fast for a living: a sprinter, a mountain bike racer, a race car driver, and a race car designer.  All set to Mozart, and hosted by Tim Allen.  Very cool.  Lots of Omnimax-optimized footage of going fast.  Some of the coolest stuff was footage of the mountain bike racer, Marla Streb, who's been clocked at 67 mph, down a mountain on a bike.  That's pretty amazing. 

If it's playing anywhere near you, it's worth checking out.

Saturday, April 17, 2004 5:40:43 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
# Friday, April 16, 2004

OK, I realize this is apropos of nothing, but I'm most of the way through The Clash's Clash On Broadway, and I've got to say it just doesn't get any better than this.   

What a great, groundbreaking band they were. 

Friday, April 16, 2004 4:09:14 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 

Now this is just plain annoying...

As I've mentioned before, I'm doing some ongoing work with code generation from XmlSchema files.  Developers mark up XmlSchema documents with some extra attributes in our namespace, and that influences how the code gets generated.  Think of it as xsd.exe, only this works.  :-)

So today a new problem was brought to my attention.  I read in schema files using the .NET XmlSchema classes.  OK, that works well.  For any element in the schema, you can ask its XmlSchemaDatatype what the corresponding .NET value type would be.  E.g. if you ask an element of type "xs:int", you get back System.Int32.  "xs:dateTime" maps to System.DateTime, etc.

When you want to serialize .NET objects using the XmlSerializer, you can add extra custom attributes to influence the behavior of the serializer.  So, if you have a property that returns a DateTime, but the schema type is "date" you need some help, since there's no underlying .NET Date type. 

public class MyClass
{
 [XmlElementAttribute(DataType="date")]
 public DateTime SomeDate
 {
  get;set;
 }
}

The XmlElementAttribute tells the serializer to enforce the rules corresponding to the schema datatype "date" and not "dateTime".  Groovy so far.

However, the XmlSerializer only supports certain schema types for certain CLR types.  Makes sense, as this

 [XmlElementAttribute(DataType="date")]
 public Int32 SomeDate
 {
  get;set;
 }

doesn't really make any intuitive sense. 

So now for the catch.  The CLR types that the schema reading classes (System.Xml.Schema) map to schema types don't in all cases match the CLR types that the XmlSerializer maps to schema types.  The schema reader says that "xs:integer" maps to System.Decimal.  OK, that's consistent with the XmlSchema part 2 spec.  Unfortunately, the XmlSerializer says that "xs:integer" must map to a System.String.  So does "xs:gYear", and several others. 

The end result is that I can't rely on the XmlSchemaDatatype class to tell me what type to make the resulting property in my generated code.  Arrrrrggggghhhhh!!!!!!!

The two solutions are basically

  • tell people not to use integer, gYear, etc. (possibly problematic)
  • have my code embody the differences in type mapping between XmlSchema and XmlSerializer (lame)

I haven't delved, but I can only assume that xsd.exe uses the latter of the two. 

Work | XML
Friday, April 16, 2004 3:39:42 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 

[Update] The OPTOPlay completely rocks.  Driverless install, works right out of the box.  Absolutely no background noise, no interference from the disk, and it's got a little pre-amp, so the headphones sound great.  Woohoo!

[Update] I just ordered one of these.  We'll see if that makes a difference.

How hard can it be to put a little shielding on either a hard drive or a soundcard?  I've had 2-3 systems in a row (at work, so I can't complain too much) that don't have any decent shielding inside, so I hear the hard drive spinning over my headphones, which is annoying as all get-out.  On my current IBM box, it's not intrusive enough to keep me from listening, but on my last Compaq laptop, it totally was.  The headphones were completely useless due to all the interference from the drive.  In a laptop, OK, I'm willing to cut them some slack although it's still pretty lame.  On a desktop system it shouldn't be so hard. 

I wonder if they just assume that anyone serious enough to bother to use headphones will shell out for a soundcard and not use the crappy stuff on the motherboard.  I suppose you could also make the argument that a business PC doesn't need decent sound, but I (and others I know) spend large parts of the day wearing headphones while coding to cut down the din of living in cube-land.  And Rhapsody is just so dang handy.

It's enough to make a guy buy an external sound widget like the Extigy.  That would put things a good distance from any unshielded drives.

Friday, April 16, 2004 8:59:09 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
# Monday, April 12, 2004

I just got asked a question at work that reminded me how many people still don't quite "get" XML.  And I still find it surprising.  We've now had a good 6-7 years of XML being fairly present, and 3-4 years of it being pretty ubiquitous.  And yet...

Once upon a time I wrote (and taught) a class on XML for a variety of customers, and when I think about the experience now, I think the hardest thing to get across to people is how to visualize the InfoSet.  It's not flat.  XML looks flat when you see it on a page (not flat in an ISAM way, there is hierarchy, but flat in a 2D kind of way), but the InfoSet isn't.  As soon as you introduce things like namespaces and ID/IDREFS, the InfoSet itself is really this big n-dimensional think that's hard to get your head around.  If you look at the XPath spec, it should provide a big clue.  It talks about navigating InfoSets in terms of axes.  That's exactly what they are.  Namespaces are a separate axis.  They come right out of the screen at your face.  It's not flat

And that's not even counting what happens when schema gets into the picture.  The shape of a "Post Schema Validation InfoSet" may have very little to do with that the XML looks like on the page.  That's why Binary XML shouldn't be scary to anyone.  It's just another way of serializing the InfoSet, not the XML.  Think about the thing that the XML becomes being serialized in a different manner, not the XML itself.  "Binary XML" in and of itself sounds pretty silly, since the whole point is that XML is text.  But "Binary Serialized PSVI" doesn't sound so silly, and may have some distinct advantages. 

OK, in rereading that I realize it may make sense to nobody but me, but given the self-indulgent nature of blogging I don't really care.  :-)  If I can help just one person see the InfoSet and not the flat XML, I'll sleep that much better at night.

XML
Monday, April 12, 2004 3:54:49 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 

If you have any interest in builds or code maintenance (like I do) check out vil.  It's a code metrics tool that works directly on .NET assemblies.  The interesting thing (IMHO) about that approach is that when it gives metrics for lines of code, complexity, etc. that's all with regard to IL, and not the C# or VB.NET or whatever that you wrote in the first place. 

That makes sense when you consider all the stuff that get's compiled out, although particularly with things like lines of code it may be misleading.  Particularly in C#, where keywords can expand out to quite a bit of code.  Most of the other metrics still make sense with regard to IL, and of course even lines of code is useful when used for comparison rather than absolutist purposes. 

Monday, April 12, 2004 3:20:40 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 

Roy's released the latest version of The Regulator, which continues to build upon what has become my favorite regex tool.  You can now generate .NET code for your regex straight from the tool, and for those of us who have to work with someone else's regular expressions, there's an analyzer that explains the regex in English.  Pretty cool stuff. 

Thanks Roy.

Monday, April 12, 2004 2:53:00 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  |