# Thursday, 30 June 2005

I finally solved the namespace issue I was having, although I’ll probably burn for all eternity for the solution.  In short, because of the behavior of XmlTextWriter, the only solution that could be implemented in a reasonable amount of time was to post-process the XML and strip out the extra namespace declarations. 

So I started down the path of using XmlTextReader to spin through and collect up all the namespaces that I needed, then add those to the root node.  After that I could use a regular expression to strip out all the unneeded ones.  Turns out I had overlooked the fact that the input isn’t guaranteed to be well-formed XML.  :-(

The “XML” is actually a template that our system uses to do some tag replacement.  So the output of that process is well-formed, but the input can contain the “@” character inside element names.  A no-no according to the XML spec. 

So here it is, the all-regular-expression solution.  I wouldn’t suggest you try this at home, but it does actually work, and seems to be quite fast (sub 1/4 second for a 1.5Mb input, and the typical input is more like 10K). 

Note: this is made a little simpler because I know (since I just wrote out the “XML”) that all the namespace prefixes we care about start with ns, e.g. ns0, ns1, etc.

                    #region begin hairy namespace rectifying code here

                    //this is necessary because the XmlTextWriter puts in more namespace

                    //declarations than we want, which causes file bloat.

 

                    Regex strip = new Regex(@"xmlns\:ns\d=""[^""]*""");

                    ArrayList names = new ArrayList();

                    MatchCollection matches = strip.Matches(result);

                    foreach(Match match in matches)

                    {

                        string val = match.Value;

                        if(!val.StartsWith("xmlns:ns0"))

                            if(!names.Contains(match.Value))

                                names.Add(match.Value);

                    }

 

                    string fixedNamespaces = null;

                    StringBuilder sb = new StringBuilder();

                    foreach(string name in names)

                    {

                        sb.AppendFormat(" {0}",name);

                    }

 

                    fixedNamespaces = result;

 

                    int pos = fixedNamespaces.IndexOf(">",0);//should be the end of the xml declaration

                    pos = fixedNamespaces.IndexOf(">",pos+1);//should be end of root node.

 

                    fixedNamespaces = fixedNamespaces.Insert(pos,sb.ToString());

 

                    pos = fixedNamespaces.IndexOf(">",0);//should be the end of the xml declaration

                    pos = fixedNamespaces.IndexOf(">",pos+1);//should be end of root node.

                    result = strip.Replace(fixedNamespaces, "", -1, pos);

 

                    #endregion

XML
Thursday, 30 June 2005 12:53:56 (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [1]  | 
# Friday, 24 June 2005

We’ve got some XML documents that are getting written out with way too many namespace declarations.  That probably wouldn’t be too much of a problem, except we then use those XML documents as templates to generate other documents, many with repetitive elements.  So we’re ending up with namespace bloat.  Scott and I found an example that was coming across the network at about 1.5Mb.  That’s a lot.  A large part of that turned out to be namespace declarations.  Because of the way XmlTextWriter does namespace scoping, it doesn’t write out a namespace declaration until it first sees it, which means for leaf nodes with a different namespace than their parent node, you end up with a namespace declaration on every element, like this…

<?xml version="1.0" encoding="UTF-8"?>

<ns0:RootNode xmlns:ns0="http://namespace/0">

            <ns1:FirstChild xmlns:ns1="http://namespace/1">

                        <ns2:SecondChild xmlns:ns2="http://namespace/2">Value</ns2:SecondChild>

                        <ns2:SecondChild xmlns:ns2="http://namespace/2">Value</ns2:SecondChild>

                        <ns2:SecondChild xmlns:ns2="http://namespace/2">Value</ns2:SecondChild>

                        <ns2:SecondChild xmlns:ns2="http://namespace/2">Value</ns2:SecondChild>

                        <ns2:SecondChild xmlns:ns2="http://namespace/2">Value</ns2:SecondChild>

                        <ns2:SecondChild xmlns:ns2="http://namespace/2">Value</ns2:SecondChild>

                        <ns2:SecondChild xmlns:ns2="http://namespace/2">Value</ns2:SecondChild>

                        <ns2:SecondChild xmlns:ns2="http://namespace/2">Value</ns2:SecondChild>

            </ns1:FirstChild>

</ns0:RootNode>

With our actual namespace strings, that’s like an additional 60 btyes per element that we don’t really need.  What we’d like to see is the namespaces declared once at the top of the file, then referenced elsewhere, like this…

<?xml version="1.0" encoding="UTF-8"?>

<ns0:RootNode xmlns:ns0="http://namespace/0" xmlns:ns1="http://namespace/1"  xmlns:ns2="http://namespace/2">

            <ns1:FirstChild>

                        <ns2:SecondChild>Value</ns2:SecondChild>

                        <ns2:SecondChild>Value</ns2:SecondChild>

                        <ns2:SecondChild>Value</ns2:SecondChild>

                        <ns2:SecondChild>Value</ns2:SecondChild>

                        <ns2:SecondChild>Value</ns2:SecondChild>

                        <ns2:SecondChild>Value</ns2:SecondChild>

                        <ns2:SecondChild>Value</ns2:SecondChild>

                        <ns2:SecondChild>Value</ns2:SecondChild>

            </ns1:FirstChild>

</ns0:RootNode>

When we edited the templates manually to achieve this effect, the 1.5Mb document went to like 660Kb.  Much better.

There doesn’t seem to be any way to get XmlTextWriter to do this, however.  Even if you explicitly write out the extra namespaces on the root element, you still get them everywhere, since the writer sees those as just attributes you chose to write, and not namespace declarations. 

Curses!  I’ve spent all day on this and have no ideas.  Anyone have any input?

Work | XML
Friday, 24 June 2005 14:53:54 (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [10]  | 

Vikki and I went to see Batman Begins last weekend up in Seattle, and really enjoyed it.  I was pondering the phenomenon that is Batman during the movie, and started thinking that Batman has become such an iconic figure in our contemporary mythos that it really frees the director.  It’s like making a Robin Hood movie.  You don’t have to worry about telling the story, because everyone already knows the story.  So the directory can focus on the details. 

Christian Bale was fantastic as the brooding playboy-without-conscience who beats up bad guys in his spare time.  He really brought a lot of detail to the character, and you can really start to understand what kind of guy Bruce Wayne would have to be to become Batman. 

Great supporting cast too.  Liam Neeson makes a great villain.  Good, atmospheric physical culture.  They did a good job of bringing the brooding Gothic/Art Deco style of Gotham into the modern age.  Definitely worth seeing. 

Friday, 24 June 2005 14:25:36 (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
# Wednesday, 15 June 2005
The New York Times has a very positive review (reg. required) of Batman Begins.  So maybe there is hope.  It’s amazing what you can do with a directory who cares, and some actors who can really act.  I’ve been a fan of Christian Bale ever since he was “Falstaff’s Boy” in Henry V.  Maybe I’ll get a chance to see it this weekend…
Wednesday, 15 June 2005 15:03:35 (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
# Tuesday, 14 June 2005

I haven’t seen too many new films lately, but here’s a quick rundown on a few…

  • Hitchhiker’s Guide to the Galaxy: if you’ve never read the books or seen the BBC series, not a bad film.  If you are hoping that it will represent the genius of the book (or even the BBC series) forget it.  Yet another terrible adaptation to the screen.  But in it’s own right I thought it had its moments.  My biggest complaint was that they lost much of the great linguistic jokes Adams was so good at and replaced it with slapstick pie-in-the-face antics.  Still amusing, but no where near as satisfying.
  • Merchant of Venice: totally fell asleep.  The parts I did see seemed to be a bit over acted.  I’ll have to try to watch the whole thing and see what I think.
  • Elektra: in short, it blew.  Very disappointing.  I’m a big fan of Alias, so I was hoping for more.  The story line was so disjoint that it was hard to follow, but intrusive enough to spoil the martial bits.  Yuck!
  • A Series of Unfortunate Events: one of the better ones I’ve seen of late.  My kids really liked it too.  Funny both intellectually and in typical over-the-top Jim Carry style.  He was fantastic, as were many of the character parts.  The children also did very well.  They did a great job of maintaining the ambiance.  Easily accessible for children, but plenty there for the grown-ups.
  • Angel (Season 3): I started re-watching Season 3, and I think this is the one where they were really firing on all cylinders.  The lost their way a bit in Season 4, but 3 was great.  All the characters had settled into their parts, the story arc was good, and hadn’t gotten too wacky.  Introduces some great bit characters, like Skip the Demon, who commutes to his job in Hell.  Good stuff.

I’d like to see Mr. & Mrs. Smith, as it seems to be getting some good reviews.  Maybe this weekend…

Tuesday, 14 June 2005 16:27:56 (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
The new 2.0 version of MaxiVista is out, and as usual it kicks complete a**.  It supports additional monitors, so I’ve got it running across 3 right now.  Plus, the new remote control feature is fantastic for times when you need to get at your other machines but don’t want to have to shuffle keyboards, etc.  If you are doing any debugging in VS.NET, you owe it to yourself to run on at least two monitors.  I’ve found it better than one big one.
Tuesday, 14 June 2005 16:15:01 (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
# Monday, 13 June 2005

Sigh.  It’s a constant battle.  I knew full well that XmlSerializer leaks temp assemblies if you don’t use the right constructor.  (The one that takes only a type will cache internally, so it’s not a problem.)  And then I went and wrote some code that called one of the other constructors without caching the resulting XmlSerializer instances. 

The result:  one process I looked at had over 1,500 instances of System.Reflection.Assembly on the heap.  Not so good. 

The fix?  Not as simple as I would have hoped.  The constructor that I’m using takes the Type to serialize, and an XmlRootAttribute instance.  It would be nice to be able to cache the serializers based on that XmlRootAttribute, since that’d be simple and all.  Unfortunately, two instances of an XmlRootAttribute with the same parameters return different values to GetHashCode(), so it’s not that easy.  I ended up using a string key compounded from the type’s full name and the parameters I’m using on XmlRootAttribute.  Not the most beautiful, but it’ll work.  Better than having 1,500 temp assemblies hanging around.

Work | XML
Monday, 13 June 2005 16:09:22 (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [1]  |