Monday, January 23, 2006

I’ve been playing with the January CTP of WCF, and I’ve encountered what seems like a pretty major setback.  I’ve got an interface that takes a MessageContract and returns a MessageContract.  All well and good.  But then I want to use the AsyncPattern on the service side, so that my routine will get called on a different thread from the one that’s listening on the network.  So I decorate the interface like so:

    [ServiceContract()]

    public interface IThingy

    {

        [OperationContract(AsyncPattern=true,Action="Signon",IsInitiating=true)]

        IAsyncResult BeginSignon(ThingyRequestMessage<SignOnRequest> request, AsyncCallback cb, object state);

 

        [OperationContract]

        ThingyResponseMessage<SignOnResponse> EndSignon(IAsyncResult ar);

    }

 

Now I get an exception at runtime, which says that I can’t mix parameters and messages for the method “EndSignon”.  What it means is that if I return a MessageContract instead of a primitive type, my method has to take a MessageContract and not one or more primitive types.  OK, I get that.  But my EndSignon method is getting flagged because it takes an IAsyncResult (as it must according to the AsyncPattern) and returns a ThingyResponseMessage. 

Does this mean I can’t use MessageContracts with the AsnycPattern?  If so, LAME.  If not, then what am I missing?

 |  |  | 
Monday, January 23, 2006 11:05:23 PM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [2]  | 
 Tuesday, November 01, 2005

I’ve been working on a project that required me to turn some CLR types into a set of XML Schema element definitions so that they can be included in another file.  It stumped me for a while, and I envisioned having to reflect over all my types and build schema myself, which would be a total drag. 

Then I remembered that this is exactly what xsd.exe does.  Thank the heavens for Reflector!  It turns out to be really simple, just undocumented…

            XmlReflectionImporter importer1 = new XmlReflectionImporter();

            XmlSchemas schemas = new XmlSchemas();

            XmlSchemaExporter exporter1 = new XmlSchemaExporter(schemas);

            Type type = typeof(MyTypeToConvert);

            XmlTypeMapping map = importer1.ImportTypeMapping(type);

            exporter1.ExportTypeMapping(map);

It’s that easy!  The XmlSchemaExporter will do all the right things, and you can do this with a bunch of types in a loop, then check your XmlSchemas collection.  It will contain one XmlSchema per namespace, with all the right types, just as if you’d run xsd.exe over your assembly.

Even better, if there’s stuff in your CLR types that isn’t quite right, you can use XmlAttributeOverrides just like you can with the XmlSerializer.  So if you want to exclude a property called “IgnoreMe” from your MyTypeToConvert type…

            // Create the XmlAttributeOverrides and XmlAttributes objects.

            XmlAttributeOverrides xOver = new XmlAttributeOverrides();

            XmlAttributes attrs = new XmlAttributes();

 

            /* Use the XmlIgnore to instruct the XmlSerializer to ignore

               the IgnoreMe prop  */

            attrs = new XmlAttributes();

            attrs.XmlIgnore = true;

            xOver.Add(typeof(MyTypeToConvert), "IgnoreMe", attrs);

 

            XmlReflectionImporter importer1 = new XmlReflectionImporter(xOver);

            XmlSchemas schemas = new XmlSchemas();

            XmlSchemaExporter exporter1 = new XmlSchemaExporter(schemas);

            Type type = typeof(MyTypeToConvert);

            XmlTypeMapping map = importer1.ImportTypeMapping(type);

            exporter1.ExportTypeMapping(map);

That’ll get rid of the IgnoreMe element in the final schema.  It took a bit of Reflectoring, but this saves me a ton of time.

 | 
Wednesday, November 02, 2005 12:32:41 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [2]  | 
 Friday, July 08, 2005
I’ll be teaching Introduction to Web Services (CST 407) at OIT (Portland) this Fall.  Tell your friends!  We’ll be covering the basics of Web Services, including theory, history, best practices, and a firm grounding in underlying technologies like XML and SOAP.  Should be a good time.  If you are interested you should be prepared to write code in C#.
Friday, July 08, 2005 11:21:38 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
 Thursday, June 30, 2005

I finally solved the namespace issue I was having, although I’ll probably burn for all eternity for the solution.  In short, because of the behavior of XmlTextWriter, the only solution that could be implemented in a reasonable amount of time was to post-process the XML and strip out the extra namespace declarations. 

So I started down the path of using XmlTextReader to spin through and collect up all the namespaces that I needed, then add those to the root node.  After that I could use a regular expression to strip out all the unneeded ones.  Turns out I had overlooked the fact that the input isn’t guaranteed to be well-formed XML.  :-(

The “XML” is actually a template that our system uses to do some tag replacement.  So the output of that process is well-formed, but the input can contain the “@” character inside element names.  A no-no according to the XML spec. 

So here it is, the all-regular-expression solution.  I wouldn’t suggest you try this at home, but it does actually work, and seems to be quite fast (sub 1/4 second for a 1.5Mb input, and the typical input is more like 10K). 

Note: this is made a little simpler because I know (since I just wrote out the “XML”) that all the namespace prefixes we care about start with ns, e.g. ns0, ns1, etc.

                    #region begin hairy namespace rectifying code here

                    //this is necessary because the XmlTextWriter puts in more namespace

                    //declarations than we want, which causes file bloat.

 

                    Regex strip = new Regex(@"xmlns\:ns\d=""[^""]*""");

                    ArrayList names = new ArrayList();

                    MatchCollection matches = strip.Matches(result);

                    foreach(Match match in matches)

                    {

                        string val = match.Value;

                        if(!val.StartsWith("xmlns:ns0"))

                            if(!names.Contains(match.Value))

                                names.Add(match.Value);

                    }

 

                    string fixedNamespaces = null;

                    StringBuilder sb = new StringBuilder();

                    foreach(string name in names)

                    {

                        sb.AppendFormat(" {0}",name);

                    }

 

                    fixedNamespaces = result;

 

                    int pos = fixedNamespaces.IndexOf(">",0);//should be the end of the xml declaration

                    pos = fixedNamespaces.IndexOf(">",pos+1);//should be end of root node.

 

                    fixedNamespaces = fixedNamespaces.Insert(pos,sb.ToString());

 

                    pos = fixedNamespaces.IndexOf(">",0);//should be the end of the xml declaration

                    pos = fixedNamespaces.IndexOf(">",pos+1);//should be end of root node.

                    result = strip.Replace(fixedNamespaces, "", -1, pos);

 

                    #endregion

Thursday, June 30, 2005 7:53:56 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [1]  | 
 Friday, June 24, 2005

We’ve got some XML documents that are getting written out with way too many namespace declarations.  That probably wouldn’t be too much of a problem, except we then use those XML documents as templates to generate other documents, many with repetitive elements.  So we’re ending up with namespace bloat.  Scott and I found an example that was coming across the network at about 1.5Mb.  That’s a lot.  A large part of that turned out to be namespace declarations.  Because of the way XmlTextWriter does namespace scoping, it doesn’t write out a namespace declaration until it first sees it, which means for leaf nodes with a different namespace than their parent node, you end up with a namespace declaration on every element, like this…

<?xml version="1.0" encoding="UTF-8"?>

<ns0:RootNode xmlns:ns0="http://namespace/0">

            <ns1:FirstChild xmlns:ns1="http://namespace/1">

                        <ns2:SecondChild xmlns:ns2="http://namespace/2">Value</ns2:SecondChild>

                        <ns2:SecondChild xmlns:ns2="http://namespace/2">Value</ns2:SecondChild>

                        <ns2:SecondChild xmlns:ns2="http://namespace/2">Value</ns2:SecondChild>

                        <ns2:SecondChild xmlns:ns2="http://namespace/2">Value</ns2:SecondChild>

                        <ns2:SecondChild xmlns:ns2="http://namespace/2">Value</ns2:SecondChild>

                        <ns2:SecondChild xmlns:ns2="http://namespace/2">Value</ns2:SecondChild>

                        <ns2:SecondChild xmlns:ns2="http://namespace/2">Value</ns2:SecondChild>

                        <ns2:SecondChild xmlns:ns2="http://namespace/2">Value</ns2:SecondChild>

            </ns1:FirstChild>

</ns0:RootNode>

With our actual namespace strings, that’s like an additional 60 btyes per element that we don’t really need.  What we’d like to see is the namespaces declared once at the top of the file, then referenced elsewhere, like this…

<?xml version="1.0" encoding="UTF-8"?>

<ns0:RootNode xmlns:ns0="http://namespace/0" xmlns:ns1="http://namespace/1"  xmlns:ns2="http://namespace/2">

            <ns1:FirstChild>

                        <ns2:SecondChild>Value</ns2:SecondChild>

                        <ns2:SecondChild>Value</ns2:SecondChild>

                        <ns2:SecondChild>Value</ns2:SecondChild>

                        <ns2:SecondChild>Value</ns2:SecondChild>

                        <ns2:SecondChild>Value</ns2:SecondChild>

                        <ns2:SecondChild>Value</ns2:SecondChild>

                        <ns2:SecondChild>Value</ns2:SecondChild>

                        <ns2:SecondChild>Value</ns2:SecondChild>

            </ns1:FirstChild>

</ns0:RootNode>

When we edited the templates manually to achieve this effect, the 1.5Mb document went to like 660Kb.  Much better.

There doesn’t seem to be any way to get XmlTextWriter to do this, however.  Even if you explicitly write out the extra namespaces on the root element, you still get them everywhere, since the writer sees those as just attributes you chose to write, and not namespace declarations. 

Curses!  I’ve spent all day on this and have no ideas.  Anyone have any input?

 | 
Friday, June 24, 2005 9:53:54 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [10]  | 
 Monday, June 13, 2005

Sigh.  It’s a constant battle.  I knew full well that XmlSerializer leaks temp assemblies if you don’t use the right constructor.  (The one that takes only a type will cache internally, so it’s not a problem.)  And then I went and wrote some code that called one of the other constructors without caching the resulting XmlSerializer instances. 

The result:  one process I looked at had over 1,500 instances of System.Reflection.Assembly on the heap.  Not so good. 

The fix?  Not as simple as I would have hoped.  The constructor that I’m using takes the Type to serialize, and an XmlRootAttribute instance.  It would be nice to be able to cache the serializers based on that XmlRootAttribute, since that’d be simple and all.  Unfortunately, two instances of an XmlRootAttribute with the same parameters return different values to GetHashCode(), so it’s not that easy.  I ended up using a string key compounded from the type’s full name and the parameters I’m using on XmlRootAttribute.  Not the most beautiful, but it’ll work.  Better than having 1,500 temp assemblies hanging around.

 | 
Monday, June 13, 2005 11:09:22 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [1]  | 
 Tuesday, April 05, 2005

Someone asked today how to get a list of all the namespace prefixes used in an XML document, along with their associated URIs so that that information could be used to initialize a XmlNamespaceManager.  This works…

        XPathDocument xdoc = new XPathDocument(@"c:\temp\myfile.xml");

        XPathNavigator nav = xdoc.CreateNavigator();

        XPathNodeIterator nodes = (XPathNodeIterator)nav.Evaluate("//namespace::*");

        Hashtable h = new Hashtable();

        while(nodes.MoveNext())

        {

            h[nodes.Current.Name] = nodes.Current.Value;

        }

        foreach(string name in h.Keys)

        {

            WL("{0}:{1}",name,h[name]);

        }

 

You’ll end up with a hashtable with the prefixes as keys and the associated URIs as their values.  You could probably do something even cooler with a unique set datastructure, but the hashtable works in a pinch.

Tuesday, April 05, 2005 11:12:52 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
 Thursday, March 17, 2005

Here we are in the year 2005.  XML has been pretty ubiquitous for at least 5–6 years now.  Namespaces have been in use for pretty much all of that time.  And yet they remain possibly the least understood part of average, everyday XML processing. 

The bottom line is that pretty much any XML parser worth its salt these days supports the namespaces spec.  Which means that

<MyElement/>

is absolutely not the same thing as

<MyElement xmlns=”urn:runforthehills”/>

Furthermore, in line with the XML Namespaces spec, an application which is expecting the latter, namespace qualified element should not and must not process the former, unqualified element.

The XmlSerializer that we all know and love in .NET is particularly sensitive to this issue (as well it should be).  As far as the serializer is concerned, everything should be namespace qualified.  The way this commonly bites people is thus: a customer/partner sends you a schema representing the XML documents they are going to be sending you.  In the schema, the targetNamespace attribute is set with a value of “http://partner.com/schema”.  When you actually do to debug the application however, it turns out they are sending you totally unqualified XML.  Nothing will work.  There are a few pretty horrible things you can do with the XmlSerializer to try and convince it not to be such a stickler about things, most involving the XmlRootAttribute and XmlAttributeOverrides.  I can share those ways if anyone really wants to see them.  Probably best to keep them under cover.  However, that’s only likely to work if your XML document is flat, meaning that the root element only has one level of child nodes under it.  Otherwise, if you use Xsd.exe to generate your serialization class, each set of sub elements get put in their own object, which will also be namespace qualified.  And you’re back to square one. 

The right solution of course is to get your partner to send you XML that’s actually correct, but often that’s just not possible for a variety of reasons with which I’m sure we’re all familiar.  As a last ditch effort, you can pre-process the XML text before passing it to the XmlSerializer, and inject the right namespace strings.  Yucky, it’s true, but it does actually get the job done.  You will of course, be paying some overhead costs of string processing and possibly parsing the XML twice.  But what can you do?

The other thing to keep in mind is how namespaces play out in XSD schema files.  You can only have one target namespace per schema, so anything you define in that schema file will be in that target namespace.  You can import things from other namespaces, but not from the target namespace.  You can, however, define two different schema files that use the same namespace, then import them both into another schema, as long as there are no name collisions.  If you omit the targetNamespace attribute from your schema, the targetNamespace becomes “”, meaning you are defining the schema for an unqualified XML document. 

Confusing enough?  Read the namespace spec (it’s really short), familiarize yourself with how namespaces work in schema, if you see errors coming back from the XmlSerializer that look like

The element <spam xmlns=””> was not expected.

check your namespaces!  That means you are trying to deserialize an unqualified document, when a qualified one was expected.

Thursday, March 17, 2005 9:00:03 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
 Monday, December 06, 2004
I’ll be teaching again next term at OIT (at CAPITAL Center in Beaverton), this time “Enterprise Web Services”.  We’ll be looking at what it takes to build a real-world enterprise application using web services, including such topics as asynchronous messaging, security, reliable messaging and a host of others. We’ll walk through all the stages of building an enterprise-level WS application, using .NET and WSE 2.0 to do the heavy lifting.  Required is a firm grasp of programming in C#, and a basic understanding of Web Services fundamentals such as XML, SOAP, and WSDL.
Monday, December 06, 2004 9:18:30 PM (Pacific Standard Time, UTC-08:00)  #    Disclaimer  |  Comments [0]  | 
 Tuesday, October 26, 2004

As most of you probably have already heard, according to Dare, we won't be getting XQuery with Whidbey. 

LAME!

One of the reasons given for this decision is that customers want something that is compliant with W3 standards.  OK, that's true.  I would disagree that people will only use something that is compliant with a full recommendation.  Back in the day when MS first started putting out XML tools (early MSXML, SOAP Toolkit, etc.) many of those tools were built around working drafts, and we still managed to use them to get real work done.  I would argue that even if the XQuery spec were to change substantively between now and it's full recommendation-hood (which I doubt) there's plenty of opportunity to get real work done with XQuery starting right now. 

The counter argument is that people don't want to make changes to their code when the real spec ships.  Guess what!  There have been breaking changes introduced in each new revision of the .NET framework.  People have to change their code all the time.  I had to unwind a huge number of problems do to the changes in remoting security between .NET 1.0 and 1.1.  Somehow we manage.  The excuse of "well, you still have XSLT" just doesn't cut it IMHO.  XSLT is a much more difficult programming model than XQuery, and most people to this day don't get how the declarative model in XSLT is supposed to work.  XPath 1.0 is very limiting, which is why there's an XPath 2/XSLT 2 (which also are not going to be supported in Whidbey!). 

I have to wonder if performance issues aren't closer to the truth of why it's not shipping.  Writing an engine that can do arbitrary XQuery against arbitrary documents certainly isn't an easy thing to do.  Think SQL.  SQL Server is a non-trivial implementation, and there's a reason for it.  I'm guessing that the reality of trying to make XQuery perform the way people would expect is a pretty daunting task. 

Either way, I think it's a drag that we won't get XQuery support, recommendation or no.

Tuesday, October 26, 2004 6:00:54 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 

Steve Maine is in the midst of the perennial debate between SOAP and REST, and I feel compelled to add my two cents...

At the XML DevCon last week I noticed that it continues to be fashionable to bash the existing Web Services standards as being too complex and unwieldy (which in several notable cases is true, but it's what we have to work with at this point) but that doesn't change the fact that they solve real problems.  I've always had a sneaking suspicion that people heavily into REST as a concept favored it mostly out of laziness, since it is undeniably a much simpler model than the SOAP/WS-* stack.  On the other hand, it fails to solve a bunch of real problems that SOAP/WS-* does.  WS-Addressing is a good example. 

I spent two years developing an application that involved hardware devices attached to large power transformers and industrial battery systems that needed to communicate back to a central data collection system.  We used SOAP to solve that particular problem, since it was easy to get the data where it needed to go, and we could use WS-Security to provide a high level of data encryption to our customers.  (Utility companies like that.)  However, we had one customer who would only allow us to get data from the monitors on their transformers through a store-and-forward mechanism, whereby the monitors would dump their data to a server inside their firewall, and we could pick up the data via FTP.  This is a great place for WS-Addressing, since all the addressing information staid inside the SOAP document, and it didn't matter if we stored it out to disk for a bit.  There is no way that REST could have solved this particular problem.  Or, at least, no way without coming up with some truly bizarre architecture that would never be anything but gross.

REST is great for solving very simple application scenarios, but that doesn't make it a replacement for SOAP.  I agree that many of the WS-* standards are getting a bit out of hand, but I also agree with Don Box's assessment (in his "WS-Why?" talk last week) that given the constraints, WS-Addressing and WS-Security are the simplest solutions that solve the problem.  There's a reason why these are non-trivial specs.  They solve non-trivial problems.

So rather than focusing on REST vs. SOAP, it's more interesting and appropriate to look at the application scenarios and talk about which is the simplest solution that addresses all the requirements.  I don't think they need to be mutually exclusive.

Tuesday, October 26, 2004 5:01:27 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [1]  | 
 Thursday, October 21, 2004
Wow, Rory drew my head.  I've arrived. :-)
 | 
Thursday, October 21, 2004 4:35:02 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
 Wednesday, October 20, 2004

Sam Ruby is talking about problems with textual data out on the web, or more specifically in the context of RSS, having to do with bad assumptions about character encoding.  As someone who once did a lot of work in localization, it's a subject near and dear to my heart. 

I'm always amazed that still to this day people don't get that there are such a thing as character encoding. 

Sam points out that an upper case A and the Greek Alpha show up in most fonts as the same glyph.  However, they are different code points in Unicode. 

He's moving this idea up the stack to show why there are so many conflicts between HTML/XML/RSS/etc.  The rules for character encoding are different in all those systems and are enforced differently by different tools, which is what causes so many RSS feeds to be badly formed. 

He started the presentation talking about how XML is an "attractive nuisance" with regard to the encoding issue, in that it leads people down the primrose path to thinking all their encoding issues are solved just because XML is supposed to take care of encoding. 

All in all, the issues Sam is talking about are pretty obscure, and appeal mostly to XML wonks, but that doesn't make them any less valid.  The reality is we've all learned to deal with it most of the time, just like we're used to IE fixing up our bad HTML tables.

Wednesday, October 20, 2004 10:51:23 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
 Monday, October 18, 2004

Just two more days until the Dev Con.  I had a great time both speaking at and attending last year, so I'm looking forward to another exciting time.  Scott and I will be talking about some of the things we're doing at work around XML Schema and using "contract first" coding in a non-Web Services context. 

I hear there are still a few more open seats...

Tuesday, October 19, 2004 1:21:17 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
 Tuesday, October 05, 2004

I may just have a new favorite XML editor.  I caught wind of Stylus Studio 6 [via Mike Gunderloy] so I downloaded a trial copy and checked it out.  Wow.  I'm pretty impressed.  It's the same price as XMLSpy Pro, but includes support for XPath (v1 and v2), XQuery, Web Services testing, and a pretty good schema-to-schema mapping tool that creates XSLT files.  Plus it has a schema editor which looks pretty good, lots of data conversion tools, support for custom extensions (if you have your own file types), etc.  Lots of good stuff here.

What is even cooler is that they have a "Home" version for non-commercial use that has almost all of the features of the pro version (unlike the pretty well crippled XMLSpy Home) for only $50.  I'll definitely turn my students on to this next week.  That's a lot of functionality for very little money.  The schema editor in the Home version isn't quite as cool, and there are a few other features it doesn't support, like web services testing, but it looks otherwise pretty highly functional. 

If you don't care about the WSDL editor, there might be a lot to recommend in the Pro version over XMLSpy Enterprise, at about 1/3 of the price.

Tuesday, October 05, 2004 8:27:23 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [1]  | 
 Friday, August 13, 2004

Looks like Scott and I will be speaking at Chris Sells' XML DevCon this year.  Last year I spoke on XML on transformer monitors.  This year Scott and I will be talking about the work we've been doing with online banking and XML Schema. 

If it's anything like last year's, the conference should be pretty amazing.  The speakers list includes some pretty serious luminaries.  In fact, it's pretty much a bunch of famous guys... and me.  :-)

Sign up now!

Friday, August 13, 2004 11:50:12 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
 Wednesday, August 04, 2004

I'll be teaching at OIT (in Portland/Beaverton, not K-Falls) again Fall term.  This time it's "Practical Web Services".  If you're interested, sign up through OIT.  The course number is 15048.  Description follows:

Practical Web Services

Web Services sound like a great idea, but how do you actually go about using them?  How do you go about actually writing your own Web Service to expose your data or functionality?

This class will cover all the details involved in using and building your own Web Services using the Microsoft .NET platform.  The first half of the class will cover the building of a client application to consume a Web Service from the Internet.  The second half will focus on building an equivalent Web Service using ASP.NET.

Students will leave this class with a firm understanding of how to use Web Services built by other people, and how to implement their own Web Services using the .NET platform.

Students should either have taken the previous "Web Services Theory" class, or have instructor approval.  All work will be done in C#, so a firm understanding of C# is required.

Wednesday, August 04, 2004 4:47:02 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
 Friday, July 23, 2004

This should be totally obvious to those with XML experience, but to those who don't fall into that category, keep in mind that it's of utmost importance to not mix data and meta-data when designing your XML.  For example, when creating an XML document for a purchase order, I've often seen stuff like

<PO>
 <lineItems>
  <item1>
    <name>widget x</name>
    <price>$4.99</price>
  </item1>
  <item2>
    <name>widget z</name>
    <price>$.99</price>
  </item2>
 </lineItems>
</PO>

This is what i mean by mixing data and meta-data.  By naming elements "item1" and "item2" you've mixed data (ordinal values "1" and "2") with meta-data (the description "item").  Now when you go to write a schema to match this document, what do you do?  Explicitly name elements item1 and item2?  What happens when you get a PO with 3 items.  You're screwed. 

Again, to those who are used to working with XML, this is readily apparent, but I found out from the class I taught this summer that it isn't obvious to everyone.  A much better solution would be something like

<PO>
 <lineItems>
  <item number="1">
    <name>widget x</name>
    <price>$4.99</price>
  </item>
  <item number="2">  <!--[Update:] fixed.  Thanks Haacked-->
    <name>widget z</name>
    <price>$.99</price>
  </item2>
 </lineItems>
</PO>

In that case the meta-date is property separated.  In this particular case, you don't actually have to specify the number at all, since the elements are inherently ordered, but you get the idea. 

Remember, friends don't let friends write bad XML.

Friday, July 23, 2004 8:10:29 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [3]  | 
 Thursday, July 15, 2004

I finished up my Web Services Theory class at OIT last night.  Just the final to go on Monday (mwah ha ha).

We ended with some stuff on WS-* and all the various specs.  I tried to spend minimal time on the actual syntax of WS-*, since some of them are pretty hairy, and spent more time on the business cases for WS-*.  That seemed to go over pretty well.  I think it's easier to understand the business case for why we need WS-Security than it is to understand the spec itself.  Unfortunately, on of the underlying assumptions about all the GXA/WS-* specs is that eventually they will just fade into the background, and you'll never see the actual XML, since some framework piece (like WSE 2.0) will just "take care of it" for you.  What that means is that the actual XML can be pretty complex.  The unfortunate part is that we don't have all those framework bits yet, so we have to deal with all the complexity ourselves.  Thankfully more tools like WSE 2 are available to hide some of that from the average developer.  On the other hand, I'm a great believer in taking the red pill and understanding what really goes on underneath our framework implemenations. 

Thursday, July 15, 2004 11:40:53 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
 Friday, July 09, 2004

Dare Obasanjo posits that the usefulness of the W3C might be at an end, and I couldn't agree more.  Yes, the W3C was largely behind the standards that "made" the Web, but they've become so bloated and slow that they can't get anything done.

There's no reason why XQuery, XInclude, and any number of other standards that people could be using today aren't finished other than the fact that all the bureaucrats on the committee all want their pet feature in the spec, and the W3C process is all about consensus.  What that ends up meaning is that no one is willing to implement any of these specs seriously until they are full recommendations.  6 years now, and still no XQuery.  It's sufficiently complex that nobody is going to try to implement anything other than toy/test implementations until the spec is a full recommendation.

By contrast, the formally GXA now WS-* specs have been coming along very quickly, and we're seeing real implementation because of it.  The best thing that ever happened to Web Services was the day that IBM and Microsoft agreed to "agree on standards, compete on implementations".  That's all it took.  As soon as you get not one but two 800 lb. gorillas writing specs together, the reality is that the industry will fall behind them.  As a result, we have real implementations of WS-Security, WS-Addressing, etc.  When we in the business world are still working on "Internet time", we can't wait around 6-7 years for a real spec just so every academic in the world gets his favorite thing in the spec.  That's how you get XML Schema, and all the irrelevant junk that's in that spec. 

The specs that have really taken off and gotten wide acceptance have largely been defacto, non-W3C blessed specs, like SAX, RSS, SOAP, etc.  It's time for us to move on and start getting more work done with real standards based on the real world.

 |  |  | 
Friday, July 09, 2004 5:35:44 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
 Thursday, June 24, 2004

I started teaching a class at OIT this week on "Web Services Theory", in which I'm trying to capture not only reality, but the grand utopian vision that Web Services were meant to solve (more on that later).  That got me thinking about the way the industry as a whole has approached file formats over the last 15 years or so. 

There was a great contraction of file formats in the early 90s, which resulted in way more problems than anyone had anticipated I think, followed by a re-expansion in the late 90s when everyone figured out that the whole Internet thing was here to stay and not just a fad among USENET geeks. 

Once upon a time, back when I was in college I worked as a lab monkey in a big room full on Macs as a "support technician".  What that mostly meant was answering questions about how to format Word documents, and trying to recover the odd thesis paper from the 800k floppy that was the only copy of the 200 page paper and had somehow gotten beer spilled all over it.  (This is back when I was pursuing my degree in East Asian Studies and couldn't imagine why people wanted to work with computers all day.)

Back then, Word documents were RTF.  Which meant that Word docs written on Windows 2.0 running on PS/2 model 40s were easily translatable into Word docs running under System 7 on Mac SEs.  Life was good.  And when somebody backed over a floppy in their VW bug and just had to get their thesis back, we could scrape most of the text off the disc even if had lost the odd sector here and there.  Sure, the RTF was trashed and you had to sift out the now-useless formatting goo, but the text was recoverable in large part.  In other sectors of the industry, files were happily being saved in CSV or fixed length text files (EDI?) and it might have been a pain to write yet another CSV parser, but with a little effort people could get data from one place to another. 

Then the industry suddenly decided that it could add lots more value to documents by making them completely inscrutable.  In our microcosm example, Word moved from RTF to OLE Structured Storage.  We support monkeys rued the day!  Sure, it made it really easy to serialize OLE embedded objects, and all kinds of neat value added junk that most people didn't take advantage of anyway.  On the other hand, we now had to treat our floppies as holy relics, because if so much as one byte went awry, forget ever recovering anything out of your document.  Best to just consider it gone.  We all learned to be completely paranoid about backing up important documents on 3-4 disks just to make sure.  (Since the entire collection of all the papers I ever wrote in college fit on a couple of 1.4Mb floppies, not a big deal, but still a hassle.)

Apple and IBM were just as guilty.  They were off inventing "OpenDoc" which was OLE Structured Storage only invented somewhere else.  And OpenDoc failed horribly, but for lots of non-technical reasons.  The point is, the industry in general was moving file formats towards mutually incomprehensible binary formats.  In part to "add value" and in part to assure "lock in".  If you could only move to another word processing platform by losing all your formatting, it might not be worth it. 

When documents were only likely to be consumed within one office or school environment, this was less of an issue, since it was relatively easy to standardize on a single platform, etc.  When the Internet entered the picture, it posed a real problem, since people now wanted to share information over a much broader range, and the fact that you couldn't possibly read a Word for Windows doc on the Mac just wasn't acceptable. 

When XML first started to be everyone's buzzword of choice in the late 90s, there were lots of detractors who said things like "aren't we just going back to delimited text files? what a lame idea!".  In some ways it was like going back to CSV text files.  Documents became human readable (and machine readable) again.  Sure, they got bigger, but compression got better too, and disks and networks became much more capable.  It was hard to shake people loose from proprietary document formats, but it's mostly happened.  Witness WordML.  OLE structured storage out, XML in.  Of course, WordML is functionally RTF, only way more verbose and bloated, but it's easy to parse and humans can understand it (given time). 

So from a world of all text, we contracted down to binary silo-ed formats, then expanded out to text files again (only with meta-data this time).  It's like a Big Bang of data compatibility.  Let's hope it's a long while before we hit another contracting cycle.  Now if we could just agree on schemas...

 | 
Thursday, June 24, 2004 6:24:31 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
 Tuesday, May 25, 2004

Scott has some comments about WSE 2.0 (which just in case you haven't heard yet has RTMed) and I wanted to comment on a few things...

 Question: The Basic Profile is great, but are the other specs getting too complicated?
My Personal Answer (today): Kinda feels like it!  WS-Security will be more useful when there is a more support on the Java side.  As far as WS-Policy, it seems that Dynamic Policy is where the money's at and it's a bummer WSE doesn't support it.    
[Scott]

It's the tools that are at issue here, rather than the specs I think.  I spent some time writing WS-Security by hand about a year ago, and yes, it's complicated, but I don't think unnecessarily so.  The problem is that we aren't supposed to be writing it by hand.  We take SSL totally for granted, but writing an SSL implementation from scratch is non-trivial.  We don't have to write them ourselves anymore, so we can take it for granted.  The problem (in the specific case of WS-Security) is that we have taken it for granted as far as Web Services go.  Unfortunately, that makes the assumption that Web Services are bound to HTTP.  In order to break the dependence on HTTP (which opens up many new application scenarios) we have to replace all the stuff that HTTP gives us "for free" like encryption, addressing, authentication, etc.  Because to fit with SOAP those things all have to be declarative rather than procedural, I think they feel harder than depending on the same thing from procedural code. 

If we are to realize the full potential of Web Services and SO, then we have to have all this infrastructure in place, to the point where it becomes ubiquitous.  Then we can take the WS-*s for granted just like we do SSL today.  Unfortunately the tools haven't caught up yet.  Three or four years ago we were writing an awful lot of SOAP and WSDL related code ourselves, and now the toolsets have caught up (mostly).  Given enough time the tools should be able to encompass the rest of the standards we need to open up all the new application scenarios. 

Steve Maine makes a good analogy to the corporate mailroom.  There's a lot of complexity and complex systems involved in getting mail around the postal system which we don't see on a daily basis.  But it's out there none the less, and we couldn't get mail around without them.  When we can take SO for granted like we do the postal system, then we'll see the full potential of what SO can do for business, etc. in the real world.

Tuesday, May 25, 2004 6:06:13 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
 Thursday, May 20, 2004
Now that I think about it some more, this is a problem that WinFS could really help to solve.  The biggest reason that people don't use things like RDF is sheer laziness (you'll notice the rich RDF on my site :-) ) but if we can use the Longhorn interface to easily enter and organize metadata about content, it might be a cool way to generate RDF or other semantic information.  Hmmmm...  It would be fun to write a WinFS -> RDF widget.  If it wasn't for that dang day job...
 | 
Thursday, May 20, 2004 7:43:55 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 

Scott mentions some difficulty he had lately in finding some some information with Google, which brings to my mind the issue (long debated) of the semantic web.  Scott's problem is exactly the kind of thing that RDF was meant to solve when it first came into being, lo these 6-7 years ago. 

Has anyone taken advantage of it?  Not really.  The odd library and art gallery.  Why?  Two main reasons: 1) pure laziness.  It's extra work to tag everything with metadata 2) RDF is nearly impossible to understand.  That's the biggest rub.  RDF, like so many other standards to come out of IETF/W3C is almost incomprehensible to anyone who didn't write the standard.  The whole notion of writing RDF tuples in XML is something that most people just don't get.  I don't really understand how it's supposed to work myself.  And, like with WSDL and other examples, the people who came up with RDF assumed that people would use tools to write the tuples, so they wouldn't have to understand the format.  The problem with that (and with WSDL) is that since noone understands the standard, noone has written any usable tools either. 

The closest that anyone has come to using RDF in any real way is RSS, which has turned out to be so successful because it is accessible.  It's not hard to understand how RSS is supposed to work, which is why it's not really RDF.  So attaching some metadata to blog content has turned out to be not so hard, mostly because most people don't go beyond a simple category, although RSS supports quite a bit more. 

The drawback to RDF is that it was create by and for librarians, not web page authors (most of whom aren't librarians).  Since most of us don't have librarians to mark up our content with RDF for us, it just doesn't get done.  Part of the implicit assumption behind RDF and the semantic web is that authoritative information only comes from institutional sources, who have the resources to deal with semantic metadata.  If blogging has taught us anything, it's that that particular assumption just isn't true.  Most of the useful information on the internet comes from "non-authoritative" sources.  When was the last time you got a useful answer to a tech support problem from a corporate web site?  The tidbit you need to solve your tech support problem is now-a-days more likely to come from a blog or a USENET post than it is from the company who made the product.  And those people don't give a fig for the "semantic web". 

 

 | 
Thursday, May 20, 2004 7:29:22 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
 Friday, April 16, 2004

Now this is just plain annoying...

As I've mentioned before, I'm doing some ongoing work with code generation from XmlSchema files.  Developers mark up XmlSchema documents with some extra attributes in our namespace, and that influences how the code gets generated.  Think of it as xsd.exe, only this works.  :-)

So today a new problem was brought to my attention.  I read in schema files using the .NET XmlSchema classes.  OK, that works well.  For any element in the schema, you can ask its XmlSchemaDatatype what the corresponding .NET value type would be.  E.g. if you ask an element of type "xs:int", you get back System.Int32.  "xs:dateTime" maps to System.DateTime, etc.

When you want to serialize .NET objects using the XmlSerializer, you can add e