# Thursday, October 21, 2004
Wow, Rory drew my head.  I've arrived. :-)
Work | XML
Thursday, October 21, 2004 9:35:02 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
# Wednesday, October 20, 2004

Sam Ruby is talking about problems with textual data out on the web, or more specifically in the context of RSS, having to do with bad assumptions about character encoding.  As someone who once did a lot of work in localization, it's a subject near and dear to my heart. 

I'm always amazed that still to this day people don't get that there are such a thing as character encoding. 

Sam points out that an upper case A and the Greek Alpha show up in most fonts as the same glyph.  However, they are different code points in Unicode. 

He's moving this idea up the stack to show why there are so many conflicts between HTML/XML/RSS/etc.  The rules for character encoding are different in all those systems and are enforced differently by different tools, which is what causes so many RSS feeds to be badly formed. 

He started the presentation talking about how XML is an "attractive nuisance" with regard to the encoding issue, in that it leads people down the primrose path to thinking all their encoding issues are solved just because XML is supposed to take care of encoding. 

All in all, the issues Sam is talking about are pretty obscure, and appeal mostly to XML wonks, but that doesn't make them any less valid.  The reality is we've all learned to deal with it most of the time, just like we're used to IE fixing up our bad HTML tables.

Wednesday, October 20, 2004 3:51:23 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
# Monday, October 18, 2004

Just two more days until the Dev Con.  I had a great time both speaking at and attending last year, so I'm looking forward to another exciting time.  Scott and I will be talking about some of the things we're doing at work around XML Schema and using "contract first" coding in a non-Web Services context. 

I hear there are still a few more open seats...

Monday, October 18, 2004 6:21:17 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 

Being a non-CS degree holder (I can tell you all about Chinese vs. Japanese Buddhism though) I've always been a bit intimidated by the idea of parser/compiler building.  Luckily, there's Coco/R.  I'm intimidated no longer!  I've been playing around with creating a custom scripting language for some of the code generation we're doing, and this turned out to be a really easy way to parse/compile the scripts.  Coco/R is distributed under the GPL, and source is available.  There are versions for both C# and Java. 

I was really impressed at how easy it was.  Basically you write an EBNF definition of your files to be parsed, and then annotate them with native (C# or Java) code that does the compilation.  Here's an example from the sample that comes with the distribution...

MulOp<out int op>
=                        (. op = -1; .)
  ( '*'                  (. op = times; .)
  | '/'                  (. op = slash; .)
RelOp<out int op>
=                        (. op = -1; .)
  ( '='                  (. op = equ; .)
  | '<'                  (. op = lss; .)
  | '>'                  (. op = gtr; .)

The EBNF stuff is on the left, and the native code on the right.  Super easy, and the parsers work great.  Very fast.  They are also very easy to debug, as the generated code is very well laid out.  It corresponds to the EBNF constructions, so debugging the process is very easy.

If you ever find yourself needing to do some parsing, check it out.

Monday, October 18, 2004 11:32:36 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
# Friday, October 08, 2004

Turns out that if you have multiple versions of the same assembly, say and in your probing path, only one of them will ever get loaded.  Here's the deal...

I've got an app in


in it are myassembly.dll v1.0.0.1 and myapp.exe.

There's a subdirectory


that contains myassembly.dll v1.0.0.2, plus a new assembly newassembly.dll, which is dependent on v1.0.0.2 of myassembly.dll

In myapp.exe.config, I've included the "new" subdirectory in the applications private path, meaning that it will look there for assemblies when doing assembly binding.

  <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
   <probing privatePath="new;"/>

When the application loads newassembly.dll, I would have thought that it would correctly load the version of myassembly.dll, and that the application would load the version to which it was bound.  Alas, that turns out not to be the case.  Fusion comes back with

LOG: Post-policy reference: myassembly, Version=, Culture=neutral, PublicKeyToken=xxxyyyzzz
LOG: Cache Lookup was unsuccessful.
LOG: Attempting download of new URL file:///C:/myapp/myassembly.DLL.
LOG: Assembly download was successful. Attempting setup of file: C:\myapp\myassembly.DLL
LOG: Entering run-from-source setup phase.
WRN: Comparing the assembly name resulted in the mismatch: Build Number
ERR: The assembly reference did not match the assembly definition found.
ERR: Failed to complete setup of assembly (hr = 0x80131040). Probing terminated.

We had assumed that fusion would skip the wrong version and keep probing for the right one.  It looks like it just finds the wrong one and bails.  Of course, if we put both versions in the GAC, it works just like we would expect, which makes sense, that being what the GAC is for and everything. :-)

Friday, October 08, 2004 2:20:04 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 
# Tuesday, October 05, 2004

I may just have a new favorite XML editor.  I caught wind of Stylus Studio 6 [via Mike Gunderloy] so I downloaded a trial copy and checked it out.  Wow.  I'm pretty impressed.  It's the same price as XMLSpy Pro, but includes support for XPath (v1 and v2), XQuery, Web Services testing, and a pretty good schema-to-schema mapping tool that creates XSLT files.  Plus it has a schema editor which looks pretty good, lots of data conversion tools, support for custom extensions (if you have your own file types), etc.  Lots of good stuff here.

What is even cooler is that they have a "Home" version for non-commercial use that has almost all of the features of the pro version (unlike the pretty well crippled XMLSpy Home) for only $50.  I'll definitely turn my students on to this next week.  That's a lot of functionality for very little money.  The schema editor in the Home version isn't quite as cool, and there are a few other features it doesn't support, like web services testing, but it looks otherwise pretty highly functional. 

If you don't care about the WSDL editor, there might be a lot to recommend in the Pro version over XMLSpy Enterprise, at about 1/3 of the price.

Tuesday, October 05, 2004 1:27:23 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [1]  | 
# Tuesday, September 14, 2004
I've got six three, if anyone wants one.
Tuesday, September 14, 2004 11:52:53 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [1]  | 
# Tuesday, September 07, 2004

By now everyone has heard that WinFS (the new SQL-based, meta-data driven file system) won't be shipping with Longhorn.  It's not really surprising.  Not only is it a fairly challenging technology, but the surrounding behavioral issues are, IMHO, an even bigger deal, and will take a good long while to resolve. 

The reason meta-data based solutions haven't dominated the world have nothing to do with technology.  RDF works just fine.  So do XSD and Web Services.  So does SQL server, so I'm convinced that the technical hurdles to achieving WinFS are solvable.  The trouble is getting people to use it.  People just don't get it.  Nobody uses RDF, in part because it's way to complicated, but also because most people just don't get meta-data.  It's hard enough to get people to use proper keywords on their HTML pages. 

Similarly, the reason that Web Services have yet to revolutionize the world of B2B eCommerce have nothing to do with technology.  The parts of the technical picture that aren't solved by SOAP/WSDL are quickly being addressed by WS-*.  The real issue is schemas.  The barrier to real B2B isn't security, or trust, or routing/addressing, or federation even.  It's the fact that no two companies in the entire world can agree on what a PO looks like.  The barriers are institutional, not technical.

The same thing applies to WinFS.  Even if the technical side can be made to work reliably (of which I have no doubt given enough time), it's the institutional issues that are hard.  What do you call the tags that get applied to your file system?  If any applications are going to take real advantage of them, they have to be agreed upon in common.  Anyone remember BizTalk.org?  It's not easy to reach consensus on what seems like a simple problem.  What do you call the meta-tags that are applied to your data?  It's great that you can arbitrarily add new tags through the explorer, but if no application besides explorer supports them, is it anything more than a great new way to do sorts? 

I think in the long run it's that problem that has delayed the release of WinFS.  There has to be a plan in place for handling the institutional issues in place first, or MS will end up with another great piece of technology that no one knows what to do with.

Tuesday, September 07, 2004 12:55:56 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 

Many who know me know that I'm a long-time member of the SCA.  For the rest of you, it shouldn't come as too much of a surprise, 'cause I'm just that big a dork.  Anyway, we just found out this weekend that my wife Vikki (or Svava as she's known in SCA circles) will be getting her Laurel in January (assuming she doesn't blow it between now and then :-) ).  For you non-SCA types, that's equivalent to Knighthood, only for Arts & Sciences.  In other words, a big deal.  She pretty much rocks.  Actually, she totally rocks.  You go honey!

I realize 99% of you reading this probably neither know nor care what I'm talking about, but I couldn't resist the opportunity for the shout-out.

Tuesday, September 07, 2004 12:43:34 PM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  | 

Despite the lack of technical content here lately, I do actually still work for a living.  I've been working on some implementation stuff lately, and have been struck by something that I've known for a while, but always comes home in a big way when I have to use someone else's UI.  There's no substitute for use cases!  I'm working with an API right now that was obviously designed by someone who thought it seemed like a good idea at the time, but didn't spend any time thinking about how this API was actually going to be used in the real world.  The end result is that to actually use it, you have to go through way to many steps, each of which takes a different set of parameters that you may or may no already have.  It's very frustrating. 

I'm working with Scott on most of it, and from the very beginning of our involvement he predicted that we'd end up spending 6 hours a day reading docs and 2 hours coding to achieve the desired end result.  It's actually been more like 7/1.  I spend all day reading docs and trying to figure out how the API is supposed to work, then write 20 lines of code to solve the problem. 

My own solution to this problem when I've been on the API writing side has been TDD.  If I can write a test case, it at least forces me to think about how the API will be used enough to avoid some of the major pitfalls.  On a larger scale project, the only solution is full-blown use case analysis.  Write the use cases first, then figure out what the API should look like.  Too often a hard-core technologist designs the API in the way he feels best exemplifies the underlying data structures.  The problem is that the users almost never care about the nature of the underlying data structures.  They need high level methods that answer questions, not CRUD based data exposition.  At the very least, TDD forces us to think through what some of those questions might be.  It's easy to miss some, however, so you really need to do use cases to find out what all the questions are before you write the API.

Tuesday, September 07, 2004 10:34:26 AM (Pacific Daylight Time, UTC-07:00)  #    Disclaimer  |  Comments [0]  |