I’ve been slowly working on a tutorial for using entity resolution catalogs that I promised the OASIS Entity Resolution Technical Committee (ERTC) I’d do (I chair the TC). As befits a proper tutorial, I figured I should test out the bits as I’m writing them in more than one implementation, just so I can warn people of the potential pitfalls. This has proven to be a frustrating experience for me, and I can see why so many people say technology is just too hard.
The catalog specification itself is fine; not widely implemented enough in my opinion (for example, MSXML doesn’t support it) but we’re hoping to change that in the TC. Even those tools that do support catalogs often do so piecemeal; for example I’ve tested the catalog support in jEdit and it works as expected — but not in the XSLT plugin. Supporting files that the editor finds aren’t found when you come to transform the document you’re editing. The XSLT transform uses Xalan so I can probably figure out how to make it work with a catalog — but I shouldn’t need to.
And talking of Xalan — this whole system is just too hard. In the tutorial I wanted to use the XML stylesheet PI as an example of the uri
element to keep the concepts simple. And Xalan, being part of the Apache XML project, is reasonably widely used. So here are the steps I went through to try to have my stylesheet be somewhere other than in the same directory as the document says it is.…
- Download Xalan and install
- Download Norm Walsh’s entity resolver for Java (so far, so good)
- test with the catalog I’ve been using up till now. No luck
- discover that the JDK that you just installed on the new PC didn’t add itself to the PATH and so the PC is merrily using whatever
java
executable it found in the Windows system directory. Sigh. Add the Java bin directory by hand to the PATH environment variable. - the catalog still isn’t being found. Hunt around on google for potential answers
- Discover that the JDK 1.4.2 comes with its own version of Xalan, which is too old, and doesn’t allow you to reset the resolver with command line options, and you have to tell Java that the Xalan directory is “endorsed” (whose bright idea was this? Just overwriting the jar files with the new set should be sufficient, but no, you have to tell java as well that you really meant for the new Xalan to be used).
- This doesn’t work either, but the error messages are different now, so something changed… discover that you also have to put the resolver jar into the same “endorsed” directory
- the catalog still doesn’t work, and running the “verbose” option on the resolver gives you more details… now you’ve run into what I consider a bug in SAX (for details, see Leigh Dodd’s article) which means you have to rewrite the entity catalog to put in absolute URIs where you had relative URIs before since SAX only passes absolute URIs (this does somewhat lower the uesfulness of catalogs since you can’t move your directories around as easily any more). Change all the catalogs (but not the ones for the other applications which don’t require this).
- Now the basic catalog works, so let’s try it with the stylesheet. Command line at this stage is
java -Djava.endorsed.dirs="c:\java\xalan-j_2_6_0\bin" org.apache.xalan.xslt.Process -in example.xml -xsl example.xsl -out example.html
-URIRESOLVER org.apache.xml.resolver.tools.CatalogResolver -ENTITYRESOLVER org.apache.xml.resolver.tools.CatalogResolve
Discover that Xalan doesn’t apply the entity resolver to the command line. And using the XML stylesheet PI doesn’t seem to work either. - Give up on Xalan and resolve to try
libxml
Stay tuned for adventures from the libxml
world. And when I have sufficient energy I’ll go back and see if I can nail down why I can’t get Xalan to work with the stylesheet PI (it’s meant to, apparently).
Seriously, it’s no wonder people say technology is just too hard. I have the advantage of being reasonably technical, and knowing people like Norm Walsh who told me about the SAX problems. How are people without those connections meant to be able to put all this together?
Sorry, the comment form is closed at this time.