Loading

try {
  http:get("/blog/60028086432/painless-xml-schema-typing-with-zorba")
} catch http:not-found {
  <p>
    404 NOT FOUND.
    Take me home 
  </p>
}
~
~
-- INSERT -- All 6, 17

Painless XML Schema typing with Zorba

Posted 1 year ago

Providing a schema for your data makes your applications much more robust and can enable numerous optimizations in your query processor. However, many users shy away from the complexity of XML Schema. Well, what if you could generate a schema automatically from your data? Or what if you could generate a sample instance from a big schema and modify it with your data in just a few lines of code? Now you can!

The schema-tools module contains functions to go in both directions between instances and schemas: xsd2inst to get an instance from a schema set and inst2xsd to get schema for a set of instances.

Let’s dive in.

From schema to instance

This is quite useful when trying out a web-service API described by a schema or for generating data for unit or performance testing. In other domains like WSDL or more specific like XBRL the documents contain schema definitions for some parts of the document. For these xsd2inst helps with generating documents when using these web-services.

Let’s look at an example - finding out the location of an IP address using the GeoIPService. The GeoIP service has a WSDL that contains a link to the schema of a valid input document. To find out what the input document should look like we can write the following:

import module namespace st = "http://www.zorba-xquery.com/modules/schema-tools";
import module namespace http = "http://expath.org/ns/http-client";

declare namespace sto = "http://www.zorba-xquery.com/modules/schema-tools/schema-tools-options";


let $xsd := http:send-request((), "http://www.restfulwebservices.net/wcf/GeoIPService.svc?xsd=xsd0")[2]/*

let $opt := <sto:xsd2inst-options>
<sto:network-downloads>true</sto:network-downloads>
</sto:xsd2inst-options>
return
st:xsd2inst(($xsd), "Analyse", $opt)

Getting this result:

<?xml version="1.0" encoding="UTF-8"?>
<ns:Analyse xmlns:ns="http://www.restfulwebservices.net/ServiceContracts/2008/01">
<!--Optional:-->
<ns:request>string</ns:request>
</ns:Analyse>

http:send-request() was used to retrieve the schema and st:xsd2inst() generated a valid instance for it.

Now, using XQuery Update’s copy/modify/return construct, we can modify this instance to put a valid IP address into the <request>element.

let $xsd := http:send-request((), "http://www.restfulwebservices.net/wcf/GeoIPService.svc?xsd=xsd0")[2]/*
let $opt := <sto:xsd2inst-options>
<sto:network-downloads>true</sto:network-downloads>
</sto:xsd2inst-options>
let $sampleInput :=
st:xsd2inst(($xsd), "Analyse", $opt)
return
copy $d := $sampleInput
modify
(
replace value of node $d//*:request with "2.2.2.2"
)
return
$d

Returning:

<?xml version="1.0" encoding="UTF-8"?>
<ns:Analyse xmlns:ns="http://www.restfulwebservices.net/ServiceContracts/2008/01">
<!--Optional:-->
<ns:request>2.2.2.2</ns:request>
</ns:Analyse>

With the request message constructed we can call the GeoIP web-service. Everything put together should look like this:

import module namespace st = "http://www.zorba-xquery.com/modules/schema-tools";
import module namespace http = "http://expath.org/ns/http-client";

declare namespace sto = "http://www.zorba-xquery.com/modules/schema-tools/schema-tools-options";
declare namespace soap = "http://schemas.xmlsoap.org/soap/envelope/";

let $xsd := http:send-request((), "http://www.restfulwebservices.net/wcf/GeoIPService.svc?xsd=xsd0")[2]/*
let $opt := <sto:xsd2inst-options>
<sto:network-downloads>true</sto:network-downloads>
</sto:xsd2inst-options>
let $sampleInput :=
st:xsd2inst(($xsd), "Analyse", $opt)/*
let $input :=
copy $d := $sampleInput
modify
(
replace value of node $d//*:request with "2.2.2.2"
)
return
$d

let $req := <http:request method="POST">
<http:header name="SOAPAction" value="Analyse"/>
<http:body media-type="text/xml" >
</http:body>
</http:request>

let $soap := <soap:Envelope>
<soap:Header/>
<soap:Body>
{$input}
</soap:Body>
</soap:Envelope>

return
http:send-request($req, "http://www.restfulwebservices.net/wcf/GeoIPService.svc", $soap)[2]

Which returns the following result:

<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<AnalyseResponse xmlns="http://www.restfulwebservices.net/ServiceContracts/2008/01">
<AnalyseResult xmlns:a="http://www.restfulwebservices.net/DataContracts/2008/01" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<a:Registry>ARIN</a:Registry>
<a:Assigned>772588800</a:Assigned>
<a:CountryCode>US</a:CountryCode>
<a:ISOCountryCode>USA</a:ISOCountryCode>
<a:Country>UNITED STATES</a:Country>
</AnalyseResult>
</AnalyseResponse>
</s:Body>
</s:Envelope>

Run it live.

From instances to schema

In many cases using a schema is a good idea to formally define the contract of the interface, and even the rules that describe how the contract might change so that it remains backwards compatible. There are cases when starting from an instance is faster, simpler, or easier to start with. For these cases, the inst2xsd tool is a good way to generate a schema.

Let’s say we are creating our own web service based on GeoIP. The result we got in the last sample was not structured exactly how we like and we would like something simpler like this:

<country isoCode=USA>UNITED STATES</country>

XQuery is made for performing these kinds of transformations and makes it really simple:

let $res := <s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
...
</s:Envelope>

return
<country isoCode="{fn:data($res//*:ISOCountryCode)}">
{fn:data($res//*:Country)}
</country>

Now getting the schema for this structure is straightforward using the inst2xsd function. This function generates a schema for a given instance. The options element contain smart detection of simple types and when to use enumeration values.

import module namespace st = "http://www.zorba-xquery.com/modules/schema-tools";

declare namespace sto = "http://www.zorba-xquery.com/modules/schema-tools/schema-tools-options";


let $inst := <country isoCode="USA">UNITED STATES</country>
let $opt := <sto:inst2xsd-options>
<sto:simple-content-types>smart</sto:simple-content-types>
<sto:use-enumeration>2</sto:use-enumeration>
</sto:inst2xsd-options>
return
st:inst2xsd($inst, $opt)

This returns the full schema to represent our simple result:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" attributeFormDefault="unqualified" elementFormDefault="qualified">
<xs:element name="country" type="countryType"/>
<xs:complexType name="countryType">
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute type="xs:string" name="isoCode"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:schema>

We can use this schema to make the result of our function typed or, if we make it a web-service, to describe the structure of the result. Try it live.

For a more complicated instance let’s look at an example that generates a sample schema for a document found on the web. This document is an instance of SportsML, which has a formal schema defined by IPTC, but for this example let’s pretend there isn’t one.

import module namespace st = "http://www.zorba-xquery.com/modules/schema-tools";
import module namespace http = "http://expath.org/ns/http-client";

declare namespace sto = "http://www.zorba-xquery.com/modules/schema-tools/schema-tools-options";


let $inst := http:send-request((),
"http://dev.iptc.org/files/SportsML-Examples/sportsml-2.2-soccer-sample.xml")[2]/*

return
st:inst2xsd($inst, ())

The result is a schema document defining this particular instance. Try it live to see the entire result.

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" attributeFormDefault="unqualified" elementFormDefault="qualified" targetNamespace="http://iptc.org/std/SportsML/2008-04-01/">
<xs:element xmlns:ns="http://iptc.org/std/SportsML/2008-04-01/" name="sports-content" type="ns:sports-contentType"/>
<xs:complexType name="sports-contentType">
<xs:sequence>
<xs:element xmlns:ns="http://iptc.org/std/SportsML/2008-04-01/" type="ns:sports-metadataType" name="sports-metadata"/>
<xs:element xmlns:ns="http://iptc.org/std/SportsML/2008-04-01/" type="ns:sports-eventType" name="sports-event"/>
</xs:sequence>
</xs:complexType>
...
</xs:schema>

This sample provides quite a good start for somebody with relatively little XMLSchema experience to modify it for his uses.

There are cases when the complete schema or parts of the schema is required inside the document it describes, like WSDL or XBRL. These cases require a deeper integration and knowledge of the domain.

Conclusion

As we’ve seen in the above examples, tools like inst2xsd and xsd2inst are quite useful in a variety of cases: when prototyping, while working with dynamic schemas, or in generating test samples.

More detailed information is found in the documentation pages of the schema-tools module.