Sunday 12 July 2009

Validating XML Files against XSD Schemas (especially for files that don’t reference the schema)

This post has moved to http://www.scottleckie.com/2009/07/validating-xml-files-against-xsd-schemas-especially-for-files-that-don%e2%80%99t-reference-the-schema/

Wow, XML is a pig, isn’t it? Don’t get me wrong; it does everything I need it to do in describing multi-faceted data, but it’s a pretty steep learning curve.
At first, I used XPath and a lot of coded validation. Then I finally invested time in learning XML Schemas (XSDs) and that helped a lot because I could validate the entire document and, only when I knew it was valid, start pulling data out of it. At around the same time, I stumbled upon LINQ to XML and this shortened the code substantially.  Now, all I had to do was validate the document against the XSD and, if it passed, get to decoding it via LINQ and we’re done and dusted in a few lines of code.
The XSD validation, by the way, looks like this;
private static readonly string SCHEMA = "http://schemas.axiossystems.com/DDI/SnmpMappings/";
private static bool ValidateFile(string file, string schemaFile)
{
if (file == null || schemaFile == null)
throw new ArgumentNullException("Must supply non-null file and Schema file to MappingFilesParser.ValidateFile()");
XmlSchemaSet schemas = new XmlSchemaSet();
schemas.Add(SCHEMA, schemaFile);
XDocument doc = XDocument.Load(file);
doc.Validate(schemas, (o, e) =>
{
throw new XmlSchemaValidationException(string.Format("{0} validating {1}", e.Message, file));
});
return true;
}


So, here you define the schema used in your XML file in the static readonly string called “SCHEMA”, then call ValidateFile(name of XML File, name of schema file).


The function either returns or throws ArgumentNullException or XmlSchemaValidationException.


All good so far. Then I realised that, if you load any random XML file that does not reference your schema file then the .Validate() method will still complete successfully.


What I mean by this is that every XML file has to have an xmlns namespace declaration similar to this;


<?xml version="1.0" encoding="utf-8"?>
<xs:schema targetNamespace="http://schemas.axiossystems.com/DDI/SnmpMappings/"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:ddi="http://schemas.axiossystems.com/DDI/SnmpMappings/"
elementFormDefault="qualified"
attributeFormDefault="unqualified">


If your XML file does not refer to the XSD then the validation passes, which is not what I intended. The solution, as outlined in Scott Hansleman’s blog is to check the namespaces in the XML file and confirm that it does reference our XSD, then run through the validation. So, now the ValidateFile() method looks like this;


private static readonly string SCHEMA = "http://schemas.axiossystems.com/DDI/SnmpMappings/";
private static bool ValidateFile(string file, string schemaFile)
{
if (file == null || schemaFile == null)
throw new ArgumentNullException("Must supply non-null file and Schema file to MappingFilesParser.ValidateFile()");
logger.InfoFormat("Validating {0} against schema; {1}", file, schemaFile);
XmlSchemaSet schemas = new XmlSchemaSet();
schemas.Add(SCHEMA, schemaFile);
XPathDocument x = new XPathDocument(file);
XPathNavigator nav = x.CreateNavigator();
nav.MoveToFollowing(XPathNodeType.Element);
IDictionary<string, string> schemasInFile = nav.GetNamespacesInScope(XmlNamespaceScope.All);
bool foundOurSchema = false;
foreach (KeyValuePair<string, string> namespaces in schemasInFile)
{
if (namespaces.Value.CompareTo(SCHEMA) == 0)
foundOurSchema = true;
}
if (!foundOurSchema)
throw new XmlSchemaValidationException(string.Format("The file {0} does not reference the required schema; {1}",
file, SCHEMA));
XDocument doc = XDocument.Load(file);
doc.Validate(schemas, (o, e) =>
{
throw new XmlSchemaValidationException(string.Format("{0} validating {1}", e.Message, file));
});
return true;
}


Here, we first confirm that the XML file references the XSD, then validate against the XSD.

No comments:

Post a Comment