Sunday, 12 July 2009

Iterating through a bunch of folders and files

This post has moved to http://www.scottleckie.com/2009/07/iterating-through-a-bunch-of-folders-and-files/

So you want to start at a top level folder, and then process all the folders beneath… Maybe you really do want to look at every file (maybe count the total size of the folder), maybe you want to process all the XML files there. The most obvious route is to recursively search through each folder;
static void Main(string[] args)
{
string startFolder = @"C:\temp";
List<string> contents = new List<string>();
foreach (string dir in Directory.GetDirectories(startFolder))
ProcessFolder(dir, contents);
foreach (string fileName in contents)
Console.WriteLine(fileName);
Console.ReadKey();
}
static void ProcessFolder(string folder, IList<string> theList)
{
foreach (string file in Directory.GetFiles(folder))
theList.Add(folder + "\\" + file);
foreach (string dir in Directory.GetDirectories(folder))
ProcessFolder(dir, theList);
}


All well and good but, at some point (probably due to the depth of the file system) you will run out of stack space.  A better way to traverse the folder structure is to do an iterative search;


private void ProcessFolder(string startingPath)
{
int iterator = 0;
List<string> dirList = new List<string>();
dirList.Add(startingPath);
string parentFolder = startingPath;
// Every new folder found is added to the list to be searched. Continue until we have
// found, and reported on, every folder or the calling thread wants us to stop
while (iterator < dirList.Count && !(workerThreadInfo.StopRequested))
{
parentFolder = dirList[iterator];       // Each FileTreeEntry wants to know who its parent is
try
{
foreach (string dir in Directory.GetDirectories(dirList[iterator]))
{
AddFolder(parentFolder, dir, dir);
dirList.Add(dir);
}
foreach (string filename in Directory.GetFiles(dirList[iterator]))
{
FileInfo file = new FileInfo(filename);
AddFile(parentFolder, file.Name, file.Length);
}
}
// There are two *acceptable* exceptions that we may see, but should not consider fatal
catch (UnauthorizedAccessException)
{
}
catch (PathTooLongException)
{
}
iterator++;
}
}


Now, we iterate through each discovered folder and for each discovered file, we call an external routine (in this case “AddFile()”. Note the two caught exceptions which can occur but which, in the author’s opinion, are not important in this context;


  • UnauthorizedAccessException
    • OK; ya got me. I’m not allowed in here, so let’s continue and not break the calling app
  • PathTooLongException
    • This is a funny one. Win200x sets a maximum path length of 255 characters. Create a big and complex structure (especially a Java one), zip it and then unravel it under a folder that is maybe 100 characters long. Windows is happy to unzip this, and even display it in the folder view. But try and open the file and you’ll be stuffed





So, ignoring these, this routine will handle any files within a structure, irrespective of how deep that structure gets.

Validating XML Files against XSD Schemas (especially for files that don’t reference the schema)

This post has moved to http://www.scottleckie.com/2009/07/validating-xml-files-against-xsd-schemas-especially-for-files-that-don%e2%80%99t-reference-the-schema/

Wow, XML is a pig, isn’t it? Don’t get me wrong; it does everything I need it to do in describing multi-faceted data, but it’s a pretty steep learning curve.
At first, I used XPath and a lot of coded validation. Then I finally invested time in learning XML Schemas (XSDs) and that helped a lot because I could validate the entire document and, only when I knew it was valid, start pulling data out of it. At around the same time, I stumbled upon LINQ to XML and this shortened the code substantially.  Now, all I had to do was validate the document against the XSD and, if it passed, get to decoding it via LINQ and we’re done and dusted in a few lines of code.
The XSD validation, by the way, looks like this;
private static readonly string SCHEMA = "http://schemas.axiossystems.com/DDI/SnmpMappings/";
private static bool ValidateFile(string file, string schemaFile)
{
if (file == null || schemaFile == null)
throw new ArgumentNullException("Must supply non-null file and Schema file to MappingFilesParser.ValidateFile()");
XmlSchemaSet schemas = new XmlSchemaSet();
schemas.Add(SCHEMA, schemaFile);
XDocument doc = XDocument.Load(file);
doc.Validate(schemas, (o, e) =>
{
throw new XmlSchemaValidationException(string.Format("{0} validating {1}", e.Message, file));
});
return true;
}


So, here you define the schema used in your XML file in the static readonly string called “SCHEMA”, then call ValidateFile(name of XML File, name of schema file).


The function either returns or throws ArgumentNullException or XmlSchemaValidationException.


All good so far. Then I realised that, if you load any random XML file that does not reference your schema file then the .Validate() method will still complete successfully.


What I mean by this is that every XML file has to have an xmlns namespace declaration similar to this;


<?xml version="1.0" encoding="utf-8"?>
<xs:schema targetNamespace="http://schemas.axiossystems.com/DDI/SnmpMappings/"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:ddi="http://schemas.axiossystems.com/DDI/SnmpMappings/"
elementFormDefault="qualified"
attributeFormDefault="unqualified">


If your XML file does not refer to the XSD then the validation passes, which is not what I intended. The solution, as outlined in Scott Hansleman’s blog is to check the namespaces in the XML file and confirm that it does reference our XSD, then run through the validation. So, now the ValidateFile() method looks like this;


private static readonly string SCHEMA = "http://schemas.axiossystems.com/DDI/SnmpMappings/";
private static bool ValidateFile(string file, string schemaFile)
{
if (file == null || schemaFile == null)
throw new ArgumentNullException("Must supply non-null file and Schema file to MappingFilesParser.ValidateFile()");
logger.InfoFormat("Validating {0} against schema; {1}", file, schemaFile);
XmlSchemaSet schemas = new XmlSchemaSet();
schemas.Add(SCHEMA, schemaFile);
XPathDocument x = new XPathDocument(file);
XPathNavigator nav = x.CreateNavigator();
nav.MoveToFollowing(XPathNodeType.Element);
IDictionary<string, string> schemasInFile = nav.GetNamespacesInScope(XmlNamespaceScope.All);
bool foundOurSchema = false;
foreach (KeyValuePair<string, string> namespaces in schemasInFile)
{
if (namespaces.Value.CompareTo(SCHEMA) == 0)
foundOurSchema = true;
}
if (!foundOurSchema)
throw new XmlSchemaValidationException(string.Format("The file {0} does not reference the required schema; {1}",
file, SCHEMA));
XDocument doc = XDocument.Load(file);
doc.Validate(schemas, (o, e) =>
{
throw new XmlSchemaValidationException(string.Format("{0} validating {1}", e.Message, file));
});
return true;
}


Here, we first confirm that the XML file references the XSD, then validate against the XSD.