Skot's code and other ponderings

Sunday, 12 July 2009

Iterating through a bunch of folders and files

This post has moved to http://www.scottleckie.com/2009/07/iterating-through-a-bunch-of-folders-and-files/

So you want to start at a top level folder, and then process all the folders beneath… Maybe you really do want to look at every file (maybe count the total size of the folder), maybe you want to process all the XML files there. The most obvious route is to recursively search through each folder;

static void Main(string[] args)
{
string startFolder = @"C:\temp";
List<string> contents = new List<string>();
foreach (string dir in Directory.GetDirectories(startFolder))
ProcessFolder(dir, contents);
foreach (string fileName in contents)
Console.WriteLine(fileName);
Console.ReadKey();
}
static void ProcessFolder(string folder, IList<string> theList)
{
foreach (string file in Directory.GetFiles(folder))
theList.Add(folder + "\\" + file);
foreach (string dir in Directory.GetDirectories(folder))
ProcessFolder(dir, theList);
}

All well and good but, at some point (probably due to the depth of the file system) you will run out of stack space. A better way to traverse the folder structure is to do an iterative search;

private void ProcessFolder(string startingPath)
{
int iterator = 0;
List<string> dirList = new List<string>();
dirList.Add(startingPath);
string parentFolder = startingPath;
// Every new folder found is added to the list to be searched. Continue until we have
// found, and reported on, every folder or the calling thread wants us to stop
while (iterator < dirList.Count && !(workerThreadInfo.StopRequested))
{
parentFolder = dirList[iterator];       // Each FileTreeEntry wants to know who its parent is
try
{
foreach (string dir in Directory.GetDirectories(dirList[iterator]))
{
AddFolder(parentFolder, dir, dir);
dirList.Add(dir);
}
foreach (string filename in Directory.GetFiles(dirList[iterator]))
{
FileInfo file = new FileInfo(filename);
AddFile(parentFolder, file.Name, file.Length);
}
}
// There are two *acceptable* exceptions that we may see, but should not consider fatal
catch (UnauthorizedAccessException)
{
}
catch (PathTooLongException)
{
}
iterator++;
}
}

Now, we iterate through each discovered folder and for each discovered file, we call an external routine (in this case “AddFile()”. Note the two caught exceptions which can occur but which, in the author’s opinion, are not important in this context;

UnauthorizedAccessException

OK; ya got me. I’m not allowed in here, so let’s continue and not break the calling app

PathTooLongException

This is a funny one. Win200x sets a maximum path length of 255 characters. Create a big and complex structure (especially a Java one), zip it and then unravel it under a folder that is maybe 100 characters long. Windows is happy to unzip this, and even display it in the folder view. But try and open the file and you’ll be stuffed

So, ignoring these, this routine will handle any files within a structure, irrespective of how deep that structure gets.

Validating XML Files against XSD Schemas (especially for files that don’t reference the schema)

This post has moved to http://www.scottleckie.com/2009/07/validating-xml-files-against-xsd-schemas-especially-for-files-that-don%e2%80%99t-reference-the-schema/

Wow, XML is a pig, isn’t it? Don’t get me wrong; it does everything I need it to do in describing multi-faceted data, but it’s a pretty steep learning curve.
At first, I used XPath and a lot of coded validation. Then I finally invested time in learning XML Schemas (XSDs) and that helped a lot because I could validate the entire document and, only when I knew it was valid, start pulling data out of it. At around the same time, I stumbled upon LINQ to XML and this shortened the code substantially. Now, all I had to do was validate the document against the XSD and, if it passed, get to decoding it via LINQ and we’re done and dusted in a few lines of code.
The XSD validation, by the way, looks like this;

private static readonly string SCHEMA = "http://schemas.axiossystems.com/DDI/SnmpMappings/";
private static bool ValidateFile(string file, string schemaFile)
{
if (file == null || schemaFile == null)
throw new ArgumentNullException("Must supply non-null file and Schema file to MappingFilesParser.ValidateFile()");
XmlSchemaSet schemas = new XmlSchemaSet();
schemas.Add(SCHEMA, schemaFile);
XDocument doc = XDocument.Load(file);
doc.Validate(schemas, (o, e) =>
{
throw new XmlSchemaValidationException(string.Format("{0} validating {1}", e.Message, file));
});
return true;
}

So, here you define the schema used in your XML file in the static readonly string called “SCHEMA”, then call ValidateFile(name of XML File, name of schema file).

The function either returns or throws ArgumentNullException or XmlSchemaValidationException.

All good so far. Then I realised that, if you load any random XML file that does not reference your schema file then the .Validate() method will still complete successfully.

What I mean by this is that every XML file has to have an xmlns namespace declaration similar to this;

<?xml version="1.0" encoding="utf-8"?>
<xs:schema targetNamespace="http://schemas.axiossystems.com/DDI/SnmpMappings/"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:ddi="http://schemas.axiossystems.com/DDI/SnmpMappings/"
elementFormDefault="qualified"
attributeFormDefault="unqualified">

If your XML file does not refer to the XSD then the validation passes, which is not what I intended. The solution, as outlined in Scott Hansleman’s blog is to check the namespaces in the XML file and confirm that it does reference our XSD, then run through the validation. So, now the ValidateFile() method looks like this;

private static readonly string SCHEMA = "http://schemas.axiossystems.com/DDI/SnmpMappings/";
private static bool ValidateFile(string file, string schemaFile)
{
if (file == null || schemaFile == null)
throw new ArgumentNullException("Must supply non-null file and Schema file to MappingFilesParser.ValidateFile()");
logger.InfoFormat("Validating {0} against schema; {1}", file, schemaFile);
XmlSchemaSet schemas = new XmlSchemaSet();
schemas.Add(SCHEMA, schemaFile);
XPathDocument x = new XPathDocument(file);
XPathNavigator nav = x.CreateNavigator();
nav.MoveToFollowing(XPathNodeType.Element);
IDictionary<string, string> schemasInFile = nav.GetNamespacesInScope(XmlNamespaceScope.All);
bool foundOurSchema = false;
foreach (KeyValuePair<string, string> namespaces in schemasInFile)
{
if (namespaces.Value.CompareTo(SCHEMA) == 0)
foundOurSchema = true;
}
if (!foundOurSchema)
throw new XmlSchemaValidationException(string.Format("The file {0} does not reference the required schema; {1}",
file, SCHEMA));
XDocument doc = XDocument.Load(file);
doc.Validate(schemas, (o, e) =>
{
throw new XmlSchemaValidationException(string.Format("{0} validating {1}", e.Message, file));
});
return true;
}

Here, we first confirm that the XML file references the XSD, then validate against the XSD.

Monday, 8 June 2009

FileTreeView – a SequioaView-like Application

This post has moved to http://www.scottleckie.com/2009/06/filetreeview-%e2%80%93-a-sequioaview-like-application/

I've long been a huge fan of the SequoiaView application released by Technische Universiteit Eindhoven, which displays disk utilization in a beautiful squarified cushion treemap format.
This was released in 2002 and does a great job of showing exactly what's eating the space on your disk, but it has one major drawback; if you point it at a 2TB volume with a million files, but you only want to see what's taking the space in a small corner of the disk, it reads the entire volume before displaying what you originally asked it to.
So, I decided to write a C# alternative to SequoiaView, partly to help us find the big files in specific folders really quickly, and partly just as an exercise.

You can download the setup from here. Do be aware that this requires the dotNet Framework 2.0 or higher and, while the setup is supposed to go fetch this if required, it does seem a little flakey. In other words, it’s probably safer to ensure you have the .Net Framework 2.0 or higher installed before starting.
You also need to install the Microsoft Data Visualization Components, which are available here.

Source

I’m going to be uploading the source, plus a background article, to CodeProject shortly. I’ll come back and update this link then.

Using FileTreeView

Refer to the following picture for each of the components within FileTreeView;

Enter the path or folder that you want to display here, or press the "..." button to browse, then press "Go"
The number of folders and files found so far will be displayed here. You can press "Cancel" if you get bored and it will display what it has so far
This rather nifty set of colours (actually seven little label controls) allows you to set the node colours
By default, the names of the files and folders will be displayed at all depths (which means from the top folder down to the deepest folder). Use this slider to de-clutter the display by displaying labels only to the depth of your choosing
This is the TreeMap control, that displays all discovered folders and their contents, grouped by the relative size of each folder
Each folder or file is a node. You can hover over a node to see its details, double-click to drill down or right-click for more options
Right clicking a node allows you to display it in Windows Explorer, open it, or zoom in and out of the tree structure

Licence

The FileTreeView application is open source, under the CodeProject Open License. Note that the Microsoft Data Visualization Components are free to use for non-commercial use only. See here for specific licence terms.

Help

If you have any suggestions or issues, please post a comment to this article.

Friday, 15 May 2009

Horrible error running against a 32bit .Net Library from a 64bit application

This post has moved to http://www.scottleckie.com/2009/05/horrible-error-running-against-a-32bit-net-library-from-a-64bit-application/

So, this is my first new dev project on my shiny new Windows 7 64bit machine (having come from a 32bit Vista box). As it happens, I want to use the Microsoft Data Visualization Components, which are only available as a set of 32bit DLLs compiled sometime in 2006.
OK, all good so far – loaded ‘em up, referenced them, threw a TreeMap control on my form, compiled fine and then ran it, and… Bang!

Got a message;
Could not load type 'Microsoft.Research.CommunityTechnologies.Treemap.NodeColor' from assembly 'TreemapGenerator, Version=1.0.1.38, Culture=neutral, PublicKeyToken=3f6121a52ebf7c82' because it contains an object field at offset 0 that is incorrectly aligned or overlapped by a non-object field.
Turns out the problem is that I’m referencing a 32bit component from a 64bit application, and that there is a union of objects on a 32bit boundary, not a 64bit one. Think that’s right, but anyhoo, the cure is obvious in hindsight – my application needs to be compiled as 32bit.
As this is my first foray into development on a 64bit machine I didn’t actually know where to set this! Found it eventually, though; on the Project Properties form, go to the Build tab and set “Platform Target” to “x64”. I’ll need to remember to make a conscious decision in the future whether a new app is supposed to be 32 or 64 bit!

Downloaded CHM help file shows “Navigation to the webpage was cancelled”

This page has moved to http://www.scottleckie.com/2009/05/downloaded-chm-help-file-shows-%e2%80%9cnavigation-to-the-webpage-was-cancelled%e2%80%9d/

Huh – that was an odd one. I copied a bunch of controls over to my Windows 7 box and the DLLs work, but the help files are stuffed;

It seems this is common problem, fixed by right-clicking the CHM file and clicking on an “Unblock” button which I’ve never seen before!

I can’t claim credit for this – found it at Rob Chandler’s blog here. Weird sense of priorities, huh? OK to copy and run any old DLL that you found lying in the bin, but a helpfile? Woooohhhh… no… that’s much more dangerous!

Thursday, 14 May 2009

Microsoft Data Visualization Components on Windows 7

this post has moved to http://www.scottleckie.com/2009/05/microsoft-data-visualization-components-on-windows-7/

I’ve been playing around with the Data Visualization Components recently (looking to incorporate the TreeMap control with a SequioaView-a-like disk space analyser) but ran into problems getting the toolkit installed on Windows 7 RC1. Running the setup from the official page at http://research.microsoft.com/en-us/downloads/dda33e92-f0e8-4961-baaa-98160a006c27/default.aspx gets stuck looking for .Net Framework 1.1.4322;

Of course, being Windows 7, .Net 3.5 is already installed which should include .Net 1.1 but it looks like the Components installer is hopelessly confused. I couldn’t get the installer to believe we had something better than 1.1 already installed and I didn’t want to try and hack .Net 1.1 on top of Windows 7.
So, all I really needed were the “bunch of files” that come in the Component setup so I installed the package on an XP machine and copied them across to a folder on Windows 7, then referenced the DLL by hand.
I doubt there’s any .Net developers out there, running the RC of Windows 7, who doesn’t have a spare XP machine lying around too(!) but, just in case there is, I’ve placed the Zip here for your convenience. The licence states that it’s OK to distribute for non-commercial use and doesn’t say it needs to be in the original MSI format, so I don’t see a legal problem with this.

Sunday, 12 April 2009

“The breakpoint will not currently be hit. No symbols have been loaded for this document.” - VS2008

This post has been moved to http://www.scottleckie.com/2009/04/%e2%80%9cthe-breakpoint-will-not-currently-be-hit-no-symbols-have-been-loaded-for-this-document-%e2%80%9d-vs2008/

I’d been hacking around with Sharp Architecture (#Arch)a few months back but haven’t touched it recently. However, the release of 1.0 to coincide with the formal release of ASP.NET MVC 1.0 got me interested again, so I downloaded the latest and greatest to see what’s adoing…
There’s a great community around #Arch and it’s pretty easy to get your head around (assuming you’ve a basic grounding in MVC and NHibernate) and it even comes with a sample/tutorial app based on the ubiquitous NorthWind database. Now, being a cautious sort I figured I’d start by getting the NorthWind sample up and running, as this would prove I had all the dependencies installed and wired up.
All you (should) need to do is restore the standard NorthWind sample database, point the nhibernate.config at your local SQL Server, and you’re up and running. Except I wasn’t; all I was getting was “Internet Explorer cannot display the webpage”. OK; double check nhibernate.config and assemblies, turn up NHibernate logging, and stick a breakpoint on Global.asax.cs – that’ll pinpoint the error, right?
Wrong. The files are not being loaded, and my breakpoints have gone a funny shade of yellow, instead of red. Hovering over them displays the message “The breakpoint will not currently be hit. No symbols have been loaded for this document.” – Crikey! Much googling and assuming that the root cause must be astoundingly complex led me up several dead ends until I figured out that the cause was much simpler. Have a look at my (truncated) IE dialogue;

It’s talking to “localhost” which obviously(!) is my PC. But check out the zone display at the bottom; Internet! What’s it doing thinking my PC is in the Internet zone (with attendant security restrictions)? I tried adding localhost to the Intranet zone but this doesn’t appear to work either. In desperation, I thought “let’s ping it” even though I did not expect that to fail. Well, it didn’t fail, but look at this;

Looks ok. WAIT! what the heck is “::1”? Well, the more new-fangled amongst you will recognise this as the IPv6 version of localhost. Quite what IPv6 actually is, is beyond the scope of this article, but the main point is that 99.999% of the Internet and attendant applications expect IPv4 addresses, not IPv6. The IPv4 version of localhost should be the much more recognisable “127.0.0.1”.
So, I went back to my IE page and tried http://127.0.0.1:2386/ instead of the localhost version and, lo and behold;

It works! Final bit of the jigsaw is, where on earth is it getting ::1 from? Well, it was getting it from the hosts file, which lives in your <windows>\System32\Drivers\etc folder. Mine contained this;

So, I overwrote the ::1 entry (the IPv6 version of localhost) to be 127.0.0.1 (the IPv4 version);

Retried my web browser and all is now good with “localhost”;

Here’s my question, though; I didn’t change the Hosts file, and localhost certainly did work last time I was playing with #Arch, so what changed? Did a Microsoft security update overwrite the Hosts’ localhost entry?
Oh, one other thing. If you are using User Access Control (UAC) then access to the Hosts file will be restricted. The solution is to go find Notepad in your Start Menu and don’t left click on it; right-click it, and select “Run as administrator” then you will be able to save the Hosts file.