Sunday, 12 July 2009

Iterating through a bunch of folders and files

This post has moved to http://www.scottleckie.com/2009/07/iterating-through-a-bunch-of-folders-and-files/

So you want to start at a top level folder, and then process all the folders beneath… Maybe you really do want to look at every file (maybe count the total size of the folder), maybe you want to process all the XML files there. The most obvious route is to recursively search through each folder;
static void Main(string[] args)
{
string startFolder = @"C:\temp";
List<string> contents = new List<string>();
foreach (string dir in Directory.GetDirectories(startFolder))
ProcessFolder(dir, contents);
foreach (string fileName in contents)
Console.WriteLine(fileName);
Console.ReadKey();
}
static void ProcessFolder(string folder, IList<string> theList)
{
foreach (string file in Directory.GetFiles(folder))
theList.Add(folder + "\\" + file);
foreach (string dir in Directory.GetDirectories(folder))
ProcessFolder(dir, theList);
}


All well and good but, at some point (probably due to the depth of the file system) you will run out of stack space.  A better way to traverse the folder structure is to do an iterative search;


private void ProcessFolder(string startingPath)
{
int iterator = 0;
List<string> dirList = new List<string>();
dirList.Add(startingPath);
string parentFolder = startingPath;
// Every new folder found is added to the list to be searched. Continue until we have
// found, and reported on, every folder or the calling thread wants us to stop
while (iterator < dirList.Count && !(workerThreadInfo.StopRequested))
{
parentFolder = dirList[iterator];       // Each FileTreeEntry wants to know who its parent is
try
{
foreach (string dir in Directory.GetDirectories(dirList[iterator]))
{
AddFolder(parentFolder, dir, dir);
dirList.Add(dir);
}
foreach (string filename in Directory.GetFiles(dirList[iterator]))
{
FileInfo file = new FileInfo(filename);
AddFile(parentFolder, file.Name, file.Length);
}
}
// There are two *acceptable* exceptions that we may see, but should not consider fatal
catch (UnauthorizedAccessException)
{
}
catch (PathTooLongException)
{
}
iterator++;
}
}


Now, we iterate through each discovered folder and for each discovered file, we call an external routine (in this case “AddFile()”. Note the two caught exceptions which can occur but which, in the author’s opinion, are not important in this context;


  • UnauthorizedAccessException
    • OK; ya got me. I’m not allowed in here, so let’s continue and not break the calling app
  • PathTooLongException
    • This is a funny one. Win200x sets a maximum path length of 255 characters. Create a big and complex structure (especially a Java one), zip it and then unravel it under a folder that is maybe 100 characters long. Windows is happy to unzip this, and even display it in the folder view. But try and open the file and you’ll be stuffed





So, ignoring these, this routine will handle any files within a structure, irrespective of how deep that structure gets.

7 comments:

  1. Thank you for the code. One may want to consider Path.Combine when building file paths, because it helps one to avoid errors. Thank you. -- Mark Kamoski

    ReplyDelete
  2. Thanks Mark - glad to have the feedback.
    I'm not sure where you're suggesting that I use Path.Combine, though?

    Cheers
    Scott

    ReplyDelete
  3. Which namespace or object does the AddFolder belong to?

    ReplyDelete
  4. Good writing but need some help though.
    I want to implement the iteration in one of my application but my problem is that I don't know how to make the the external routing "addFolder" and "addFile".
    If you can give one example, it should be appreciated.

    Many thanks, and again thanks for the script.

    ReplyDelete
  5. theList.Add( System.IO.Path.Combine(folder,file));

    ReplyDelete
  6. @Anonymous and @Fagelot
    sorry for the delay in responding - I thought blogspot would send me updates but it seems it doesn't!

    Anyways, AddFolder and AddFile, in this example, are your own code where you are interested in the discovery of a new folder or file. Chances are that you only care about a newly discovered file, so you could a) delete the call to AddFolder() and b) add your own AddFile() routine.

    How I handle this is to use AddFile() to add file paths to a list and when the core search routine ends, I then work thru that list. If you were adventurous you could kick off the file handlers on a separate thread while the search routine was still running.

    Hope that helps, and thanks for listening and commenting!

    ReplyDelete
  7. @anonymous (17th Nov 2009) - not sure what you're suggesting? The point of the code is to add discovered paths to a list...

    ReplyDelete