mEt

I'm am creating a file index project to index certain file types and add them into a SQL database.

I do have a working model of this but I am looking to see if there is anyway to more efficiently accomplish my task.

The following is the actual code that loops through the files and grabs the required data.
I am grabbing the file name, it's extension, last write time, and the full path to the directory which is then bound to a datatable. Later this datatable is passed into a SQL database.



public DataTable processData(string path, DataTable DT)
{
// Declare directory
DirectoryInfo dir = new DirectoryInfo(path);

// This block goes through every file in every sub directory of the path declared
try
{
FileInfo[] fi = dir.GetFiles();

foreach (FileInfo file in fi)
{
// Only index the files of specified extension
if (file.Extension == ".dwg" || file.Extension == ".dwf")
{
// Create a new row on the DataTable
DataRow DR = DT.NewRow();

// Committ values to the DR
DR["fname"] = file.Name.ToString();
DR["fext"] = file.Extension.ToString();
DR["mdate"] = file.LastWriteTime.ToString();
DR["fpath"] = file.DirectoryName.ToString();

// Committ the rows to the datatable
DT.Rows.Add(DR);
}
}

foreach (DirectoryInfo di in dir.GetDirectories())
{
// Recall the processData method with the new directory we have entered
processData(di.FullName, DT);
}
}
// Catch and display any errors
catch (Exception ex)
{
MessageBox.Show(ex.Message.ToString());
}
return DT;

}



This is all occuring over a network share. I am on a 10 Mbps connection but am only hitting 20% utilization when I can get a 80% utilization typically. Is this just because the data I am pulling back or is there a way to increase this utilization. Also is there a more efficient way to scour these files and pull the data back.

As just a final question, does anyone have any good links to tutorials on multithreading I'm just looking for something that's pretty straight forward.

Thanks in advance for your help.

I just remembered what my original question actually was. How can I get past if the computer indexing does not have permission to enter a subfolder. Right now it throws an exception but I just want it to keep on indexing the next folder instead.



Re: Visual C# General File Index Project

sirjis

I think you can solve the problem with exceptions for permissions just by moving the "DirectoryInfo dir = new DirectoryInfo(path)" line into the try block.

I don't have any ideas for more efficient ways to index the data.





Re: Visual C# General File Index Project

mEt

I went ahead and moved the DirectoryInfo into the Try Catch block. It still displays the error message stating
Access to the path 'pathhere' is denied.

The problem with trying to make this more efficient is the fact that there are multiple subdirectories and that makes it a real pain. I know recalling the function over and over is very taxing on the system but I'm struggling coming up with better solutions.

*** EDIT ***

Well if I don't show the error in a messagebox it keeps rolling, thanks. Smile




Re: Visual C# General File Index Project

tlehr

You can get rid of the recursive call by passing in a SearchOptions.AllDirectories to list all child directories. You can also specify a search pattern.

Code Snippet

dir.GetFiles(path, "*.xml",SearchOption.AllDirectories)





Re: Visual C# General File Index Project

mEt

tlehr wrote:
You can get rid of the recursive call by passing in a SearchOptions.AllDirectories to list all child directories. You can also specify a search pattern.

Code Snippet

dir.GetFiles(path, "*.xml",SearchOption.AllDirectories)



If I'm capitalizing on the SearchOptions.AllDirectories to list all the directories and do not wish to specify a search term would I simply input an * for my search term
eg.



dir.GetFiles(path, "*", SearchOption.AllDirectories)





Also how do you get that nice code snippet box, I can't find *** in the FAQ. Thanks for your help everyone.




Re: Visual C# General File Index Project

tlehr

* should work though i'd go *.* to make sure the files i'm geting back have extensions. To use the code snip box, its a button on the toolbar, the 5th button from the right.





Re: Visual C# General File Index Project

mEt

Just a side note to anyeone following along, the GetFiles overload only takes 2 arguements, I think you were thinking about GetDirectory that takes three...

The DirectoryInfo declaration already takes care of the path.

Revised Code:

Code Snippet

public DataTable mockprocessData(string path, DataTable DT)
{

try
{
DirectoryInfo dir = new DirectoryInfo(path);

foreach (FileInfo file in dir.GetFiles("*.*",SearchOption.AllDirectories))
{
if (file.Extension == ".dwg" || file.Extension == ".dwf")
{
DataRow DR = DT.NewRow();

DR["fname"] = file.Name.ToString();
DR["fext"] = file.Extension.ToString();
DR["mdate"] = file.LastWriteTime.ToString();
DR["fpath"] = file.DirectoryName.ToString();

DT.Rows.Add(DR);
}
}
}
catch (Exception ex)
{
Console.WriteLine(ex.Message.ToString());
}

return DT;
}


I shaved about a minute off indexing times. It look like my major issue is the network.

6 Minutes 19 Seconds to Index
22 Seconds to Merge DataTable with SQL Database

I indexed 42,811 files (grabbing the name, ext, modify date, and path).
This sifted through 41.3 Gigs, broken out into 67,879 Files over 9,668 Folders.

I'm running a P4 3.0 w/ a gig of ram. If anyone sees some shortcuts please let me know.





Re: Visual C# General File Index Project

mwalts

Looks abit better from where I'm standing, just as a side note though, I think he was thinking of the static Directory.GetFiles as detailed here http://msdn2.microsoft.com/en-us/library/ms143316.aspx but yeah no biggie

As for a threading tutorial, well threading isn't a teribly easy topic, but try this one. http://www.yoda.arachsys.com/csharp/threads/index.shtml

Good luck,

-mwalts






Re: Visual C# General File Index Project

mEt

Everything looks to be in order. And I'm sure you're right that he was looking at the Directory.GetFiles().

Thanks for that article. I've been reading about this threadpooling stuff but are the articles about it are just excessive.

Thanks again for everyone's help.