(In which last week’s drunk guy makes a return appearance)

There’s a joke about a drunk guy who is looking for the fifty-dollar bill he lost.  He’s down on his hands and knees, searching around the base of a street light.  A helpful citizen stops to assist, and asks the drunk where he lost the bill.  “About two blocks that way.”  “Then why are you looking for it here??”  “The light’s better over here.”

Not a knee-slapper of a joke, but it illustrates something about today’s tool: A search doesn’t have to be perfect to be useful.

I have a ton of files on my computer.  Not a literal ton – the floor hasn’t crashed through, and I can move my computer easily.  But there are a lot of files.  I have ripped music from many (but not all – yet) of my CDs, and I am the official on-site backup for Bettie’s pictures.  Couple that with operating system files, and other downloaded pictures and music, and I have a ton of files.  MS Security Essentials says about two and a half million

Today’s tool says about a fifth of that:

The tool is Duplicate Cleaner, and I had told it to ignore some areas of the computer:

You see, I think I have too many files.  Not like the emperor’s quote to Amadeus – “Too many notes!” – but too many duplicate files (and no, this won’t turn into a Phillip Glass joke).

The software works the regular way – download, extract, install, run.  Two caveats here: the thing runs for a long time, and I haven’t done anything with the results.

Long-running: It ran for about fifteen hours before producing the report.

Nicely, it handled the sleep times in Win7 very well.  It didn’t choke and want to restart, but went to sleep and woke up with the rest of the applications, and kept on doing what it was supposed to do.  So the fifteen hours isn’t wall clock time, but time that the computer was running.  It had a lot of work to do, comparing a half-million files to each other.  I have a quad-core CPU (not sure if it takes advantage of the extra horsepower), but be prepared for this thing to take a while to run.

And then the part of not using the results?  Why recommend a tool that doesn’t produce useful results?  Well, I didn’t say that the results weren’t useful.  There are two parts to this reasoning.  Yes, the tool legitimately finds useful results.  I have multiple copies of sermons on my computer

If you expand that image, you’ll see that the same file, same name, same size, same CRC (a Cyclic Redundancy Check makes sure the internals are the same) exists in two places.  And for this one, I’d probably feel safe in deleting the 8G Export flavor of it.  But what else is in that 8G export that I may not need, even if it’s not a duplicate?  I want to do more investigating of that directory before I remove the link to it.

The other reason I haven’t removed files using this tool is something called a hard link.  Even after reading about it, I’m not so certain of what it is.  Some of the files found as duplicates are apparently just shadows – a ventriloquist throwing his voice as opposed to identical twins.

The little number 1 at the end of some of the duplicates indicates a hard link.  The explanation of a hard link is

This is, unfortunately, not a model of clarity.  I want it to tell me “Ignore the hard links”.  It doesn’t say that, and I haven’t done enough research to tell you that.  I should re-run the tool telling it to ignore hard links (a choice seen in the “Options” screen shot), but I haven’t done that yet.

So I present an imperfect but still useful tool.  I’d love for it to search the disk in ten minutes, and present me with a completely safe way to remove duplicates and free up disk space.  But like the inebriated gentleman who didn’t find what he was looking for, there’s still the possibility of something else good to be found.

And it’s never bad to spend time on your knees, whether you’re cleaning the floor or offering a prayer to God.