Two decades worth of unorganized files and folders to sort, delete and archive.

ElevenFingers

Limp Gawd
Joined
May 30, 2008
Messages
187
Over the past two decades I've taken on the task of backing up data from family devices as they we're rotated out or were showing signs of failure. My, my parents and my sister's data mostly. They are all as close to computer illiterate as modern knowledge workers can be.

I was not very elegant with my methods: I would copy entire drives, user directories and other folders that in all likelihood contained copies of data I already had. What's worse is I rarely took time to name and sort files. If I did, I have since forgotten the methods toy madness.

Now I want to sort and make the data available to its owners without infringing on their privacy or having to manually sort the data. I wrote a simple python script to sort and rename Images by their metadata which has resulted in 70000 images sorted by date and camera model (proxy for person who took the photo), but I don't think the same approach would work for other file types.

My question is this: is there a magic bullet tool to help at the very least identify duplicates across folders and help me with creating a single copy of cleaned and organized data (not perfect, but usable)? If not, how would you tackle this?

My idea is to either use WeTransfer or a physical drive to give them their data so I am no longer its custodian.
 
Many ways, whether tools or powershell scripts. Note, as I am sure you already will do.

Make 1 final big backup of everything - pre organizing, and make sure it is write protected and read only sort of thing.

I have seen over the years some methods have deleted files it should not have due to the rare chance of dates and file size being identical.
You have prob done similar to me for the wife and mother-in-law - i just have backups of their user profile folders, downloads from OneDrive and I just merge it into the same directory, but I know there are duplicates of crap, but since it is only about 100Gb for each, i just leave it.
 
There are several good free deduplication software tools that will help you with this. I’d start there and then work on your cleanup/organization.

https://www.avast.com/c-best-duplicate-file-finder-for-windows
Thanks for the tip on the software, I'll check it out. Also thanks for reminding me to relisten to that album.

Many ways, whether tools or powershell scripts. Note, as I am sure you already will do.

Make 1 final big backup of everything - pre organizing, and make sure it is write protected and read only sort of thing.

I have seen over the years some methods have deleted files it should not have due to the rare chance of dates and file size being identical.
You have prob done similar to me for the wife and mother-in-law - i just have backups of their user profile folders, downloads from OneDrive and I just merge it into the same directory, but I know there are duplicates of crap, but since it is only about 100Gb for each, i just leave it.
Yeah, I noticed this with my photos. In the end I used camera model and date, down to the second to ensure uniqueness (in the case of burst photography, I reckon some were lost, but meh). If both model and date could be determined from file metadata, the script copied the file to the new photo master folder if it did not exist or if it was larger than the already identically named one and deleted it from the source. If not, it was left where it was and I copied it manually (webcam photos, some camera photos, downloaded photos, etc) into a messy folder that likely still has duplicates. That process sorted an estimated 90%.

In my case, it's multiple TB... It's backups of their backups of their backups as well as some system and software artifacts. I didn't know where they had stored stuff or what was important, so sometimes I just grabbed entire drives. I hope to debloat it considerably in any case.
 
Back
Top