RFdude: January 2013

I'm pretty obsessive when it comes to managing my repository of electronic files (documents, stored reference literature, actual work files). I think I've got some good advice on this topic, but I will admit that this is about as boring a subject as one could find. I'll apologize in advance for this rambling....

If you never struggle with finding files or how to keep multiple computers in sync, then please stop reading. If you're interested in how I've tackled this problem, feel free to read on.

The dedication to maintaining the integrity of my stash is not just because I'm a pack-rat, I also pride myself on reusing as much work as possible (for any paranoid clients: I reuse my own work, often from hobby or general lab projects, I do not cross-pollinate here). I've dug up a piece of source code or a basic analysis that was fifteen years old and saved a lot of time in the process. Assuming this "work" is appropriate to use on a new project, I would consider it a disservice to not take these time-saving measures. What kinds of things do I save? Stuff that I believe will save me (or one of my peers) time later.

On organizing things -- my advice is simple: Create a directory structure that works for you and stick with it. Consider the nature of the stuff you are storing (personal or business) and other categorization so that if you need to later remove/extract/archive a particular segment it is easier. It is all about making it easier down the road. I should point out that this "individual" structure may not mesh well within a large organization, but the same principles apply.

You don't have to have a single structure for all files. For me, I have one main one that is made up of mostly my work and others for "media" (one for photos, one for video, and another for verbatim copies of literature archives). The reason for separating them in my case has to do with the size differences: the work/key archives is the smallest, most important, and intended to live on any computer I'm doing real work on.

Here are a few suggestions:

Don't let your software store things in the default "my documents" or on the desktop (unless you have manually relocated what "my documents" points to). Change the settings in the programs you use most to default to your new repository for saving new files and finding old ones.
If you use multiple computers, I suggest making the base of this directory tree structure the same for all. There are three main reasons for this:

You don't have to remember different paths for each machine and stumble each time
you can use the same shortcuts and relative paths, which occasionally are important
it is much easier to bring on a new computer and perform routine backups or synchronize between computers/drives/etc.

Consider a few exceptions:

My main repository of "stuff" is about 16 GB today with about 16k files, this is a pretty convenient size and is fairly easily manageable. 100k files or 50 GB would be a little tougher. When I mean "manageable" here I am speaking of the relative size and count of files; since any future automatic synchronization checking will need to account for the lot. Here are some ideas:

Keep multiple repositories; one that is manageable (by your judgement) with the most active files that are changing the most often. Keep others that are more archival in nature; perhaps collections of literature, photos, videos, etc. My experience is that file size is much less of a problem than file quantity -- tons of tiny files are much more painful to maintain.
Keep only the most important piece that defines the work. By this I mean the "design" or the "artwork", not the analysis data (this is a CAE-centric example). Several EM analysis packages that I have used generate a ton of large files and quite a collection of small ones as well... so I tend to work with these files in a separate temp directory then copy the core (without data) back into the repository. If it is a month or year from now I am usually happy to re-run the analysis and suffer the delay that causes.

File synchronization manually:

I started this endeavor in the mid 1990's, and the tool of choice for me back then was Windows Briefcase. It was quirky but it worked reasonably well most of the time. By "sync" here I am referring to the process of making sure that the repository on my portable computer is up to date with that of a desktop as a common example. I graduated to a program called Comparator Pro (SoftByte labs) in more recent years. One trait that both of these share is that after you have initiated the process and the program has run, the user gets to see what has changed and which direction any file moves should be in. This is really nice as it can help you catch potential catastrophes since you have to approve the changes.

This manual method of synchronization is pretty effective, but it is time consuming and not terribly elegant. I still use it to periodically to sync redundant copies of things to each other (only one is actively being modified).

Automatic synchronization / Cloud storage:

After witnessing several near disasters with windows briefcase over the years I was very skeptical of completely automating the process. During the summer of 2011 I took the leap and started to give DropBox a try. While a skeptic at first, I was blow away by how well it works. These days I "work" out of a DropBox synchronized folder. Dropbox does what it does extremely well -- I don't know how, but it just works.

The point is, that these days, my repositories largely live inside a Dropbox folder (all synchronized to the cloud). This means that within seconds of me writing a file, it is both on the way to the cloud and on the way to every other computer I have powered up that is running Dropbox. Their special formula that seems to differ from the others (such as Google Drive... which I like and use as well, mostly for redundancy) is called "LANSync". This "LAN sync" feature is an incredible time-saver; as it lets computers on the same local network transfer files based on knowing what is needed where. This way if you want to get your repository on a new machine you set it up and while your new machine will talk to the cloud to get some of the information (the list of files for instance), it can actually copy the files from a local network. This isn't magic -- it is just smart, it turns days of sluggish download into hours.

That's about all I have to offer on the topic right now... If you discard all the other advice, I strongly suggest that you at least give DropBox a try....

RFdude

Friday, January 25, 2013

Electronic Organization, Backup & Synchronization