Thinking about data

A few blogs ago I was musing about how to categorise and talk about applications. This post is about the other side of the coin: data. Throughout the industry, there has been a lot more work on how to organise data than applications and as a result there are plenty of good taxonomies. Everyone who has used a computer has used one of these taxonomies because every computer has a filesystem and a filesystem is an implementaiton of a data taxonomy.

However, there hasn’t really been much progress in filesystem design over the last few decades, at least not from the user’s perspective. There have been great innovations going on under the covers: de-dupe, indexing, encryption, copy on write etc. but in terms of   helping users organise and find their files it’s pretty much hierarchical files and directories for everyone who has a PC, Mac or Linux.

In the context of a personal computer, most people want to be able to find their files quickly, or just fine them, particularly when faced with the prospect of moving data from one system to another. Or even just to back them up—probably the best reason. I recently had this challenge with my laptop; where is all my data? I want to copy it to a new system. In Windows terms you can copy your profile, but is that enough? What about the times you wrote something on C: or D: and what about the data, such as private keys, that the operating system has written on your behalf?

There is great scope for storing and exposing to the user more metadata about their files than just size, dates, type.

I would like to see, as part of the filesystem and operating system, concise information about the owner of the data. Not the file mind, that’s just the container; the file could be owned by the operating system for protection but the data inside may be a users. In this scenario, the OS should always prompt the user about where to store the data, not put it in  some “built in” area of the subsystem. The OS can store it’s own data where it likes but the user should always get to choose where to put their own information. The OS can give  hints but ultimately it should be down to the user.

The filesystem should remember all locations that a user has specified so that it is easy to get all the users data. (In most cases it will all be on a single disk, which is fine if you can copy the disk at a block level but not fine if you just want your data.) A user can run a search which specifies “all my data”, perhaps with other information such as size and date.

To be fair, Windows search has the ability to look for metadata such as author or tag but the problem is that the metadata is application dependant and not built into the save dialog, I don’t get the option to specify author or tags in notepad. this leads us down the route of file formats. The filesystem should take more of the burden here and wouldn’t it be great if you could even specify your own types of metadata.

Just another lifetime needed to write a better filesystem.

Leave a Reply