Disclaimer

The content of this blog is my personal opinion only. Although I am an employee - currently of Nvidia, in the past of other companies such as Iagination Technologies, MIPS, Intellectual Ventures, Intel, AMD, Motorola, and Gould - I reveal this only so that the reader may account for any possible bias I may have towards my employer's products. The statements I make here in no way represent my employer's position, nor am I authorized to speak on behalf of my employer. In fact, this posting may not even represent my personal opinion, since occasionally I play devil's advocate.

See http://docs.google.com/View?id=dcxddbtr_23cg5thdfj for photo credits.

Friday, June 12, 2009

Implicit vs. Explicit Filenames

In an earlier post I talked about creating a UNIX command line tool for user level filesystems.

I described primitives such as
    FSinF get fileSrc fileDst
    get contents of fileSrc (in FSinF) and write to fileDst (in native filesystem)

    FSinF put fileSrc fileDst
    put contents of fileSrc (in native filesystem) to fileDst (in FSinF)
I must admit that I was thinking in terms of UNIX like filesystems, where the user specifies the filename. However, there is a more basic sort of filesystem: one where the storage system assigns the name.

My first real encounter with such a system was the MH Mail Handling system. Other mail systems are similar: when you save or file an email, you don't have to give it a name. MH gave it a number, sequentially incrementing. Outlook doesn't even do that, although there is probably a message ID somewhere. Apple Itunes' music manager is similar. The user typically looks at listing of the files in a folder, using metadata that may be extracted from the file, or otherwise associated. In other words, there is a browse/search/select interface. It is harder to automate than a simple filename interface, because uniqueness is harder to guarantee for a query selection.

Content based filesystems are similar. E.g. git, where the name is the hash.

Some content based filesystems cannot handle conflicts - two different files with the same hash. Git seems to have little provision for conflicts, although I have been told by Linus that it handls them. My own work in content based filesystems handles hash conflicts - the filename is the hash, with a version number to handle conflicts. The comparison to handle conflicts can be avoided, at the risk of data loss (such as occurs in other content based filesystes).

In any case: for a strictly hash based filesystem that does not handle conflicts, the user could calculate the hash outside and install it using a filename interface. However, for a content based filesystem such as mine, only the filesystem can return the filename.

This suggests an interface:
    fileDst := FSinF put fileSrc
    put contents of fileSrc (in native filesystem)
    into fileDst (in FSinF). FSinF specifies the filename.
where the storage system (FSinF) returns the filename.

Call these two types of filesystem implicitly versus explicitly name filesystem.

Obviously, an implicit filesystem can be built on top of an explicit filesystem. The wrapper calculates the names.

An explicit filesystem can be built on top of an implicit filesystem, but it needs bootstrapping: one file that can be located without knowing the implicit name, that contains the mappings of explicit to implicit names.

No comments: