I want to explore my ideas on how branches, lines of development, tags, and subprojects work.
I think the filesystem oriented design is the way to go. Strictly speaking, I first saw this in svn, but Linus certainly took it further.
Think of it as an abstract interface to a filesystem - a filesystem embedded in a directory tree. Or possibly in a file. E.g. an XML file. I'll concentrate on the filesystem in a directory tree idea first, but want to always keep filesystem n a file, filesystem-in-XML, filesystem-in-tar-archive, etc., around at the back of my mind.
The API to such a user level filesystem is not UNIX kernel level open/read/write. In particular, we cannot assume file handles. No state apart from the filesystem itself.
The API must interface to an existing filesystem.
The API should be (a) in a scripting language, and (b) at the *IX command level. The mapping between them should be 1:1 as much as possible - i.e. I am lazy, and do not want to have to define both. I want to be able to automatically generate the command line tools from the scripting modules.
The filesystem should be layerable. E.g. for version control, I want a layer that handles my VC concepts - branches, tags, lines, etc. I want this layered on top of a contenht based, deduplicated, filesystem. And I want that layered on top of a compression layer, i.e. a packing layer. I want to be able to design these layers separately, and mix and match.
I want this layering so that, ideally, I can steal other people's implementations.
E.g. I might want to steal Linus' git packfiles, or bzr's packfiles.
E.g. I will probably write my own content based layer, deliberately using a hash that has lots of collisions, to prove that I can handle collisions. But I would like be able to use any other existying implementation.
E.g. I may want to implement things in a directory, or in some single file. If it is properly layered, that should just involve switching a layer. Although the performance considerations might be extreme.
I want the UNIX command line interface to support such layering. E.g. I may want to look at a filesystem in something like a .git directory tree as a VC level - or at the content based level, or at the raw pack level.
I want the API in the scripting language so that I can have the layering in the script language, somewhat efficiently. But if layering modules are written in different scripting languages, say Perl or Python or Ruby, or even in C/C++ like git, I want to be able to layer through tyhe command line as well.
I.e. command line layering required. Inside-script-language layering nice.
Given my hisory, I will probably start coding in object oriented perl. Hope to leverage lots of code out of CPAN. Call it PerlFS? No, that name is taken. How about FSinF or FSinD - FileSystem in a File and FileSystem in a Directory (Tree). Perl_FSinF, Perl_FSinD. FSinF being the generic name, since directories are just filesystems.
I'm not scared of using a scripting language. Bzr argues they have acheived good performance in Python. I like portability.
As usual, ultimately I would like the system to install as a single file. Likes scons.py. Or at least a directory.
Thoughts about the FSinF (FilesSystem in File/Directory) Interface
Can't have primitives like UNIX open/read/write/close. Thesed assume state - file descriptors.
Interface must specify both files on the FSinF filesystem, as well as data on the native filesystem.
Operations such as:
Copying
FSinF copy fileSrc fileDst
Copy fileSrc (in FSinF) to fileDst (in FSinF)
FSinF get fileSrc fileDst
get contents of fileSrc (in FSinF) and write to fileDst (in native filesystem)
FSinF put fileSrc fileDst
put contenhts of fioleSrc (in native filesystem) to fileDst (in FSinF)
If we have syntax to distinguish native and FSinF filesystem names, this might be a single command "copy", although get/put is nicely documenting.
Moving and Renaming
FSinF move fileSrc fileDst
move or rename fileSrc (in FSinF) to fileDst (in FSinF)
Again, syntax could allow moving and renaming between FSinF and the native filesystem.
Inter-filesystem operations
get and put are inter-filesystem operations. We don't need to limit ourselves to getting from the FSinF to the native filessytem. We could transfer between two different FSinFs, FSinF1 and FSinF2. Possibly might stage through the native filesystem, but not necessarily always.
Parts of Files
Don't want always to operate on whole files. Although that is natural.
May want to specify parts of files:
- byte offset regions
- line regions
- XML clauses
- record numbers
At the command line leve, these would be optional parameters.
E.g.
FSinF get -range "line 40:line 100" file
Hints and Requirements
Different filesystems have different semantics. E.g. "emove" may mean "remove completely", or "remove from current version, but keep earlier versions around".
Such semantics are FSinF specific. Optional arguments.
May want to indicate if the semantics are mandatory or optional. E.g. "remove from current version, but keep earlier versions around" motivated by desire to remove copyright infringing stuff is mandatory - and must be done. Or must receive an error if cannot be done.
E.g.
FSinF remove -require hard-removal file ...
FSinF remove -optional soft-removal file ...
Other Filesystem Operations
FSinF remove files ...
FSinF create files ....
Empty files created.
Possibler specify types.
FSinF mkdir dirname
FSinF mkfs fsinf-file-or-dirname
FSinF fsck fsinf-file-or-dirname
FSinF chperm files...
Changes permisions
Also used to change owner, group name, etc.
Expect to extend:
E.g.
May start off with UNIX permissions, or ACLs
May create my own extra permission classes and ACLs, more flexble than native OS.
FSinF metadata add file metadata...
FSinF metadata remove file metadata...
Manipulae extra metdata associated with files
Synchronization
FSinF get file1 file2 ...
atomically get consistent file versions
FSinF atomic compare-and-swap old-file oldcmpdata newfile
FSinF atomic compare-and-put old-file1 cmpdata1 newfile2
Have the "server" do the comparison, and then put/swap.
Possibly
FSinF lock file ...
FSinF unlock file ...
Although I dislike the stae implied by locking.
Layering, Access to
E.g. in a directory ~/myproject
with a subdirectory .git
I would want to be able t say
FSinF -type git move file1 file2
But, I might also want tio say, at the level of the conent based
FSinF -type git-sha-content-based-fs remove 0x6416fec17...
I might also have a .svn link, so I may want to say
FSinF -type svn get "-r4 file1 "
note how "-r4 file1" almost looks like a filename / specification to the svn filesystem.
I.e. I may want to have several different filesystems under a tree, such as .git / .cvn. Or even my old CVS / CVS.othr-reo stuff.
But I also want to be able to look at the layers in any of thenm.
Layers I want
Raw filesystem. I.e. basically a NOP.
FSinF pack:
Compacts the above.
Probably needs some form of delta specification, for delta compression.
FSinF content-based
Uses hashes to deduplicate.
Me, I want to be able to handle hash collisions. Want both stupid hashes, like UNIX compress, and fancier hashes almost guaranteed to be collsions free.
FSinF checksummed
Adds checksums, verifies integrity
Maybe even ECC
How about:
FSinF mirrored?
FSinF encrypted.
FSinF signed
bit not encrypted
FSinF version-conrol
version controlled
in my dreams, with my extended semantics.
FSinF dirtree, file, xml, tar ...