Moreover, I really do understand why you want to rearrange history. My classoc example is of interleaved changes
V0---(a1)--->Va1---(b1)--->Va1b1---(a2)--->Va2b1---(b2)--->Va2b2 = Vab
I.e. starting off at version V0, apply changeset (a1) to get version Va1, then changeset (b1) to get version Va1b1, and so on until you get final version Va2b2, which we will call simply version Vab.
I.e. there are really two independent changes going on, (a)=(a1)+(a2) and (b)=(b1)+(b2). But they got interleaved.
Try as you might, this happens.
Using history rewriting tools like git rebase and hg graft you can rearrange the histories.
V0---(a1)--->Va1--->(a2')--->Va --->(b1')--->Vab1'---(b2')--->Vab
or in the other order
V0---(b1")--->Vb1"--->(b2")--->Vb --->(a1")--->Va1"b---(a2")--->Vab
Note that I have tried to indicated by priming, ' and ", that the changesets applied in different order may not be identical. Note: we are not talking about applying the SAME changesets in a different order, which might give different final results Vab, but instead calculating MODIFIED changesets (a1"), (b1'), etc. to give the same final result Vab.
Compactly, letting (a)=(a1)+(a2) and (b)=(b1)+(b2)
V0---(a)--->Va--->(b)--->Vab
or in the other order
V0---(b)--->Vb--->(a)--->Vab
I know how to do this with standard history editing tools.
or in the other order
V0---(b)--->Vb--->(a)--->Vab
I know how to do this with standard history editing tools.
---
But this is problematic when others have pulled and cloned and made modifications to the intermediate pre-history edited versions like Va2b1. In Mercurial, if that version is deleted from the repository, you may not be able to merge back in - or, when you merge, it may revive the deleted version from the dead.
This is bad.
Moreover, this "you must edit history before pushing" philosophy leads to people delaying sharing code. Which is bad.
---
I have been thinking more and more about recording multiple paths of history in the repo.
I.e. pushing first
V0---(a1)--->Va1--->(a2')--->Va --->(b1')--->Vab1'---(b2')--->Vab
and then later, at your leisure, going and rearranging the history so that it makes sense:
V0---+---(a1)--->Va1--->(a2')--->Va --->(b1')--->Vab1'---(b2')--->Vab
\ ||
+-----------(a)---------->Va--------->(b)--------------------->Vab
\ ||
+-----------(a)---------->Va--------->(b)--------------------->Vab
I.e. the final nodes Vab are equivalent.
Or, pushing my ascii art skills
V0---+---(a1)--->Va1--->(a2')--->Va --->(b1')--->Vab1'---(b2')---+--->Vab
\ /
+-----------(a)---------->Va--------->(b)-------------------------+
\ /
+-----------(a)---------->Va--------->(b)-------------------------+
(I think I like the equivalence || drawing better.)
Still later you may create a second alternate history:
V0---+---(a1)--->Va1--->(a2')--->Va --->(b1')--->Vab1'---(b2')--->Vab
\ ||
+-----------(a)---------->Va--------->(b)--------------------->Vab
\ ||
+-----------(a)---------->Va--------->(b)--------------------->Vab
\ ||
+-----------(b')---------->Vb--------->(a')------------------>Vab
+-----------(b')---------->Vb--------->(a')------------------>Vab
The basic idea is that when you establish that you are creating alternate history paths, the operations are constrained to create the same final results.
And then, for those who want a simple history, you hide the paths, the changesets and the versions, that you do not want to appear by default. But they still are recorded in the history, so that if somebody made modifications based off an intermediate version, you can still merge meaningfully. You get to decide if that merge appears as if based of a version that is currently hidden, or if you want to roll up the visibility of its ancestors.
2 comments:
One nice thing about this approach is that you don't have to be in a hurry to do the right thing. You can edit in a quick and dirty manner, share, but also clean up the history after the fact. You don't have to decide between integrating early and often, and having a nice history.
So what is the name for a version?
Mercurial creates a hash of all of the history leading up to a version.
This can work with my multiple paths approach. A new path can be constrained to point to a version determined by an old path.
Except when you may delete versions, delete paths. Then it may not be possible to calculate the name from the history any more.
But why do we need to calculate the name? Why not store? We may lose an arbitrary integrity check.
I mean, basically the reason that we do not arbitrarily assign numbers is to handle collisions when disconnected. Content based names are very unlikely to collide. Even if part of the content has been deleted, still unlikely to collide.
However, I suppose that one of the advantages of content based names when all content is available is that you can in theory detect collisions. I don't know of anyone who does this in a DVCS (except me in my old RCS merge). If the content that went into the computation of a hash for a name is deleted, you can't verify.
But you can do perfunctory checks. For example, you can compare the current file contents (assuming you have them). This will detect different files having same history hashes. And of course all hash collisions are hopefully unlikely.
Hash the file contents.
Hash the file contents and current metadata.
Hash the file contents and metadata and history.
Post a Comment