I love git. I try to do as much as I can using it. I have converted many SVN
based projects over to git so that I can interact with them more naturally.
I wanted a bit more of a challenge than just using git svn clone to make
the repositories.
I love Python. I use it all over the place such as web applications for one. I want to some day be able to give back to Python in some form of a contribution. Perhaps this is a first step. I tasked myself with moving Python's SVN into git. Python has a pretty long history which makes it a bit more tricky.
Going local
One approach that I can take is to use git svn clone on the public SVN
URL, but that would take a very long time. It would be much better to have
access to actual repository and do a clone locally. Fortnately Python provides
the full repository:
wget http://svn.python.org/snapshots/projects-svn-tarball.tar.bz2
tar xvfj projects-svn-tarball.tar.bz2
This now gives us a diretory named projects that is a bonafide SVN
repository. You can verify this by running:
svn ls file:///Users/brian/code/python-svn/projects
Note
The SVN path above is critical. The absolute path above is
/Users/brian/code/python-svn/projects. This may be referenced
in other places, be sure to know what yours is if you are trying
accomplish something similar.
Gitify
Lets go ahead and begin the process of moving everything from SVN into git.
We simply use git svn to handle this for us. It does a great job.
git svn clone -s --prefix=svn/ file:///Users/brian/code/python-svn/projects/python
We pass it -s which indicates that the tree at the given URL follows the
standard SVN layout with trunk, branches and tags. This helps git
see those and make them native. I always use --prefix=svn/ so that the
ref names for the SVN stuff gets prefixed and will avoid name clashing of
possible local branches.
The above command will take a very long time to complete in this case. If the SVN repository is small enough it will be relatively quick. It took about five hours on my Macbook Pro.
Getting our hands dirty
Now we have a git repository of Python's SVN! However, we are not left with some issues. We pulled down a nightly snapshot of the Python SVN repository. We want to be able to keep it in sync the the on-going changes upstream.
git svn stores some of its information in .git/config:
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
[svn-remote "svn"]
url = file:///Users/brian/code/python-svn/projects
fetch = python/trunk:refs/remotes/svn/trunk
branches = python/branches/*:refs/remotes/svn/*
tags = python/tags/*:refs/remotes/svn/tags/*
It seems logical that I can adjust the repository URL above and be able to
git svn fetch without an issue. I have attempted this process several
times all to which failed in some sort of way giving me:
Unable to determine upstream SVN information from working tree history
However, this one time it actually worked. The difference I am aware of between this attempt and the previous attempts was the version of git I am using. I am using git 1.6.0.4 which seems to handle this better. Perhaps someone knows if there was indeed some fixes in regard to this.
In my digging I found out that the reason why it would fail with the above
message is that git svn is unable to automatically detect the repository
information from the commit messages. If you look at git log you will see:
commit a3e5b71ea1f27e29285c7a3e686e6084d10231a7
Author: benjamin.peterson <benjamin.peterson@6015fed2-1504-0410-9fe1-9d1591cc4771>
Date: Sun Nov 23 02:09:41 2008 +0000
raise a better error
git-svn-id: file:///Users/brian/code/python-svn/projects/python/trunk@67348 6015fed2-1504-0410-9fe1-9d1591cc4771
Notice the git-svn-id part. This has meaning to git svn. Rememeber
before how I told you that the full path was important? My end goal for this
task was to make this repository available. I need to change this in all
commits.
git filter-branch is really poweful
Maybe too powerful? git comes with filter-branch that enables you to
completely rewrite a branch commit history. This is pretty insane. I wanted
to adjust the commit messages to contain the correct path in git-svn-id.
Lets break out some sed and awk awesomeness here:
git filter-branch \
--msg-filter \
'sed -e "s/^git-svn-id: file:\/\/\/Users\/brian\/code\/python-svn\//git-svn-id: http:\/\/svn.python.org\//g"' \
$(cat .git/packed-refs | awk '// {print $2}' | grep -v 'pack-refs')
This will rewrite the entire repository history adjusting the commit messages
by changing file:///Users/brian/code/python-svn/ to
http://svn.python.org/. Credit goes to this blog where I ripped the
above command off. That blogger goes into more detail and I recommend the
read.
Once again, the above command is going to take a very long time in this case.
Ta da!
The very last step in this process is to kill .git/svn. I figured at first
that I didn't need to do this, but git svn rebase was taking forever. Did
this:
rm -rf .git/svn
git svn rebase --all
And viola! All is working now.
I worked with Jannis Leidel to get the repository auto-updating and publically available. You can clone it from GitHub:
git clone git://github.com/python-git/python.git
I hope I was able to help out someone and provide a place where more people can contribute back to Python using git. Do keep in mind this git mirror is unofficial and all patches should be directed upstream to the Python developers.
