oebfare

Gitify Python SVN

I love git. I try to do as much as I can using it. I have converted many SVN based projects over to git so that I can interact with them more naturally. I wanted a bit more of a challenge than just using git svn clone to make the repositories.

I love Python. I use it all over the place such as web applications for one. I want to some day be able to give back to Python in some form of a contribution. Perhaps this is a first step. I tasked myself with moving Python's SVN into git. Python has a pretty long history which makes it a bit more tricky.

Going local

One approach that I can take is to use git svn clone on the public SVN URL, but that would take a very long time. It would be much better to have access to actual repository and do a clone locally. Fortnately Python provides the full repository:

wget http://svn.python.org/snapshots/projects-svn-tarball.tar.bz2
tar xvfj projects-svn-tarball.tar.bz2

This now gives us a diretory named projects that is a bonafide SVN repository. You can verify this by running:

svn ls file:///Users/brian/code/python-svn/projects

Note

The SVN path above is critical. The absolute path above is /Users/brian/code/python-svn/projects. This may be referenced in other places, be sure to know what yours is if you are trying accomplish something similar.

Gitify

Lets go ahead and begin the process of moving everything from SVN into git. We simply use git svn to handle this for us. It does a great job.

git svn clone -s --prefix=svn/ file:///Users/brian/code/python-svn/projects/python

We pass it -s which indicates that the tree at the given URL follows the standard SVN layout with trunk, branches and tags. This helps git see those and make them native. I always use --prefix=svn/ so that the ref names for the SVN stuff gets prefixed and will avoid name clashing of possible local branches.

The above command will take a very long time to complete in this case. If the SVN repository is small enough it will be relatively quick. It took about five hours on my Macbook Pro.

Getting our hands dirty

Now we have a git repository of Python's SVN! However, we are not left with some issues. We pulled down a nightly snapshot of the Python SVN repository. We want to be able to keep it in sync the the on-going changes upstream.

git svn stores some of its information in .git/config:

[core]
    repositoryformatversion = 0
    filemode = true
    bare = false
    logallrefupdates = true
    ignorecase = true
[svn-remote "svn"]
    url = file:///Users/brian/code/python-svn/projects
    fetch = python/trunk:refs/remotes/svn/trunk
    branches = python/branches/*:refs/remotes/svn/*
    tags = python/tags/*:refs/remotes/svn/tags/*

It seems logical that I can adjust the repository URL above and be able to git svn fetch without an issue. I have attempted this process several times all to which failed in some sort of way giving me:

Unable to determine upstream SVN information from working tree history

However, this one time it actually worked. The difference I am aware of between this attempt and the previous attempts was the version of git I am using. I am using git 1.6.0.4 which seems to handle this better. Perhaps someone knows if there was indeed some fixes in regard to this.

In my digging I found out that the reason why it would fail with the above message is that git svn is unable to automatically detect the repository information from the commit messages. If you look at git log you will see:

commit a3e5b71ea1f27e29285c7a3e686e6084d10231a7
Author: benjamin.peterson <benjamin.peterson@6015fed2-1504-0410-9fe1-9d1591cc4771>
Date:   Sun Nov 23 02:09:41 2008 +0000

    raise a better error

    git-svn-id: file:///Users/brian/code/python-svn/projects/python/trunk@67348 6015fed2-1504-0410-9fe1-9d1591cc4771

Notice the git-svn-id part. This has meaning to git svn. Rememeber before how I told you that the full path was important? My end goal for this task was to make this repository available. I need to change this in all commits.

git filter-branch is really poweful

Maybe too powerful? git comes with filter-branch that enables you to completely rewrite a branch commit history. This is pretty insane. I wanted to adjust the commit messages to contain the correct path in git-svn-id.

Lets break out some sed and awk awesomeness here:

git filter-branch \
    --msg-filter \
        'sed -e "s/^git-svn-id: file:\/\/\/Users\/brian\/code\/python-svn\//git-svn-id: http:\/\/svn.python.org\//g"' \
    $(cat .git/packed-refs | awk '// {print $2}' | grep -v 'pack-refs')

This will rewrite the entire repository history adjusting the commit messages by changing file:///Users/brian/code/python-svn/ to http://svn.python.org/. Credit goes to this blog where I ripped the above command off. That blogger goes into more detail and I recommend the read.

Once again, the above command is going to take a very long time in this case.

Ta da!

The very last step in this process is to kill .git/svn. I figured at first that I didn't need to do this, but git svn rebase was taking forever. Did this:

rm -rf .git/svn
git svn rebase --all

And viola! All is working now.

I worked with Jannis Leidel to get the repository auto-updating and publically available. You can clone it from GitHub:

git clone git://github.com/python-git/python.git

I hope I was able to help out someone and provide a place where more people can contribute back to Python using git. Do keep in mind this git mirror is unofficial and all patches should be directed upstream to the Python developers.

Entry Details

Published: Dec 8, 2008 at 10:12 AM

© 2007 - 2008 Brian Rosner