oebfare

Moving away from Google Code

Over in Pinax land we have recently moved away from using Google Code. Google provides project hosting for open source projects. It really is a valuable service for anyone with an open source project. They provide a web interface that includes a wiki, issue tracking, SVN browser and a bit more.

Pinax lived under the project name django-hotclub on Google Code. It mostly still does. We were sort of outgrowing the features Google Code was providing us. Utlimately we wanted to write our own software development platform using Pinax so it seemed to be the best time, early in the project, to move the version code system elsewhere.

Step One

The first step in this process is to get a local copy of the Subversion repository. I first created a local SVN repository:

svnadmin create /home/brian/svn/pinax

SVN has a handy tool named svnsync. The first thing I did was initialize the repository for using svnsync. I simply executed:

svnsync init file:///home/brian/svn/pinax/ https://django-hotclub.googlecode.com/svn/

This fails with:

svnsync: Repository has not been enabled to accept revision propchanges;
ask the administrator to create a pre-revprop-change hook

I enabled this hook by simply moving the template hook to a real hook and making it executable:

mv /home/brian/pinax/hooks/pre-revprop-change.tmpl /home/brian/pinax/hooks/pre-revprop-change
chmod +x /home/brian/pinax/hooks/pre-revprop-change

The Pinax repository sets more props than what the hook allows (this is my best guess as to why) so I needed to modify the hook a tiny bit to allow any revprop change. I simply changed the the exit value in the one error case to return a 0 status code so svnsync thinks it works.

Once this is all set I was able to run the sync:

svnsync sync file:///home/brian/svn/pinax/

It was on a roll.

Step Two

Step two was to remap the usernames that were in Google Code to ones the committers normally go by. How on earth am I going to do this? I searched high and low on the Internet to see if there was some simple way. I could not find anything. James Tauber mentioned that the dump file you can get from SVN is plain text. This sparked an idea. Why not handle this before re-importing it. James said that the only way to set it up correctly on Webfaction was to dump from my synced repository and re-import it. This made the dump/import a requirement.

The next step was to learn how the file format is layed out. While I was searching the web I happened to come across a Python script that can read/write dump files. This was perfect. I don't have this source available, but once I find it I will give proper credit. It was a bit dated in terms of Python code so I rewrote it. It was a good opportunity to actually learn how it works.

After spending two/three hours rewriting it and understanding how it works I was successfully able to parse the dump file and remap the usernames to what we desired.

Here is the final script:

import sys
import copy

from cStringIO import StringIO

username_mapping = {
    "leidel": "jezdez",
    "floguy": "ericflo",
    "gregoryjnewman": "newman",
}

class Lump(object):
    def __init__(self):
        self.hdrlist = []
        self.hdrdict = {}
        self.proplist = []
        self.propdict = {}
        self.text_data = ""

    def sethdr(self, key, value):
        if key not in self.hdrdict:
            self.hdrlist.append(key)
        self.hdrdict[key] = value

    def delhdr(self, key):
        if key in self.hdrdict:
            del self.hdrdict[key]
            self.hdrlist.remove(key)

def read_rfc822_headers(fp):
    lump = Lump()
    while 1:
        line = fp.readline()
        if line == "":
            return None # eof
        if line == "\n":
            if len(lump.hdrlist) > 0:
                break # newline after headers ends them
            else:
                continue # newline before headers is simply ignored
        if line[-1:] == "\n":
            line = line[:-1]
        colon = line.find(":")
        assert colon > 0
        assert line[colon:colon+2] == ": "
        key, value = line[:colon], line[colon+2:]
        lump.sethdr(key, value)
    return lump

def props_parse(bytes):
    i = 0
    proplist = []
    propdict = {}
    while 1:
        if bytes[i:i+2] == "K ":
            need_value = True
        elif bytes[i:i+2] == "D ":
            need_value = False
        elif bytes[i:i+9] == "PROPS-END":
            break
        else:
            raise Exception, "Unrecognized record in props section"

        # the position of the \n character
        nl = bytes.find("\n", i)
        assert nl > 0
        # length of the key
        kl = int(bytes[i+2:nl])
        assert bytes[nl+1+kl] == "\n"
        # key value
        key = bytes[nl+1:nl+1+kl]
        # move the index position
        i = nl + 2 + kl

        if need_value:
            assert bytes[i:i+2] == "V "
            nl = bytes.find("\n", i)
            assert nl > 0
            vl = int(bytes[i+2:nl])
            assert bytes[nl+1+vl] == "\n"
            value = bytes[nl+1:nl+1+vl]
            i = nl + 2 + vl
        else:
            value = None
        proplist.append(key)
        propdict[key] = value
    return proplist, propdict

def read_lump(fp):
    lump = read_rfc822_headers(fp)
    if lump is None:
        return None
    pcl = int(lump.hdrdict.get("Prop-content-length", 0))
    tcl = int(lump.hdrdict.get("Text-content-length", 0))
    if pcl > 0:
        lump.proplist, lump.propdict = props_parse(fp.read(pcl))
    if tcl > 0:
        lump.text_data = fp.read(tcl)
    return lump

def write_lump(stream, lump):
    """
    """
    lump = copy.deepcopy(lump)
    prop_stream = StringIO()
    if lump.proplist:
        for key in lump.proplist:
            value = lump.propdict[key]
            if value is None:
                prop_stream.write("D %d\n" % len(key))
                prop_stream.write("%s\n" % key)
            else:
                # write key out
                prop_stream.write("K %d\n" % len(key))
                prop_stream.write("%s\n" % key)

                # write value out
                prop_stream.write("V %d\n" % len(value))
                prop_stream.write("%s\n" % value)
        prop_stream.write("PROPS-END\n")
    prop_data = prop_stream.getvalue()

    if len(prop_data) > 0:
        lump.sethdr("Prop-content-length", str(len(prop_data)))
    else:
        lump.delhdr("Prop-content-length")

    # TODO: if lump.text_data ever changes we need to write out a new
    # Text-content-md5
    if len(lump.text_data) > 0:
        lump.sethdr("Text-content-length", str(len(lump.text_data)))
    else:
        lump.delhdr("Text-content-length")

    if len(prop_data) > 0 or len(lump.text_data) > 0:
        lump.sethdr("Content-length", str(len(prop_data) + len(lump.text_data)))
    else:
        lump.delhdr("Content-length")

    # TODO: need a better data strucuture here for a sorted dict
    for key in lump.hdrlist:
        value = lump.hdrdict[key]
        stream.write("%s: %s\n" % (key, value))
    stream.write("\n")
    # write prop data
    stream.write(prop_data)
    # the text data was not modified so just write it to the stream
    stream.write(lump.text_data)
    if "Prop-content-length" in lump.hdrdict or \
         "Text-content-length" in lump.hdrdict or \
         "Content-length" in lump.hdrdict:
        stream.write("\n")

def convert_username(lump):
    if "svn:author" in lump.propdict:
        if lump.propdict["svn:author"] in username_mapping:
            committer = username_mapping[lump.propdict["svn:author"]]
        else:
            committer = lump.propdict["svn:author"]
        lump.propdict["svn:author"] = committer
    return lump

if __name__ == "__main__":
    fp = sys.stdin
    new_fp = sys.stdout

    # read headers up until a revision
    lump = read_lump(fp)
    while "Revision-number" not in lump.hdrdict:
        # write the lump to disk
        write_lump(new_fp, lump)
        lump = read_lump(fp)
        revhdr = lump

    while revhdr is not None:
        # read revision header
        assert "Revision-number" in revhdr.hdrdict
        print >> sys.stderr, "processing r%s" % (revhdr.hdrdict["Revision-number"],)
        revhdr = convert_username(revhdr)
        contents = []
        # read revision contents
        while 1:
            lump = read_lump(fp)
            if lump is None or "Revision-number" in lump.hdrdict:
                newrevhdr = lump
                break
            contents.append(lump)
        # write out the revision
        write_lump(new_fp, revhdr)
        for lump in contents:
            write_lump(new_fp, lump)
        revhdr = newrevhdr

    fp.close()
    new_fp.close()

svn commit blog-post

I hope someone out there finds this useful. Rewriting to understand the script above was an absolute blast. Maybe you can provide ways to improve it. I didn't spend too much time thinking about improving it.

Entry Details

Published: Nov 5, 2008 at 9:28 PM

© 2007 - 2008 Brian Rosner