Over in Pinax land we have recently moved away from using Google Code. Google provides project hosting for open source projects. It really is a valuable service for anyone with an open source project. They provide a web interface that includes a wiki, issue tracking, SVN browser and a bit more.
Pinax lived under the project name django-hotclub on Google Code. It mostly still does. We were sort of outgrowing the features Google Code was providing us. Utlimately we wanted to write our own software development platform using Pinax so it seemed to be the best time, early in the project, to move the version code system elsewhere.
Step One
The first step in this process is to get a local copy of the Subversion repository. I first created a local SVN repository:
svnadmin create /home/brian/svn/pinax
SVN has a handy tool named svnsync. The first thing I did was initialize
the repository for using svnsync. I simply executed:
svnsync init file:///home/brian/svn/pinax/ https://django-hotclub.googlecode.com/svn/
This fails with:
svnsync: Repository has not been enabled to accept revision propchanges;
ask the administrator to create a pre-revprop-change hook
I enabled this hook by simply moving the template hook to a real hook and making it executable:
mv /home/brian/pinax/hooks/pre-revprop-change.tmpl /home/brian/pinax/hooks/pre-revprop-change
chmod +x /home/brian/pinax/hooks/pre-revprop-change
The Pinax repository sets more props than what the hook allows (this is my
best guess as to why) so I needed to modify the hook a tiny bit to allow any
revprop change. I simply changed the the exit value in the one error case to
return a 0 status code so svnsync thinks it works.
Once this is all set I was able to run the sync:
svnsync sync file:///home/brian/svn/pinax/
It was on a roll.
Step Two
Step two was to remap the usernames that were in Google Code to ones the committers normally go by. How on earth am I going to do this? I searched high and low on the Internet to see if there was some simple way. I could not find anything. James Tauber mentioned that the dump file you can get from SVN is plain text. This sparked an idea. Why not handle this before re-importing it. James said that the only way to set it up correctly on Webfaction was to dump from my synced repository and re-import it. This made the dump/import a requirement.
The next step was to learn how the file format is layed out. While I was searching the web I happened to come across a Python script that can read/write dump files. This was perfect. I don't have this source available, but once I find it I will give proper credit. It was a bit dated in terms of Python code so I rewrote it. It was a good opportunity to actually learn how it works.
After spending two/three hours rewriting it and understanding how it works I was successfully able to parse the dump file and remap the usernames to what we desired.
Here is the final script:
import sys
import copy
from cStringIO import StringIO
username_mapping = {
"leidel": "jezdez",
"floguy": "ericflo",
"gregoryjnewman": "newman",
}
class Lump(object):
def __init__(self):
self.hdrlist = []
self.hdrdict = {}
self.proplist = []
self.propdict = {}
self.text_data = ""
def sethdr(self, key, value):
if key not in self.hdrdict:
self.hdrlist.append(key)
self.hdrdict[key] = value
def delhdr(self, key):
if key in self.hdrdict:
del self.hdrdict[key]
self.hdrlist.remove(key)
def read_rfc822_headers(fp):
lump = Lump()
while 1:
line = fp.readline()
if line == "":
return None # eof
if line == "\n":
if len(lump.hdrlist) > 0:
break # newline after headers ends them
else:
continue # newline before headers is simply ignored
if line[-1:] == "\n":
line = line[:-1]
colon = line.find(":")
assert colon > 0
assert line[colon:colon+2] == ": "
key, value = line[:colon], line[colon+2:]
lump.sethdr(key, value)
return lump
def props_parse(bytes):
i = 0
proplist = []
propdict = {}
while 1:
if bytes[i:i+2] == "K ":
need_value = True
elif bytes[i:i+2] == "D ":
need_value = False
elif bytes[i:i+9] == "PROPS-END":
break
else:
raise Exception, "Unrecognized record in props section"
# the position of the \n character
nl = bytes.find("\n", i)
assert nl > 0
# length of the key
kl = int(bytes[i+2:nl])
assert bytes[nl+1+kl] == "\n"
# key value
key = bytes[nl+1:nl+1+kl]
# move the index position
i = nl + 2 + kl
if need_value:
assert bytes[i:i+2] == "V "
nl = bytes.find("\n", i)
assert nl > 0
vl = int(bytes[i+2:nl])
assert bytes[nl+1+vl] == "\n"
value = bytes[nl+1:nl+1+vl]
i = nl + 2 + vl
else:
value = None
proplist.append(key)
propdict[key] = value
return proplist, propdict
def read_lump(fp):
lump = read_rfc822_headers(fp)
if lump is None:
return None
pcl = int(lump.hdrdict.get("Prop-content-length", 0))
tcl = int(lump.hdrdict.get("Text-content-length", 0))
if pcl > 0:
lump.proplist, lump.propdict = props_parse(fp.read(pcl))
if tcl > 0:
lump.text_data = fp.read(tcl)
return lump
def write_lump(stream, lump):
"""
"""
lump = copy.deepcopy(lump)
prop_stream = StringIO()
if lump.proplist:
for key in lump.proplist:
value = lump.propdict[key]
if value is None:
prop_stream.write("D %d\n" % len(key))
prop_stream.write("%s\n" % key)
else:
# write key out
prop_stream.write("K %d\n" % len(key))
prop_stream.write("%s\n" % key)
# write value out
prop_stream.write("V %d\n" % len(value))
prop_stream.write("%s\n" % value)
prop_stream.write("PROPS-END\n")
prop_data = prop_stream.getvalue()
if len(prop_data) > 0:
lump.sethdr("Prop-content-length", str(len(prop_data)))
else:
lump.delhdr("Prop-content-length")
# TODO: if lump.text_data ever changes we need to write out a new
# Text-content-md5
if len(lump.text_data) > 0:
lump.sethdr("Text-content-length", str(len(lump.text_data)))
else:
lump.delhdr("Text-content-length")
if len(prop_data) > 0 or len(lump.text_data) > 0:
lump.sethdr("Content-length", str(len(prop_data) + len(lump.text_data)))
else:
lump.delhdr("Content-length")
# TODO: need a better data strucuture here for a sorted dict
for key in lump.hdrlist:
value = lump.hdrdict[key]
stream.write("%s: %s\n" % (key, value))
stream.write("\n")
# write prop data
stream.write(prop_data)
# the text data was not modified so just write it to the stream
stream.write(lump.text_data)
if "Prop-content-length" in lump.hdrdict or \
"Text-content-length" in lump.hdrdict or \
"Content-length" in lump.hdrdict:
stream.write("\n")
def convert_username(lump):
if "svn:author" in lump.propdict:
if lump.propdict["svn:author"] in username_mapping:
committer = username_mapping[lump.propdict["svn:author"]]
else:
committer = lump.propdict["svn:author"]
lump.propdict["svn:author"] = committer
return lump
if __name__ == "__main__":
fp = sys.stdin
new_fp = sys.stdout
# read headers up until a revision
lump = read_lump(fp)
while "Revision-number" not in lump.hdrdict:
# write the lump to disk
write_lump(new_fp, lump)
lump = read_lump(fp)
revhdr = lump
while revhdr is not None:
# read revision header
assert "Revision-number" in revhdr.hdrdict
print >> sys.stderr, "processing r%s" % (revhdr.hdrdict["Revision-number"],)
revhdr = convert_username(revhdr)
contents = []
# read revision contents
while 1:
lump = read_lump(fp)
if lump is None or "Revision-number" in lump.hdrdict:
newrevhdr = lump
break
contents.append(lump)
# write out the revision
write_lump(new_fp, revhdr)
for lump in contents:
write_lump(new_fp, lump)
revhdr = newrevhdr
fp.close()
new_fp.close()
svn commit blog-post
I hope someone out there finds this useful. Rewriting to understand the script above was an absolute blast. Maybe you can provide ways to improve it. I didn't spend too much time thinking about improving it.
