I used to have trouble keeping track of my files. I often couldn't remember
whether I saved a file on my desktop, laptop, or phone, or if it was floating
around in the cloud somewhere. Plus, with certain information, like passwords
and bitcoin keys, I didn't feel comfortable just sending that in an email to
myself in plain text.
What I wanted was to store my data in a git repository that was backed up to a
single location. I could view old versions of files, and wouldn't have to
worry about my data being deleted. Plus, I was familiar with using git to push
and fetch files to various computers.
But, like I said, I didn't want to just upload my secret keys and passwords to
GitHub or BitBucket, even in a private repository.
I had the cool idea of writing a tool to encrypt my repository before I pushed
it into backup. Unfortunately, I wouldn't be able to use git push
like I
normally would, and instead would have to use something like this:
$ encrypted-git push http://example.com/
At least, that's what I thought until I discovered git-remote-helpers.
Git remote helpers
Online, I found the documentation for git remote
helpers.
It turns out that if you were to run the commands
$ git remote add origin asdf://example.com/repo
$ git push --all origin
Git would first check if it had the asdf
protocol built in, and when it saw
it didn't, it would check if git-remote-asdf
was on the PATH, and if it was,
it'd run git-remote-asdf origin asdf://example.com/repo
to handle the
communications.
Similarly, you can also run
$ git clone asdf::http://example.com/repo
Which will cause git to invoke git-remote-asdf origin http://example.com/repo
.
Unfortunately, I found the documentation to be severely lacking on the details
I needed to actually implement a helper. But then, in the Git source code, I
found a shell script called
git-remote-testgit.sh
that implements a testgit
which is used to test the git remote helper system.
It basically implements pushing and fetching from local repositories on the
same filesystem. So
git clone testgit::/existing-repository
is equivalent to
git clone /existing-repository
Similarly, you can push and fetch from local repositories over the testgit
protocol.
In this article, we'll walk through the code of git-remote-testgit and
reimplement it in Go by creating a brand new helper, git-remote-go
. Along
the way, I'll explain what the code means, and the various things I had to
learn in order to implement my own remote helper,
git-remote-grave.
Some basics
To make the following sections clearer, let's establish some terminology and
basic mechanisms.
When we run
$ git remote add myremote go::http://example.com/repo
$ git push myremote master
Git will instantiate a new process by running the command
git-remote-go myremote http://example.com/repo
Notice that the first argument is the remote name, and the second argument is
the url.
When you run
$ git clone go::http://example.com/repo
the helper will be instantiated with
git-remote-go origin http://example.com/repo
This is because the remote origin
is automatically created in cloned
repositories.
When git instanties the helper as a new process, it opens up pipes for stdin,
stdout, and stderr for communicating with it. Commands are sent to the helper
over stdin, and the helper responds over stdout. Any output the helper
produces on stderr is redirected to wherever git's stderr is going—which
is probably the terminal.
The following diagram demonstrates this relationship.
The last point I want to make before we begin is to distinguish the local and
remote repository. Generally, but not always, the local repository is the
one we are running git from, and the remote is the one we are making a
connection to.
So in a push, we are sending changes from the local to the remote. In a fetch,
we are taking changes from the remote to the local. In a clone, we are cloning
from the remote into the local.
When git runs the helper, it sets the environment variable GIT_DIR
to the Git
directory of the local repository (e.g. local/.git
).
Starting the project
In this article, I'm assuming that Go is
installed, with $GOPATH
pointing to a
directory named go
.
Let's start by creating the directory go/src/git-remote-go
. This will make
it possible to install our helper just by running go install
(assuming
go/bin
is on the PATH).
With this in mind, we can write the first few lines of
go/src/git-remote-go/main.go
.
package main
import (
"log"
"os"
)
func Main() (er error) {
if len(os.Args) < 3 {
return fmt.Errorf("Usage: git-remote-go remote-name url")
}
remoteName := os.Args[1]
url := os.Args[2]
}
func main() {
if err := Main(); err != nil {
log.Fatal(err)
}
}
I've separated Main()
as a separate function because error handling is easier
when we can return errors. It also allows us to use defer
, since log.Fatal
calls os.Exit
, which doesn't run defer
'd functions.
Now let's look at the top of git-remote-testgit to see what to do next.
#!/bin/sh
# Copyright (c) 2012 Felipe Contreras
alias=$1
url=$2
dir="$GIT_DIR/testgit/$alias"
prefix="refs/testgit/$alias"
default_refspec="refs/heads/*:${prefix}/heads/*"
refspec="${GIT_REMOTE_TESTGIT_REFSPEC-$default_refspec}"
test -z "$refspec" && prefix="refs"
GIT_DIR="$url/.git"
export GIT_DIR
force=
mkdir -p "$dir"
if test -z "$GIT_REMOTE_TESTGIT_NO_MARKS"
then
gitmarks="$dir/git.marks"
testgitmarks="$dir/testgit.marks"
test -e "$gitmarks" || >"$gitmarks"
test -e "$testgitmarks" || >"$testgitmarks"
fi
The variable they call alias
is what we are calling remoteName
. url
means the same thing.
The next declaration is
dir="$GIT_DIR/testgit/$alias"
This creates a namespace in the Git directory that is specific to the testgit
protocol and to the remote we are using. This way the testgit files for the
origin
remote are different from the backup
remote.
Down below, we see the statement
mkdir -p "$dir"
This will make sure the local directory is created, if it doesn't exist
already.
Let's add the creation of the local directory to our Go program.
// Add "path" to the import list
localdir := path.Join(os.Getenv("GIT_DIR"), "go", remoteName)
if err := os.MkdirAll(localdir, 0755); err != nil {
return err
}
Continuing through the script, we come across the following lines
prefix="refs/testgit/$alias"
default_refspec="refs/heads/*:${prefix}/heads/*"
refspec="${GIT_REMOTE_TESTGIT_REFSPEC-$default_refspec}"
test -z "$refspec" && prefix="refs"
Let's talk about refs really quick.
In git, refs are stored in .git/refs
:
.git
└── refs
├── heads
│ └── master
├── remotes
│ ├── gravy
│ └── origin
│ └── master
└── tags
In the above tree, remotes/origin/master
contains the SHA-hash of the most
recent commit in the master
branch of the origin
remote. heads/master
refers to the most recent commit of your local master
branch. A ref is like
a pointer to a commit.
A refspec allows us to map remote refs to local refs. In the above code,
prefix
is the directory where the remote refs will be held. If the remote
name is origin, then the remote master branch would be determined by the ref
.git/refs/testgit/origin/master
. It basically creates a protocol-specific
namespace for remote branches.
The next line is the actual refspec. The line
default_refspec="refs/heads/*:${prefix}/heads/*"
expands to
default_refspec="refs/heads/*:refs/testgit/$alias/*"
Which means that map the remote branches that look like refs/heads/*
(where
*
means any text) to refs/testgit/$alias/*
(where *
is replaced with
whatever *
was in the first one). So refs/heads/master
becomes
refs/testgit/origin/master
, for instance.
Essentially, the refspec allows testgit to add a branch to the tree for itself,
like this
.git
└── refs
├── heads
│ └── master
├── remotes
│ └── origin
│ └── master
├── testgit
│ └── origin
│ └── master
└── tags
The next line
refspec="${GIT_REMOTE_TESTGIT_REFSPEC-$default_refspec}"
Sets $refspec
to $GIT_REMOTE_TESTGIT_REFSPEC
, unless it doesn't exist, then
it becomes $default_refspec
. This is so testgit can be tested with other
refspecs. We'll assume it gets set to $default_refspec
.
Finally, the next line,
test -z "$refspec" && prefix="refs"
Seems to set $prefix
to refs
if $GIT_REMOTE_TESTGIT_REFSPEC
exists but is
empty, which we'll assume is the case.
We need our own refspec, so we'll add the line
refspec := fmt.Sprintf("refs/heads/*:refs/go/%s/*", remoteName)
Following that code, we see
GIT_DIR="$url/.git"
export GIT_DIR
Another fact about $GIT_DIR
is that if it is set in the environment, the
git
binary will use the directory in $GIT_DIR
as its .git
directory,
instead of the local .git
. This command makes it so that all future git
commands run by the helper will run in the context of the remote repository.
We'll translate this to
if err := os.Setenv("GIT_DIR", path.Join(url, ".git")); err != nil {
return err
}
Remember, of course, that $dir
and our variable localdir
still refer to a
subdirectory of the repository we are fetching to or pushing from.
And the last bit of code before the main loop is
if test -z "$GIT_REMOTE_TESTGIT_NO_MARKS"
then
gitmarks="$dir/git.marks"
testgitmarks="$dir/testgit.marks"
test -e "$gitmarks" || >"$gitmarks"
test -e "$testgitmarks" || >"$testgitmarks"
fi
The contents of the if statement will be executed if
$GIT_REMOTE_TESTGIT_NO_MARKS
isn't set, which we'll assume is the case.
These marks files are used by git fast-export
and git fast-import
to record
information about refs and blobs being transferred. It's important that these
marks are kept the same between multiple invocations of the helper, so they're
being stored in the localdir.
Here, $gitmarks
refers to the marks for our local repository that git writes,
while $testgitmarks
stores the marks for the remote one that the handler
writes.
The two following lines appear equivalent to touch
invocations, where if the
marks files don't exist, they are created empty.
test -e "$gitmarks" || >"$gitmarks"
test -e "$testgitmarks" || >"$testgitmarks"
We'll need these files in our own program, so let's start by writing a Touch
function.
// Create path as an empty file if it doesn't exist, otherwise do nothing.
// This works by opening a file in exclusive mode; if it already exists,
// an error will be returned rather than truncating it.
func Touch(path string) error {
file, err := os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_EXCL, 0666)
if os.IsExist(err) {
return nil
} else if err != nil {
return err
}
return file.Close()
}
Now we can create the marks files.
gitmarks := path.Join(localdir, "git.marks")
gomarks := path.Join(localdir, "go.marks")
if err := Touch(gitmarks); err != nil {
return err
}
if err := Touch(gomarks); err != nil {
return err
}
However, one thing I've come across is that if the helper fails for some
reason, the marks files can be left in an invalid state. To guard against
this, we can save the original contents of the files, and then rewrite them if
the Main()
function returns an error.
// add "io/ioutil" to imports
originalGitmarks, err := ioutil.ReadFile(gitmarks)
if err != nil {
return err
}
originalGomarks, err := ioutil.ReadFile(gomarks)
if err != nil {
return err
}
defer func() {
if er != nil {
ioutil.WriteFile(gitmarks, originalGitmarks, 0666)
ioutil.WriteFile(gomarks, originalGomarks, 0666)
}
}()
We can finally begin on the central command loop.
Commands are passed to helper via stdin, where each command is a string
terminated by a newline. The helper responds to the commands via stdout;
stderr is piped to the end user.
Let's make our own loop.
// Add "bufio" to import list.
stdinReader := bufio.NewReader(os.Stdin)
for {
// Note that command will include the trailing newline.
command, err := stdinReader.ReadString('\n')
if err != nil {
return err
}
switch {
case command == "capabilities\n":
// ...
case command == "\n":
return nil
default:
return fmt.Errorf("Received unknown command %q", command)
}
}
The capabilities
command
The first command to implement is capabilities
. The helper is expected to
print what commands and other capabilities it supports on separate lines,
terminated by an empty line.
echo 'import'
echo 'export'
test -n "$refspec" && echo "refspec $refspec"
if test -n "$gitmarks"
then
echo "*import-marks $gitmarks"
echo "*export-marks $gitmarks"
fi
test -n "$GIT_REMOTE_TESTGIT_SIGNED_TAGS" && echo "signed-tags"
test -n "$GIT_REMOTE_TESTGIT_NO_PRIVATE_UPDATE" && echo "no-private-update"
echo 'option'
echo
This list of capabilities states that this helper supports the import
,
export
and option
commands. The option
command allows git to change the
verbosity and such of our helper.
signed-tags
means that when git creates a fast-export stream for the export
command, it will pass --signed-tags=verbatim
to git-fast-export
.
no-private-update
instructs git to not update a private ref when it's been
successfully pushed. I've never seemed to need this feature.
refspec $refspec
tells git what refspec we want to use.
The *import-marks $gitmarks
and *export-marks $gitmarks
means git should
save the marks it generates to the gitmarks
files. The *
means that if git
does not understand these lines, it must fail instead of ignoring them. This
is because the helper depends on the marks files being saved, and won't work
with versions of git that don't support this.
Let's ignore signed-tags
, no-private-update
and option
, as they are
provided in git-remote-testgit for completeness of testing, and we don't need
them for this example. We can implement the above simply as
case command == "capabilities\n":
fmt.Printf("import\n")
fmt.Printf("export\n")
fmt.Printf("refspec %s\n", refspec)
fmt.Printf("*import-marks %s\n", gitmarks)
fmt.Printf("*export-marks %s\n", gitmarks)
fmt.Printf("\n")
The list
command
The next command is list
. This isn't provided in the capabilities list
because it must always be supported by the helper.
When the helper receives a list
command, it should print out the refs of the
remote repository as a series of lines of the format $objectname $refname
,
followed by an empty line. $refname
is the name of the ref, while
$objectname
is what the ref points to. $objectname
can be a commit hash,
refer to another ref by name with @$refname
, or be ?
, which means the ref's
value was unable to be acquired.
git-remote-testgit's implementation is the following.
git for-each-ref --format='? %(refname)' 'refs/heads/'
head=$(git symbolic-ref HEAD)
echo "@$head HEAD"
echo
Remembering that $GIT_DIR
causes git for-each-ref
to run in the remote
repository, this will print a line ? $refname
for every branch in the remote
repository, as well as @$head HEAD
, where $head
is the name of the ref that
the HEAD of the repository refers to.
In an ordinary repository with two branches, master and development, the output
of this might look like
? refs/heads/master
? refs/heads/development
@refs/heads/master HEAD
<blank>
Now let's write it ourselves. Let's write a function GitListRefs()
, because
we'll need it again later.
// Add "os/exec" and "bytes" to the import list.
// Returns a map of refnames to objectnames.
func GitListRefs() (map[string]string, error) {
out, err := exec.Command(
"git", "for-each-ref", "--format=%(objectname) %(refname)",
"refs/heads/",
).Output()
if err != nil {
return nil, err
}
lines := bytes.Split(out, []byte{'\n'})
refs := make(map[string]string, len(lines))
for _, line := range lines {
fields := bytes.Split(line, []byte{' '})
if len(fields) < 2 {
break
}
refs[string(fields[1])] = string(fields[0])
}
return refs, nil
}
Now we'll write GitSymbolicRef()
.
func GitSymbolicRef(name string) (string, error) {
out, err := exec.Command("git", "symbolic-ref", name).Output()
if err != nil {
return "", fmt.Errorf(
"GitSymbolicRef: git symbolic-ref %s: %v", name, out, err)
}
return string(bytes.TrimSpace(out)), nil
}
We can implement the list
command like so.
case command == "list\n":
refs, err := GitListRefs()
if err != nil {
return fmt.Errorf("command list: %v", err)
}
head, err := GitSymbolicRef("HEAD")
if err != nil {
return fmt.Errorf("command list: %v", err)
}
for refname := range refs {
fmt.Printf("? %s\n", refname)
}
fmt.Printf("@%s HEAD\n", head)
fmt.Printf("\n")
The import
command
Next up is the import
command, which git uses when trying to fetch or clone.
This command actually comes in a batch; it is sent as a series of lines import $refname
followed by a blank line. When git sends this command to the helper,
it executes the git fast-import
binary, and pipes the helper's stdout into
its stdin. In other words, the helper is expected to return a git fast-export
stream on stdout.
Let's look at git-remote-testgit's implementation.
# read all import lines
while true
do
ref="${line#* }"
refs="$refs $ref"
read line
test "${line%% *}" != "import" && break
done
if test -n "$gitmarks"
then
echo "feature import-marks=$gitmarks"
echo "feature export-marks=$gitmarks"
fi
if test -n "$GIT_REMOTE_TESTGIT_FAILURE"
then
echo "feature done"
exit 1
fi
echo "feature done"
git fast-export \
${testgitmarks:+"--import-marks=$testgitmarks"} \
${testgitmarks:+"--export-marks=$testgitmarks"} \
$refs |
sed -e "s#refs/heads/#${prefix}/heads/#g"
echo "done"
The loop at the top, true to the comment, accumulates all the import $refname
commands into a single variable $refs
, which is a list of the refs separated
by spaces.
Following that, if the script is using a gitmarks file (which we're assuming it
is), it prints out feature import-marks=$gitmarks
and feature export-marks=$gitmarks
. This tells git to pass --import-marks=$gitmarks
and
--export-marks=$gitmarks
to git fast-import.
The next branch fails the helper if $GIT_REMOTE_TESTGIT_FAILURE
is set for
testing purposes.
After that, feature done
is printed, signalling that the export stream
follows.
Finally, git fast-export is called in the remote repository, setting the marks
files to the remote marks, $testgitmarks
, and then passing the list of refs
we want to export.
The output of git-fast-export is piped through a sed script that maps
refs/heads/
to refs/testgit/$alias/heads/
. The refspec that we passed to
git will take care of this mapping when we export.
After the export stream, done
is printed.
Let's try this in go.
case strings.HasPrefix(command, "import "):
refs := make([]string, 0)
for {
// Have to make sure to trim the trailing newline.
ref := strings.TrimSpace(strings.TrimPrefix(command, "import "))
refs = append(refs, ref)
command, err = stdinReader.ReadString('\n')
if err != nil {
return err
}
if !strings.HasPrefix(command, "import ") {
break
}
}
fmt.Printf("feature import-marks=%s\n", gitmarks)
fmt.Printf("feature export-marks=%s\n", gitmarks)
fmt.Printf("feature done\n")
args := []string{
"fast-export",
"--import-marks", gomarks,
"--export-marks", gomarks,
"--refspec", refspec}
args = append(args, refs...)
cmd := exec.Command("git", args...)
cmd.Stderr = os.Stderr
cmd.Stdout = os.Stdout
if err := cmd.Run(); err != nil {
return fmt.Errorf("command import: git fast-export: %v", err)
}
fmt.Printf("done\n")
The export
command
Next up is the export
command. When we finish this one, our helper is done.
Git issues this command when we are pushing to the remote repository. After
sending the command over stdin, git follows it with a stream produced by git fast-export
, which we can git fast-import
into the remote repository.
if test -n "$GIT_REMOTE_TESTGIT_FAILURE"
then
# consume input so fast-export doesn't get SIGPIPE;
# git would also notice that case, but we want
# to make sure we are exercising the later
# error checks
while read line; do
test "done" = "$line" && break
done
exit 1
fi
before=$(git for-each-ref --format=' %(refname) %(objectname) ')
git fast-import \
${force:+--force} \
${testgitmarks:+"--import-marks=$testgitmarks"} \
${testgitmarks:+"--export-marks=$testgitmarks"} \
--quiet
# figure out which refs were updated
git for-each-ref --format='%(refname) %(objectname)' |
while read ref a
do
case "$before" in
*" $ref $a "*)
continue ;; # unchanged
esac
if test -z "$GIT_REMOTE_TESTGIT_PUSH_ERROR"
then
echo "ok $ref"
else
echo "error $ref $GIT_REMOTE_TESTGIT_PUSH_ERROR"
fi
done
echo
The first if statement is, again, just for testing purposes.
The next line is more interesting. It creates a space separated list of $refname $objectname
pairs of the refs which we will use to determine which
refs were updated in the import.
The next command is rather self explanatory. git fast-import
is run on the
stream we receive on stdin, passing --force
if specified, --quiet
, and the
remote marks files.
Next it runs git for-each-ref
again to see what refs have changed. For every
ref this command returns, it checks to see if the $refname $objectname
pair
is in the $before
list. If it is, nothing changed and it continues onto the
next. If the ref isn't in the list, however, it prints ok $refname
to
signify to git that the ref updated successfully. Printing error $refname $message
tells git that a ref failed to be imported on the remote end.
Finally, it prints a blank line to show that the import is done.
Now we can write it ourselves. We can use the GitListRefs()
function we
defined earlier.
case command == "export\n":
beforeRefs, err := GitListRefs()
if err != nil {
return fmt.Errorf("command export: collecting before refs: %v", err)
}
cmd := exec.Command("git", "fast-import", "--quiet",
"--import-marks="+gomarks,
"--export-marks="+gomarks)
cmd.Stderr = os.Stderr
cmd.Stdin = os.Stdin
if err := cmd.Run(); err != nil {
return fmt.Errorf("command export: git fast-import: %v", err)
}
afterRefs, err := GitListRefs()
if err != nil {
return fmt.Errorf("command export: collecting after refs: %v", err)
}
for refname, objectname := range afterRefs {
if beforeRefs[refname] != objectname {
fmt.Printf("ok %s\n", refname)
}
}
fmt.Printf("\n")
Trying it out
Run go install
, which should build and install git-remote-go
to go/bin
.
You can try the following; first we create two empty git repositories, then
make a commit in testlocal, and push it to testremote using our new helper.
$ cd $HOME
$ git init testremote
Initialized empty Git repository in $HOME/testremote/.git/
$ git init testlocal
Initialized empty Git repository in $HOME/testlocal/.git/
$ cd testlocal
$ echo 'Hello, world!' >hello.txt
$ git add hello.txt
$ git commit -m "First commit."
[master (root-commit) 50d3a83] First commit.
1 file changed, 1 insertion(+)
create mode 100644 hello.txt
$ git remote add origin go::$HOME/testremote
$ git push --all origin
To go::$HOME/testremote
* [new branch] master -> master
$ cd ../testremote
$ git checkout master
$ ls
hello.txt
$ cat hello.txt
Hello, world!
Uses for git remote helpers
Git remote helpers have been used to implement interfaces to other source
control (like
felipec/git-remote-hg), or push
code into CouchDBs
(peritus/git-remote-couch),
among others. You could probably think of more.
I wrote a git remote helper for my original motivation,
git-remote-grave. You can use
it to push and fetch from encrypted archives on your file system or over
HTTP/HTTPS.
$ git remote add usb grave::/media/usb/backup.grave
$ git push --all backup
Using a couple compression
tricks, the
archives are typically 22% the size of the original repository.
Discussion of this article is taking place on Hacker
News and
/r/programming.