I used to have trouble keeping track of my files. I often couldn’t
remember whether I saved a file on my desktop, laptop, or phone, or
if it was floating around in the cloud somewhere. Plus, with certain
information, like passwords and bitcoin keys, I didn’t feel
comfortable just sending that in an email to myself in plain text.
What I wanted was to store my data in a git repository that was
backed up to a single location. I could view old versions of files,
and wouldn’t have to worry about my data being deleted. Plus, I was
familiar with using git to push and fetch files to various computers.
But, like I said, I didn’t want to just upload my secret keys and
passwords to GitHub or BitBucket, even in a private repository.
I had the cool idea of writing a tool to encrypt my repository before
I pushed it into backup. Unfortunately, I wouldn’t be able to use
“git push” like I normally would, and instead would have to use
something like this:
$ encrypted-git push http://example.com/
At least, that’s what I thought until I discovered
git-remote-helpers.
Git remote helpers
Online, I found the documentation for git remote
helpers.
It turns out that if you were to run the commands
$ git remote add origin asdf://example.com/repo
$ git push --all origin
Git would first check if it had the asdf protocol built in, and when
it saw it didn’t, it would check if git-remote-asdf was on the PATH,
and if it was, it’d run “git-remote-asdf origin
asdf://example.com/repo” to handle the communications.
Similarly, you can also run
$ git clone asdf::http://example.com/repo
Which will cause git to invoke “git-remote-asdf origin
http://example.com/repo”.
Unfortunately, I found the documentation to be severely lacking on the details
I needed to actually implement a helper. But then, in the Git source code, I
found a shell script called git-remote-testgit.sh
that implements a “testgit” which is used to test the git remote helper
system. It basically implements pushing and fetching from local repositories
on the same filesystem. So
git clone testgit::/existing-repository
is equivalent to
git clone /existing-repository
Similarly, you can push and fetch from local repositories over the
testgit protocol.
In this article, we’ll walk through the code of git-remote-testgit
and reimplement it in Go by creating a brand new helper,
git-remote-go. Along the way, I’ll explain what the code means, and
the various things I had to learn in order to implement my own remote
helper,
git-remote-grave.
Some basics
To make the following sections clearer, let’s establish some
terminology and basic mechanisms.
When we run
$ git remote add myremote go::http://example.com/repo
$ git push myremote master
Git will instantiate a new process by running the command
git-remote-go myremote http://example.com/repo
Notice that the first argument is the remote name, and the second
argument is the URL.
When you run
$ git clone go::http://example.com/repo
the helper will be instantiated with
git-remote-go origin http://example.com/repo
This is because the remote “origin” is automatically created in
cloned repositories.
When git instanties the helper as a new process, it opens up pipes
for stdin, stdout, and stderr for communicating with it. Commands are
sent to the helper over stdin, and the helper responds over stdout.
Any output the helper produces on stderr is redirected to wherever
git’s stderr is going—which is probably the terminal.
The last point I want to make before we begin is to distinguish the
local and remote repository. Generally, but not always, the local
repository is the one we are running git from, and the remote is the
one we are making a connection to.
So in a push, we are sending changes from the local to the remote. In
a fetch, we are taking changes from the remote to the local. In a
clone, we are cloning from the remote into the local.
When git runs the helper, it sets the environment variable GIT_DIR
to the Git directory of the local repository (e.g. local/.git).
Starting the project
In this article, I’m assuming that Go is
installed, with $GOPATH pointing to
a directory named “go.”
Let’s start by creating the directory go/src/git-remote-go. This
will make it possible to install our helper just by running “go
install” (assuming go/bin is on the PATH).
With this in mind, we can write the first few lines of
go/src/git-remote-go/main.go.
package main
import (
"log"
"os"
)
func Main() error {
if len(os.Args) < 3 {
return fmt.Errorf("Usage: git-remote-go remote-name url")
}
remoteName := os.Args[1]
url := os.Args[2]
return nil
}
func main() {
if err := Main(); err != nil {
log.Fatal(err)
}
}
I’ve separated Main() as a separate function because error handling
is easier when we can return errors. It also allows us to use defer,
since log.Fatal calls os.Exit, which doesn’t run deferred
functions.
Now let’s look at the top of git-remote-testgit to see what to do
next.
#!/bin/sh
# Copyright (c) 2012 Felipe Contreras
alias=$1
url=$2
dir="$GIT_DIR/testgit/$alias"
prefix="refs/testgit/$alias"
default_refspec="refs/heads/*:${prefix}/heads/*"
refspec="${GIT_REMOTE_TESTGIT_REFSPEC-$default_refspec}"
test -z "$refspec" && prefix="refs"
GIT_DIR="$url/.git"
export GIT_DIR
force=
mkdir -p "$dir"
if test -z "$GIT_REMOTE_TESTGIT_NO_MARKS"
then
gitmarks="$dir/git.marks"
testgitmarks="$dir/testgit.marks"
test -e "$gitmarks" || >"$gitmarks"
test -e "$testgitmarks" || >"$testgitmarks"
fi
The variable they call “alias” is what we are calling remoteName.
url means the same thing.
The next declaration is
dir="$GIT_DIR/testgit/$alias"
This creates a namespace in the Git directory that is specific to the
testgit protocol and to the remote we are using. This way the testgit
files for the origin remote are different from the backup remote.
Down below, we see the statement
This will make sure the local directory is created, if it doesn’t
exist already.
Let’s add the creation of the local directory to our Go program.
localdir := path.Join(os.Getenv("GIT_DIR"), "go", remoteName)
if err := os.MkdirAll(localdir, 0755); err != nil {
return err
}
Continuing through the script, we come across the following lines
prefix="refs/testgit/$alias"
default_refspec="refs/heads/*:${prefix}/heads/*"
refspec="${GIT_REMOTE_TESTGIT_REFSPEC-$default_refspec}"
test -z "$refspec" && prefix="refs"
Let’s talk about refs really quick.
In git, refs are stored in .git/refs:
.git
└── refs
├── heads
│ └── master
├── remotes
│ ├── gravy
│ └── origin
│ └── master
└── tags
In the above tree, remotes/origin/master contains the SHA-hash of
the most recent commit in the master branch of the origin remote.
heads/master refers to the most recent commit of your local master
branch. A ref is like a pointer to a commit.
A refspec allows us to map remote refs to local refs. In the above
code, prefix is the directory where the remote refs will be held. If
the remote name is origin, then the remote master branch would be
determined by the ref .git/refs/testgit/origin/master. It basically
creates a protocol-specific namespace for remote branches.
The next line is the actual refspec. The line
default_refspec="refs/heads/*:${prefix}/heads/*"
expands to
default_refspec="refs/heads/*:refs/testgit/$alias/*"
Which means that map the remote branches that look like refs/heads/_
(where _ means any text) to refs/testgit/$alias/_ (where _ is
replaced with whatever * was in the first one). So
refs/heads/master becomes refs/testgit/origin/master, for
instance.
Essentially, the refspec allows testgit to add a branch to the tree
for itself, like this
.git
└── refs
├── heads
│ └── master
├── remotes
│ └── origin
│ └── master
├── testgit
│ └── origin
│ └── master
└── tags
The next line
refspec="${GIT_REMOTE_TESTGIT_REFSPEC-$default_refspec}"
Sets $refspec to $GIT_REMOTE_TESTGIT_REFSPEC, unless it doesn’t
exist, then it becomes $default_refspec. This is so testgit can be
tested with other refspecs. We’ll assume it gets set to
$default_refspec.
Finally, the next line,
test -z "$refspec" && prefix="refs"
Seems to set $prefix to refs if $GIT_REMOTE_TESTGIT_REFSPEC
exists but is empty, which we’ll assume is the case.
We need our own refspec, so we’ll add the line
refspec := fmt.Sprintf("refs/heads/*:refs/go/%s/*", remoteName)
Following that code, we see
GIT_DIR="$url/.git"
export GIT_DIR
Another fact about $GIT_DIR is that if it is set in the environment,
the git binary will use the directory in $GIT_DIR as its .git
directory, instead of the local .git. This command makes it so that
all future git commands run by the helper will run in the context of
the remote repository.
We’ll translate this to
if err := os.Setenv("GIT_DIR", path.Join(url, ".git")); err != nil {
return err
}
Remember, of course, that $dir and our variable localdir still refer
to a subdirectory of the repository we are fetching to or pushing
from.
And the last bit of code before the main loop is
if test -z "$GIT_REMOTE_TESTGIT_NO_MARKS"
then
gitmarks="$dir/git.marks"
testgitmarks="$dir/testgit.marks"
test -e "$gitmarks" || >"$gitmarks"
test -e "$testgitmarks" || >"$testgitmarks"
fi
The contents of the if statement will be executed if
$GIT_REMOTE_TESTGIT_NO_MARKS isn’t set, which we’ll assume is the
case.
These marks files are used by git fast-export and git fast-import
to record information about refs and blobs being transferred. It’s
important that these marks are kept the same between multiple
invocations of the helper, so they’re being stored in the localdir.
Here, $gitmarks refers to the marks for our local repository that
git writes, while $testgitmarks stores the marks for the remote one
that the handler writes.
The two following lines appear equivalent to “touch” invocations,
where if the marks files don’t exist, they are created empty.
test -e "$gitmarks" || >"$gitmarks"
test -e "$testgitmarks" || >"$testgitmarks"
We’ll need these files in our own program, so let’s start by writing
a Touch function.
// Create path as an empty file if it doesn't exist, otherwise do
// nothing. This works by opening a file in exclusive mode; if it
// already exists, an error will be returned rather than truncating
// it.
func Touch(path string) error {
file, err := os.OpenFile(
path, os.O_WRONLY|os.O_CREATE|os.O_EXCL, 0666,
)
if os.IsExist(err) {
return nil
} else if err != nil {
return err
}
return file.Close()
}
Now we can create the marks files.
gitmarks := path.Join(localdir, "git.marks")
gomarks := path.Join(localdir, "go.marks")
if err := Touch(gitmarks); err != nil {
return err
}
if err := Touch(gomarks); err != nil {
return err
}
However, one thing I’ve come across is that if the helper fails for
some reason, the marks files can be left in an invalid state. To
guard against this, we can save the original contents of the files,
and then rewrite them if the Main() function returns an error.
originalGitmarks, err := ioutil.ReadFile(gitmarks)
if err != nil {
return err
}
originalGomarks, err := ioutil.ReadFile(gomarks)
if err != nil {
return err
}
defer func() {
if retErr != nil {
if _, err := ioutil.WriteFile(gitmarks, originalGitmarks, 0666); err != nil {
log.Printf("Reverting %q: %s", gitmarks, err)
}
if _, err := ioutil.WriteFile(gomarks, originalGomarks, 0666); err != nil {
log.Printf("Reverting %q: %s", gomarks, err)
}
}
}()
We can finally begin on the central command loop.
Commands are passed to helper via stdin, where each command is a
string terminated by a newline. The helper responds to the commands
via stdout; stderr is piped to the end user.
Let’s make our own loop.
stdinReader := bufio.NewReader(os.Stdin)
for {
// Note that command will include the trailing newline.
command, err := stdinReader.ReadString('\n')
if err != nil {
return err
}
switch {
case command == "capabilities\n":
// ...
case command == "\n":
return nil
default:
return fmt.Errorf("Received unknown command %q", command)
}
}
The capabilities command
The first command to implement is “capabilities.” The helper is
expected to print what commands and other capabilities it supports on
separate lines, terminated by an empty line.
echo 'import'
echo 'export'
test -n "$refspec" && echo "refspec $refspec"
if test -n "$gitmarks"
then
echo "*import-marks $gitmarks"
echo "*export-marks $gitmarks"
fi
test -n "$GIT_REMOTE_TESTGIT_SIGNED_TAGS" && \
echo "signed-tags"
test -n "$GIT_REMOTE_TESTGIT_NO_PRIVATE_UPDATE" && \
echo "no-private-update"
echo 'option'
echo
This list of capabilities states that this helper supports the
import, export and option commands. The option command allows
git to change the verbosity and such of our helper.
signed-tags means that when git creates a fast-export stream for the
export command, it will pass --signed-tags=verbatim to
git-fast-export.
no-private-update instructs git to not update a private ref when it’s
been successfully pushed. I’ve never seemed to need this feature.
“refspec $refspec” tells git what refspec we want to use.
The “*import-marks $gitmarks” and “*export-marks $gitmarks” means git
should save the marks it generates to the gitmarks files. The *
means that if git does not understand these lines, it must fail
instead of ignoring them. This is because the helper depends on the
marks files being saved, and won’t work with versions of git that
don’t support this.
Let’s ignore signed-tags, no-private-update and option, as they are
provided in git-remote-testgit for completeness of testing, and we
don’t need them for this example. We can implement the above simply
as
case command == "capabilities\n":
fmt.Printf("import\n")
fmt.Printf("export\n")
fmt.Printf("refspec %s\n", refspec)
fmt.Printf("*import-marks %s\n", gitmarks)
fmt.Printf("*export-marks %s\n", gitmarks)
fmt.Printf("\n")
The list command
The next command is “list.” This isn’t provided in the capabilities
list because it must always be supported by the helper.
When the helper receives a list command, it should print out the
refs of the remote repository as a series of lines of the format
“$objectname $refname”, followed by an empty line. $refname is the
name of the ref, while $objectname is what the ref points to.
$objectname can be a commit hash, refer to another ref by name with
@$refname, or be “?”, which means the ref’s value was unable to be
acquired.
git-remote-testgit’s implementation is the following.
git for-each-ref --format='? %(refname)' 'refs/heads/'
head=$(git symbolic-ref HEAD)
echo "@$head HEAD"
echo
Remembering that $GIT_DIR causes “git for-each-ref” to run in the
remote repository, this will print a line “? $refname” for every
branch in the remote repository, as well as “@$head HEAD”, where
$head is the name of the ref that the HEAD of the repository refers
to.
In an ordinary repository with two branches, master and development,
the output of this might look like
? refs/heads/master
? refs/heads/development
@refs/heads/master HEAD
<blank>
Now let’s write it ourselves. Let’s write a function GitListRefs(),
because we’ll need it again later.
// Returns a map of refnames to objectnames.
func GitListRefs() (map[string]string, error) {
out, err := exec.Command(
"git", "for-each-ref", "--format=%(objectname) %(refname)",
"refs/heads/",
).Output()
if err != nil {
return nil, err
}
lines := bytes.Split(out, []byte{'\n'})
refs := make(map[string]string, len(lines))
for _, line := range lines {
fields := bytes.Split(line, []byte{' '})
if len(fields) < 2 {
break
}
refs[string(fields[1])] = string(fields[0])
}
return refs, nil
}
Now we’ll write GitSymbolicRef().
func GitSymbolicRef(name string) (string, error) {
out, err := exec.Command("git", "symbolic-ref", name).Output()
if err != nil {
return "", fmt.Errorf(
"GitSymbolicRef: git symbolic-ref %s: %w", name, out, err)
}
return string(bytes.TrimSpace(out)), nil
}
We can implement the list command like so.
case command == "list\n":
refs, err := GitListRefs()
if err != nil {
return fmt.Errorf("command list: %w", err)
}
head, err := GitSymbolicRef("HEAD")
if err != nil {
return fmt.Errorf("command list: %w", err)
}
for refname := range refs {
fmt.Printf("? %s\n", refname)
}
fmt.Printf("@%s HEAD\n", head)
fmt.Printf("\n")
The import command
Next up is the “import” command, which git uses when trying to fetch
or clone. This command actually comes in a batch; it is sent as a
series of lines “import $refname” followed by a blank line. When git
sends this command to the helper, it executes the “git fast-import”
binary, and pipes the helper’s stdout into its stdin. In other words,
the helper is expected to return a git fast-export stream on stdout.
Let’s look at git-remote-testgit’s implementation.
# read all import lines
while true
do
ref="${line#* }"
refs="$refs $ref"
read line
test "${line%% *}" != "import" && break
done
if test -n "$gitmarks"
then
echo "feature import-marks=$gitmarks"
echo "feature export-marks=$gitmarks"
fi
if test -n "$GIT_REMOTE_TESTGIT_FAILURE"
then
echo "feature done"
exit 1
fi
echo "feature done"
git fast-export \
${testgitmarks:+"--import-marks=$testgitmarks"} \
${testgitmarks:+"--export-marks=$testgitmarks"} \
$refs |
sed -e "s#refs/heads/#${prefix}/heads/#g"
echo "done"
The loop at the top, true to the comment, accumulates all the “import
$refname” commands into a single variable $refs, which is a list of
the refs separated by spaces.
Following that, if the script is using a gitmarks file (which we’re
assuming it is), it prints out “feature import-marks=$gitmarks” and
“feature export-marks=$gitmarks”. This tells git to pass
--import-marks=$gitmarks and --export-marks=$gitmarks to git
fast-import.
The next branch fails the helper if $GIT_REMOTE_TESTGIT_FAILURE is
set for testing purposes.
After that, “feature done” is printed, signalling that the export
stream follows.
Finally, git fast-export is called in the remote repository, setting
the marks files to the remote marks, $testgitmarks, and then passing
the list of refs we want to export.
The output of git-fast-export is piped through a sed script that maps
refs/heads/ to refs/testgit/$alias/heads/. The refspec that we
passed to git will take care of this mapping when we export.
After the export stream, “done” is printed.
Let’s try this in go.
case strings.HasPrefix(command, "import "):
refs := make([]string, 0)
for {
// Have to make sure to trim the trailing newline.
ref := strings.TrimSpace(strings.TrimPrefix(command, "import "))
refs = append(refs, ref)
command, err = stdinReader.ReadString('\n')
if err != nil {
return err
}
if !strings.HasPrefix(command, "import ") {
break
}
}
fmt.Printf("feature import-marks=%s\n", gitmarks)
fmt.Printf("feature export-marks=%s\n", gitmarks)
fmt.Printf("feature done\n")
args := []string{
"fast-export",
"--import-marks", gomarks,
"--export-marks", gomarks,
"--refspec", refspec}
args = append(args, refs...)
cmd := exec.Command("git", args...)
cmd.Stderr = os.Stderr
cmd.Stdout = os.Stdout
if err := cmd.Run(); err != nil {
return fmt.Errorf("command import: git fast-export: %w", err)
}
fmt.Printf("done\n")
The export command
Next up is the export command. When we finish this one, our helper
is done.
Git issues this command when we are pushing to the remote repository.
After sending the command over stdin, git follows it with a stream
produced by “git fast-export”, which we can “git fast-import” into
the remote repository.
if test -n "$GIT_REMOTE_TESTGIT_FAILURE"
then
# consume input so fast-export doesn't get SIGPIPE;
# git would also notice that case, but we want
# to make sure we are exercising the later
# error checks
while read line; do
test "done" = "$line" && break
done
exit 1
fi
before=$(git for-each-ref --format=' %(refname) %(objectname) ')
git fast-import \
${force:+--force} \
${testgitmarks:+"--import-marks=$testgitmarks"} \
${testgitmarks:+"--export-marks=$testgitmarks"} \
--quiet
# figure out which refs were updated
git for-each-ref --format='%(refname) %(objectname)' |
while read ref a
do
case "$before" in
*" $ref $a "*)
continue ;; # unchanged
esac
if test -z "$GIT_REMOTE_TESTGIT_PUSH_ERROR"
then
echo "ok $ref"
else
echo "error $ref $GIT_REMOTE_TESTGIT_PUSH_ERROR"
fi
done
echo
The first if statement is, again, just for testing purposes.
The next line is more interesting. It creates a space separated list
of “$refname $objectname” pairs of the refs which we will use to
determine which refs were updated in the import.
The next command is rather self explanatory. “git fast-import” is run
on the stream we receive on stdin, passing --force if specified,
--quiet, and the remote marks files.
Next it runs “git for-each-ref” again to see what refs have changed.
For every ref this command returns, it checks to see if the “$refname
$objectname” pair is in the $before list. If it is, nothing changed
and it continues onto the next. If the ref isn’t in the list,
however, it prints “ok $refname” to signify to git that the ref
updated successfully. Printing “error $refname $message” tells git
that a ref failed to be imported on the remote end.
Finally, it prints a blank line to show that the import is done.
Now we can write it ourselves. We can use the GitListRefs() function
we defined earlier.
case command == "export\n":
beforeRefs, err := GitListRefs()
if err != nil {
return fmt.Errorf("export: collecting before refs: %w", err)
}
cmd := exec.Command("git", "fast-import", "--quiet",
"--import-marks="+gomarks,
"--export-marks="+gomarks)
cmd.Stderr = os.Stderr
cmd.Stdin = os.Stdin
if err := cmd.Run(); err != nil {
return fmt.Errorf("export: git fast-import: %w", err)
}
afterRefs, err := GitListRefs()
if err != nil {
return fmt.Errorf("export: collecting after refs: %w", err)
}
for refname, objectname := range afterRefs {
if beforeRefs[refname] != objectname {
fmt.Printf("ok %s\n", refname)
}
}
fmt.Printf("\n")
Trying it out
Run “go install”, which should build and install git-remote-go to
go/bin.
You can try the following; first we create two empty git
repositories, then make a commit in testlocal, and push it to
testremote using our new helper.
$ cd $HOME
$ git init testremote
Initialized empty Git repository in $HOME/testremote/.git/
$ git init testlocal
Initialized empty Git repository in $HOME/testlocal/.git/
$ cd testlocal
$ echo 'Hello, world!' >hello.txt
$ git add hello.txt
$ git commit -m "First commit."
[master (root-commit) 50d3a83] First commit.
1 file changed, 1 insertion(+)
create mode 100644 hello.txt
$ git remote add origin go::$HOME/testremote
$ git push --all origin
To go::$HOME/testremote
* [new branch] master -> master
$ cd ../testremote
$ git checkout master
$ ls
hello.txt
$ cat hello.txt
Hello, world!
Uses for git remote helpers
Git remote helpers have been used to implement interfaces to other
source control (like
felipec/git-remote-hg),
or push code into CouchDBs
(peritus/git-remote-couch),
among others. You could probably think of more.
I wrote a git remote helper for my original motivation,
git-remote-grave. You
can use it to push and fetch from encrypted archives on your file
system or over HTTP/HTTPS.
$ git remote add usb grave::/media/usb/backup.grave
$ git push --all backup
Discussion of this article is taking place on Hacker
News and
/r/programming.