How to write a new git protocol

2020-01-30

I used to have trouble keeping track of my files. I often couldn't remember whether I saved a file on my desktop, laptop, or phone, or if it was floating around in the cloud somewhere. Plus, with certain information, like passwords and bitcoin keys, I didn't feel comfortable just sending that in an email to myself in plain text.

What I wanted was to store my data in a git repository that was backed up to a single location. I could view old versions of files, and wouldn't have to worry about my data being deleted. Plus, I was familiar with using git to push and fetch files to various computers.

But, like I said, I didn't want to just upload my secret keys and passwords to GitHub or BitBucket, even in a private repository.

I had the cool idea of writing a tool to encrypt my repository before I pushed it into backup. Unfortunately, I wouldn't be able to use git push like I normally would, and instead would have to use something like this:

$ encrypted-git push http://example.com/

At least, that's what I thought until I discovered git-remote-helpers.

Git remote helpers

Online, I found the documentation for git remote helpers.

It turns out that if you were to run the commands

$ git remote add origin asdf://example.com/repo
$ git push --all origin

Git would first check if it had the asdf protocol built in, and when it saw it didn't, it would check if git-remote-asdf was on the PATH, and if it was, it'd run git-remote-asdf origin asdf://example.com/repo to handle the communications.

Similarly, you can also run

$ git clone asdf::http://example.com/repo

Which will cause git to invoke git-remote-asdf origin http://example.com/repo.

Unfortunately, I found the documentation to be severely lacking on the details I needed to actually implement a helper. But then, in the Git source code, I found a shell script called git-remote-testgit.sh that implements a testgit which is used to test the git remote helper system. It basically implements pushing and fetching from local repositories on the same filesystem. So

git clone testgit::/existing-repository

is equivalent to

git clone /existing-repository

Similarly, you can push and fetch from local repositories over the testgit protocol.

In this article, we'll walk through the code of git-remote-testgit and reimplement it in Go by creating a brand new helper, git-remote-go. Along the way, I'll explain what the code means, and the various things I had to learn in order to implement my own remote helper, git-remote-grave.

Some basics

To make the following sections clearer, let's establish some terminology and basic mechanisms.

When we run

$ git remote add myremote go::http://example.com/repo
$ git push myremote master

Git will instantiate a new process by running the command

git-remote-go myremote http://example.com/repo

Notice that the first argument is the remote name, and the second argument is the url.

When you run

$ git clone go::http://example.com/repo

the helper will be instantiated with

git-remote-go origin http://example.com/repo

This is because the remote origin is automatically created in cloned repositories.

When git instanties the helper as a new process, it opens up pipes for stdin, stdout, and stderr for communicating with it. Commands are sent to the helper over stdin, and the helper responds over stdout. Any output the helper produces on stderr is redirected to wherever git's stderr is going—which is probably the terminal.

The following diagram demonstrates this relationship.

The last point I want to make before we begin is to distinguish the local and remote repository. Generally, but not always, the local repository is the one we are running git from, and the remote is the one we are making a connection to.

So in a push, we are sending changes from the local to the remote. In a fetch, we are taking changes from the remote to the local. In a clone, we are cloning from the remote into the local.

When git runs the helper, it sets the environment variable GIT_DIR to the Git directory of the local repository (e.g. local/.git).

Starting the project

In this article, I'm assuming that Go is installed, with $GOPATH pointing to a directory named go.

Let's start by creating the directory go/src/git-remote-go. This will make it possible to install our helper just by running go install (assuming go/bin is on the PATH).

With this in mind, we can write the first few lines of go/src/git-remote-go/main.go.

package main

import (
  "log"
  "os"
)

func Main() (er error) {
  if len(os.Args) < 3 {
    return fmt.Errorf("Usage: git-remote-go remote-name url")
  }

  remoteName := os.Args[1]
  url := os.Args[2]
}

func main() {
  if err := Main(); err != nil {
    log.Fatal(err)
  }
}

I've separated Main() as a separate function because error handling is easier when we can return errors. It also allows us to use defer, since log.Fatal calls os.Exit, which doesn't run defer'd functions.

Now let's look at the top of git-remote-testgit to see what to do next.

#!/bin/sh
# Copyright (c) 2012 Felipe Contreras

alias=$1
url=$2

dir="$GIT_DIR/testgit/$alias"
prefix="refs/testgit/$alias"

default_refspec="refs/heads/*:${prefix}/heads/*"

refspec="${GIT_REMOTE_TESTGIT_REFSPEC-$default_refspec}"

test -z "$refspec" && prefix="refs"

GIT_DIR="$url/.git"
export GIT_DIR

force=

mkdir -p "$dir"

if test -z "$GIT_REMOTE_TESTGIT_NO_MARKS"
then
  gitmarks="$dir/git.marks"
  testgitmarks="$dir/testgit.marks"
  test -e "$gitmarks" || >"$gitmarks"
  test -e "$testgitmarks" || >"$testgitmarks"
fi

The variable they call alias is what we are calling remoteName. url means the same thing.

The next declaration is

dir="$GIT_DIR/testgit/$alias"

This creates a namespace in the Git directory that is specific to the testgit protocol and to the remote we are using. This way the testgit files for the origin remote are different from the backup remote.

Down below, we see the statement

mkdir -p "$dir"

This will make sure the local directory is created, if it doesn't exist already.

Let's add the creation of the local directory to our Go program.

// Add "path" to the import list

localdir := path.Join(os.Getenv("GIT_DIR"), "go", remoteName)

if err := os.MkdirAll(localdir, 0755); err != nil {
  return err
}

Continuing through the script, we come across the following lines

prefix="refs/testgit/$alias"

default_refspec="refs/heads/*:${prefix}/heads/*"

refspec="${GIT_REMOTE_TESTGIT_REFSPEC-$default_refspec}"

test -z "$refspec" && prefix="refs"

Let's talk about refs really quick.

In git, refs are stored in .git/refs:

.git
└── refs
    ├── heads
    │   └── master
    ├── remotes
    │   ├── gravy
    │   └── origin
    │       └── master
    └── tags

In the above tree, remotes/origin/master contains the SHA-hash of the most recent commit in the master branch of the origin remote. heads/master refers to the most recent commit of your local master branch. A ref is like a pointer to a commit.

A refspec allows us to map remote refs to local refs. In the above code, prefix is the directory where the remote refs will be held. If the remote name is origin, then the remote master branch would be determined by the ref .git/refs/testgit/origin/master. It basically creates a protocol-specific namespace for remote branches.

The next line is the actual refspec. The line

default_refspec="refs/heads/*:${prefix}/heads/*"

expands to

default_refspec="refs/heads/*:refs/testgit/$alias/*"

Which means that map the remote branches that look like refs/heads/* (where * means any text) to refs/testgit/$alias/* (where * is replaced with whatever * was in the first one). So refs/heads/master becomes refs/testgit/origin/master, for instance.

Essentially, the refspec allows testgit to add a branch to the tree for itself, like this

.git
└── refs
    ├── heads
    │   └── master
    ├── remotes
    │   └── origin
    │       └── master
    ├── testgit
    │   └── origin
    │       └── master
    └── tags

The next line

refspec="${GIT_REMOTE_TESTGIT_REFSPEC-$default_refspec}"

Sets $refspec to $GIT_REMOTE_TESTGIT_REFSPEC, unless it doesn't exist, then it becomes $default_refspec. This is so testgit can be tested with other refspecs. We'll assume it gets set to $default_refspec.

Finally, the next line,

test -z "$refspec" && prefix="refs"

Seems to set $prefix to refs if $GIT_REMOTE_TESTGIT_REFSPEC exists but is empty, which we'll assume is the case.

We need our own refspec, so we'll add the line

refspec := fmt.Sprintf("refs/heads/*:refs/go/%s/*", remoteName)

Following that code, we see

GIT_DIR="$url/.git"
export GIT_DIR

Another fact about $GIT_DIR is that if it is set in the environment, the git binary will use the directory in $GIT_DIR as its .git directory, instead of the local .git. This command makes it so that all future git commands run by the helper will run in the context of the remote repository.

We'll translate this to

if err := os.Setenv("GIT_DIR", path.Join(url, ".git")); err != nil {
  return err
}

Remember, of course, that $dir and our variable localdir still refer to a subdirectory of the repository we are fetching to or pushing from.

And the last bit of code before the main loop is

if test -z "$GIT_REMOTE_TESTGIT_NO_MARKS"
then
  gitmarks="$dir/git.marks"
  testgitmarks="$dir/testgit.marks"
  test -e "$gitmarks" || >"$gitmarks"
  test -e "$testgitmarks" || >"$testgitmarks"
fi

The contents of the if statement will be executed if $GIT_REMOTE_TESTGIT_NO_MARKS isn't set, which we'll assume is the case.

These marks files are used by git fast-export and git fast-import to record information about refs and blobs being transferred. It's important that these marks are kept the same between multiple invocations of the helper, so they're being stored in the localdir.

Here, $gitmarks refers to the marks for our local repository that git writes, while $testgitmarks stores the marks for the remote one that the handler writes.

The two following lines appear equivalent to touch invocations, where if the marks files don't exist, they are created empty.

test -e "$gitmarks" || >"$gitmarks"
test -e "$testgitmarks" || >"$testgitmarks"

We'll need these files in our own program, so let's start by writing a Touch function.

// Create path as an empty file if it doesn't exist, otherwise do nothing.
// This works by opening a file in exclusive mode; if it already exists,
// an error will be returned rather than truncating it.
func Touch(path string) error {
  file, err := os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_EXCL, 0666)
  if os.IsExist(err) {
    return nil
  } else if err != nil {
    return err
  }

  return file.Close()
}

Now we can create the marks files.

gitmarks := path.Join(localdir, "git.marks")
gomarks := path.Join(localdir, "go.marks")

if err := Touch(gitmarks); err != nil {
  return err
}

if err := Touch(gomarks); err != nil {
  return err
}

However, one thing I've come across is that if the helper fails for some reason, the marks files can be left in an invalid state. To guard against this, we can save the original contents of the files, and then rewrite them if the Main() function returns an error.

// add "io/ioutil" to imports

originalGitmarks, err := ioutil.ReadFile(gitmarks)
if err != nil {
  return err
}

originalGomarks, err := ioutil.ReadFile(gomarks)
if err != nil {
  return err
}

defer func() {
  if er != nil {
    ioutil.WriteFile(gitmarks, originalGitmarks, 0666)
    ioutil.WriteFile(gomarks, originalGomarks, 0666)
  }
}()

We can finally begin on the central command loop.

Commands are passed to helper via stdin, where each command is a string terminated by a newline. The helper responds to the commands via stdout; stderr is piped to the end user.

Let's make our own loop.

// Add "bufio" to import list.

stdinReader := bufio.NewReader(os.Stdin)

for {
  // Note that command will include the trailing newline.
  command, err := stdinReader.ReadString('\n')
  if err != nil {
    return err
  }

  switch {
  case command == "capabilities\n":
    // ...
  case command == "\n":
    return nil
  default:
    return fmt.Errorf("Received unknown command %q", command)
  }
}

The capabilities command

The first command to implement is capabilities. The helper is expected to print what commands and other capabilities it supports on separate lines, terminated by an empty line.

echo 'import'
echo 'export'
test -n "$refspec" && echo "refspec $refspec"
if test -n "$gitmarks"
then
  echo "*import-marks $gitmarks"
  echo "*export-marks $gitmarks"
fi
test -n "$GIT_REMOTE_TESTGIT_SIGNED_TAGS" && echo "signed-tags"
test -n "$GIT_REMOTE_TESTGIT_NO_PRIVATE_UPDATE" && echo "no-private-update"
echo 'option'
echo

This list of capabilities states that this helper supports the import, export and option commands. The option command allows git to change the verbosity and such of our helper.

signed-tags means that when git creates a fast-export stream for the export command, it will pass --signed-tags=verbatim to git-fast-export.

no-private-update instructs git to not update a private ref when it's been successfully pushed. I've never seemed to need this feature.

refspec $refspec tells git what refspec we want to use.

The *import-marks $gitmarks and *export-marks $gitmarks means git should save the marks it generates to the gitmarks files. The * means that if git does not understand these lines, it must fail instead of ignoring them. This is because the helper depends on the marks files being saved, and won't work with versions of git that don't support this.

Let's ignore signed-tags, no-private-update and option, as they are provided in git-remote-testgit for completeness of testing, and we don't need them for this example. We can implement the above simply as

case command == "capabilities\n":
  fmt.Printf("import\n")
  fmt.Printf("export\n")
  fmt.Printf("refspec %s\n", refspec)
  fmt.Printf("*import-marks %s\n", gitmarks)
  fmt.Printf("*export-marks %s\n", gitmarks)
  fmt.Printf("\n")

The list command

The next command is list. This isn't provided in the capabilities list because it must always be supported by the helper.

When the helper receives a list command, it should print out the refs of the remote repository as a series of lines of the format $objectname $refname, followed by an empty line. $refname is the name of the ref, while $objectname is what the ref points to. $objectname can be a commit hash, refer to another ref by name with @$refname, or be ?, which means the ref's value was unable to be acquired.

git-remote-testgit's implementation is the following.

git for-each-ref --format='? %(refname)' 'refs/heads/'
head=$(git symbolic-ref HEAD)
echo "@$head HEAD"
echo

Remembering that $GIT_DIR causes git for-each-ref to run in the remote repository, this will print a line ? $refname for every branch in the remote repository, as well as @$head HEAD, where $head is the name of the ref that the HEAD of the repository refers to.

In an ordinary repository with two branches, master and development, the output of this might look like

? refs/heads/master
? refs/heads/development
@refs/heads/master HEAD
<blank>

Now let's write it ourselves. Let's write a function GitListRefs(), because we'll need it again later.

// Add "os/exec" and "bytes" to the import list.

// Returns a map of refnames to objectnames.
func GitListRefs() (map[string]string, error) {
  out, err := exec.Command(
    "git", "for-each-ref", "--format=%(objectname) %(refname)",
    "refs/heads/",
  ).Output()
  if err != nil {
    return nil, err
  }

  lines := bytes.Split(out, []byte{'\n'})
  refs := make(map[string]string, len(lines))

  for _, line := range lines {
    fields := bytes.Split(line, []byte{' '})

    if len(fields) < 2 {
      break
    }

    refs[string(fields[1])] = string(fields[0])
  }

  return refs, nil
}

Now we'll write GitSymbolicRef().

func GitSymbolicRef(name string) (string, error) {
  out, err := exec.Command("git", "symbolic-ref", name).Output()
  if err != nil {
    return "", fmt.Errorf(
      "GitSymbolicRef: git symbolic-ref %s: %v", name, out, err)
  }

  return string(bytes.TrimSpace(out)), nil
}

We can implement the list command like so.

case command == "list\n":
  refs, err := GitListRefs()
  if err != nil {
    return fmt.Errorf("command list: %v", err)
  }

  head, err := GitSymbolicRef("HEAD")
  if err != nil {
    return fmt.Errorf("command list: %v", err)
  }

  for refname := range refs {
    fmt.Printf("? %s\n", refname)
  }

  fmt.Printf("@%s HEAD\n", head)
  fmt.Printf("\n")

The import command

Next up is the import command, which git uses when trying to fetch or clone. This command actually comes in a batch; it is sent as a series of lines import $refname followed by a blank line. When git sends this command to the helper, it executes the git fast-import binary, and pipes the helper's stdout into its stdin. In other words, the helper is expected to return a git fast-export stream on stdout.

Let's look at git-remote-testgit's implementation.

# read all import lines
while true
do
  ref="${line#* }"
  refs="$refs $ref"
  read line
  test "${line%% *}" != "import" && break
done

if test -n "$gitmarks"
then
  echo "feature import-marks=$gitmarks"
  echo "feature export-marks=$gitmarks"
fi

if test -n "$GIT_REMOTE_TESTGIT_FAILURE"
then
  echo "feature done"
  exit 1
fi

echo "feature done"
git fast-export \
    ${testgitmarks:+"--import-marks=$testgitmarks"} \
  ${testgitmarks:+"--export-marks=$testgitmarks"} \
  $refs |
sed -e "s#refs/heads/#${prefix}/heads/#g"
echo "done"

The loop at the top, true to the comment, accumulates all the import $refname commands into a single variable $refs, which is a list of the refs separated by spaces.

Following that, if the script is using a gitmarks file (which we're assuming it is), it prints out feature import-marks=$gitmarks and feature export-marks=$gitmarks. This tells git to pass --import-marks=$gitmarks and --export-marks=$gitmarks to git fast-import.

The next branch fails the helper if $GIT_REMOTE_TESTGIT_FAILURE is set for testing purposes.

After that, feature done is printed, signalling that the export stream follows.

Finally, git fast-export is called in the remote repository, setting the marks files to the remote marks, $testgitmarks, and then passing the list of refs we want to export.

The output of git-fast-export is piped through a sed script that maps refs/heads/ to refs/testgit/$alias/heads/. The refspec that we passed to git will take care of this mapping when we export.

After the export stream, done is printed.

Let's try this in go.

case strings.HasPrefix(command, "import "):
  refs := make([]string, 0)

  for {
    // Have to make sure to trim the trailing newline.
    ref := strings.TrimSpace(strings.TrimPrefix(command, "import "))

    refs = append(refs, ref)
    command, err = stdinReader.ReadString('\n')
    if err != nil {
      return err
    }

    if !strings.HasPrefix(command, "import ") {
      break
    }
  }

  fmt.Printf("feature import-marks=%s\n", gitmarks)
  fmt.Printf("feature export-marks=%s\n", gitmarks)
  fmt.Printf("feature done\n")

  args := []string{
    "fast-export",
    "--import-marks", gomarks,
    "--export-marks", gomarks,
    "--refspec", refspec}
  args = append(args, refs...)

  cmd := exec.Command("git", args...)
  cmd.Stderr = os.Stderr
  cmd.Stdout = os.Stdout

  if err := cmd.Run(); err != nil {
    return fmt.Errorf("command import: git fast-export: %v", err)
  }

  fmt.Printf("done\n")

The export command

Next up is the export command. When we finish this one, our helper is done.

Git issues this command when we are pushing to the remote repository. After sending the command over stdin, git follows it with a stream produced by git fast-export, which we can git fast-import into the remote repository.

if test -n "$GIT_REMOTE_TESTGIT_FAILURE"
then
  # consume input so fast-export doesn't get SIGPIPE;
  # git would also notice that case, but we want
  # to make sure we are exercising the later
  # error checks
  while read line; do
    test "done" = "$line" && break
  done
  exit 1
fi

before=$(git for-each-ref --format=' %(refname) %(objectname) ')

git fast-import \
  ${force:+--force} \
  ${testgitmarks:+"--import-marks=$testgitmarks"} \
  ${testgitmarks:+"--export-marks=$testgitmarks"} \
  --quiet

# figure out which refs were updated
git for-each-ref --format='%(refname) %(objectname)' |
while read ref a
do
  case "$before" in
  *" $ref $a "*)
    continue ;; # unchanged
  esac
  if test -z "$GIT_REMOTE_TESTGIT_PUSH_ERROR"
  then
    echo "ok $ref"
  else
    echo "error $ref $GIT_REMOTE_TESTGIT_PUSH_ERROR"
  fi
done

echo

The first if statement is, again, just for testing purposes.

The next line is more interesting. It creates a space separated list of $refname $objectname pairs of the refs which we will use to determine which refs were updated in the import.

The next command is rather self explanatory. git fast-import is run on the stream we receive on stdin, passing --force if specified, --quiet, and the remote marks files.

Next it runs git for-each-ref again to see what refs have changed. For every ref this command returns, it checks to see if the $refname $objectname pair is in the $before list. If it is, nothing changed and it continues onto the next. If the ref isn't in the list, however, it prints ok $refname to signify to git that the ref updated successfully. Printing error $refname $message tells git that a ref failed to be imported on the remote end.

Finally, it prints a blank line to show that the import is done.

Now we can write it ourselves. We can use the GitListRefs() function we defined earlier.

case command == "export\n":
  beforeRefs, err := GitListRefs()
  if err != nil {
    return fmt.Errorf("command export: collecting before refs: %v", err)
  }

  cmd := exec.Command("git", "fast-import", "--quiet",
    "--import-marks="+gomarks,
    "--export-marks="+gomarks)

  cmd.Stderr = os.Stderr
  cmd.Stdin = os.Stdin

  if err := cmd.Run(); err != nil {
    return fmt.Errorf("command export: git fast-import: %v", err)
  }

  afterRefs, err := GitListRefs()
  if err != nil {
    return fmt.Errorf("command export: collecting after refs: %v", err)
  }

  for refname, objectname := range afterRefs {
    if beforeRefs[refname] != objectname {
      fmt.Printf("ok %s\n", refname)
    }
  }

  fmt.Printf("\n")

Trying it out

Run go install, which should build and install git-remote-go to go/bin.

You can try the following; first we create two empty git repositories, then make a commit in testlocal, and push it to testremote using our new helper.

$ cd $HOME
$ git init testremote
Initialized empty Git repository in $HOME/testremote/.git/
$ git init testlocal
Initialized empty Git repository in $HOME/testlocal/.git/
$ cd testlocal
$ echo 'Hello, world!' >hello.txt
$ git add hello.txt
$ git commit -m "First commit."
[master (root-commit) 50d3a83] First commit.
 1 file changed, 1 insertion(+)
 create mode 100644 hello.txt
$ git remote add origin go::$HOME/testremote
$ git push --all origin
To go::$HOME/testremote
 * [new branch]      master -> master
$ cd ../testremote
$ git checkout master
$ ls
hello.txt
$ cat hello.txt
Hello, world!

Uses for git remote helpers

Git remote helpers have been used to implement interfaces to other source control (like felipec/git-remote-hg), or push code into CouchDBs (peritus/git-remote-couch), among others. You could probably think of more.

I wrote a git remote helper for my original motivation, git-remote-grave. You can use it to push and fetch from encrypted archives on your file system or over HTTP/HTTPS.

$ git remote add usb grave::/media/usb/backup.grave
$ git push --all backup

Using a couple compression tricks, the archives are typically 22% the size of the original repository.

Discussion of this article is taking place on Hacker News and /r/programming.