One of my favorite parts about Go is its unwavering focus on utility. Sometimes we place so much emphasis on language design that we forget all the other things programming involves. For example:
go get X
(for example go get code.google.com/p/go.net/websocket
)gofmt name_of_file.go
.go fix
which can automatically convert Go code designed for earlier versions to newer versionsgo test /path/to/package
. It can do benchmarks too.Those are just a few examples, but I want to focus on one that's not generally well known: Go can seamlessly use functions written in Assembly.
Suppose we want to write an assembly version of a sum
function. First create a file called sum.go
that contains this:
package sum
func Sum(xs []int64) int64 {
var n int64
for _, v := range xs {
n += v
}
return n
}
This function just adds a slice of integers and gives you the result. To test it create a file called sum_test.go
that contains this:
package sum
import (
"testing"
)
type (
testCase struct {
n int64
xs []int64
}
)
var (
cases = []testCase{
{ 0, []int64{} },
{ 15, []int64{1,2,3,4,5} },
}
)
func TestSum(t *testing.T) {
for _, tc := range cases {
n := Sum(tc.xs)
if tc.n != n {
t.Error("Expected", tc.n, "got", n, "for", tc.xs)
}
}
}
Writing tests for your code is generally a good idea, but it turns out for library code (anything not package main
) it also makes for a good way to experiment. Just type go test
from the command line and it will run your tests.
Now lets replace this function with one written in assembly. We can start by examining what the Go compiler produces. Instead of go test
or go build
run this command: go tool 6g -S sum.go
. (for a 64bit binary) You should see something like this:
--- prog list "Sum" --- 0000 (sum.go:3) TEXT Sum+0(SB),$16-24 0001 (sum.go:4) MOVQ $0,SI 0002 (sum.go:5) MOVQ xs+0(FP),BX 0003 (sum.go:5) MOVQ BX,autotmp_0000+-16(SP) 0004 (sum.go:5) MOVL xs+8(FP),BX 0005 (sum.go:5) MOVL BX,autotmp_0000+-8(SP) 0006 (sum.go:5) MOVL xs+12(FP),BX 0007 (sum.go:5) MOVL BX,autotmp_0000+-4(SP) 0008 (sum.go:5) MOVL $0,AX 0009 (sum.go:5) MOVL autotmp_0000+-8(SP),DI 0010 (sum.go:5) LEAQ autotmp_0000+-16(SP),BX 0011 (sum.go:5) MOVQ (BX),CX 0012 (sum.go:5) JMP ,14 0013 (sum.go:5) INCL ,AX 0014 (sum.go:5) CMPL AX,DI 0015 (sum.go:5) JGE ,20 0016 (sum.go:5) MOVQ (CX),BP 0017 (sum.go:5) ADDQ $8,CX 0018 (sum.go:6) ADDQ BP,SI 0019 (sum.go:5) JMP ,13 0020 (sum.go:8) MOVQ SI,.noname+16(FP) 0021 (sum.go:8) RET , sum.go:3: Sum xs does not escape
Assembly can be quite difficult to understand and we will take a look at this in more detail in a bit... but first lets go ahead and use this as a template. Create a new file called sum_amd64.s
in the same folder as sum.go
which contains this:
// func Sum(xs []int64) int64
TEXT ·Sum(SB),$0
MOVQ $0,SI
MOVQ xs+0(FP),BX
MOVQ BX,autotmp_0000+-16(SP)
MOVL xs+8(FP),BX
MOVL BX,autotmp_0000+-8(SP)
MOVL xs+12(FP),BX
MOVL BX,autotmp_0000+-4(SP)
MOVL $0,AX
MOVL autotmp_0000+-8(SP),DI
LEAQ autotmp_0000+-16(SP),BX
MOVQ (BX),CX
JMP L2
L1: INCL AX
L2: CMPL AX,DI
JGE L3
MOVQ (CX),BP
ADDQ $8,CX
ADDQ BP,SI
JMP L1
L3: MOVQ SI,.noname+16(FP)
RET
Basically all I did was replace the hardcoded line numbers for jumps (JMP, JGE) with labels and added a middle dot (·) before the function name. (Make sure to save the file as UTF-8) Next we remove our function definition from our sum.go
file:
package sum
func Sum(xs []int64) int64
Now you should be able to run the tests again with go test
and our custom assembly version of the function will be used.
This type of assembly is described in more detail here. I will briefly explain what it's doing.
MOVQ $0,SI
First we put 0 in the SI register, which is used to represent our running total. The Q means quadword which is 8 bytes, later we'll see L which is for 4 bytes. The parameters are in (source, destination) order.
MOVQ xs+0(FP),BX
MOVQ BX,autotmp_0000+-16(SP)
MOVL xs+8(FP),BX
MOVL BX,autotmp_0000+-8(SP)
MOVL xs+12(FP),BX
MOVL BX,autotmp_0000+-4(SP)
Next we take the parameter passed in and store its value on the stack. A Go slice is made up of 3 parts: a pointer to its location in memory, a length and a capacity. The pointer is 8 bytes, the length and capacity are 4 bytes each. So this code copies those values through the BX register. (See this for more details about slices)
MOVL $0,AX
MOVL autotmp_0000+-8(SP),DI
LEAQ autotmp_0000+-16(SP),BX
MOVQ (BX),CX
Next we put 0 in AX which we'll use as an iterator variable. We load the length of the slice into DI, and load xs' elements pointer into CX.
JMP L2
L1: INCL AX
L2: CMPL AX,DI
JGE L3
Now we get to the meat of the code. First we jump down to L2 where we compare AX and DI. If they're equal we've consumed all the items in the slice so we go to L3. (basically i == len(xs)
).
MOVQ (CX),BP
ADDQ $8,CX
ADDQ BP,SI
JMP L1
This does the actual addition. First we get the value of CX and store it in BP. Then we move CX 8 bytes ahead. Finally we add BP to SI and jump to L1. L1 increments AX and starts the loop again.
L3: MOVQ SI,.noname+16(FP)
RET
After we've completed our summation we store the result after all the arguments to the function (so 16 bytes ahead because a slice is 16 bytes). Then we return.
Here's my rewrite of the code:
// func Sum(xs []int64) int64
TEXT ·Sum2(SB),7,$0
MOVQ $0, SI // n
MOVQ xs+0(FP), BX // BX = &xs[0]
MOVL xs+8(FP), CX // len(xs)
MOVLQSX CX, CX // len as int64
INCQ CX // CX++
start:
DECQ CX // CX--
JZ done // jump if CX = 0
ADDQ (BX), SI // n += *BX
ADDQ $8, BX // BX += 8
JMP start
done:
MOVQ SI, .noname+16(FP) // return n
RET
Hopefully its a little easier to understand.
It's pretty cool that you can do this, but it's not without its caveats:
Still it's useful for at least two reasons: