One of the most common bugs that you may encounter in go are race conditions. This is often the case when working with goroutines and non-atomic data types. Plenty of articles have covered goroutine race conditions, but why don’t we take another look and, this time, debug a couple race conditions to death by looking at what happens under the hood?
Let’s take a look at a typical example. We will examine what happens when we dereference a method receiver and assign it an object that we later modify. First, we will set up a couple structs:
type Student struct{
FirstName string
LastName string
Age int
Address Address
}
type Address struct{
State string
City string
}
Now we are going to create a method that returns an array of students.
func getCoolStudents() []core.Student {
return []core.Student{
{
FirstName: "Abed",
LastName: "Nadir",
Age: 27,
Address: core.Address{
City: "New York City",
State: "NY",
},
},
{
FirstName: "Troy",
LastName: "Barnes",
Age: 28,
Address: core.Address{
City: "Denver",
State: "CO",
},
},
}
}
Let’s write a method for the Student
struct. This method takes a pointer to a student and copies it to itself (the receiver):
func (s *Student) SetStudent(student *Student){
*s = *student
}
We are now ready to test our race condition:
func testRace() {
// Create an empty student struct
student := core.Student{}
// get a list of rad students
coolStudents := getCoolStudents()
// set the student to the first student from the array of coolStudents
go student.SetStudent(&coolStudents[0])
//overwrite Abed with Troy
coolStudents[0] = coolStudents[1]
// simiulate some other operations
time.Sleep(1000 * time.Millisecond)
// print the student
println("\nshould be Abed:", student.FirstName)
fmt.Printf("%+v\n", student)
}
The process is straightforward, and it is explained in the comments. The method student.SetStudent
should set the student
to Abed. We then decide to overwrite the first student from our coolStudents
array to Troy. We should expect that the student struct is set to Abed, but we get different results:
$ ./race
should be Abed: Troy
{FirstName:Troy LastName:Barnes Age:28 Address:{State:CO City:Denver}}
Instead of printing Abed
as the student, we get Troy
. We set Student
to Abed
, never to Troy
, so what happened? In a nutshell, the method student.SetStudent
is called asynchronously, and it does not get a chance to use coolStudents[0]
before it gets updated in the next line. This may be a sufficient explanation, but let’s take a look at what is actually happening.
We will start by taking a closer look at student.SetStudent
:
func (s *Student) SetStudent(student *Student){
*s = *student
}
In the method, we dereference the student s
(the method’s receiver), setting it to the value pointed by the student
argument. Let’s take a look at what that looks like in go assembly (I did say we were going to debug this thing to death):
$ go tool objdump -s SetStudent race
TEXT <redacted>
utils.go:61 0x409eb90 65488b0c2530000000 MOVQ GS:0x30, CX
utils.go:61 0x409eb99 483b6110 CMPQ 0x10(CX), SP
utils.go:61 0x409eb9d 7671 JBE 0x409ec10
utils.go:61 0x409eb9f 4883ec20 SUBQ $0x20, SP
utils.go:61 0x409eba3 48896c2418 MOVQ BP, 0x18(SP)
utils.go:61 0x409eba8 488d6c2418 LEAQ 0x18(SP), BP
utils.go:62 0x409ebad 488b442428 MOVQ 0x28(SP), AX
utils.go:62 0x409ebb2 8400 TESTB AL, 0(AX)
utils.go:62 0x409ebb4 488b4c2430 MOVQ 0x30(SP), CX
utils.go:62 0x409ebb9 8401 TESTB AL, 0(CX)
utils.go:62 0x409ebbb 833d1ee20f0000 CMPL $0x0, runtime.writeBarrier(SB)
utils.go:62 0x409ebc2 7530 JNE 0x409ebf4
utils.go:62 0x409ebc4 488b11 MOVQ 0(CX), DX
utils.go:62 0x409ebc7 488910 MOVQ DX, 0(AX)
utils.go:62 0x409ebca 0f104108 MOVUPS 0x8(CX), X0
utils.go:62 0x409ebce 0f114008 MOVUPS X0, 0x8(AX)
utils.go:62 0x409ebd2 0f104118 MOVUPS 0x18(CX), X0
utils.go:62 0x409ebd6 0f114018 MOVUPS X0, 0x18(AX)
utils.go:62 0x409ebda 0f104128 MOVUPS 0x28(CX), X0
utils.go:62 0x409ebde 0f114028 MOVUPS X0, 0x28(AX)
utils.go:62 0x409ebe2 0f104138 MOVUPS 0x38(CX), X0
utils.go:62 0x409ebe6 0f114038 MOVUPS X0, 0x38(AX)
utils.go:63 0x409ebea 488b6c2418 MOVQ 0x18(SP), BP
utils.go:63 0x409ebef 4883c420 ADDQ $0x20, SP
utils.go:63 0x409ebf3 c3 RET
utils.go:62 0x409ebf4 488d1525800200 LEAQ runtime.rodata+158816(SB), DX
utils.go:62 0x409ebfb 48891424 MOVQ DX, 0(SP)
utils.go:62 0x409ebff 4889442408 MOVQ AX, 0x8(SP)
utils.go:62 0x409ec04 48894c2410 MOVQ CX, 0x10(SP)
utils.go:62 0x409ec09 e8a237f7ff CALL runtime.typedmemmove(SB)
utils.go:62 0x409ec0e ebda JMP 0x409ebea
utils.go:61 0x409ec10 e81b46fbff CALL runtime.morestack_noctxt(SB)
utils.go:61 0x409ec15 e976ffffff JMP github.com/hex0punk/go-re/core.(*Student).SetStudent(SB)
Now, that is a lot of instructions for what seems like a simple operation. However, the revelant part occurs here:
utils.go:62 0x409ebbb 833d1ee20f0000 CMPL $0x0, runtime.writeBarrier(SB)
utils.go:62 0x409ebc2 7530 JNE 0x409ebf4
<more instructions>
utils.go:62 0x409ebf4 488d1525800200 LEAQ runtime.rodata+158816(SB), DX
utils.go:62 0x409ebfb 48891424 MOVQ DX, 0(SP)
utils.go:62 0x409ebff 4889442408 MOVQ AX, 0x8(SP)
utils.go:62 0x409ec04 48894c2410 MOVQ CX, 0x10(SP)
utils.go:62 0x409ec09 e8a237f7ff CALL runtime.typedmemmove(SB)
First, go checks whether the garbage collector is busy to perform the copy operation, in which case it asks it to do so by calling runtime.typedmemmove
, otherwise, the method itself performs the copy operation. Now let’s take a look at runtime.typedmemmove
in the go source, found here:
//go:nosplit
func typedmemmove(typ *_type, dst, src unsafe.Pointer) {
if dst == src {
return
}
if typ.ptrdata != 0 {
bulkBarrierPreWrite(uintptr(dst), uintptr(src), typ.size)
}
// There's a race here: if some other goroutine can write to
// src, it may change some pointer in src after we've
// performed the write barrier but before we perform the
// memory copy. This safe because the write performed by that
// other goroutine must also be accompanied by a write
// barrier, so at worst we've unnecessarily greyed the old
// pointer that was in src.
memmove(dst, src, typ.size)
if writeBarrier.cgo {
cgoCheckMemmove(typ, dst, src, 0, typ.size)
}
}
According to the comments, a race condition may occur given that, while the GC is creating a write barrier on the source object, another operation may change it. By the time the write barrier is created, and before the copy operation occurs, the src
, which is passed to typedmemmove
as an unsafe
pointer, may be overwritten. This is exactly what happened in our case. That is, coolStudents[0] = coolStudents[1]
overwrites src
before student.SetStudent
finishes the copy operation *s = *student
.
Now let’s see what happens if instead we create a copy of the source student that we want to assing to our student
struct by using this method instead, which also avoids dereferencing student
:
func CopyStudent(student *Student, src Student){
student.FirstName = src.FirstName
student.LastName = src.LastName
student.Address = src.Address
student.Age = src.Age
}
func testRace() {
student := core.Student{}
coolStudents := getCoolStudents()
go CopyStudent(&student, coolStudents[0])
coolStudents[0] = coolStudents[1]
time.Sleep(1000 * time.Millisecond)
println("\nshould be Abed:", student.FirstName)
fmt.Printf("%+v\n", student)
}
When we call the above, we get the expected result:
$ ./race
should be Abed: Abed
{FirstName:Abed LastName:Nadir Age:27 Address:{State:NY City:New York City}}
Eventhough we are still overwriting coolStudents[0]
with coolStudents[1]
after calling CopyStudent
as a goroutine, coolStudents[0]
is copied when passed to CopyStudent
, as it is passed by value. Furthermore we are not dereferencing student
, so runtime.typedmemmove
is never called, as shown in the go dissembly of the CopyStudent
function:
$ go tool objdump -s CopyStudent race
github.com/hex0punk/go-re/core.CopyStudent(SB) <redacted>
utils.go:57 0x409d5a0 65488b0c2530000000 MOVQ GS:0x30, CX
utils.go:57 0x409d5a9 483b6110 CMPQ 0x10(CX), SP
utils.go:57 0x409d5ad 0f86c7000000 JBE 0x409d67a
utils.go:57 0x409d5b3 4883ec08 SUBQ $0x8, SP
utils.go:57 0x409d5b7 48892c24 MOVQ BP, 0(SP)
utils.go:57 0x409d5bb 488d2c24 LEAQ 0(SP), BP
utils.go:58 0x409d5bf 488b4c2420 MOVQ 0x20(SP), CX
utils.go:58 0x409d5c4 488b442418 MOVQ 0x18(SP), AX
utils.go:58 0x409d5c9 488b7c2410 MOVQ 0x10(SP), DI
utils.go:58 0x409d5ce 48894f08 MOVQ CX, 0x8(DI)
utils.go:58 0x409d5d2 833d07c80f0000 CMPL $0x0, runtime.writeBarrier(SB)
utils.go:58 0x409d5d9 0f8591000000 JNE 0x409d670
utils.go:58 0x409d5df 488907 MOVQ AX, 0(DI)
utils.go:59 0x409d5e2 488b4c2430 MOVQ 0x30(SP), CX
utils.go:59 0x409d5e7 488b442428 MOVQ 0x28(SP), AX
utils.go:59 0x409d5ec 48894f18 MOVQ CX, 0x18(DI)
utils.go:59 0x409d5f0 833de9c70f0000 CMPL $0x0, runtime.writeBarrier(SB)
utils.go:59 0x409d5f7 7563 JNE 0x409d65c
utils.go:59 0x409d5f9 48894710 MOVQ AX, 0x10(DI)
utils.go:60 0x409d5fd 488b442440 MOVQ 0x40(SP), AX
utils.go:60 0x409d602 488b4c2458 MOVQ 0x58(SP), CX
utils.go:60 0x409d607 488b542450 MOVQ 0x50(SP), DX
utils.go:60 0x409d60c 488b5c2448 MOVQ 0x48(SP), BX
utils.go:60 0x409d611 48895f30 MOVQ BX, 0x30(DI)
utils.go:60 0x409d615 48894f40 MOVQ CX, 0x40(DI)
utils.go:60 0x409d619 833dc0c70f0000 CMPL $0x0, runtime.writeBarrier(SB)
utils.go:60 0x409d620 751a JNE 0x409d63c
utils.go:60 0x409d622 48894728 MOVQ AX, 0x28(DI)
utils.go:60 0x409d626 48895738 MOVQ DX, 0x38(DI)
utils.go:61 0x409d62a 488b442438 MOVQ 0x38(SP), AX
utils.go:61 0x409d62f 48894720 MOVQ AX, 0x20(DI)
utils.go:62 0x409d633 488b2c24 MOVQ 0(SP), BP
utils.go:62 0x409d637 4883c408 ADDQ $0x8, SP
utils.go:62 0x409d63b c3 RET
utils.go:60 0x409d63c 488d4f28 LEAQ 0x28(DI), CX
utils.go:57 0x409d640 4889fb MOVQ DI, BX
utils.go:60 0x409d643 4889cf MOVQ CX, DI
utils.go:60 0x409d646 e87580fbff CALL runtime.gcWriteBarrier(SB)
utils.go:60 0x409d64b 488d7b38 LEAQ 0x38(BX), DI
utils.go:60 0x409d64f 4889d0 MOVQ DX, AX
utils.go:60 0x409d652 e86980fbff CALL runtime.gcWriteBarrier(SB)
utils.go:61 0x409d657 4889df MOVQ BX, DI
utils.go:60 0x409d65a ebce JMP 0x409d62a
utils.go:59 0x409d65c 488d4f10 LEAQ 0x10(DI), CX
utils.go:57 0x409d660 4889fa MOVQ DI, DX
utils.go:59 0x409d663 4889cf MOVQ CX, DI
utils.go:59 0x409d666 e85580fbff CALL runtime.gcWriteBarrier(SB)
utils.go:60 0x409d66b 4889d7 MOVQ DX, DI
utils.go:59 0x409d66e eb8d JMP 0x409d5fd
utils.go:58 0x409d670 e84b80fbff CALL runtime.gcWriteBarrier(SB)
utils.go:58 0x409d675 e968ffffff JMP 0x409d5e2
utils.go:57 0x409d67a e8e160fbff CALL runtime.morestack_noctxt(SB)
utils.go:57 0x409d67f e91cffffff JMP github.com/hex0punk/go-re/core.CopyStudent(SB)
Even though more instructions are required, we are no longer changing the src
, and the write barriers can be created either by the GC or the function itself on the student
object. We fixed the bug. However, CopyStudent
has a lot more code. A better solution would be to use sync.Mutex
, but we will examine that futher in future blog posts.