July 22, 2020

typedmemmove, or Debugging a Goroutine to Death

One of the most common bugs that you may encounter in go are race conditions. This is often the case when working with goroutines and non-atomic data types. Plenty of articles have covered goroutine race conditions, but why don’t we take another look and, this time, debug a couple race conditions to death by looking at what happens under the hood?

Let’s take a look at a typical example. We will examine what happens when we dereference a method receiver and assign it an object that we later modify. First, we will set up a couple structs:

type Student struct{
  FirstName string
  LastName  string
  Age     int
  Address   Address
}

type Address struct{
  State string
  City string
}

Now we are going to create a method that returns an array of students.

func getCoolStudents() []core.Student {
  return []core.Student{
    {
      FirstName: "Abed",
      LastName:  "Nadir",
      Age:       27,
      Address: core.Address{
        City:  "New York City",
        State: "NY",
      },
    },
    {
      FirstName: "Troy",
      LastName:  "Barnes",
      Age:       28,
      Address: core.Address{
        City:  "Denver",
        State: "CO",
      },
    },
  }
}

Let’s write a method for the Student struct. This method takes a pointer to a student and copies it to itself (the receiver):

func (s *Student) SetStudent(student *Student){
  *s = *student
}

We are now ready to test our race condition:

func testRace() {
  // Create an empty student struct
  student := core.Student{}
  // get a list of rad students
  coolStudents := getCoolStudents()

  // set the student to the first student from the array of coolStudents
  go student.SetStudent(&coolStudents[0])

  //overwrite Abed with Troy
  coolStudents[0] = coolStudents[1]

  // simiulate some other operations
  time.Sleep(1000 * time.Millisecond)
  
  // print the student
  println("\nshould be Abed:", student.FirstName)
  fmt.Printf("%+v\n", student)
}

The process is straightforward, and it is explained in the comments. The method student.SetStudent should set the student to Abed. We then decide to overwrite the first student from our coolStudents array to Troy. We should expect that the student struct is set to Abed, but we get different results:

$ ./race

should be Abed: Troy
{FirstName:Troy LastName:Barnes Age:28 Address:{State:CO City:Denver}}

Instead of printing Abed as the student, we get Troy. We set Student to Abed, never to Troy, so what happened? In a nutshell, the method student.SetStudent is called asynchronously, and it does not get a chance to use coolStudents[0] before it gets updated in the next line. This may be a sufficient explanation, but let’s take a look at what is actually happening.

We will start by taking a closer look at student.SetStudent:

func (s *Student) SetStudent(student *Student){
  *s = *student
}

In the method, we dereference the student s (the method’s receiver), setting it to the value pointed by the student argument. Let’s take a look at what that looks like in go assembly (I did say we were going to debug this thing to death):

$ go tool objdump -s SetStudent race

TEXT <redacted> 
  utils.go:61   0x409eb90   65488b0c2530000000  MOVQ GS:0x30, CX
  utils.go:61   0x409eb99   483b6110            CMPQ 0x10(CX), SP
  utils.go:61   0x409eb9d   7671                JBE 0x409ec10
  utils.go:61   0x409eb9f   4883ec20            SUBQ $0x20, SP
  utils.go:61   0x409eba3   48896c2418          MOVQ BP, 0x18(SP)
  utils.go:61   0x409eba8   488d6c2418          LEAQ 0x18(SP), BP
  utils.go:62   0x409ebad   488b442428          MOVQ 0x28(SP), AX
  utils.go:62   0x409ebb2   8400                TESTB AL, 0(AX)
  utils.go:62   0x409ebb4   488b4c2430          MOVQ 0x30(SP), CX
  utils.go:62   0x409ebb9   8401                TESTB AL, 0(CX)
  utils.go:62   0x409ebbb   833d1ee20f0000      CMPL $0x0, runtime.writeBarrier(SB)
  utils.go:62   0x409ebc2   7530                JNE 0x409ebf4
  utils.go:62   0x409ebc4   488b11              MOVQ 0(CX), DX
  utils.go:62   0x409ebc7   488910              MOVQ DX, 0(AX)
  utils.go:62   0x409ebca   0f104108            MOVUPS 0x8(CX), X0
  utils.go:62   0x409ebce   0f114008            MOVUPS X0, 0x8(AX)
  utils.go:62   0x409ebd2   0f104118            MOVUPS 0x18(CX), X0
  utils.go:62   0x409ebd6   0f114018            MOVUPS X0, 0x18(AX)
  utils.go:62   0x409ebda   0f104128            MOVUPS 0x28(CX), X0
  utils.go:62   0x409ebde   0f114028            MOVUPS X0, 0x28(AX)
  utils.go:62   0x409ebe2   0f104138            MOVUPS 0x38(CX), X0
  utils.go:62   0x409ebe6   0f114038            MOVUPS X0, 0x38(AX)
  utils.go:63   0x409ebea   488b6c2418          MOVQ 0x18(SP), BP
  utils.go:63   0x409ebef   4883c420            ADDQ $0x20, SP
  utils.go:63   0x409ebf3   c3                  RET
  utils.go:62   0x409ebf4   488d1525800200      LEAQ runtime.rodata+158816(SB), DX
  utils.go:62   0x409ebfb   48891424            MOVQ DX, 0(SP)
  utils.go:62   0x409ebff   4889442408          MOVQ AX, 0x8(SP)
  utils.go:62   0x409ec04   48894c2410          MOVQ CX, 0x10(SP)
  utils.go:62   0x409ec09   e8a237f7ff          CALL runtime.typedmemmove(SB)
  utils.go:62   0x409ec0e   ebda                JMP 0x409ebea
  utils.go:61   0x409ec10   e81b46fbff          CALL runtime.morestack_noctxt(SB)
  utils.go:61   0x409ec15   e976ffffff          JMP github.com/hex0punk/go-re/core.(*Student).SetStudent(SB)

Now, that is a lot of instructions for what seems like a simple operation. However, the revelant part occurs here:

  utils.go:62   0x409ebbb   833d1ee20f0000      CMPL $0x0, runtime.writeBarrier(SB)
  utils.go:62   0x409ebc2   7530                JNE 0x409ebf4
   <more instructions>
  utils.go:62   0x409ebf4   488d1525800200      LEAQ runtime.rodata+158816(SB), DX
  utils.go:62   0x409ebfb   48891424            MOVQ DX, 0(SP)
  utils.go:62   0x409ebff   4889442408          MOVQ AX, 0x8(SP)
  utils.go:62   0x409ec04   48894c2410          MOVQ CX, 0x10(SP)
  utils.go:62   0x409ec09   e8a237f7ff          CALL runtime.typedmemmove(SB)

First, go checks whether the garbage collector is busy to perform the copy operation, in which case it asks it to do so by calling runtime.typedmemmove, otherwise, the method itself performs the copy operation. Now let’s take a look at runtime.typedmemmove in the go source, found here:

//go:nosplit
func typedmemmove(typ *_type, dst, src unsafe.Pointer) {
  if dst == src {
    return
  }
  if typ.ptrdata != 0 {
    bulkBarrierPreWrite(uintptr(dst), uintptr(src), typ.size)
  }
  // There's a race here: if some other goroutine can write to
  // src, it may change some pointer in src after we've
  // performed the write barrier but before we perform the
  // memory copy. This safe because the write performed by that
  // other goroutine must also be accompanied by a write
  // barrier, so at worst we've unnecessarily greyed the old
  // pointer that was in src.
  memmove(dst, src, typ.size)
  if writeBarrier.cgo {
    cgoCheckMemmove(typ, dst, src, 0, typ.size)
  }
}

According to the comments, a race condition may occur given that, while the GC is creating a write barrier on the source object, another operation may change it. By the time the write barrier is created, and before the copy operation occurs, the src, which is passed to typedmemmove as an unsafe pointer, may be overwritten. This is exactly what happened in our case. That is, coolStudents[0] = coolStudents[1] overwrites src before student.SetStudent finishes the copy operation *s = *student.

Now let’s see what happens if instead we create a copy of the source student that we want to assing to our student struct by using this method instead, which also avoids dereferencing student:

func CopyStudent(student *Student, src Student){
  student.FirstName = src.FirstName
  student.LastName = src.LastName
  student.Address = src.Address
  student.Age = src.Age
}
func testRace() {
  student := core.Student{}
  coolStudents := getCoolStudents()

  go CopyStudent(&student, coolStudents[0])

  coolStudents[0] = coolStudents[1]

  time.Sleep(1000 * time.Millisecond)
  println("\nshould be Abed:", student.FirstName)
  fmt.Printf("%+v\n", student)
}

When we call the above, we get the expected result:

$ ./race

should be Abed: Abed
{FirstName:Abed LastName:Nadir Age:27 Address:{State:NY City:New York City}}

Eventhough we are still overwriting coolStudents[0] with coolStudents[1] after calling CopyStudent as a goroutine, coolStudents[0] is copied when passed to CopyStudent, as it is passed by value. Furthermore we are not dereferencing student, so runtime.typedmemmove is never called, as shown in the go dissembly of the CopyStudent function:

$ go tool objdump -s CopyStudent race

github.com/hex0punk/go-re/core.CopyStudent(SB) <redacted>
  utils.go:57   0x409d5a0   65488b0c2530000000  MOVQ GS:0x30, CX
  utils.go:57   0x409d5a9   483b6110            CMPQ 0x10(CX), SP
  utils.go:57   0x409d5ad   0f86c7000000        JBE 0x409d67a
  utils.go:57   0x409d5b3   4883ec08            SUBQ $0x8, SP
  utils.go:57   0x409d5b7   48892c24            MOVQ BP, 0(SP)
  utils.go:57   0x409d5bb   488d2c24            LEAQ 0(SP), BP
  utils.go:58   0x409d5bf   488b4c2420          MOVQ 0x20(SP), CX
  utils.go:58   0x409d5c4   488b442418          MOVQ 0x18(SP), AX
  utils.go:58   0x409d5c9   488b7c2410          MOVQ 0x10(SP), DI
  utils.go:58   0x409d5ce   48894f08            MOVQ CX, 0x8(DI)
  utils.go:58   0x409d5d2   833d07c80f0000      CMPL $0x0, runtime.writeBarrier(SB)
  utils.go:58   0x409d5d9   0f8591000000        JNE 0x409d670
  utils.go:58   0x409d5df   488907              MOVQ AX, 0(DI)
  utils.go:59   0x409d5e2   488b4c2430          MOVQ 0x30(SP), CX
  utils.go:59   0x409d5e7   488b442428          MOVQ 0x28(SP), AX
  utils.go:59   0x409d5ec   48894f18            MOVQ CX, 0x18(DI)
  utils.go:59   0x409d5f0   833de9c70f0000      CMPL $0x0, runtime.writeBarrier(SB)
  utils.go:59   0x409d5f7   7563                JNE 0x409d65c
  utils.go:59   0x409d5f9   48894710            MOVQ AX, 0x10(DI)
  utils.go:60   0x409d5fd   488b442440          MOVQ 0x40(SP), AX
  utils.go:60   0x409d602   488b4c2458          MOVQ 0x58(SP), CX
  utils.go:60   0x409d607   488b542450          MOVQ 0x50(SP), DX
  utils.go:60   0x409d60c   488b5c2448          MOVQ 0x48(SP), BX
  utils.go:60   0x409d611   48895f30            MOVQ BX, 0x30(DI)
  utils.go:60   0x409d615   48894f40            MOVQ CX, 0x40(DI)
  utils.go:60   0x409d619   833dc0c70f0000      CMPL $0x0, runtime.writeBarrier(SB)
  utils.go:60   0x409d620   751a                JNE 0x409d63c
  utils.go:60   0x409d622   48894728            MOVQ AX, 0x28(DI)
  utils.go:60   0x409d626   48895738            MOVQ DX, 0x38(DI)
  utils.go:61   0x409d62a   488b442438          MOVQ 0x38(SP), AX
  utils.go:61   0x409d62f   48894720            MOVQ AX, 0x20(DI)
  utils.go:62   0x409d633   488b2c24            MOVQ 0(SP), BP
  utils.go:62   0x409d637   4883c408            ADDQ $0x8, SP
  utils.go:62   0x409d63b   c3                  RET
  utils.go:60   0x409d63c   488d4f28            LEAQ 0x28(DI), CX
  utils.go:57   0x409d640   4889fb              MOVQ DI, BX
  utils.go:60   0x409d643   4889cf              MOVQ CX, DI
  utils.go:60   0x409d646   e87580fbff          CALL runtime.gcWriteBarrier(SB)
  utils.go:60   0x409d64b   488d7b38            LEAQ 0x38(BX), DI
  utils.go:60   0x409d64f   4889d0              MOVQ DX, AX
  utils.go:60   0x409d652   e86980fbff          CALL runtime.gcWriteBarrier(SB)
  utils.go:61   0x409d657   4889df              MOVQ BX, DI
  utils.go:60   0x409d65a   ebce                JMP 0x409d62a
  utils.go:59   0x409d65c   488d4f10            LEAQ 0x10(DI), CX
  utils.go:57   0x409d660   4889fa              MOVQ DI, DX
  utils.go:59   0x409d663   4889cf              MOVQ CX, DI
  utils.go:59   0x409d666   e85580fbff          CALL runtime.gcWriteBarrier(SB)
  utils.go:60   0x409d66b   4889d7              MOVQ DX, DI
  utils.go:59   0x409d66e   eb8d                JMP 0x409d5fd
  utils.go:58   0x409d670   e84b80fbff          CALL runtime.gcWriteBarrier(SB)
  utils.go:58   0x409d675   e968ffffff          JMP 0x409d5e2
  utils.go:57   0x409d67a   e8e160fbff          CALL runtime.morestack_noctxt(SB)
  utils.go:57   0x409d67f   e91cffffff          JMP github.com/hex0punk/go-re/core.CopyStudent(SB)

Even though more instructions are required, we are no longer changing the src, and the write barriers can be created either by the GC or the function itself on the student object. We fixed the bug. However, CopyStudent has a lot more code. A better solution would be to use sync.Mutex, but we will examine that futher in future blog posts.

© hex0punk 2023