August 31, 2021

Into the rabbit hole of nil-not-nil bugs in Go

One of the most interesting go gotchas is the nil-not-nil bug. This happens when a function declares an interface as its return type, but a concrete type is returned. As a result, the returned value can never be nil, leading to unexpected behavior and, yes, panics with an anarchistic flare. A lot of the documentation on this type of bug leaves a lot to the imagination, often making some generalizations that are not necessarily true. This post describes the issue in detail and in highly verbose mode.

Let’s examine this issue in depth until our eyes hurt and we begin to develop a hate for go interfaces while appreciating their complexity. Let’s begin we a simple (but buggy program):

package main

import (
    "errors"
    "fmt"
    "os"
)

func main() {
    fmt.Println("file path: ")
    var filePath string
    fmt.Scanln(&filePath)

    err := PrintFile(filePath)
    if err != nil {
        myErr := err.Error()
        fmt.Println(myErr)
    }
}

func PrintFile(filePath string) error {
    var pathError *os.PathError

    _, err := os.Stat(filePath)
    if err != nil {
        pathError = &os.PathError{
            Path: filePath,
            Err: errors.New("File not found"),
        }
        return pathError
    }
    content, _ := ioutil.ReadFile(filePath)
    fmt.Println(string(content))

    return pathError
}

The above is only a slightly more realistic version of the type of the issues documented here and here. All the program does is ask the user for a file path and print the contents of that file to STDOUT. If the path specified by the user does not exist, we then return an error of type *os.PathError. In main we print the returned error to STDOUT whenever an error is not nil. If you have already checked out either of the links above then you already have an idea of what might happen when no error is returned (that is, when filePath exists and PrintFile returns pathError as nil):

$ go run clean.go
file path:
mitch-joke.txt
Every book is a children's book if the kid can read.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x10a0ae6]

goroutine 1 [running]:
os.(*PathError).Error(0x0, 0xe, 0x10fcdc0)
    /usr/local/go/src/os/error.go:56 +0x26
main.main()
    /Users/au/Projects/ToB/not-going-anywhere/cmd/test/clean.go:17 +0x142
exit status 2

Our check for nil in main returns false, and we get a segmentation fault when calling err.Error().

Go interfaces - beautiful and a bit complicated

The error above happens because the check if err != nil in main evaluates to false (that is, err is not nil) in all cases. To understand why err is never nil in the above case we must have a better understanding of go interfaces. This is because error is an interface, not a concrete type.

Most documentation on go interfaces will tell you that they consist of a Type and a Value, but let’s take that one step further by looking at the definition of the Interface type in the go source code (declaredhere):

type iface struct {
    tab  *itab
    data unsafe.Pointer
}

The first field is a pointer to an itab. The tab field holds both the type of the interface and the type of the concrete type, if there is one. On the other hand, the data field holds a pointer to its concrete value. The itab struct is defined here:

// layout of Itab known to compilers
// allocated in non-garbage-collected memory
// Needs to be in sync with
// ../cmd/compile/internal/gc/reflect.go:/^func.dumptabs.
type itab struct {
    inter *interfacetype
    _type *_type
    hash  uint32 // copy of _type.hash. Used for type switches.
    _     [4]byte
    fun   [1]uintptr // variable sized. fun[0]==0 means _type does not implement inter.
}

As we can see, the itab struct includes a pointer to the interfaceType and a pointer to _type. The type of the interface is stored in inter, and the type of its concrete type is stored in _type (technically, *interfacetype is a wrapper around _type). In fact, if we can examine both the interface type and the concrete type returned by the PrintFile function by compiling our program like so:

$ go tool compile -S nilnotnil.go | grep -A 7 '^go.itab.\*os.PathError'
go.itab.*os.PathError,error SRODATA dupok size=32
    0x0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    0x0010 d8 aa 89 b7 00 00 00 00 00 00 00 00 00 00 00 00  ................
    rel 0+8 t=1 type.error+0
    rel 8+8 t=1 type.*os.PathError+0
    rel 24+8 t=1 os.(*PathError).Error+0

The first 8 bytes show the interface type (error) whereas the next 8 bytes show the concrete type os.PathError.

Now let’s simplify our PrintFile function so that we can easily examine what exactly gets returned when we don’t assign any value to pathError:

//go:noinline
func PrintFile(filePath string) error {
    var pathError *os.PathError

    return pathError
}

Let’s build our program and examine the compiler output for the PrintFile function:

$ go build
$ go tool objdump -S -s main.PrintFile -gnu interfaces
TEXT main.PrintFile(SB) /Users/au/Projects/ToB/not-going-anywhere/cmd/interfaces/nilnotnil.go
    return pathError
  0x10b4d00     488d05f9840400      LEAQ go.itab.*os.PathError,error(SB), AX // lea 0x484f9(%rip),%rax
  0x10b4d07     4889442418          MOVQ AX, 0x18(SP)                        // mov %rax,0x18(%rsp)
  0x10b4d0c     48c744242000000000  MOVQ $0x0, 0x20(SP)                      // movq $0x0,0x20(%rsp)
  0x10b4d15     c3                  RET                                      // retq

We return a pointer to an itab, which in this case holds two types, os.PathError and error. Additionally, a nil value is returned in the third instruction (at 0x10b4d0c). Here we can clearly see that we are not returning nil. In fact, we are returning an interface with a non-nil type (or itab) and a nil value. Let’s add some debugging print statements to main to confirm this is the case.

//go:noinline
func main() {
    fmt.Println("file path: ")
    var filePath string
    fmt.Scanln(&filePath)
    
    err := PrintFile(filePath)
    
    fmt.Println("DESC OF err FROM CALLER:")
    fmt.Println("err = ", err)
    fmt.Printf("Type: %T, Value: %v\n", err, err)
    fmt.Println("Is err nil? ", err == nil)
    
    if err != nil {
        myErr := err.Error()
        fmt.Println(myErr)
    }
}

When running our updated code, we get the following:

$ go run nilnotnil.go
file path:
mitch-joke.txt
DESC OF err FROM CALLER:
err =  <nil>
Type: *os.PathError, Value: <nil>
Is err nil?  false

Eventhough the fmt.Println("err = ", err) statement prints nil, the following print statement ouputs a type *os.PathError and a nil value, which is why err is not nil as shown in the print statement following it. This is also why this type of check wouldn’t work either:

//go:noinline
func main() {
    fmt.Println("file path: ")
    var filePath string
    fmt.Scanln(&filePath)

    var emptyErr error

    err := PrintFile(filePath)

    if err != emptyErr {
        myErr := err.Error()
        fmt.Println(myErr)
    }
}

Although main is expecting an interface from PrintFile, go will check against its concrete type, not the interface type.

Checking Type against nil

Now, some documentation on this matter will tell you that when we do err != nil we are comparing <*os.PathError, nil> against <nil, nil>, and that is why err is never nil. However, the generated assembly tells us that only the type (itab) is being compared against nil, not the value (data).

Let’s update our code one more time so we include our comparison against nil and against our new, empty emptyErr interface:

//go:noinline
func main() {
    fmt.Println("file path: ")
    var filePath string
    fmt.Scanln(&filePath)

    var emptyErr error

    err := PrintFile(filePath)

    if err != nil {
        return
    }

    if err != emptyErr {
        fmt.Println("not equal")
    }
}

Now let’s compile the above and examine the generated assembly:

$ go build
$ go tool objdump -S -s main.main interfaces
TEXT main.main(SB) /Users/au/Projects/ToB/not-going-anywhere/cmd/interfaces/nilnotnil.go
  0x10b2eda     e8c1000000      CALL main.PrintFile(SB)
  0x10b2edf     488b442410      MOVQ 0x10(SP), AX
  0x10b2ee4     488b4c2418      MOVQ 0x18(SP), CX
    if err != nil {
  0x10b2ee9     4885c0          TESTQ AX, AX
  0x10b2eec     0f8587000000    JNE 0x10b2f79
    if err != emptyErr {
  0x10b2ef2     7462            JE 0x10b2f56

The main function collects the return values from the CALL to main.PrintFile and stores them in AX and CX. The psudo-register AX contains the itab pointer (which holds the type), and CX holds the value. The TEST instruction only checks against the tab. How can we be sure that AX holds the type or itab? If we jump to the interface comparison, we find the following:

if err != emptyErr {
  0x10b2f56     48890424            MOVQ AX, 0(SP)
  0x10b2f5a     48894c2408          MOVQ CX, 0x8(SP)
  0x10b2f5f     48c744241000000000  MOVQ $0x0, 0x10(SP)
  0x10b2f68     e8d307f5ff          CALL runtime.ifaceeq(SB)

When comparing two interfaces, go calls the runtime.ifaceeq method defined here. The first argument is a pointer to an itab, followed by two unsafe.Pointer arguments. Because AX has not been updated until now, we know AX holds the itab. Likewise, we know that CX holds a pointer to the value of err. The last value, $0x0 represents nil for the value of emptyErr.

Ok, so given what we have learned so far it would follow that the value of pathError in PrintFile can never be nil, as go checks against the type in that functions as well, right? Not quite (and you thought we were done). Let’s go ahead and add the same print statements to the simplified PrintFile function:

//go:noinline
func PrintFile(filePath string) error {
    var pathError *os.PathError
    
    fmt.Println("DESC OF pathError FROM CALLEE:")
    fmt.Println("pathErr = ", pathError)
    fmt.Printf("Type: %T, Value: %v\n", pathError, pathError)
    fmt.Println("Is pathErr nil? ", pathError == nil)
    fmt.Println("----------------------------------\n")
    
    return pathError
} 

The output looks like this. This includes the output from the print statements we added to main earlier too. Just remember that main is the caller, and PrintFile is the callee:

$ go run nilnotnil.go
file path:
somepath
DESC OF pathError FROM CALLEE:
pathErr =  <nil>
Type: *os.PathError, Value: <nil>
Is pathErr nil?  true
----------------------------------

DESC OF err FROM CALLER:
err =  <nil>
Type: *os.PathError, Value: <nil>
Is err nil?  false

The callee, PrintFile sees a Type of *os.PathError and a nil Value, which is the same thing that the caller sees when examining the returned error. At this point, both interfaces look exactly the same. Yet, while fmt.Println("Is err nil? ", err == nil) printedfalse in main, fmt.Println("Is pathErr nil? ", pathError == nil) printedtrue in PrintFile. This indicates that, in the callee, go does not perform the comparison to nil using the same logic we saw it used in main, where the comparison was done only against the interface’s concrete type, not its the value.

Sometimes it is all about the value

To determine what happens in the callee, let’s update our PrintFile function once more:

//go:noinline
func PrintFile(filePath string) error {
    var pathError *os.PathError

    _, err := os.Stat(filePath)
    if err != nil {
        pathError = &os.PathError{
            Path: filePath,
            Err: errors.New("File not found"),
        }
    }
    if pathError == nil {
        return nil
    }
    
    return pathError
}

Now let’s recompile the code and examine the resulting assembly. I found that in this case it was easier to see what happens by building the code without optimizations, but the resulting core logic will be the same:

$ go build -gcflags '-N -l'
$ go tool objdump -S -s main.PrintFile interfaces
        pathError = &os.PathError{
  0x10b5288     488b442438          MOVQ 0x38(SP), AX
  0x10b528d     4889442430          MOVQ AX, 0x30(SP)
  0x10b5292     eb00                JMP 0x10b5294
    if pathError == nil {
  0x10b5294     48837c243000        CMPQ $0x0, 0x30(SP)
  0x10b529a     7502                JNE 0x10b529e
  0x10b529c     eb2b                JMP 0x10b52c9
    return pathError
  0x10b529e     488b442430          MOVQ 0x30(SP), AX
  0x10b52a3     4889442440          MOVQ AX, 0x40(SP)
  0x10b52a8     488d0d51850400      LEAQ go.itab.*os.PathError,error(SB), CX
  0x10b52af     48898c2488000000    MOVQ CX, 0x88(SP)
  0x10b52b7     4889842490000000    MOVQ AX, 0x90(SP)
  0x10b52bf     488b6c2468          MOVQ 0x68(SP), BP
  0x10b52c4     4883c470            ADDQ $0x70, SP
  0x10b52c8     c3                  RET

The comparison of pathError to nil happens after we create a new os.PathError struct. In this case, whatever is stored in AX is used to determine whether pathError == nil. The question is, does AX hold the Type or the Value of the interface? The answer to this is in the return statement shown above.

First, we load the address of the itab (which contains the Type) in CX. Then, CX is moved to the stack of the caller.

0x10b52a8       488d0d51850400      LEAQ go.itab.*os.PathError,error(SB), CX
0x10b52af       48898c2488000000    MOVQ CX, 0x88(SP)

The only thing left to do is to move the Value of pathError to the stack as well (so the caller can access it), which in this case is stored in AX. After that, we restore the stack and return to the caller:

  0x10b52b7     4889842490000000    MOVQ AX, 0x90(SP)
  0x10b52bf     488b6c2468          MOVQ 0x68(SP), BP
  0x10b52c4     4883c470            ADDQ $0x70, SP
  0x10b52c8     c3                  RET

This confirms that AX holds the value, which tells us that the pathError == nil check is performed against the value of pathError, not its type. While this may seem strange (or fascinating, depending on how tired you are by now) it makes sense. The compiler is aware that pathError was declared locally as *os.PathError type, so it does not need to compare against the type. The compiler knows that pathError has a type. Instead, it checks against the value. This is in contrast to what we saw in main, the caller, where go cannot make assumptions regarding the type of the result to PrintFile (as the signature for the PrintFile function shows that it returns a generic error interface) so it performed the comparison against the Type instead.

Engineering fixes

All this chaos, destruction, and panics are fun. But we are engineers, so a solution to this is also in order.

So how do we fix this? One option is to return an explicit nil when you know you have to return nil:

//go:noinline
func PrintFile(filePath string) error {
    var pathError *os.PathError

    //...
    
    return pathError
}

Another way, as suggested here, is to declare and return the base error interface rather than a concrete type.

//go:noinline
func PrintFile(filePath string) error {
    var error error
    _, err := os.Stat(filePath)
    if err != nil {
        pathError = &os.PathError{
            Path: filePath,
            Err: errors.New("File not found"),
        }
    }
    return pathError
}

In both cases, PrintFile will return a nil Type (when the value returned is nil as well), and the returned error will evaluate to nil when it is actually nil.

Side note on reflection

I looked into the reflection package and thought that it was possible to do extract the Type from a Value by doing this:

//go:noinline
func main() {
    var filePath string
    fmt.Println("file path: ")
    fmt.Scanln(&filePath)

    err := PrintFile(filePath)
    
    fmt.Println("\n\nREFLECTION OF err FROM CALLER:")
    v := reflect.ValueOf(err)
    
    fmt.Println("Value of err: ", v)
    fmt.Println("extracting type from value of err...:", v.Type())
}

The above works; however, when looking at the code for reflect.ValueOf here, it is clear that go snags Type information when calling that function. ValueOf calls unpackEface(i), which extracts the type from the interface struct.

© hex0punk 2023