Pacuna's Blog

Implementing a concurrent Set in Go – Part II

In the first part of this series, we reviewed the basic structure for a list-based Set. We defined three methods to interact with the Set: Add(), Remove() and Contains().

Even though this implementation works fine for one goroutine, it didn’t work correctly for concurrent threads modifying the list. Since the algorithm to traverse the linked list uses two pointers that are dependant, multiple goroutines can interleave and create nasty data races.

One of the most used techniques to manage concurrent data structures are locks. And when you use locks, you have a couple of strategies to choose from.

In this post, we will review the most general locking approach: coarse-grained synchronization.

Coarse-grained synchronization

The most apparent approach to solve the concurrency problem would be to add a lock mechanism to the data structure and lock the entire critical section before modifying it.

Let’s add a new field to the list type:

type list struct {
    mu   sync.Mutex
    head *node
    size int
}

And then lock the mutex before the critical sections of all methods:

func (l *list) Add(item []byte) bool {
    var pred, curr *node
    key, _ := hashstructure.Hash(item, nil)

    l.mu.Lock()
    defer l.mu.Unlock()
    pred = l.head
    curr = pred.next

    for curr.key < key {
    	pred = curr
    	curr = curr.next
    }

    if key == curr.key {
    	return false
    }
    n := newNode(item)
    n.next = curr
    pred.next = n
    l.size++
    return true
}

func (l *list) Remove(item []byte) bool {
    var pred, curr *node
    key, _ := hashstructure.Hash(item, nil)

    l.mu.Lock()
    defer l.mu.Unlock()
    pred = l.head
    curr = pred.next

    for curr.key < key {
    	pred = curr
    	curr = curr.next
    }

    if key == curr.key {
    	pred.next = curr.next
    	return true
    }
    return false
}

func (l *list) Contains(item []byte) bool {
    var pred, curr *node
    key, _ := hashstructure.Hash(item, nil)

    l.mu.Lock()
    defer l.mu.Unlock()
    pred = l.head
    curr = pred.next

    for curr.key < key {
    	pred = curr
    	curr = curr.next
    }
    return key == curr.key
}

As you can see, the mutex is locked before initializing the current and predecessor nodes.

Then we use the handy defer() method in Go to make sure we always unlock the mutex before returning.

If one thread wants to add an item, it will lock the mutex and continue with its operation. If another thread comes and wants to add or remove another item, it will hit the critical section, see the mutex is locked, and wait until it gets unlocked again. By the time the first thread finishes, there’s no more critical section to worry about.

The main advantage of this implementation is that it’s easy to understand and that we know it’s correct. Two goroutines will never step on each other toes. Every time a goroutine wants to execute a method, it will lock the entire critical section, and all the rest of the routines will have to wait.

It’s also easy to infer this strategy is starvation-free, which means a thread won’t be stuck forever waiting for access. Every operation must finish and release the lock at some point.

If we run the same code that generates data races but using our new version, we see the correct output for every run:

package main

import (
    "fmt"
    "sortedlist/coarselist"
    "sync"
)

func main() {
    l := coarselist.New()
    var wg sync.WaitGroup
    for i := 0; i < 200; i++ {
    	wg.Add(1)
    	go func(it int) {
    		item := []byte(string(it))
    		l.Add(item)
    		wg.Done()
    	}(i)
    }
    wg.Wait()

    fmt.Println(l.Size()) // 1st run: 200, 2nd run: 200, 3rd run: 200...
}

If the contention is low, meaning that not many threads are competing for the locks, this approach would be a reliable option. On the other hand, if you have many threads adding, removing, and checking for items, then there’s going to be a lot of wasted time.

Note: Here, we use thread and goroutine interchangeably even though technically they are not the same.

Can we do better? The answer is yet. But everything comes with a price.

In the next part, we will review a more granular approach that allows more than one thread to travel the list concurrently.

View original

#go #concurrency

- 1 toasts