Pacuna's Blog

Buffered vs. Unbuffered writes

Let's talk about buffering. What does it mean to buffer data? And, how does buffered and unbuffered data look like in practice?

Let's use the cat command as an example. The following is an excerpt from the cat's man page:

The cat utility reads files sequentially, writing them to the standard output. The file operands are processed in command-line order. If file is a single dash (`-') or absent, cat reads from the standard input.

So cat always writes to stdout, but it can read from different sources. If you don't pass any arguments, it will read from stdin.

One thing to notice is that when we write data using stdin and send it by pressing enter, it gets written back to stdout immediately, regardless of the number of characters we send:

$ cat
hello # enter
hello
world # enter
world

This is an example of unbuffered data. There's no buffer between the source and the destination (at least from our perspective) so all the writes happen immediately after the data is sent.

We can replicate this behavior with this simple Go program:

package main

import (
    "io"
    "log"
    "os"
)

func main() {
    dst := os.Stdout
    src := os.Stdin

    // io.Copy copies until EOF is reached or error occurs
    if _, err := io.Copy(dst, src); err != nil {
    	log.Fatal(err)
    }
}

The program reads from stdin and writes to stdout without explicitly using any buffer. The consequence of this is that every read gets written immediately after it's sent. You can use Ctrl-d to send EOF and break from the if.

$ go run cat.go
hello # enter
hello
world # enter
world

Now, what if instead of writing directly to stdout we write to a new object that has a buffer and also an underlying object to which data is written after passing through the buffer? For that, we can use the NewWriter method from the bufio package:

package main

import (
    "bufio"
    "io"
    "log"
    "os"
)

func main() {
    // We write to the buffer, the buffer writes to stdout
    dst := bufio.NewWriter(os.Stdout)
    src := os.Stdin

   // The buffer also implements io.Reader so we can still use Copy()
    if _, err := io.Copy(dst, src); err != nil {
    	log.Fatal(err)
    }
}

If you run this program and write some data, you'll notice nothing gets written back to stdout. Even after pressing Ctrl-d to send EOF. Why is that?

$ go run bufcat.go
hello # enter
world # enter
$ 

The documentation for NewWriter mentions the following:

NewWriter returns a new Writer whose buffer has the default size.

And if you take a look at the bufio package source code you will find the default size defined as:

const (
    defaultBufSize = 4096
)

OK, so now we know the default buffer size is 4096 bytes. But, how does it work?

When we create the variable dst := bufio.NewWriter(os.Stdout) we are creating an object (or struct) that has a default sized buffer and an underlying object to which data gets written after the buffer is flushed. dst will write the buffered data to the underlying object once the buffer is full, or you manually call its Flush() method. So in this example, since we are not sending more than 4096 bytes, the buffer never gets full, and we are never calling Flush() manually, so the data never gets written to stdout.

To test this hypothesis, let's create a file with 4095 bytes and send it to our program. If we are right, we expect no output. Then we will repeat the experiment with a file with 4096 bytes and see what happens.

$ perl -E 'say "1" x 4094' > junk
$ wc junk
1       1    4095 junk.

We created a file with a bunch of 1s, and that contains 4095 bytes. Now, the program doesn't take any arguments, but we can always redirect the file to stdin:

$ go build bufcat.go
$ ./bufcat <junk
$

As expected, we get no output. Now let's try with 4096 bytes:

$ perl -E 'say "1" x 4095' > junk
$ wc junk
1       1    4096 junk

$ ./bufcat <junk
11111111111111111111111111111111....

We get the output. We wrote more data than the buffer could hold, which triggered a Flush() call, and the data was written to the underlying object, which is stdout.

You can also create a Reader with a custom buffer size by using:

dst := bufio.NewWriterSize(os.Stdout, 10)

Now, if you run the program and send data using stdin it will flush after writing 10 bytes.

$ go run bufcat.go
123
456
78
123
456
78

In the case of the default buffer size, we can always explicitly call Flush() after we receive EOF to write the buffered data to stdout:

func main() {
    dst := bufio.NewWriter(os.Stdout)
    src := os.Stdin

    if _, err := io.Copy(dst, src); err != nil {
    	log.Fatal(err)
    }
    dst.Flush()
}

Now if you send data and then press Ctrl-d the data will be written to stdout even when the buffer is not full.

Buffered writes play a critical role in systems where you need efficient reads and writes, which are done in blocks or chunks. Most storage systems will perform better when writing or reading chunks of data up to a certain size. It's a good technique to know about, and the bufio package gives us great tools to implement it.

Thanks for reading!

#programming #go #golang #files

- 1 toasts