Performance-friendly I/O interfaces

read()/write() vs. mmap()/write() vs. sendfile()

For applications with high input/output performance requirements (including network I/O), it is worthwhile to look at operating system support for efficient I/O routines.

As an example, here is simple pseudo-code that reads the contents of an open file in and writes them to an open socket out - this code could be part of a file server. A straightforward way of coding this uses the read()/write() system calls to copy the bytes through a memory buffer:

#define BUFSIZE 4096
long send_file (int in, int out) {
  unsigned char buffer[BUFSIZE];
  int result; long written = 0;
  while (result = read (in, buffer, BUFSIZE) > 0) {
    if (write (out, buffer, result) != result)
      return -1;
    written += result;
  }
  return (result == 0 ? written : result);
}

Unfortunately, this common programming paradigm results in high memory traffic and inefficient use of a system's caches. Also, if a small buffer is used, the number of system operations and, in particular, of user/kernel context switches will be quite high.

On systems that support memory mapping of files using mmap(), the following is more efficient if the source is an actual file:

#define BUFSIZE 4096
long send_file (int in, int out) {
  unsigned char *b;
  struct stat st;
  if (fstat (in, &st) == -1) return -1;
  if ((b = mmap (0, st.st_size, PROT_READ, 0)) == -1)
    return -1;
  madvise (b, st.st_size, MADV_SEQUENTIAL);
  return write (out, b, st.st_size);
}

An even more efficient - and also more concise - variant is the sendfile() call, which directly copies the bits from the file to the network.

long send_file (int in, int out) {
  struct stat st;
  if (fstat (in, &st) == -1) return -1;
  off_t offset = 0;
  return sendfile (out, in, &offset, st.st_size);
}

Note that an operating system could optimize this internally up to the point where data blocks are copied directly from the disk controller to the network controller without any involvement of the CPU.

For more complex situations, the sendfilev() interface can be used to send data from multiple files and memory buffers to construct complex protocol units with a single call.

-- SimonLeinen - 26 Jun 2005

One thing to note, the above usage of write and sendfile is simplified, these system-calls can stop in the middle and return the number of bytes written, for real-world usage you should have a loop around them to continue sending the rest of the file and handle signal errors.

The first loop should be written as:

#define BUFSIZE 4096
long send_file (int in, int out) {
  unsigned char buffer[BUFSIZE];
  int result; long written = 0;
  while (result = read (in, buffer, BUFSIZE) > 0) {
    ssize_t tosend = result;
    ssize_t offset = 0;
    while (tosend > 0) {
      result = write (out, buffer + offset, tosend);
      if (result == -1) {
        if (errno == EINTR || errno == EAGAIN)
          continue;
        else
          return -1;
      }
      written += result;
      offset += result;
      tosend -= result;
    }
  }
  return (result == 0 ? written : result);
}

References

  • Project Volo Design Document, OpenSolaris, 2008. This document contains, in section 4.5 ("zero-copy interface") a description of how sendfilev is implemented in current Solaris, as well as suggestions on generalizing the internal kernel interfaces that support it.

-- BaruchEven - 05 Jan 2006
-- SimonLeinen - 12 Sep 2008

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r5 - 2008-09-12 - SimonLeinen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2004-2009 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.