Remote Direct Memory Access
RDMA is a feature of the Network Interface Card (NIC) that allows one computer to have direct memory access to another one bypassing the kernel. This technology permits high-throughput and low-latency by minimizing demands on bandwidth and processing overhead.
Traditionally a packet in the NIC is first transferred into a kernel buffer in DRAM, at a speed determined by the bandwidth of the I/O subsystem. From the DRAM, the CPU loads the packet, processes it and stores it to the application buffer in DRAM. This process imposes a significant load on both CPU and memory. Clearly as connection speed increases, memory bottlenecks become more severe.
RDMA supports zero-copy networking where "zero-copy" referres to computer operations with no CPU involment in copying data from one memory area to another. RDMA implements a reliable transport protocol in hardware on the NIC that enables the NIC itself to transfer data directly to or from application memory, without having to execute a kernel call. Basically through RDMA an application read from or write to the memory of a remote application and the remote virtual memory address for the operation is carried within the RDMA message. The only thing that the application needs to do is registering the relevant memory buffer with its local NIC.
If on one hand RDMA permits low-latency and high-throughput, on the other hand the technology presents some disadvantages especially due to the fact that the target node is not notified of the completion of the request (1-sided communications). The common way to notify it is to change a memory byte when the data has been delivered, but it requires the target to poll on this byte. Not only does this polling consume CPU cycles, but the memory footprint and the latency increases linearly with the number of possible other nodes, which limits use of RDMA in High Performance Computing (HPC).
Common RDMA implementations are InfiniBand and iWARP.
References
— AlessandraScicchitano - 14 Aug 2012