Data Dissemination

In a generalization of the FileTransfer application, some data set has to be distributed from a source to a (possibly large) set of receivers.

Example Applications

There are many use cases for this; here are some examples:

  • The LHC case: Data is generated at one point (CERN, the "Tier-0"), and identical copies must be transferred to a distributed set of "Tier-1" centers for storage, processing, and further (partial) dissemination.
  • Software download: A new version of e.g. a GNU/Linux distribution is released by a publisher. Many thousand users want to download it over the first few hours/days.
  • OS/VM image distribution for data centers and clouds: A disk image containing an OS installation should be replicated to many servers so that virtual machines can be started from it.

Protocols and Systems

  • BitTorrent is a peer-to-peer protocol that distributes the dissemination work among a large and dynamic set of nodes.
  • Mirror Servers can be used to store copies of popular files; clients are somehow directed to a mirror that is "close" to them and/or has free capacity.
  • multicast is useful to efficiently replicate bits when many destinations are interested in the same data; however, building reliable transmission protocols above it is not easy. Some examples of such attempts are udpcast and FLUTE. The Ghost software by Symantec (originally Norton) is a popular commercial system that can use multicast for image distribution.
  • USENET News solves an even more general problem: It distributes data (articles) from many sources to many receivers in a (rather static) peer-to-peer network.
  • scp-wave from the OpenNebula cloud toolkit builds on SSH's scp and uses a distribution tree.

-- SimonLeinen - 08 Dec 2010

Topic revision: r1 - 2010-12-08 - SimonLeinen
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2004-2009 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.