/sys/doc/ Documentation archive

A DHT-based Backup System

Note: Russ Cox says that this was “an internal MIT workshop submission, a draft of a draft of a draft” it is only provided here for the sake of completeness and for historical reference purposes.

Introduction

Distributed hashtables have been proposed as a way to simplify the construction of large-scale distributed applications(e.g.[1,6]). DHTs are completely decentralized systems that provide block storage on a changing collection of nodes spread throughout the Internet. Each block is identified by aunique key. DHTs spread the load of storing and serving blocks across all of the active nodes and keep the blocks available as nodes join and leave the system.

This paper presents the design and implementation of a cooperative off-site backup system, Venti-DHash. Venti-DHash is based on a DHT infrastructure and is designed to support recovery of data after a disaster by keeping regular snapshots of filesystems distributed off-site, on peers on the Internet. Where as conventional backup systems incur significant equipment costs, manual effort and high administrative overhead, we hope that a distributed backup system can alleviate these problems, making backups easy and feasible. By building this system on top of a DHT, the backup application inherits the properties of the DHT, and serves to evaluate the feasibility of using a DHT to build larg escale applications.

The backup system is based around the Venti archival storage system[9], replacing the storage back-end with the DHash distributed hashtable[5]. Venti-DHash operates as an archiver that takes complete filesystem snapshots, at a block level. Each unique block is only stored once, even across snapshots. DHash is used to balance storage and network load, aswell as to provide adequate availability blocks.

A number of changes were made the internals of DHash in order to meet our desired performance and availability goals. Our improved version of DHash is a DHT with good read and write performance, and 5 nines of availability perblock (assuming an average node reliability of 90%). The resulting system is now being tested byrunning backups of our primary fileserver.

The rest of the paper is structured as follows. Section 2 briefly surveys related work. The design of the backup system is presented in Section 3. Next, we describe how DHash was changed to achieve the desired performance and availability goals in Section 4. Section 5 describes some preliminary performance benchmarks and analysis we have conducted on our prototype. Finally, weconclude in Section 6.