Review of GlusterFS - Distributed Filesystem
By Dipankar Das, Gaea News NetworkMonday, January 4, 2010
Overview:
GlusterFS is an open source parallel distributed/network filesystem. The software integrates file system, an operating system and a management UI and can scale several petabytes of data. It has two components, server and client. The storage server uses Glusterfsd and the client node runs Glusterfs client to mount the filesystem. In order to mount GlusterFS filesystem, the client should have FUSE support in the kernel. But server can run on any node. GlusterFS server runs on Linux, FreeBSD and Opensolaris and the client runs only on Linux. The platform provides TCP/IP support.
- GlusterFS is a clustered filesystem to store unstructured data. One of the important feature is global namespace. It uses an index to look up files using hash algorithm.
- Gluster is not only meant for storing data. Users also can store multiple copies of virtual machine images and use the unique features of Gluster Storage Platform to create failover capabilities for the VM environments.
- The Storage Platform is designed to withstand a hardware failure and allows users to recover VMs without disrupting the services. It uses checksum healing to do this.
“We compare the live file to the copy and that allows you to figure out the error within the file as opposed to working at the file level,” O’Brien, VP, Marketing says. “So you don’t have to rebuild the entire file. VM images can be very large so stopping to rebuild it can be very disruptive.”
- One interesting feature of Gluster is Automatic File Replication. It replicates your I/O in real time. The AFR provides Gluster the ability of Hardware Failure.
- The other cool feature is User Space Design. The advantage is that no kernel module is required. Complex features can be added quickly. Bugs do not crash the OS. So, GlusterFS runs faster than kernel based OS.
- GlusterFS addresses the parallelization at volume management and I/O scheduling level as opposed to block level for other cluster based filesystems. Removing the centralized metadata provides better scaling and reliability.
- If the data goes beyond 32 TB, the File System Check(FSCK) downtime is a big issue. Gluster does not have any FSCK. It has the ability of self-healing.
- Every feature in GlusterFS like network, scheduler, cache to disk is defined as a logical volume. User can build them up in a customizable, optimized storage solution level.
- GlusterFS incorporates stackable modular design feature. All the features like performance options, distributed locking, replication are implemented as stackable modules.
- The filesystem supports rot-13 encryption module.
Tags: Distributed File system, Encryption, File Systen Check, Kernel, Linux, Metadata, Stackable Module, TCP/IP, User Space Design, Virtual Machine
This Sucks