Computer data storage - Network storage

From Helpful
Jump to: navigation, search
Computer data storage
These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.


Filesystems

(with a focus on distributed filesystems)

http://en.wikipedia.org/wiki/List_of_file_systems#Distributed_parallel_fault-tolerant_file_systems http://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems

NFS notes

See NFS notes

Relevant here: pNFS / PanFS / Panasas

SMB notes

See SMB, CIFS, Samba, Windows File Sharing notes


GlusterFS

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
Fault tolerance:  with replication there is a self-healing scrub operation 
Speed:            scales well. Metadata not distributed so some fs operations not as fast as read/write
Load balancing:   implied by striping, distribution
Access:           library (gfapi), POSIX-semantics mount (via FUSE), built-in NFSv3, or block device (since  2013)
Expandable:       yes   (add bricks, update configuration. with some important side notes, though)
Networking:       TCP/IP, Infiniband (RDMA), or SDP


CAP-wise it seems to be AP.

Seems relatively easy to manage compared to various others - though resolving problem is reportedly more involved.

Wide-ishly used, implying a bunch of support in the form of experience (forums, IRC, but also RedHat support).


Any host that runs glusterfsd is effectively a server. Each server can have one or more storage bricks (see terms), which can be dynamically included into one storage pool.

How to use bricks is up to configuration, which is client-controlled, can be changed at will, and part of a storage pool's state.

Striping is based on a hashing algorithm known to all servers, also part of that state, avoids the need for a metadata server.


To deal (consistency-wise) with bricks that were temporarily offline, a daemon dealing with heal operations was introduced. (Before that it was more manual, you basically didn't want that to happen).


Seems to be better than various others at: (note: this in part from the somewhat larger default block size)

  • streaming throughput
  • sometimes IOPS

Seems to be slower than others at

  • a number of metadata operations

(so arguably good for large-dataset stuff, less ideal for many-client stuff).


On latency
Some operations (most on metadata) slow down in proportion to RTT, some because they are sequential (e.g.
ls
), some because all servers involved in a volume must be contacted (for example for self-heal checks, done at file open time).

This also depends on the translators in place - consider for example what replication implies. For this reason, something like geo-replication using the basic replication translator is probably a bad idea (there is clever geo-replication you can use instead).


Terminology

brick - typically corresponds to a distinct backing storage disk. Any particular network node may easily have a few. In more concrete terms, it is a directory on an existing local filesystem location exposed as usable by gluster.

client - Will have a configuration file mentioning bricks on servers, and translators to use.

server - a host that can expose bricks to clients

storage pool - a trusted network of storage servers.

translator - given a brick/subvolume, applies some feature/behaviour, and presents it as a subvolume

subvolume - a brick once processed by a translator

volume - the final subvolume, the one you consider considered mountable


On translators

Auth and networking

tl;dr:

No auth
You can use your generic-purpose firewall to not allow the wrong machines in (...or...)
gluster itself can have per-volume host whitelists, e.g.
gluster volume set volname auth.allow 192.168.1.*

Getting started

Peers
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

To see the status of peer servers:

gluster peer status

To add hosts to the storage pool:

gluster peer probe servername

You can do this from any member.


Do a status again to see what happened.

...on all nodes if you wish; if you don't use DNS you may find that the host you probed knows the prober only by IP. You may wish to do an explicit probe, just to make it realize it has a name.


You can detach peers too, though not while they are part of a volume.



Creating volumes

Mounting volumes

One-shot:

mount -t glusterfs host:/volname /mnt/point

fstab:

host:/volname /mnt/point glusterfs defaults,_netdev 0 0


You mount a network URL because the client doesn't have to be a server


expand, shrink; migrate, rebalance

On failure

Use RAID?

Errors

peer probe: failed: Error through RPC layer, retry again later

In my case, this was caused by having mismatched versions of glusterfs.


See also

and/or TOREAD myself:

MooseFS

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Similar to Google File System, Lustre, Ceph

Fault tolerance: replication (per file or directory)
Speed: Striping (for more aggregate bandwidth)
Load balancing: Yes
Security: user auth, POSIX-style permissions

Userspace.

Fault-tolerance — MooseFS uses replication, data can be replicated across chunkservers, the replication ratio (N) is set per file/directory

Easy enough to set up.

Hot-add/remove

Single metadata server?

http://en.wikipedia.org/wiki/Moose_File_System

LizardFS

Fork of MooseFS


Ceph FS

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
Fault tolerance: Replication (settings are pool-wide(verify)), journaling (verify).
Speed:           scales well, generally good
Load balancing:  implied by striping
Access:          POSIX-semantics mount, library, block device. Integrates with some VMs.
Expandable:      yes
Networking:      
License? Paid?   Open source and free. Paid support offered.

Seems to focus more on scalability, failure resistance, and some features useful in virtualization environment, and to some degree easy management.

...at some cost of throughput on typical use e.g. compared to gluster, but some of that can be informedly mitigated.

Drive failure is dealt with well, so there is no critical replacement window as there is with RAID5, RAID6.

Common, apparently still leading gluster a bit.

Still marked as a work in progress with hairy bits - but quite mature in many ways. Opinion seems to be "a bit of a bother, but works very well".

Its documentation not quite so much yet, so not the easiest to set up.


Can be used as a block device as well as for files (verify)


See also:


Lustre

Sheepdog

BeeGFS (previously Fraunhofer Parallel File System, FhGFS)

RozoFS

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Fault tolerance: can deal with missing nodes Speed: Seems good, and better than some on small IO Load balancing: distributed storage Security: Access: POSIX-like

Roughly: like gluster, but deals with missing nodes RAID-style (more specifically, an erasure coding algorithm ).

Has a single(?) metadata server

SeaweedFS

MogileFS

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
Fault tolerance:  configurable replication
                  avoids single point of failure - all components can be run on multiple machines
Speed:            
Load balancing:   
Access:           
Expandable:       
Networking:       


Userspace (no kernel modules)

Files are replicated (to the wishes of the class they are in), so you can have different kinds of files be safer, while saving disk space for things you could cheaply rebuild.


See also:


XtreemFS

HDFS

Gfarm

CXFS (Clustered XFS)

https://en.wikipedia.org/wiki/CXFS


pCIFS - clustered Samba

(with gluster underneath?) http://wiki.samba.org/index.php/CTDB_Setup


OCFS2

PVFS2

OrangeFS

OpenAFS

Tahoe-LAFS

DFS

http://www.windowsnetworking.com/articles-tutorials/windows-2003/Windows2003-Distributed-File-System.html


GPFS (IBM General Parallel File System)

Object stores

Block devices