So lately I’ve been approaching maximum capacity on my server for file storage. Seriously, how could I have so many files and not fit on 8TB+? Well, to be fair, it all has to do with my paranoia. 8TB is achieved over 8 disks. There are 2 RAID5s using 4 disks each. Nightly, I have a script that synchronizes the two RAIDs just in case I delete a file I have a backup from the day before. So in essence, I only have 2.7GB available In a RAID5 using 4 disks, you only store files on 3 disks and the 4th is a parity disk. Then there is some filesystem overhead which explains the ~300MB loss.
Let’s get to my point. I’ve been playing around with ZFS. Mainly because I wanted to do deduplication. On disks with so much data, there have to be redundant series of bytes. In that case, the point of deduplication is that the redundant series is only stored once. ZFS also allows for compression, thus making the redundant series take up less space.
Now, ZFS was originally made for Solaris systems. They do have libraries for linux now. I’m running on Ubuntu 11.10. So I installed zfs-fuse. Installing it will be left as an exercise to the reader.
I had 2 identical 230GB drives spinning idly in my server so I decided to test out ZFS on them. I created the filesystem like so:
zpool create zfs_230GB mirror /dev/disk/by-id/ata-HDS722525VLAT80_VN693ECFEBYTVD /dev/disk/by-id/scsi-SATA_HDS722525VLAT80_VN693ECFEBYTVD
This puts the two drives in a pool as a mirrored configuration. Upon completion, the drives are automatically mounted to /zfs_230GB. Nothing more to do, no formatting.
Not sure if I needed to, but I also issued the following commands:
zfs set compression=on zfs_230GB
zfs set dedup=on zfs_230GB
Then I proceeded to copy 270GB of stuff from my RAID5 onto the drive. It actually fit! I was a little bit surprised, but pleased. So now I wanted to find out how much space it actually took up. Tried doing a ‘df’ which returned:
Filesystem Size Used Avail Use% Mounted on
zfs_230GB 237G 236G 1.4G 100% /zfs_230GB
Okay. So that didn’t help (as expected because df doesn’t understand deduplication). So I wanted to find out how to get the physical space on the drives still available for compressed data. Tried to do:
$ sudo zfs list NAME USED AVAIL REFER MOUNTPOINT
zfs_230GB 236G 1.32G 235G /zfs_230GB
Clearly that didn’t help. But then I tried:
$ sudo zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
zfs_230GB 232G 227G 5.09G 97% 1.04x ONLINE -
So sweet! I still have 5GB available. Though I read on some documentation that ZFS reserves 1/64th of your drive for reserve storage. Which would be 230/64 = 3.59 for me, so I still have some space left. Same amount of data stored in 84% of the physical space? Sure I’ll take it.
Thanks for reading!