ZFS is the coolest

So lately I’ve been approaching maximum capacity on my server for file storage. Seriously, how could I have so many files and not fit on 8TB+? Well, to be fair, it all has to do with my paranoia. 8TB is achieved over 8 disks. There are 2 RAID5s using 4 disks each. Nightly, I have a script that synchronizes the two RAIDs just in case I delete a file I have a backup from the day before. So in essence, I only have 2.7GB available In a RAID5 using 4 disks, you only store files on 3 disks and the 4th is a parity disk. Then there is some filesystem overhead which explains the ~300MB loss.

Let’s get to my point. I’ve been playing around with ZFS. Mainly because I wanted to do deduplication. On disks with so much data, there have to be redundant series of bytes. In that case, the point of deduplication is that the redundant series is only stored once. ZFS also allows for compression, thus making the redundant series take up less space.

Now, ZFS was originally made for Solaris systems. They do have libraries for linux now. I’m running on Ubuntu 11.10. So I installed zfs-fuse. Installing it will be left as an exercise to the reader.

I had 2 identical 230GB drives spinning idly in my server so I decided to test out ZFS on them. I created the filesystem like so:
zpool create zfs_230GB mirror /dev/disk/by-id/ata-HDS722525VLAT80_VN693ECFEBYTVD /dev/disk/by-id/scsi-SATA_HDS722525VLAT80_VN693ECFEBYTVD

This puts the two drives in a pool as a mirrored configuration. Upon completion, the drives are automatically mounted to /zfs_230GB. Nothing more to do, no formatting.

Not sure if I needed to, but I also issued the following commands:
zfs set compression=on zfs_230GB
zfs set dedup=on zfs_230GB

Then I proceeded to copy 270GB of stuff from my RAID5 onto the drive. It actually fit! I was a little bit surprised, but pleased. So now I wanted to find out how much space it actually took up. Tried doing a ‘df’ which returned:

Filesystem Size Used Avail Use% Mounted on
zfs_230GB 237G 236G 1.4G 100% /zfs_230GB

Okay. So that didn’t help (as expected because df doesn’t understand deduplication). So I wanted to find out how to get the physical space on the drives still available for compressed data. Tried to do:

zfs_230GB 236G 1.32G 235G /zfs_230GB

Clearly that didn’t help. But then I tried:

zfs_230GB 232G 227G 5.09G 97% 1.04x ONLINE -

So sweet! I still have 5GB available. Though I read on some documentation that ZFS reserves 1/64th of your drive for reserve storage. Which would be 230/64 = 3.59 for me, so I still have some space left. Same amount of data stored in 84% of the physical space? Sure I’ll take it.

Thanks for reading!

Fekete András

Liked it? Take a second to support me on Patreon!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.