Backup Bliss
This entry was originally published in our developer’s wiki, we’ve pasted it here to conserve mouse clicks.
Back it up here
So, internally we’ve found a need to do a lot of file transfer and encryption with off site backups. It took most of the day, but we’ve got a cool solution that works well, and we thought we’d share it with you.
Background and problem
- I need to accept tons of transfers from a variety of clients in a secure way
- I want to have these files backed up in a large device on my network
- I want to make sure that these backups are stored encrypted
- I also need to make sure that these backups are encrypted remotely (via Amazon S3)
Our Solution
- Secure transfer via sftp + rssh
- Use Ubuntu’s encfs as the encryption method
- Use Jungledisk and rsync to backup the stored files to S3
Hey, you think you’re so cool, using standard tools why the need to write a wiki on this?
Not everything works out of the box…what to do?
sftp+rssh+chroot == wtf?!?”’
For something that purports to be secure, we found one glaring nuisance (not an outright security flaw), but something that would raise eyebrows. When you use chroot to make your jail for your sequestered / directory, you still need to grant your users for the limited scp and sftp execution access to the entire (limited) filesystem. That includes the /home directory. So if your user testuser logs in, and does a chdir .. – it will be able to scan through and see if there are other clients that this server is feeding to. Definetaly a big no-no in our book. Some forums we ran across raised this issue, but we found that they went unanswered. Since disk space is cheap, we ultimately made a new chroot jail for each customer, so to speak.
We largely followed the directions found at this ubuntu forum entry, but with the modification that you’ll need to make run chroot.sh <jail location> for each jail you want to setup.
Then for each user you add, you will need to modify the /etc/password accordingly to set the home directory of the user to the correct jail root.
In the end, we ultimately put these entries directly at the end of the /etc/rssh.conf – the per-user options block.
user=user1:011:00011:"/home/jail1" user=user2:011:00011:"/home/jail2"
Deciphering Encryption
Ubuntu’s latest releases come with two different flavors of folder encryption. encfs and ecryptfs. We chose to use encfs because it gives a bit more control over the location and more importantly the multiple ways in which one can mount the folder. According to this blog, performance between the two seems negligible.
encfs allows you store data in an encrypted folder, let’s call it /var/encrypteddata. In order to manipulate it, you need to mount it using
encfs /var/encrypteddata ~/letmesee
Now, if i look at ~/letmesee, i can see all the correct filenames and can read/write to it. If i look at /var/encrypteddata, the filenames are gibberish as are the contents.
Another cool part is that since the directories are locked with a passphrase, I can mount it across the network so long as I have encfs running on my machine. Of note, I had to make sure that the versions of encfs were at least the same version. I could not mount a volume encrypted on version 1.4.2 (Intrepid Ibex) on the default 1.3.2 (Hardy Heron).
Why not truecrypt?
This program intrigues us greatly, and we use it in other contexts, but for our needs, we found that it might have caused more issues, so ultimately we didn’t implement our solution with it.
The issues:
- truecrypt needs to either allocate a big block of a file as the encrypted store
- or it needs to mount an entire drive
The result of these two directives made it difficult to do a few things:
- We wanted more flexibility with just dumping data to a directory and not have to allocate the storage ahead of time. N
- We wanted also to send incremental backups of the data to S3. We saw that either we would have to decrypt the data out of the volume and deposit it to S3 and re-encrypt it (higher CPU cost, decrypt->recrypt, but that’s not necessarily a bad thing). Encfs let us just do a direct copy of the encrypted individual files to S3.
Implementing encfs So, ultimately for our solution, we attached our NAS to our machine running the sftp+rssh server. And what we set as the home directories of our jailed/chroot-ed users, the visible endpoint of an encfs mount. I had some trouble but found that the fusermount option for multiple users makes for the mounted directory being visible to the rssh’ed user on sftp. This page got me started on the road to using encfs.
To reiterate:
encfs -o allow_other /mnt/nasdrive/encrypted /home/user1/data-to-encrypt rssh user1 home dir: /home/jail1/home/user1/data-to-encrypt
Note, the -o allow_other flag is the fuse option to allow your jailed user to have access the folder mounted by root
Jungledisk is awesome
There, I said it. Using the linux command line tool, we mounted our s3 filesystem with the fstab entry:
jungledisk /mnt/s3 fuse noauto,config=/etc/jungledisk-settings.xml 0 0
If you get any weird errors on trying to write files to the s3 mount, make sure that the tmp directory is actually viable. I copied over the xml settings verbatim from windows and it gave me two problems. One, the amazon secret key is not stored by default in the windows file, plus it has a temp directory setting to some win32 specific filepath.
After that, rsync is a breeze from getting the /mnt/nasdrive/encrypted directory to somewhere in s3. The advantage here is that we’re just uploading the already encrypted data in place. Because it’s encrypted on a per file basis, we can run rsync (or just cp for that matter) to just do a delta of files that do or do not exist (instead of say over a huge block of a truecrypt volume).
Share
Tags
Similar Articles
BAO Systems and Dimagi Announce Integration Partnership
Learn more about the new partnership between Dimagi and Bar Systems. This collaboration will support communities connecting CommCare and DHIS2 to harness technology and data for greater impact, as they improve data processes from data collection to integration to analysis and use.
Staff Blog
May 29, 2023
A Day in the Life Of - Wouter Vink, Director of Solutions Delivery, Solutions Division
The Q&A provides insights into the Solutions Delivery team at Dimagi, including A Day in the Life of Wouter Vink, Director of Solutions Delivery, Solutions Division, their consulting role, unique challenges, remote collaboration, career advice, and excitement for the future.
Staff Blog
May 26, 2023