I’m writing this mainly so I can remember what I did. Much of this just going to serve to supplement my own memory. If it helps someone else, that’s cool too.
I have a half-rack here at home with two servers that I use for about 20 qemu VMs. I have an instance with a 3 TB external disk attached I use for bacula. I use this to backup all of my Linux virtuals and reals.
I’ve had issues cleaning up old bacula backups both in the database and on disk. They seem to stay in the database and on disk in perpetuity despite their expiry date, especially when there are problems with bacula. The USB SATA case I use is flaky and needs replaced. So, I sometimes backups quit working without me being able to give them attention…for a few months in this case.
This stuff really should be scripted, but it’s also handy to have here.
Recently, the bacula server quit working not because the disk flaked out, but because I ran out of disk space . I needed to purge old backups, regardless of their retention date. I used find to locate the files and remove them both from the database and filesystem. Postgres and bacula refused to start, and since this has mostly been hobby stuff, I didn’t fix it for months.
Before I did that, I had prework that was required. I needed to resize the root disk so I could get Postgresql back onto it.
Resize the Root Disk
The root disk on the VM nearly filled up months ago and I moved the Postgres database to external storage (/home/postgresql). This wasn’t an optimal solution. I needed to free up enough disk space to do maintenance, so my first step was to resize the root disk and move the Postgres database back to it, freeing up a few gigs on the external drive used for backup storage.
First I had to grow the root QCOW volume on the real. The steps listed below grow the volume to 40 GB. I had to use parted to change the parameters in the partition table to the extents of the virtual disk. Note the use of qemu-nbd to mount the volumes in a way they could be accessed as block devices.
The use of these commands is left as an exercise for the reader.
root@fs4:~# cd /srv/vms/array1/
root@fs4:/srv/vms/array1# virsh shutdown backserv01
root@fs4:/srv/vms/array1# qemu-img info backserv01.qcow2
root@fs4:/srv/vms/array1# qemu-img resize backserv01.qcow2 40G
root@fs4:/srv/vms/array1# qemu-nbd -c /dev/nbd1 backserv01.qcow2
root@fs4:/srv/vms/array1# parted /dev/nbd1
root@fs4:/srv/vms/array1# qemu-nbd -d /dev/nbd1
root@fs4:/srv/vms/array1# virsh start backserv01; sleep 1; virsh console $(virsh list | grep backserv01 | awk '{ print $1 }')
Once I did that on the real, I had to resize the root disk in the VM. Although it’s encrypted with LUKS, that’s fortunately easy. If I was using LVM, I probably could have used pvresize and lvresize in conjunction with e2fsck and resize2fs to accomplish the same.
root@backserv01:~# e2fsck -f /dev/mapper/sda2_crypt
root@backserv01:~# resize2fs /dev/mapper/sda2_crypt
Move Postgres Back to the Root Volume and Get bacula Running Again
The next step was to move Postgres back to my newly-enlarged root disk. This would allow me to start Postgres and give me a little space on the backup disk to work. I shut down postgres and made a backup of the postgresql directory. Then, cleared out the old postgresql directory on the root disk, and copied over the files with tar. This is an old school way of doing it. cp -R should work these days, but the tar method has always kept relevant metadata intact for me.
Once that’s done, postgresql can be restarted and tested, and then the postgresql data under /home/postgresql can be deleted.
root@backserv01:~# systemctl stop postgresql
root@backserv01:~# cd /home/postgresql/
root@backserv01:home/postgresql/# tar cvfzip /tmp/home_postgres.tgz .
root@backserv01:home/postgresql/# rm -fr /var/lib/postgresql/*
root@backserv01:home/postgresql/# tar cfip - . | (cd /var/lib/postgresql/; tar xvfip -)
root@backserv01:home/postgresql/# vi /etc/postgresql/11/main/postgresql.conf
root@backserv01:home/postgresql/# usermod -d /var/lib/postgresql postgres
root@backserv01:home/postgresql/# systemctl start postgresql
root@backserv01:home/postgresql/# rm -fr /home/postgresql/11
bacula Can then be restarted and functionality verified.
root@backserv01:~# systemctl start bacula-director
You can verify that bacula is running while listing volumes with the following command:
root@backserv01:~# echo "list volumes" | bconsole | more
Deleting Old bacula Volumes and Recovering Disk Space
First I needed to find any volumes with errors and remove them from the database. It turns out that they were very old, so I was able to leave the removal of their volumes (files) for later.
root@backserv01:~# for i in $(echo "list volumes" | bconsole | grep -i error | awk '{ print $4 }'); do echo "delete volume=${i} yes" | bconsole; done
Then I needed to purge old volumes. I decided not to use pool volume retention periods since the server hadn’t been running in awhile. I wanted to make sure that I had *some* backups, even if they were dated.
root@backserv01:~# cd /home/bacula/backup/
root@backserv01:/home/bacula/backup# for i in $(find . -mtime +180 -name 'Diff-*' | xargs basename -a); do echo "delete volume=${i} yes" | bconsole; rm -v ${i}; done
root@backserv01:/home/bacula/backup# for i in $(find . -mtime +180 -name 'Inc-*' | xargs basename -a); do echo "delete volume=${i} yes" | bconsole; rm -v ${i}; done
root@backserv01:/home/bacula/backup# for i in $(find . -mtime +270 -name 'Full-*' | xargs basename -a); do echo "delete volume=${i} yes" | bconsole; rm -v ${i}; done
This freed up about 750 GB of disk so I could start running my backups again.
I still have a 835GB volume in the Default pool that needs to be purged. However, it contains 3 years of Catalog backups. They were configured to go to the Default pool and the Default pool was configured so that it would never rotate.
I’ve sense reconfigured bacula to have a storage pool for the Catalog. I should now get a week of Catalog backups in each volume and retain those volumes for 182 days, which is just slightly longer than the retention time for Full backups.
Once I know I have a good month of backups, I’ll remove the Default-0006 volume a recover the 835GB of space it’s occupying. I may have to do it sooner if I run out of disk again.