ICT – Blog – [20230105] – Consolidation of New rsync Backup Scripts

ICT Blog

As a followup of my activities related to rsync backup scripts yesterday, today I consolidated a new organization of the backup scripts.

Motivation

The main reason for the reorganization was to stop the divergence of the various backup scripts across hosts. Over the years, when provisioning a new (virtual) machine, I used to simply copy and adapt backup scripts from an arbitrary other local machine. However, from time to time the existing scripts underwent bug fixes and functional improvements, and they started to diverge. Needless to say, syncing the changes amond the scripts became increasingly difficult as the number of local machines increased.

Requirements

I had the following main requirements:

  • Optimal reuse of existing script code;
  • No dependence on centralized storage of (parts of) the backup scripts;
  • No change in the existing cron.daily or log-files structure;
  • Preservation of backup-script names.

Design

I had to choose between the following two alternatives:

  • Design a script that takes arguments through the command line.
  • Design a script that takes arguments through shell variables.

I decided to choose the second approach, and created a Bash script named backup_template [shown below] that checks for the definition of several variables outlining the required backup action and then executes that action. This script is not supposed to, and will not run properly without definition of required variables. The idea is that specific backup scripts (replacing the original, diverging, backup scripts) simply define these variables for the applicable backup action, and then invoke the (shared) backup script.

Input Variables of the Shared Backup Script

The shared backup script backup_template takes the following variables at its input (refer to comment in the scripts below for their explanations):

  • Mandatory:
    • is_volume_prefix
    • FSARR
    • REMOTE_HOST
    • REMOTE_PREFIX
  • Optional:
    • BWLIMIT

Suggestions for Future Improvements

  • Use VOLUMES and DIRECTORIES variable names instead of FSARR so the backup_template script can do hybrid Volume Backups and Directory Backups. This would also make the is_volume_prefix obsolete.
  • For Volume Backups, consider requiring the specification of an absolute mount point instead of something relative to /mnt.
  • Consider renaming REMOTE_HOST to something more descriptive. In many cases, it is not just the host…
  • Consider the use of Ansible for this.

Deployment

The new backup-script structure was deployed and tested today on:

Volume Backup

A typical Volume Backup looks like this:

#!/bin/bash

# Whether this is a Volume Backup or a Directory Backup.
is_volume_backup="true"

# Local filesystems/volumes or directories to be backed up.
FSARR=( st )

# Remote host and prefix.
REMOTE_HOST=user@host.rsync.net
REMOTE_PREFIX=backup

# The bandwidth limit, in KB/s (optional).
BWLIMIT=256

. ./backup_template

Directory Backup

Below is a typical Directory Backup script:

#!/bin/bash

# Whether this is a Volume Backup or a Directory Backup.
is_volume_backup="false"

# Local filesystems/volumes or directories to be backed up.
FSARR=( /mnt/st /home )

# Remote host and prefix.
REMOTE_HOST=rush.jdj
REMOTE_PREFIX=/mnt/rush/backup2

# The bandwidth limit, in KB/s (optional).
# BWLIMIT=256

. ./backup_template

Shared Backup Script

Here’s the backup_template script:

#!/bin/bash

# Backup specified filesystems or directories.

# Whether this is a Volume Backup or a Directory Backup.
#
# A Volume Backup insists that the source directories are (all) partitions
# for which an entry exists in /etc/fstab.
# In addition, a Volume Backup does not get the source station's name in its path on the remote.
#
# The alternative Directory Backup only insists on the existence of the source directories (as directory).
# A Directory Backup's suffix may not be unique among source stations,
# and therefore the source stations's name is included in the path on the remote.
#
# Note: This script does not support mixing 'Volume' and 'Directory' backups (yet).
if [[ -z "$is_volume_backup" ]]; then
  echo "is_volume_backup is NOT set; aborting."
  exit -1
fi

# Our hostname; used to separate data from multiple machines on the destination.
# Only relevant for Directory Backups.
HOSTNAME=`hostname -f`

# Mount directory prefix.
# Only relevant for Volume Backups.
MNT=/mnt

# Local filesystems/volumes or directories to be backed up.
if [[ -z "$FSARR" ]]; then
  echo "FSARR is NOT set; aborting."
  exit -1
fi
FSARR_SIZE=${#FSARR[@]}

# Remote host and prefix.
if [[ -z "$REMOTE_HOST" ]]; then
  echo "REMOTE_HOST is NOT set; aborting."
  exit -1
fi
if [[ -z "$REMOTE_PREFIX" ]]; then
  echo "REMOTE_PREFIX is NOT set; aborting."
  exit -1
fi

# The bandwidth limit, in KB/s (optional).
# BWLIMIT=128

# Options to rsync for the backup.
# -a		: Backup, preserves pretty much everything (owner, group, permissions, mod time, device and special files).
# -v		: Verbose.
# --delete	: Delete extraneous files on target.
# --fake-super  : Store/recover privileged attrs using xattrs.
# --bwlimit     : Sets bandwidth limit (128 KB/s ~ 1 Mbps).
RSYNC_OPTS="-av --delete --fake-super"
if [[ -v BWLIMIT ]];
then
  RSYNC_OPTS+=" --bwlimit=$BWLIMIT"
fi

# Check to see if the remote host is up and accepts ssh.
#
# Note that
# - Our public key (~/.ssh/id_rsa.pub has to be present on
#   the remote host (in ~/.ssh/authorized_keys on the remote host).
# - The host key of the remote host has to be present in
#   ~/.ssh/known_hosts. This is done automatically by ssh on the
#   first ssh connection to the remote host, but requires user confirmation
#   (only once).
# Therefore: always dry-run this script interactively at least once.
#
# The remote host must be supplied in $1.
function check_ssh_to_remote_host {
  # For some odd reason, the 'true' command on $REMOTE_HOST (at least on rsync.net) exits with code 1.
  # ssh $REMOTE_HOST true || { echo "No ssh to $REMOTE_HOST, abort." >&2; exit -1; }
  ssh $1 ls >& /dev/null || return -1
}

# A naive yet effective check to see that our backup destination is either mounted as a filesystem or exists as a directory.
# $1 is the remote host; $2 is the remote prexix (must be dir).
function check_target_dir_remote_host {
  ssh $1 "mount | grep -q $2" || ssh $1 ls $2/ >& /dev/null || return -1
}

# Do the actual backup from $1 to $2.
function backup {
  echo "->Backing up from $1 to $2, starting:  `date -u`"
  if [[ -v BWLIMIT ]];
  then
    echo "  -> Bandwidth Limit set to $BWLIMIT KB/s."
  else
    echo "  -> Bandwidth Limit not set."
  fi
  echo "  Disk usage on $1: "
  echo "    " `df -h $1`
  echo "  Disk usage on target:"
  echo "    " `ssh $REMOTE_HOST "df -h $REMOTE_PREFIX"`
  rsync $RSYNC_OPTS $1 $2 2>&1
  RETVAL=$?
  if [ $RETVAL -eq 0 ]; then
    echo "->Success!"
  else
    echo "->Failed!"
  fi
  echo "  Disk usage on $1: "
  echo "    " `df -h $1`
  echo "  Disk usage on target:"
  echo "    " `ssh $REMOTE_HOST "df -h $REMOTE_PREFIX"`
  echo "Done, end-time: `date -u`"
}

# Perform the backup for $1, which should contain the full (absolute) path.
function check_and_backup {
  path="$1"
  echo "=============================="
  echo -n "Backing up $path, status: "
  if [ "$is_volume_backup" = true ]; then
    # Volume Backup
    mount | grep -q $path >/dev/null 2>&1
    RETVAL=$?
    if [ $RETVAL -eq 0 ]; then
      echo "mounted!"
      backup $path $REMOTE_HOST:$REMOTE_PREFIX
    else
      # Attempt to mount the local filesystem if found in /etc/fstab.
      echo "not mounted!"
      cat /etc/fstab | grep -q $path
      RETVAL=$?
      if [ $RETVAL -eq 0 ]; then
        echo -n "Filesystem in /etc/fstab; trying to mount..."
        mount $path >/dev/null 2>&1
        RETVAL=$?
        if [ $RETVAL -eq 0 ]; then
          echo "success!"
          backup $path $REMOTE_HOST:$REMOTE_PREFIX
          echo -n "Trying to umount $path..."
          umount $path >/dev/null 2>&1
          RETVAL=$?
          if [ $RETVAL -eq 0 ]; then
            echo "succes!"
          else
            echo "failed, ignoring!"
          fi
        else
          echo "failed, skipping!"
        fi
      else
        echo "Filesystem not in /etc/fstab; skipping!"
      fi
    fi
  else
    # Directory Backup
    backup $path $REMOTE_HOST:$REMOTE_PREFIX/$HOSTNAME
  fi
  echo "=============================="
}

function check_and_backup_all {
  echo ""
  echo "===== RUNNING " $0 " at " `date` " on " `hostname` "====="
  echo ""
  check_ssh_to_remote_host $REMOTE_HOST
  if [ $? -eq 0 ]; then
    echo "-> SSH to $REMOTE_HOST -> success!"
    echo
    check_target_dir_remote_host $REMOTE_HOST $REMOTE_PREFIX
    if [ $? -eq 0 ]; then
      # Consider each filesystem in turn.
      # There's gotta be a foreach in bash somewhere, but what the heck.
      for (( i=0;i<$FSARR_SIZE;i++)); do
        fs=${FSARR[${i}]}
        if [ "$is_volume_backup" = true ]; then
          # Volume Backup
          check_and_backup "$MNT/$fs"
	else
          # Directory Backup
          check_and_backup "$fs"
	fi
      done 
    else
      echo "-> Target volume $REMOTE_PREFIX not mounted and non-existent as a directory on $REMOTE_HOST, abort." >&2
    fi
  else
    echo "-> No SSH to $REMOTE_HOST, abort." >&2
  fi
  echo ""
  echo "===== FINISHED " $0 " at " `date` " on " `hostname` "====="
  echo ""
}

check_and_backup_all