six demon bag

Wind, fire, all that kind of thing!

2020-05-23

Shell Patterns (5) - Locking

This is a short series describing some Bash constructs that I frequently use in my scripts.

Sometimes you need to ensure that a script is doing an operation exclusively, to avoid race conditions in case it had been launched several times in parallel. This kind of concurrency control is called mutual exclusion, or mutex for short. In Bash a mutex can be implemented by terminating or suspending execution unless the script is able to create a lock file.

Setting the option noclobber (set -o noclobber or set -C) instructs Bash to fail when trying to overwrite an existing file via redirection operators. Conversely, since opening a file for writing is an atomic operation on most operating systems (avoiding a race condition during file creation) and noclobber prevents replacing an already existing file, this means that if the script is able to create the file no other concurrent script will be able to do the same, thus ensuring exclusivity until the the file is removed. Use a construct like below and you have ensured that the critical section of your code runs exclusively. Remove the lock file after the critical section has completed to release the mutex.

lockfile='/var/lock/foo.lock'
if (set -o noclobber; >"$lockfile"); then
  # critical section of the code
  # no concurrent access to the shared resource used here should
  # occur while running this part of the code
fi
rm -f "$lockfile"

So far, so good. But what if an error occurs and the script terminates before releasing a mutex it had acquired? The consequence of that would be that the script (or at least its critical section) could never run again unless someone manually removes the lock file. Which is very bad if your script is part of an automated task, because nobody wants to get up in the middle of the night just to remove a spurious lock file. Therefore you want to make sure that a mutex always gets released when your script terminates.

Releasing a mutex upon termination can be accomplished by defining a trap that will remove the lock file on exit. Do this after the lock file was successfully created and return a non-zero status code otherwise. Put both operations in a function, and you have reusable code for acquiring a mutex.

acquire_mutex() {
  lockfile='/var/lock/foo.lock'
  if ! (set -o noclobber; >"$lockfile"); then
    return 1
  fi
  trap 'rc=${?:-0}; rm -f '"$lockfile"'; exit $rc' EXIT
  return 0
}

if acquire_mutex; then
  # critical section of the code
fi

If your code has only one critical section the above will usually suffice. However, if you have multiple critical sections, or want to release a mutex early (e.g. because you have a long-running script where most of the code is non-critical), you can define a second function for releasing the mutex.

lockfile='/var/lock/foo.lock'

acquire_mutex() {
  if ! (set -o noclobber; >"$lockfile"); then
    return 1
  fi
  trap 'rc=${?:-0}; rm -f '"$lockfile"'; exit $rc' EXIT
  return 0
}

release_mutex() {
  if [ -n "$(trap -p EXIT)" ]; then
    rm -f "$lockfile"
    trap - EXIT
  fi
}

if acquire_mutex; then
  # critical section of the code
  release_mutex
fi

The condition in release_mutex() is to make sure that a lock file created by another process doesn't get deleted by mistake, even if the function gets called before acquire_mutex().

However, with multiple critical sections in your code another problem arises: you may not want the script to fail immediately if a mutex cannot be acquired, but instead wait for the other process to release the mutex and only fail if that doesn't happen within a certain period of time. This timeout can be implemented for instance with the waitfor() function from part 4 of this series.

# try to acquire mutex every 5 seconds for 3 minutes
if waitfor 3 5 acquire_mutex; then
  # critical section of the code
  release_mutex
else
  # terminate if mutex could not be acquired within 3 minutes
  exit 1
fi

Posted 15:33 [permalink]