Why Your Barman Backups Keep Failing (And How to Actually Fix It)

So you finally set up Barman to handle your PostgreSQL backups. You followed the docs, configured your server, ran barman check and... a wall of FAILED messages stares back at you. Cool. Very reassuring for your disaster recovery strategy.

I've been through this exact pain on multiple projects. Barman is genuinely excellent backup tooling for PostgreSQL, but the initial setup has several moving parts that all need to work together. Let me walk you through the most common failures and how to systematically fix each one.

The Symptom: `barman check` Looks Like a Crime Scene

Here's what a broken Barman setup typically looks like:

bash

$ barman check mydb
Server mydb:
	PostgreSQL: OK
	is_superuser: OK
	PostgreSQL streaming: FAILED
	WAL archive: FAILED (no WAL file archived yet)
	replication slot: FAILED (slot not found)
	SSH: FAILED
	backup maximum age: FAILED
	compression settings: OK

Four failures. Each one blocks the next. The trick is knowing the correct order to fix them, because they're actually a dependency chain.

Root Cause 1: SSH Isn't Configured Both Ways

This catches everyone. Barman needs passwordless SSH in both directions — from the barman OS user to the postgres OS user on the database host, AND from postgres back to barman. Most people only set up one direction.

bash

# On the Barman host, as the barman user
ssh-keygen -t ed25519 -N '' -f ~/.ssh/id_ed25519
ssh-copy-id postgres@your-db-host

# On the DB host, as the postgres user
ssh-keygen -t ed25519 -N '' -f ~/.ssh/id_ed25519
ssh-copy-id barman@your-barman-host

Verify both directions actually work without a password prompt:

bash

# From barman host
sudo -u barman ssh postgres@your-db-host "echo ok"

# From db host
sudo -u postgres ssh barman@your-barman-host "echo ok"

If either one asks for a password, your backups won't work. Period. Check ~/.ssh/authorized_keys permissions — SSH is picky about this. The .ssh directory needs 700 and the authorized_keys file needs 600.

Root Cause 2: WAL Archiving Isn't Actually Enabled

Barman relies on receiving WAL (Write-Ahead Log) files from PostgreSQL to enable point-in-time recovery. There are two ways to get WAL to Barman, and mixing them up is a classic source of confusion.

Method 1: archive_command (push model)

PostgreSQL pushes WAL files to Barman via SSH. You need to configure this in postgresql.conf:

ini

# postgresql.conf on the database server
archive_mode = on
archive_command = 'barman-archive-wal mydb %p'

# Requires barman-cli package installed on the DB host

The gotcha here: archive_mode requires a full server restart, not just a reload. I've lost an embarrassing amount of time wondering why archive_command wasn't firing, only to realize archive_mode was still off because I only did pg_ctl reload.

Method 2: Streaming via pg_receivewal (pull model)

Barman pulls WAL using PostgreSQL's streaming replication protocol. This is more reliable and my preferred approach. In your Barman server config:

ini

# /etc/barman.d/mydb.conf
[mydb]
description = "Production DB"
conninfo = host=your-db-host user=barman dbname=postgres
streaming_conninfo = host=your-db-host user=streaming_barman
backup_method = postgres
streaming_archiver = on
replication_slot_name = barman

You can actually run both methods simultaneously for redundancy, which is what I do in production. Belt and suspenders.

Root Cause 3: The Replication Slot Doesn't Exist Yet

If you set replication_slot_name in the config (and you should, to prevent WAL files from being recycled before Barman grabs them), you need to explicitly create it:

bash

# Create the replication slot
barman receive-wal --create-slot mydb

# Then start the WAL receiver
barman receive-wal mydb

A warning here: if a replication slot exists but Barman isn't consuming from it, PostgreSQL will keep every WAL file forever. I've seen this fill up a production disk at 3 AM. Not fun. Monitor your replication slot lag.

You can check the slot status from PostgreSQL directly:

sql

SELECT slot_name, active, restart_lsn,
       pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS lag_bytes
FROM pg_replication_slots
WHERE slot_name = 'barman';

Root Cause 4: The Cron Job Is Missing

This is the sneaky one. Barman doesn't run as a daemon. It relies on barman cron being executed regularly — typically every minute — to perform WAL archiving, manage pg_receivewal processes, and enforce retention policies.

bash

# Add to the barman user's crontab
sudo -u barman crontab -e

# Add this line:
* * * * * /usr/bin/barman cron

Without this, pg_receivewal won't start, WAL files won't be processed from the incoming directory, and old backups will pile up ignoring your retention policy. I've audited setups where everything was configured perfectly but nobody added the cron entry. barman check just silently showed failures.

The Fix: A Systematic Checklist

Here's the order I follow every time I set up Barman on a new server:

Install Barman on the backup host and barman-cli on the database host

Set up bidirectional SSH between the barman and postgres users

Configure the PostgreSQL side — archive_mode, WAL level, connection permissions in pg_hba.conf

Create the Barman server config in /etc/barman.d/

Create the replication slot: barman receive-wal --create-slot mydb

Set up the cron job for barman cron

Force a WAL switch to verify the pipeline: barman switch-wal mydb

Run barman check — everything should be green now

Take your first backup: barman backup mydb

Prevention: Don't Wait for Disaster

Once everything is green, set up monitoring. A few things to watch:

Run barman check in your monitoring system — it returns non-zero exit codes on failure, so it plugs into Nagios, Prometheus exporters, or a simple cron-based alerting script
Set a retention policy so old backups get cleaned up automatically:

ini

# In your server config
retention_policy = RECOVERY WINDOW OF 7 DAYS

Test recovery regularly. A backup you've never restored is a backup you don't have. Schedule a monthly test restore to a scratch server:

bash

# Restore latest backup to a temporary location
barman recover mydb latest /tmp/pg_restore_test \
  --remote-ssh-command "ssh postgres@test-host"

Monitor replication slot lag to catch the disk-filling scenario I mentioned earlier

Wrapping Up

Barman's initial setup friction is real, but it's a one-time cost. Once it's running, it's genuinely solid tooling — I've relied on it across multiple production Postgres deployments and it's saved me more than once during actual incidents.

The key insight is that most Barman failures aren't Barman problems. They're SSH permission issues, PostgreSQL configuration oversights, or missing cron entries. Fix the foundation and Barman just works.

If barman check is still showing failures after going through all of this, the official Barman documentation is thorough and well-organized. The barman diagnose command is also your friend — it dumps the full configuration and system state into a format you can paste into a GitHub issue if you're truly stuck.

The Symptom: barman check Looks Like a Crime Scene

Root Cause 1: SSH Isn't Configured Both Ways

Root Cause 2: WAL Archiving Isn't Actually Enabled

Root Cause 3: The Replication Slot Doesn't Exist Yet

Root Cause 4: The Cron Job Is Missing

The Fix: A Systematic Checklist

Prevention: Don't Wait for Disaster

Wrapping Up

The Symptom: `barman check` Looks Like a Crime Scene