So you finally set up Barman to handle your PostgreSQL backups. You followed the docs, configured your server, ran barman check and... a wall of FAILED messages stares back at you. Cool. Very reassuring for your disaster recovery strategy.
I've been through this exact pain on multiple projects. Barman is genuinely excellent backup tooling for PostgreSQL, but the initial setup has several moving parts that all need to work together. Let me walk you through the most common failures and how to systematically fix each one.
The Symptom: barman check Looks Like a Crime Scene
Here's what a broken Barman setup typically looks like:
$ barman check mydb
Server mydb:
PostgreSQL: OK
is_superuser: OK
PostgreSQL streaming: FAILED
WAL archive: FAILED (no WAL file archived yet)
replication slot: FAILED (slot not found)
SSH: FAILED
backup maximum age: FAILED
compression settings: OKFour failures. Each one blocks the next. The trick is knowing the correct order to fix them, because they're actually a dependency chain.
Root Cause 1: SSH Isn't Configured Both Ways
This catches everyone. Barman needs passwordless SSH in both directions — from the barman OS user to the postgres OS user on the database host, AND from postgres back to barman. Most people only set up one direction.
# On the Barman host, as the barman user
ssh-keygen -t ed25519 -N '' -f ~/.ssh/id_ed25519
ssh-copy-id postgres@your-db-host
# On the DB host, as the postgres user
ssh-keygen -t ed25519 -N '' -f ~/.ssh/id_ed25519
ssh-copy-id barman@your-barman-hostVerify both directions actually work without a password prompt:
# From barman host
sudo -u barman ssh postgres@your-db-host "echo ok"
# From db host
sudo -u postgres ssh barman@your-barman-host "echo ok"If either one asks for a password, your backups won't work. Period. Check ~/.ssh/authorized_keys permissions — SSH is picky about this. The .ssh directory needs 700 and the authorized_keys file needs 600.
Root Cause 2: WAL Archiving Isn't Actually Enabled
Barman relies on receiving WAL (Write-Ahead Log) files from PostgreSQL to enable point-in-time recovery. There are two ways to get WAL to Barman, and mixing them up is a classic source of confusion.
Method 1: archive_command (push model)PostgreSQL pushes WAL files to Barman via SSH. You need to configure this in postgresql.conf:
# postgresql.conf on the database server
archive_mode = on
archive_command = 'barman-archive-wal mydb %p'
# Requires barman-cli package installed on the DB hostThe gotcha here: archive_mode requires a full server restart, not just a reload. I've lost an embarrassing amount of time wondering why archive_command wasn't firing, only to realize archive_mode was still off because I only did pg_ctl reload.
Barman pulls WAL using PostgreSQL's streaming replication protocol. This is more reliable and my preferred approach. In your Barman server config:
# /etc/barman.d/mydb.conf
[mydb]
description = "Production DB"
conninfo = host=your-db-host user=barman dbname=postgres
streaming_conninfo = host=your-db-host user=streaming_barman
backup_method = postgres
streaming_archiver = on
replication_slot_name = barmanYou can actually run both methods simultaneously for redundancy, which is what I do in production. Belt and suspenders.
Root Cause 3: The Replication Slot Doesn't Exist Yet
If you set replication_slot_name in the config (and you should, to prevent WAL files from being recycled before Barman grabs them), you need to explicitly create it:
# Create the replication slot
barman receive-wal --create-slot mydb
# Then start the WAL receiver
barman receive-wal mydbA warning here: if a replication slot exists but Barman isn't consuming from it, PostgreSQL will keep every WAL file forever. I've seen this fill up a production disk at 3 AM. Not fun. Monitor your replication slot lag.
You can check the slot status from PostgreSQL directly:
SELECT slot_name, active, restart_lsn,
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS lag_bytes
FROM pg_replication_slots
WHERE slot_name = 'barman';Root Cause 4: The Cron Job Is Missing
This is the sneaky one. Barman doesn't run as a daemon. It relies on barman cron being executed regularly — typically every minute — to perform WAL archiving, manage pg_receivewal processes, and enforce retention policies.
# Add to the barman user's crontab
sudo -u barman crontab -e
# Add this line:
* * * * * /usr/bin/barman cronWithout this, pg_receivewal won't start, WAL files won't be processed from the incoming directory, and old backups will pile up ignoring your retention policy. I've audited setups where everything was configured perfectly but nobody added the cron entry. barman check just silently showed failures.
The Fix: A Systematic Checklist
Here's the order I follow every time I set up Barman on a new server:
barman and postgres usersarchive_mode, WAL level, connection permissions in pg_hba.conf/etc/barman.d/barman receive-wal --create-slot mydbbarman cronbarman switch-wal mydbbarman check — everything should be green nowbarman backup mydbPrevention: Don't Wait for Disaster
Once everything is green, set up monitoring. A few things to watch:
- Run
barman checkin your monitoring system — it returns non-zero exit codes on failure, so it plugs into Nagios, Prometheus exporters, or a simple cron-based alerting script - Set a retention policy so old backups get cleaned up automatically:
# In your server config
retention_policy = RECOVERY WINDOW OF 7 DAYS- Test recovery regularly. A backup you've never restored is a backup you don't have. Schedule a monthly test restore to a scratch server:
# Restore latest backup to a temporary location
barman recover mydb latest /tmp/pg_restore_test \
--remote-ssh-command "ssh postgres@test-host"- Monitor replication slot lag to catch the disk-filling scenario I mentioned earlier
Wrapping Up
Barman's initial setup friction is real, but it's a one-time cost. Once it's running, it's genuinely solid tooling — I've relied on it across multiple production Postgres deployments and it's saved me more than once during actual incidents.
The key insight is that most Barman failures aren't Barman problems. They're SSH permission issues, PostgreSQL configuration oversights, or missing cron entries. Fix the foundation and Barman just works.
If barman check is still showing failures after going through all of this, the official Barman documentation is thorough and well-organized. The barman diagnose command is also your friend — it dumps the full configuration and system state into a format you can paste into a GitHub issue if you're truly stuck.
