AuthonAuthon Blog
debugging7 min read

How to Migrate Between Cloud Providers Without Losing Your Mind (or Data)

A step-by-step guide to migrating your production infrastructure between cloud providers without downtime or data loss.

AW
Alan West
Authon Team
How to Migrate Between Cloud Providers Without Losing Your Mind (or Data)

You finally made the decision. Maybe your cloud bill doubled overnight. Maybe the provider changed their pricing model and your side project now costs more than your car payment. Whatever the reason, you're staring at a dozen VMs, three managed databases, and a DNS setup you configured two years ago at 2am — and you need to move all of it.

I recently went through this exact exercise, migrating a production stack from one major cloud provider to another. Here's what I learned about doing it without downtime disasters or data loss.

The Real Problem Isn't Moving Files

When most people think "cloud migration," they picture copying files between servers. That's maybe 10% of the actual work. The real challenges are:

  • Stateful services — databases, queues, persistent volumes
  • DNS propagation — the silent killer of migrations
  • TLS certificates — because nothing says "fun" like expired certs at 3am
  • Service discovery — everything that references hardcoded IPs or provider-specific endpoints
  • Firewall rules — that one port you opened six months ago and forgot about

Let me walk through how to handle each one.

Step 1: Audit Everything First

Before you touch anything, document what you actually have running. I wrote a quick script to inventory my infrastructure:

bash
#!/bin/bash
# inventory.sh — snapshot your current infrastructure state

echo "=== Running Services ==="
systemctl list-units --type=service --state=running

echo "=== Listening Ports ==="
ss -tlnp

echo "=== Cron Jobs ==="
for user in $(cut -f1 -d: /etc/passwd); do
  crontab -l -u "$user" 2>/dev/null | grep -v '^#' | while read -r line; do
    echo "$user: $line"
  done
done

echo "=== Disk Usage ==="
df -h

echo "=== Docker Containers ==="
docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Ports}}' 2>/dev/null

Run this on every server. Save the output. You'll thank yourself later when you're wondering why the new setup is missing that one background worker process.

Step 2: Make Your Infrastructure Reproducible

If your servers were set up manually (no judgment — we've all been there), now is the time to codify everything. You don't need a full Terraform setup, though that's ideal. At minimum, write a provisioning script:

bash
#!/bin/bash
# provision.sh — reproducible server setup
set -euo pipefail

# System updates
apt update && apt upgrade -y

# Essential packages
apt install -y \
  nginx \
  certbot python3-certbot-nginx \
  postgresql-client \
  docker.io docker-compose-plugin \
  fail2ban \
  ufw

# Firewall — only open what you actually need
ufw default deny incoming
ufw default allow outgoing
ufw allow 22/tcp   # SSH
ufw allow 80/tcp   # HTTP (for cert challenges)
ufw allow 443/tcp  # HTTPS
ufw --force enable

# Harden SSH — disable password auth
sed -i 's/#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config
sed -i 's/PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config
systemctl restart sshd

# Set up Docker permissions
usermod -aG docker deploy

The key insight: write this script against your current setup, test it on a fresh VM at your new provider, and fix every gap before you even think about migrating data.

Step 3: Database Migration (The Scary Part)

This is where most migrations go sideways. Here's the approach that's worked reliably for me:

For PostgreSQL:
bash
# On the source server — create a consistent dump
pg_dump -Fc -Z 6 --no-owner --no-acl \
  -h localhost -U appuser -d myapp_production \
  > myapp_$(date +%Y%m%d_%H%M%S).dump

# Transfer to new server (use compression — databases are surprisingly large)
rsync -avz --progress myapp_*.dump deploy@new-server:/tmp/

# On the destination server — restore
createdb -U postgres myapp_production
pg_restore -U postgres -d myapp_production \
  --no-owner --no-acl --jobs=4 /tmp/myapp_*.dump

# Verify row counts on critical tables
psql -U postgres -d myapp_production -c "
SELECT schemaname, relname, n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC
LIMIT 20;
"

That --jobs=4 flag on pg_restore parallelizes the restore across four cores. On a database with lots of independent tables, this cuts restore time dramatically.

Critical tip: Don't just dump and restore once. Practice the entire cycle at least twice. Time it. My first attempt took 45 minutes; after optimizing the dump format and restore parallelism, I got it down to 12.

Step 4: The DNS Dance

DNS propagation is the reason zero-downtime migrations are hard. Here's the strategy:

  • Before migration day: Lower your TTL to 60 seconds (do this 48 hours ahead — old TTL values are cached)
  • During migration: Run both old and new servers simultaneously
  • Switch DNS: Point records to new IPs
  • Wait: Keep the old server running for at least 48 hours after the switch
  • After confirming: Raise TTL back to something reasonable (3600-86400)
  • bash
    # Check current TTL and propagation status
    dig +short myapp.example.com
    dig +trace myapp.example.com | tail -5
    
    # Verify the new server is responding correctly before switching
    curl -H "Host: myapp.example.com" https://NEW_IP/ --resolve myapp.example.com:443:NEW_IP

    That curl --resolve trick is gold. It lets you test your new server with the real hostname before DNS propagates. Catches TLS issues, routing problems, everything.

    Step 5: The Actual Cutover

    Here's my cutover checklist, in order:

  • Put the app in maintenance mode on the old server
  • Take a final database dump (you need consistency, not speed here)
  • Transfer and restore the final dump to the new server
  • Run your verification queries — row counts, checksums on critical tables
  • Start the app on the new server
  • Test everything via curl --resolve before touching DNS
  • Switch DNS records
  • Monitor error rates for the next hour
  • Keep the old server alive for 48 hours as a safety net
  • The total downtime with this approach? For my setup it was about 15 minutes — mostly the final database dump and restore.

    Things That Will Bite You

    A few gotchas I ran into that aren't obvious:

    • Let's Encrypt rate limits — if you're issuing new certs on the new server, remember there are rate limits per domain. Use the staging environment for testing, then switch to production when you're ready.
    • Outbound email — new server IPs might not have established reputation. If you send transactional email directly, your deliverability might tank. Use a dedicated email service instead.
    • Timezone differences — check that your new server's timezone matches your old one, or better yet, standardize on UTC everywhere.
    • Kernel parameters — things like net.core.somaxconn or vm.swappiness that you tuned on the old server. These don't transfer with your application.

    Prevention: Make the Next Migration Easier

    The best thing I did during this migration was containerize everything. My next move (whenever that happens) will be dramatically simpler because:

    • Docker Compose defines the entire application stack
    • Environment variables handle all provider-specific configuration
    • Automated backups are scripted and provider-agnostic
    • Infrastructure as Code means I can spin up the whole stack in minutes

    The less your application knows about where it's running, the easier it is to move. Avoid provider-specific services where a portable alternative exists — use standard PostgreSQL instead of a proprietary managed database, use S3-compatible object storage with an abstraction layer, keep your DNS in a provider that isn't your hosting provider.

    Migrations are never fun, but they don't have to be terrifying. Document what you have, automate what you can, practice before you go live, and keep the old server running longer than you think you need to. Your future self will appreciate it.

    How to Migrate Between Cloud Providers Without Losing Your Mind (or Data) | Authon Blog