You built a flight status dashboard. It looks great. Users love the UI. Then someone tweets a screenshot showing your app says their flight is "on time" while they're literally sitting on a delayed plane at JFK. Cool.
I've been there. Twice. The problem isn't your frontend, your caching layer, or your websocket implementation. It's that real-time airport and flight data is significantly harder to get right than most developers expect.
Let me walk through why this happens and how to build something that actually reflects reality.
The Root Cause: FAA Data Is Messier Than You Think
Most flight tracking projects start by pulling from the FAA's public data sources — things like the Airport Status API or SWIM (System Wide Information Management). The assumption is: government data source = authoritative = accurate.
Not quite.
FAA delay data has a few gotchas that will burn you:
- Ground Delay Programs (GDP) are reported at the program level, not per-flight. Your flight might be delayed 45 minutes due to a GDP, but the API won't tell you that directly.
- Status updates lag by 5-20 minutes depending on the data source and time of day. During peak hours at busy airports, the lag gets worse.
- Cancellations sometimes appear as delays first. A flight might show as "delayed 3 hours" before flipping to "cancelled" — and your app showed stale optimism the whole time.
Step 1: Don't Rely on a Single Data Source
The first fix is triangulating across multiple feeds. Here's a basic architecture I've used:
import asyncio
import aiohttp
from datetime import datetime, timedelta
class FlightStatusAggregator:
def __init__(self):
self.sources = [
FAAAirportStatusSource(),
ADSBExchangeSource(), # ADS-B radio signals from actual aircraft
AirlineAPISource(), # some airlines expose semi-public APIs
]
async def get_delay_info(self, airport_code: str) -> dict:
# fetch from all sources concurrently
tasks = [source.fetch(airport_code) for source in self.sources]
results = await asyncio.gather(*tasks, return_exceptions=True)
# filter out failed sources — don't let one bad API kill your data
valid = [r for r in results if not isinstance(r, Exception)]
if not valid:
return {"status": "unknown", "confidence": 0.0}
return self._reconcile(valid)
def _reconcile(self, reports: list) -> dict:
# take the WORST reported status — optimistic defaults burn users
delay_minutes = max(r.get("delay_minutes", 0) for r in reports)
# confidence drops when sources disagree significantly
spread = max(r.get("delay_minutes", 0) for r in reports) - \
min(r.get("delay_minutes", 0) for r in reports)
confidence = max(0.3, 1.0 - (spread / 120)) # normalize against 2hr spread
return {
"delay_minutes": delay_minutes,
"confidence": round(confidence, 2),
"source_count": len(reports),
"timestamp": datetime.utcnow().isoformat()
}The key insight: always bias toward the worst-case report. Users will forgive you for saying a flight is delayed when it's actually on time. They will not forgive the opposite.
Step 2: Use ADS-B Data as Ground Truth
ADS-B (Automatic Dependent Surveillance-Broadcast) is the radio signal that aircraft transmit with their position, altitude, and speed. Open-source projects like dump1090 and networks like ADS-B Exchange aggregate this from hobbyist receivers worldwide.
This is the closest thing to ground truth you'll get without working at an airline.
async def check_adsb_departure_status(flight_icao: str,
scheduled_departure: datetime) -> dict:
"""Check if a flight has actually departed by looking at ADS-B signals."""
async with aiohttp.ClientSession() as session:
# query an ADS-B aggregator for recent positions
async with session.get(
f"https://your-adsb-source/api/aircraft/{flight_icao}"
) as resp:
data = await resp.json()
if not data.get("positions"):
# no ADS-B signal — plane is likely still on the ground
if datetime.utcnow() > scheduled_departure + timedelta(minutes=15):
return {"status": "likely_delayed", "airborne": False}
return {"status": "waiting", "airborne": False}
latest = data["positions"][-1]
# altitude check — taxiing planes are below ~200ft AGL
if latest["alt_baro"] > 2000:
return {"status": "departed", "airborne": True}
return {"status": "taxiing", "airborne": False}This approach lets you catch a common failure mode: the official API says "departed" but the plane is actually still taxiing. That 20-minute gap matters to people waiting at the arrival gate.
Step 3: Cache Smart, Not Hard
The instinct is to cache aggressively because you're hitting rate-limited APIs. But stale cache is exactly how you end up showing "on time" for a delayed flight.
Here's what actually works:
import time
class AdaptiveTTLCache:
"""Cache that shortens TTL when delays are detected."""
def __init__(self, default_ttl=300): # 5 min default
self.store = {}
self.default_ttl = default_ttl
def get(self, key: str):
if key not in self.store:
return None
entry = self.store[key]
if time.time() > entry["expires_at"]:
del self.store[key]
return None
return entry["value"]
def set(self, key: str, value: dict):
# if there's an active delay, cache for much less time
# delays change fast — a 30min delay can become 2hrs quickly
if value.get("delay_minutes", 0) > 0:
ttl = 60 # 1 minute when delays are active
elif value.get("confidence", 1.0) < 0.7:
ttl = 90 # sources disagree, check again soon
else:
ttl = self.default_ttl
self.store[key] = {
"value": value,
"expires_at": time.time() + ttl
}The idea is simple: when things are normal, cache longer. When things are disrupted, cache shorter. This keeps your API costs reasonable on calm days while staying responsive during weather events when accuracy matters most.
Step 4: Show Your Uncertainty
This is the one most developers skip. If your confidence score is low — say, your sources disagree or you're only getting data from one feed — tell the user.
Don't show a confident green "On Time" badge when you're actually guessing. A simple "Last updated 12 min ago — status may have changed" goes a long way. Users can handle uncertainty. What they can't handle is false confidence.
Prevention: Monitor the Monitors
Set up alerts for when your data sources go stale:
- Track the timestamp of the last successful fetch per source
- Alert if any source hasn't returned fresh data in 2x its normal interval
- Log the disagreement rate between sources — if it spikes, something is wrong
- Monitor your cache hit rate during known disruption events (storms, ATC issues) to make sure your adaptive TTL is actually kicking in
The Bigger Lesson
Real-time data systems are hard not because of the "real-time" part — websockets and streaming are well-solved problems. They're hard because the upstream data is unreliable, inconsistent, and delayed in ways that aren't documented.
The fix is never just "poll faster." It's building a reconciliation layer that treats every data source as potentially wrong, biases toward the worst case for user-facing status, and is honest about its own uncertainty.
That's not just a flight tracking lesson. That's a distributed systems lesson.
