A University of Michigan study recently made waves on Hacker News. Researchers found that the nitrile and latex gloves scientists wear while counting microplastics in samples were themselves shedding tiny particles — inflating the very counts they were trying to measure. The tools meant to keep the experiment clean were contaminating it.
I read that and immediately thought: I've seen this exact bug in production. Not with gloves and plastic particles, but with monitoring agents, logging frameworks, and data pipelines that quietly corrupt the thing they're supposed to observe.
This is the observer effect applied to data engineering, and it's more common than you'd think.
The Problem: Your Instrumentation Is Part of the Signal
Here's the pattern. You set up a system to measure something — request latency, error rates, user behavior, resource consumption. You trust the numbers. You make decisions based on them. But the measurement apparatus itself is introducing noise, bias, or outright false signals into your data.
A few real examples I've hit:
- A logging library that allocated so much memory it triggered the garbage collection pauses we were trying to diagnose
- An APM agent adding 12ms of latency to every request, making our p99 look terrible when the app itself was fine
- A metrics collector sampling so aggressively it caused CPU spikes that showed up as anomalies in... the CPU metrics
The microplastics researchers had the exact same class of problem. Their protective equipment was shedding particles indistinguishable from the ones they were counting. The instrument became part of the measurement.
Root Cause: Failing to Account for Instrument Overhead
The root cause is almost always the same: we treat our observability layer as zero-cost. We assume the probe doesn't affect the patient. But it does.
Let's look at a concrete example. Say you're benchmarking a function:
import time
import logging
logger = logging.getLogger(__name__)
def process_batch(items):
start = time.perf_counter()
results = []
for item in items:
result = transform(item)
# This log call takes ~0.5ms each due to synchronous I/O
logger.debug(f"Transformed item {item.id}: {result.status}")
results.append(result)
elapsed = time.perf_counter() - start
# elapsed now includes all that logging overhead
logger.info(f"Batch processed in {elapsed:.3f}s")
return resultsWith 1,000 items, you're adding roughly 500ms of logging overhead to your measurement. Your dashboard says process_batch takes 800ms. The actual computation? 300ms. You just overestimated your processing time by 2.6x because the measurement tool contaminated the result.
This is exactly what happened with the gloves and microplastics — the protective layer was shedding particles that looked like the thing being measured.
Step-by-Step: How to Detect and Fix Measurement Contamination
Step 1: Establish a Baseline Without Instrumentation
The researchers' approach was methodical: measure with and without the suspected contaminant. Do the same thing with your code.
import time
import os
# Environment flag to run "clean room" benchmarks
CLEAN_BENCHMARK = os.getenv("CLEAN_BENCHMARK", "false") == "true"
def process_batch(items):
start = time.perf_counter()
results = []
for item in items:
result = transform(item)
if not CLEAN_BENCHMARK:
logger.debug(f"Transformed item {item.id}: {result.status}")
results.append(result)
elapsed = time.perf_counter() - start
return results, elapsedRun your workload both ways. If there's a significant delta, your instrumentation is part of the signal.
Step 2: Quantify the Overhead
Don't just notice it — measure it. You need to know the magnitude.
# Run with instrumentation
for i in $(seq 1 10); do
CLEAN_BENCHMARK=false python bench.py >> with_instrumentation.txt
done
# Run without instrumentation
for i in $(seq 1 10); do
CLEAN_BENCHMARK=true python bench.py >> without_instrumentation.txt
done
# Compare the distributions
python -c "
import statistics
with_inst = [float(l) for l in open('with_instrumentation.txt')]
without_inst = [float(l) for l in open('without_instrumentation.txt')]
print(f'With: mean={statistics.mean(with_inst):.3f}s, stdev={statistics.stdev(with_inst):.4f}')
print(f'Without: mean={statistics.mean(without_inst):.3f}s, stdev={statistics.stdev(without_inst):.4f}')
print(f'Overhead: {statistics.mean(with_inst) - statistics.mean(without_inst):.3f}s')
"If overhead is less than 1-2% of your measured value, it's probably fine. If it's 10%+, you have a contamination problem.
Step 3: Decouple Collection from the Hot Path
The fix is usually the same: move your measurement out of band.
import queue
import threading
# Async log drain — observations go into a buffer,
# a background thread handles the I/O
log_queue = queue.Queue(maxsize=10000)
def _drain_logs():
while True:
msg = log_queue.get()
if msg is None:
break
logger.debug(msg)
log_thread = threading.Thread(target=_drain_logs, daemon=True)
log_thread.start()
def process_batch(items):
start = time.perf_counter()
results = []
for item in items:
result = transform(item)
# Non-blocking: just enqueue, don't wait for I/O
try:
log_queue.put_nowait(f"Transformed {item.id}: {result.status}")
except queue.Full:
pass # drop the log rather than slow the measurement
elapsed = time.perf_counter() - start
return results, elapsedThe principle: your observation mechanism should never block or significantly slow the thing being observed.
Step 4: Subtract Known Overhead (When You Can't Eliminate It)
Sometimes you can't fully remove the instrumentation from the hot path. In that case, measure the overhead independently and subtract it — which is reportedly what the microplastics researchers recommended for handling glove contamination.
def measure_instrumentation_overhead(sample_size=1000):
"""Measure the cost of our logging alone."""
start = time.perf_counter()
for i in range(sample_size):
logger.debug(f"Calibration message {i}")
return (time.perf_counter() - start) / sample_size
# Use this as a correction factor
per_item_overhead = measure_instrumentation_overhead()This is essentially running a blank control, same as you'd do in a lab.
Prevention: Building Contamination-Resistant Pipelines
After getting burned by this a few times, here's what I do now on every project:
- Separate the data plane from the control plane. Metrics collection should never share resources (threads, I/O channels, memory pools) with the workload being measured.
- Run periodic "clean room" benchmarks. Automated tests that run your critical paths with all instrumentation disabled, so you always have a contamination-free baseline to compare against.
- Set overhead budgets. We treat observability overhead like a performance budget — if any single probe adds more than 1% overhead, it gets flagged in code review.
- Use sampling instead of exhaustive collection. You almost never need to log every event. Sample at 1-10% and extrapolate. Less instrumentation means less contamination.
- Audit your dependencies. That APM library you pulled in? It might be wrapping every HTTP call, adding headers, buffering traces in memory. Read what it actually does.
The Bigger Lesson
What I love about the microplastics study is how universal the lesson is. Whether you're counting plastic particles in ocean water or measuring request latency in a distributed system, the same principle applies: your measurement apparatus is part of the system, and ignoring that leads to wrong conclusions.
The researchers reportedly found that simply wearing gloves during sample processing could inflate microplastic counts. In software, I've seen teams spend weeks optimizing code paths that were only slow because of the profiler attached to them.
Before you trust any metric, ask yourself: how much of this signal is the thing I'm measuring, and how much is the tool I'm measuring it with?
That question alone would've saved me a lot of late nights.
