AuthonAuthon Blog
debugging6 min read

Why Your Measurement Tools Might Be Corrupting Your Data

How measurement tools can contaminate the data they collect — lessons from microplastics research applied to software observability and benchmarking.

AW
Alan West
Authon Team
Why Your Measurement Tools Might Be Corrupting Your Data

A University of Michigan study recently made waves on Hacker News. Researchers found that the nitrile and latex gloves scientists wear while counting microplastics in samples were themselves shedding tiny particles — inflating the very counts they were trying to measure. The tools meant to keep the experiment clean were contaminating it.

I read that and immediately thought: I've seen this exact bug in production. Not with gloves and plastic particles, but with monitoring agents, logging frameworks, and data pipelines that quietly corrupt the thing they're supposed to observe.

This is the observer effect applied to data engineering, and it's more common than you'd think.

The Problem: Your Instrumentation Is Part of the Signal

Here's the pattern. You set up a system to measure something — request latency, error rates, user behavior, resource consumption. You trust the numbers. You make decisions based on them. But the measurement apparatus itself is introducing noise, bias, or outright false signals into your data.

A few real examples I've hit:

  • A logging library that allocated so much memory it triggered the garbage collection pauses we were trying to diagnose
  • An APM agent adding 12ms of latency to every request, making our p99 look terrible when the app itself was fine
  • A metrics collector sampling so aggressively it caused CPU spikes that showed up as anomalies in... the CPU metrics

The microplastics researchers had the exact same class of problem. Their protective equipment was shedding particles indistinguishable from the ones they were counting. The instrument became part of the measurement.

Root Cause: Failing to Account for Instrument Overhead

The root cause is almost always the same: we treat our observability layer as zero-cost. We assume the probe doesn't affect the patient. But it does.

Let's look at a concrete example. Say you're benchmarking a function:

python
import time
import logging

logger = logging.getLogger(__name__)

def process_batch(items):
    start = time.perf_counter()
    results = []
    for item in items:
        result = transform(item)
        # This log call takes ~0.5ms each due to synchronous I/O
        logger.debug(f"Transformed item {item.id}: {result.status}")
        results.append(result)
    
    elapsed = time.perf_counter() - start
    # elapsed now includes all that logging overhead
    logger.info(f"Batch processed in {elapsed:.3f}s")
    return results

With 1,000 items, you're adding roughly 500ms of logging overhead to your measurement. Your dashboard says process_batch takes 800ms. The actual computation? 300ms. You just overestimated your processing time by 2.6x because the measurement tool contaminated the result.

This is exactly what happened with the gloves and microplastics — the protective layer was shedding particles that looked like the thing being measured.

Step-by-Step: How to Detect and Fix Measurement Contamination

Step 1: Establish a Baseline Without Instrumentation

The researchers' approach was methodical: measure with and without the suspected contaminant. Do the same thing with your code.

python
import time
import os

# Environment flag to run "clean room" benchmarks
CLEAN_BENCHMARK = os.getenv("CLEAN_BENCHMARK", "false") == "true"

def process_batch(items):
    start = time.perf_counter()
    results = []
    for item in items:
        result = transform(item)
        if not CLEAN_BENCHMARK:
            logger.debug(f"Transformed item {item.id}: {result.status}")
        results.append(result)
    
    elapsed = time.perf_counter() - start
    return results, elapsed

Run your workload both ways. If there's a significant delta, your instrumentation is part of the signal.

Step 2: Quantify the Overhead

Don't just notice it — measure it. You need to know the magnitude.

bash
# Run with instrumentation
for i in $(seq 1 10); do
  CLEAN_BENCHMARK=false python bench.py >> with_instrumentation.txt
done

# Run without instrumentation
for i in $(seq 1 10); do
  CLEAN_BENCHMARK=true python bench.py >> without_instrumentation.txt
done

# Compare the distributions
python -c "
import statistics
with_inst = [float(l) for l in open('with_instrumentation.txt')]
without_inst = [float(l) for l in open('without_instrumentation.txt')]
print(f'With:    mean={statistics.mean(with_inst):.3f}s, stdev={statistics.stdev(with_inst):.4f}')
print(f'Without: mean={statistics.mean(without_inst):.3f}s, stdev={statistics.stdev(without_inst):.4f}')
print(f'Overhead: {statistics.mean(with_inst) - statistics.mean(without_inst):.3f}s')
"

If overhead is less than 1-2% of your measured value, it's probably fine. If it's 10%+, you have a contamination problem.

Step 3: Decouple Collection from the Hot Path

The fix is usually the same: move your measurement out of band.

python
import queue
import threading

# Async log drain — observations go into a buffer,
# a background thread handles the I/O
log_queue = queue.Queue(maxsize=10000)

def _drain_logs():
    while True:
        msg = log_queue.get()
        if msg is None:
            break
        logger.debug(msg)

log_thread = threading.Thread(target=_drain_logs, daemon=True)
log_thread.start()

def process_batch(items):
    start = time.perf_counter()
    results = []
    for item in items:
        result = transform(item)
        # Non-blocking: just enqueue, don't wait for I/O
        try:
            log_queue.put_nowait(f"Transformed {item.id}: {result.status}")
        except queue.Full:
            pass  # drop the log rather than slow the measurement
    
    elapsed = time.perf_counter() - start
    return results, elapsed

The principle: your observation mechanism should never block or significantly slow the thing being observed.

Step 4: Subtract Known Overhead (When You Can't Eliminate It)

Sometimes you can't fully remove the instrumentation from the hot path. In that case, measure the overhead independently and subtract it — which is reportedly what the microplastics researchers recommended for handling glove contamination.

python
def measure_instrumentation_overhead(sample_size=1000):
    """Measure the cost of our logging alone."""
    start = time.perf_counter()
    for i in range(sample_size):
        logger.debug(f"Calibration message {i}")
    return (time.perf_counter() - start) / sample_size

# Use this as a correction factor
per_item_overhead = measure_instrumentation_overhead()

This is essentially running a blank control, same as you'd do in a lab.

Prevention: Building Contamination-Resistant Pipelines

After getting burned by this a few times, here's what I do now on every project:

  • Separate the data plane from the control plane. Metrics collection should never share resources (threads, I/O channels, memory pools) with the workload being measured.
  • Run periodic "clean room" benchmarks. Automated tests that run your critical paths with all instrumentation disabled, so you always have a contamination-free baseline to compare against.
  • Set overhead budgets. We treat observability overhead like a performance budget — if any single probe adds more than 1% overhead, it gets flagged in code review.
  • Use sampling instead of exhaustive collection. You almost never need to log every event. Sample at 1-10% and extrapolate. Less instrumentation means less contamination.
  • Audit your dependencies. That APM library you pulled in? It might be wrapping every HTTP call, adding headers, buffering traces in memory. Read what it actually does.

The Bigger Lesson

What I love about the microplastics study is how universal the lesson is. Whether you're counting plastic particles in ocean water or measuring request latency in a distributed system, the same principle applies: your measurement apparatus is part of the system, and ignoring that leads to wrong conclusions.

The researchers reportedly found that simply wearing gloves during sample processing could inflate microplastic counts. In software, I've seen teams spend weeks optimizing code paths that were only slow because of the profiler attached to them.

Before you trust any metric, ask yourself: how much of this signal is the thing I'm measuring, and how much is the tool I'm measuring it with?

That question alone would've saved me a lot of late nights.

Why Your Measurement Tools Might Be Corrupting Your Data | Authon Blog