Why self-hosted ebook servers choke at 150k books (and how to fix it)

Last month I helped a friend migrate his personal library to a self-hosted server. He'd been collecting ebooks for fifteen years across academic PDFs, fiction, and technical manuals. The total: roughly 150,000 files spread across an old NAS. He picked a popular open-source ebook server, pointed it at the directory, and went to bed.

The next morning, the import was still running. By the third day, the web UI was timing out on every search. By the fifth day, we gave up and started over with a different approach.

If you've ever tried to self-host a library past ~50k books, you know this story. Here's what actually breaks at scale, and what I'd do differently now.

The frustrating problem

Most self-hosted ebook stacks were designed for collections in the hundreds or low thousands. A personal Calibre library, maybe a family's shared reading list. The architecture works fine until you cross some invisible line — usually somewhere between 20k and 80k items — and then everything gets weird.

Symptoms I've seen:

Initial import takes days, sometimes silently failing partway
Web UI loads but searching for a title hangs for 30+ seconds
Metadata refreshes lock the database for hours
Memory usage balloons past available RAM, OOM killer triggers nightly
Adding a single new book triggers a full re-scan

The weird part: CPU and disk look fine. Nothing looks broken. The server just gets slower and slower until it's unusable.

Root cause: it's almost always the database

Most ebook servers ship with SQLite by default. That's a great choice for the 99% case. It's zero-config, file-based, and plenty fast for libraries up to maybe 20k items.

But SQLite was never the bottleneck people think it is — it's how these apps use it. Three problems compound at scale:

1. Full-table scans on search. A lot of ebook apps don't build proper indexes on title, author, or tag columns. When you have 150k rows, a LIKE '%term%' query without an index reads every row. 2. Metadata refresh holds a write lock. SQLite serializes writes. A metadata sweep that touches every book holds the lock for the duration. Everything else (searches, new imports, OPDS feeds) blocks. 3. N+1 queries on listing pages. Showing 50 books per page often means 1 query for the list + 50 queries for cover art + 50 more for tags. Multiply by user count and you've got thousands of queries per page load.

You can verify this yourself. Most apps store their DB at a known path. Open it with the SQLite CLI and run:

sql

-- Check what indexes exist on your books table
SELECT name, sql FROM sqlite_master 
WHERE type = 'index' AND tbl_name = 'books';

-- See what a search actually does
EXPLAIN QUERY PLAN 
SELECT * FROM books WHERE title LIKE '%foundation%';

If you see SCAN books instead of SEARCH books USING INDEX, that's your problem.

Step-by-step fix

Here's the approach that worked for the 150k library. It's not glamorous but it's reliable.

1. Split the library before importing

This is the single biggest win. Instead of one monolithic library, partition by category — fiction, non-fiction, academic, comics, whatever makes sense. Run a separate instance per partition, each with its own database and its own port.

bash

# Directory layout I ended up with
/srv/books/fiction/      # ~60k files, one server instance
/srv/books/academic/     # ~45k files, separate instance  
/srv/books/technical/    # ~25k files, separate instance
/srv/books/comics/       # ~20k files, separate instance

A reverse proxy in front (Caddy or nginx) makes this feel like one site. Each backend stays well under the size where things break.

2. Add the missing indexes manually

If your app doesn't index search columns properly, you can add them yourself. Stop the server first, then:

sql

-- Index the columns you actually search on
CREATE INDEX IF NOT EXISTS idx_books_title 
  ON books(title COLLATE NOCASE);

CREATE INDEX IF NOT EXISTS idx_books_author_sort 
  ON books(author_sort COLLATE NOCASE);

-- For tag/category filtering
CREATE INDEX IF NOT EXISTS idx_books_tags 
  ON books_tags_link(book, tag);

-- Rebuild statistics so the query planner picks the indexes
ANALYZE;

On one library this turned 12-second searches into 80ms searches. The index added maybe 40MB to the DB file.

3. Move full-text search to a dedicated engine

For anything beyond exact-match lookups, SQLite's FTS5 works surprisingly well and is built in. Better still: offload search to something like Meilisearch or Typesense and let your ebook app handle metadata only.

A minimal Meilisearch indexing pipeline looks like:

python

import sqlite3
import meilisearch

client = meilisearch.Client('http://localhost:7700', 'masterKey')
index = client.index('books')

# Pull just the searchable fields out of your existing DB
conn = sqlite3.connect('/srv/books/metadata.db')
rows = conn.execute('''
    SELECT id, title, author_sort, series, tags 
    FROM books
''').fetchall()

# Push to Meilisearch in batches — don't load 150k into memory
batch_size = 1000
for i in range(0, len(rows), batch_size):
    docs = [
        {'id': r[0], 'title': r[1], 'author': r[2], 
         'series': r[3], 'tags': r[4]}
        for r in rows[i:i + batch_size]
    ]
    index.add_documents(docs)

Now your ebook server only handles file delivery and metadata edits. Search hits a service designed for it.

4. Throttle metadata refreshes

Don't let metadata sweeps run unbounded. Schedule them for off-hours and cap concurrency. A cron job that processes a few thousand books at a time is much friendlier than a single all-night job that locks everything.

Prevention: things I'd do from day one

If I were starting over with a large library, I'd plan for scale upfront:

Partition early. Even if you only have 10k books today, set up category-based libraries. Migrating later is painful.
Use OPDS instead of the web UI for clients. OPDS is a paginated XML feed designed for catalogs. It scales much better than rendering thousands of book cards in HTML.
Back up the metadata DB separately from the files. The DB is small and changes constantly. The files are huge and barely change. Different backup cadences make sense.
Monitor query times, not just CPU. A 99th-percentile query time graph will warn you about index regressions before users notice.
Keep covers on a separate volume. Cover thumbnails are 80% of your I/O once the library is large. Putting them on SSD while bulk files stay on spinning disk is a cheap win.

The broader lesson is one I keep relearning: tools optimized for the median case quietly fall apart at the tails. A library of 1,500 books and a library of 150,000 books are different engineering problems, even if the software looks identical. Plan accordingly.