Introducing IndexLM-1.0: Democratizing Access to Affordable, High Quality Indexes

TL;DR: We trained an AI on 1,000+ publicly available back-of-the-book indexes. The resulting model, IndexLM-1.0, consistently outperforms general-purpose LLMs on subject indexing and produces accurate, elegant, publication-ready indexes comparable to professional work.

We’re launching our subject indexing service, powered by IndexLM-1.0.

IndexLM-1.0 is a fine-tuned variant of Mistral 3 Large. It was trained on a dataset of 1,000+ real world-indexes from the open web totaling ~250 million tokens of training data. In our evaluations, IndexLM-1.0 measurably outperforms general-purpose LLMs at back-of-the-book indexing and produces structurally consistent indexes that align closely with the work of experienced human indexers in both coverage and style.

Ideal overlap ratio (50-75%)

Random pruning baseline

IndexLM-1.0

GPT-5.4-xhigh

Claude-Opus-4.6

Gemini-3.1-Pro

Read the full breakdown of our backtest of The Oxford History of the French Revolution.

Mainstream LLMs are highly capable across many tasks, but they’re trained primarily for broad conversation and general reasoning and not for the craft of professional subject indexing.

Some of that gap can be bridged with careful prompting and scaffolding, but we’ve found that excellent indexing depends on hundreds of subjective judgment calls—decisions about what to include, what to omit, how to name concepts, and how to structure subentries. You can’t reliably compress that “taste” into a prompt any more than a human can become a veteran indexer by reading a single book on indexing.

Our approach is to teach that judgment directly by training on many high-quality examples.

Although the internet likely contains millions of indexes embedded in scanned books, we started with a curated sample of ~1,000 that are publicly available across the open web. From our own tests we’ve found a substantial convergence of coverage and structure between the indexes our systems produce with that of their human counterparts for the same input text.

In our own tests, IndexLM-1 shows substantial convergence with human-made indexes for the same input text, not because there’s one “correct” index, but because good indexes for the same text tend to make similar decisions about metatopics, entry phrasing, and structure.

You can view a demo of such overlap here: indexerlabs.com/neomania-demo

We will continue to add demos to our website over the following weeks.

We’re also able to exert control over the output length of the indexes our model produce. Given an input document and a page or entry budget, our systems are able to get fairly close (within ~5-10%) to the requested length.

Indexing is often expensive and slow. Our goal is to democratize access to professional-quality indexes by automating the workflow end-to-end without compromising on quality.

Get started today or contact support@indexerlabs.com for any questions!