Subject Indexing Demos

Subject Indexing Demo

Oxford History of the French Revolution

Total Pages 425

Human Processing Time 7-8 days

IndexerLabs Processing Time 2 Hours

Human Cost $2,500-$4,000

IndexerLabs Cost $99

Note: The IndexerLabs generated index was produced entirely by our automated pipeline. No human editing or manual correction was performed on the output before comparison.

One of the hardest parts of evaluating AI-generated subject indexes is that there is no single obvious benchmark. Even two skilled human indexers can produce different indexes for the same book while both remaining professionally defensible. They may choose different phrasings, emphasize different themes, split or merge topics differently, or select and phrase entries in different ways.

That makes direct comparison for subject indexing harder to evaluate than tasks where there is a single correct output. Still, comparison is possible, and we think it should be done as rigorously as possible.

For one recent backtest, we ran our subject-indexing pipeline on The Oxford History of the French Revolution, a 425-page scholarly book with an existing professional printed index. Our goal was not to ask whether our system reproduced the printed index word-for-word, but whether it converged on a substantial portion of the same subjects, at a comparable length, with accurate locators. In other words, we wanted to see if our AI indexing system produced an index equivalent to a professional.

Full Index Comparison

We compared the generated index and the printed human index in several ways.

First, we compared and controlled for overall top-level entry count and overall index length, since a much longer index can gain an artificial advantage simply by including more access points.

Second, we compared entry overlap. Some matches were exact string matches, while others required manual semantic adjudication. For example, slightly differently worded entries may still clearly refer to the same person or topic, such as "William V, Stadtholder of Orange" (professional index) and "William V (Prince of Orange)" (our index).

Third, within matched entries, we compared locator overlap to see how often the generated index pointed to the same pages as the printed index.

In this backtest, our generated index produced 1,058 top-level entries, compared with 1,066 in the printed human index. Of those 1,058 entries, 608 matched entries in the human index, counting both exact matches and manually adjudicated semantic equivalents.

Comparing only top level entries for a moment, one can highlight the overlap in entries between the two indexes in green, as shown below:

Marked Overlap Comparison

We believe that this strongly suggests that there is substantial agreement between the generated and human indexes, and the observed overlap appears to be far greater than would be expected by chance.

We discuss this substantial overlap in greater detail in our blog post.

Subject Indexing Demo

Neomania

View source book (Open Book Publishers)

Total Pages 141

Human Processing Time 2-3 days

IndexerLabs Processing Time 1 Hour

Human Cost $600-$700

IndexerLabs Cost $99

Notes & Analysis

Notes

The book used for this test is Neomania, by Dr. Krist Vaesen, published by Open Book Publishers.
No human intervention or editing occurred when generating the index, aside from setting the target page range for generation.
The book's original index was removed before generation to ensure our system could not be influenced by (or inadvertently reproduce) the existing index.
The original Neomania index contains no subentries, so we disabled subentry generation to enable a fair, direct comparison. If you would like to see an example with subentries, see our other demo for this book with subentries.

Analysis

While indexing is inherently subjective, two professional indexes of the same manuscript should still show substantial overlap because the underlying text is unchanged. That makes it reasonable to compare our AI-generated index against a professionally produced human index.

A few observations:

There is significant overlap in entries and locators.
Many entries appear in both indexes with identical phrasing and the same page references, for example: "arXiv," "Bacon, Francis," "bullshit jobs," "academic freedom," and "Daston, Lorraine." Overall, the two indexes have approximately 55% exact-match overlap (same heading phrasing and matching locators).
Some differences reflect indexing judgment and specificity.
One notable example is "coordination." The human index groups the broader theme under a single entry ("coordination"), whereas our system takes a narrower approach and indexes only the more specific concept ("Peirce-style coordination").

Looking closely at the human index locators, page 123 appears under both "coordination" and "Peirce-style coordination." But the passage on that page is explicitly about Peirce-style coordination:

... In this future, science funders incentivize researchers to engage in Peirce-style coordination, and to define and commit to a number of well-selected research programmes. Journals and universities do the same: they too change their incentive structure such that researchers are stimulated to take up research programme work...

Because the "coordination" mentioned here is clearly the Peirce-style variant, listing page 123 under both headings may be redundant. Because the instances of "coordination" the index points to are overwhelmingly Peirce-style coordination, the separate "coordination" heading does not add much navigational value. In practice, a large share of its locators overlap with "Peirce-style coordination," making the broader entry function like a close-duplicate rather than a distinct set of occurrences.

In other words, this may not be an error so much as a space-specificity tradeoff (or a difference in indexing style). Much of the remaining 45% difference is that of phrasing such as "Artificial Intelligence (AI)" versus "AI (Artificial Intelligence)", in addition to various judgement decisions.

Indexing is not purely mechanical, and neither is our system. It has a pronounced subjective streak which we suspect is attributable in part to its extensive fine-tuning on hundreds of indexes. Our system prioritizes themes, chooses emphases, and commits to particular interpretations of what the reader will look for. That judgment can produce different tradeoffs than a conservative "include everything" approach, but it also avoids the generic, flattening style that many automated indexes tend to produce.

Subject Indexing Demos

Oxford History of the French Revolution

Full Index Comparison

Original Professional Index

IndexerLabs Generated Index

Marked Overlap Comparison

Original Index (Marked Overlap)

IndexerLabs Index (Marked Overlap)

Neomania

IndexerLabs Generated Index

Original Human Index

Notes & Analysis

Notes

Analysis

Cheap Print and Street Literature of the Long Eighteenth Century

IndexerLabs Generated Index

Original Human Index