Subject Indexing Demo

Total Pages 141
Human Processing Time 2-3 days
IndexerLabs Processing Time 1 Hour
Human Cost $600-$700
IndexerLabs Cost $70.5
Blind test: which index is Human vs IndexerLabs?

Index A

Index B

Notes & Analysis

Notes

  • The book used for this test is Neomania, by Dr. Krist Vaesen, published by Open Book Publishers.
  • No human intervention or editing occurred when generating the index, aside from setting the target page range for generation.
  • The book’s original index was removed before generation to ensure our system could not be influenced by (or inadvertently reproduce) the existing index.
  • The original Neomania index contains no subentries, so we disabled subentry generation to enable a fair, direct comparison. (If you’d like to see an example with subentries, see our other demo for this book with subentries.)

Analysis

While indexing is inherently subjective, two professional indexes of the same manuscript should still show substantial overlap because the underlying text is unchanged. That makes it reasonable to compare our AI-generated index against a professionally produced human index.

A few observations:

  1. There is significant overlap in entries and locators.

    Many entries appear in both indexes with identical phrasing and the same page references, for example: “arXiv,” “Bacon, Francis,” “bullshit jobs,” “academic freedom,” and “Daston, Lorraine.” Overall, the two indexes have approximately 55% exact-match overlap (same heading phrasing and matching locators).

  2. Some differences reflect indexing judgment and specificity.

    One notable example is “coordination.” The human index groups the broader theme under a single entry (“coordination”), whereas our system takes a narrower approach and indexes only the more specific concept (“Peirce-style coordination”).

    Looking closely at the human index locators, page 123 appears under both “coordination” and “Peirce-style coordination.” But the passage on that page is explicitly about Peirce-style coordination:

    ... In this future, science funders incentivize researchers to engage in Peirce-style coordination, and to define and commit to a number of well-selected research programmes. Journals and universities do the same: they too change their incentive structure such that researchers are stimulated to take up research programme work...

    Because the “coordination” mentioned here is clearly the Peirce-style variant, listing page 123 under both headings may be redundant. Because the instances of “coordination” the index points to are overwhelmingly Peirce-style coordination, the separate “coordination” heading doesn’t add much navigational value. In practice, a large share of its locators overlap with “Peirce-style coordination,” making the broader entry function like a close-duplicate rather than a distinct set of occurrences. Some of the remaining locators are also arguably low-value (for example, indexing the acknowledgements), which further weakens the case for keeping the broader heading as a standalone entry.

    In other words, this may not be an error so much as a space-specificity tradeoff (or a difference in indexing style). Much of the remaining 45% difference is that of phrasing such as “Artificial Intelligence (AI)” versus “AI (Artificial Intelligence)”, in addition to various judgement decisions.

Indexing isn’t purely mechanical, and neither is our system. It has a pronounced subjective streak which we suspect is attributable in part to its extensive fine-tuning on hundreds of indexes. Our system prioritizes themes, chooses emphases, and commits to particular interpretations of what the reader will look for. That judgment can produce different tradeoffs than a conservative “include everything” approach, but it also avoids the generic, flattening style that many automated indexes tend to produce.