The Index and the Vector


            
        October 20, 2025
    
    
The Index and the Vector
Converting ambiguity into precision can help a broader audience discover and learn from collections

                
by Dan Cohen
        
    
        J. M. W. Turner, “Port Ruysdael,” exhibited 1827, Yale Center for British Art
This is the third part in a series that explores what a beneficial relationship might be between the collections of libraries, archives, and museums and the technologies associated with artificial intelligence — focusing not on AI training and generation, but on forging better pathways to human expression. If you are new to this newsletter, it might be helpful to rewind to the first two pieces, “AI and Libraries, Archives, and Museums, Loosely Coupled” and “The Library’s New Entryway.”
When I began teaching history, I was fortunate to be at a university with two marvelous art museums, and I regularly took my students to those museums to view works of art from the period we were studying. It swiftly became clear that while the students could describe and analyze the texts we were reading, they had a much harder time characterizing and explaining what was happening in the art. Much of the vocabulary they needed to engage with a particular book was right there on the page; for paintings or sculpture, they simply lacked the expressive words to capture the artwork fully. I encouraged them not to worry about abstract art concepts but to describe as richly as they could what they actually saw in front of them, to think aloud.
The result was a jumble of words, often rather prosaic, in pursuit of the style and purpose of the art. We would refine this improvised lexicon when we returned to the classroom, exchanging fifty-cent words for greater currency. Paintings that appeared “woodsy” and “mythical” with “long-haired women,” for instance, crystallized into the “Pre-Raphaelite” movement and its associated historical context.
My students, like all novice learners of a subject or medium, lacked specialized vocabulary, and they would have had similar problems trying to describe and locate works of art in a digital or physical collection, due to the precise composition of finding aids. Cultural heritage professionals helpfully provide detailed descriptive metadata for individual items, and we have sophisticated indexes of aggregate collections against which to check search terms. But few first-year students walk into a museum, or visit a museum’s website, knowing about the Pre-Raphaelite Brotherhood and exact terms for its visual and conceptual characteristics. Indeed, many audiences for cultural works don’t have the right words at hand — the names of artists, art movements, places (real or imagined), or time periods — for what they seek. They have lay descriptors instead, like “woodsy.”
Fortunately, at the heart of large language models is an alchemical technology that is extremely good at transforming leaden descriptions into golden terms that unlock discovery and understanding. Instead of representing human expression through a rigid set of terms and metadata, assembled in an index, vectors inside LLMs represent an array of words and works mathematically in a multidimensional space. Within that opaque numerical domain, vectors can be compared with each other for proximity, which can convert ambiguity into precision. For example, the vector for the word “boxer” will be almost identical to the word “pugilist”; a word like “fighter” might also be close but slightly more distant; and many words, like “gloves” or “ring” or “bell” might have partial quantitative proximity to “boxer” while having similar relational scores to many other words. Together, however, “gloves,” “ring,” and “bell” will add up to a vector that is highly correlated with “boxing.”
For this reason, an AI chatbot has an easier time than an index at digesting mealy-mouthed queries that begin with phrases such as, “You know that art that often has a long-haired woman in the woods and it’s got mythical vibes…” The standard museum or library index, sitting behind the search box, will desperately seek to parse this phrase for words it can match against its keyword list, and will likely fail at this task; the LLM instead tosses all of the words into its vector space, and together they can invoke the implicit, rather than explicit, meaning of the search query itself.
It is this understanding of ambiguous words and goals — which can be refined further in the iterative process of conversation — that can serve as a helpful interface for cultural collections, which by their very nature are large, complex, and heterogeneous. Robust indexes of such collections may have parallel descriptions of items, including synonyms and variant spellings, but vectors are far more flexible at figuring out exactly which topics and creators a constellation of words might be orbiting around. That’s an excellent first step in pointing a novice toward the forms of human expression they seek, and setting them on the pathway to expertise.
Speaking of art, since the last newsletter, on our MCP initiative at the Northeastern University Library, I have begun adding art museum MCPs to my custom rig in Claude, including connectors to the collections of the Met and the Art Institute of Chicago. With the article databases our team has attached to Claude through our MCP server, my test setup has reached a level of comprehensiveness that enables me to turn off web retrieval (RAG) entirely, and I now can rely solely on library- and museum-augmented responses. The results are very promising. When I ask about “Cubism,” for instance, instead of web-based regurgitation, Claude returns a good array of articles on Cubism, as well as representative artwork that has been digitized by museums and related curatorial text.
    

                Read more:
            
        
                                        August 18, 2025
                                    
                                
                                AI and Libraries, Archives, and Museums, Loosely Coupled
                                
                                    A new framework provides a way for cultural heritage institutions to take advantage of the technology with fewer misgivings, and to serve students, scholars, and the public better
                                
                                
                                    Read article →
                                
                            
                                        October 10, 2025
                                    
                                
                                The Library’s New Entryway
                                
                                    An interface that combines the advantages of the traditional index with the power of LLMs is the path forward
                                
                                
                                    Read article →
                                
                            
                            Don't miss what's next. Subscribe to Humane Ingenuity:
                        
                    
            Email address (required)