AI and Libraries, Archives, and Museums, Loosely Coupled
A new framework provides a way for cultural heritage institutions to take advantage of the technology with fewer misgivings, and to serve students, scholars, and the public better
by Dan Cohen

The relationship between AI and cultural heritage institutions has gotten off to a rocky start. In one corner are fast-moving tech companies that are gobbling up as much of the content in libraries, archives, and museums as they can get, by any means necessary, blending those institutions’ rigorously selected and validated texts and images into a training set with raw masses of tweets and Reddit threads, and surmising that their technology will replace intellectual production in part or whole in the future. In the other corner are those entrusted with carefully preserving the textual, visual, and audible forms of the human spirit from the past, who are naturally disposed to honoring the long term over the short, and who by profession describe and value idiosyncratic items in their collections rather than seeing them as cells in an undifferentiated blob. You don’t need to be a marriage counselor to see the conflict here.
And yet there are good reasons to bridge the divide between AI and libraries, archives, and museums. AI cannot access most of the materials safeguarded by these institutions, which could be used to counteract AI’s still iffy ability to produce trustworthy responses, as well as to satisfy the desire of students, researchers, and the public who wish to examine and understand art, books, and other cultural artifacts. Cultural heritage institutions could, in turn, use better tools for searching their vast, heterogeneous collections, and AI’s ability to provide good-enough translations and identify details within a work that are absent from general descriptive metadata could make the history of human expression more accessible and discoverable.
Given the baggage of the last few years, since the launch of ChatGPT in 2022, is such a rapprochement possible? I believe it is, but only if we focus less on all-consuming training corpora and multimodal generative outputs as the definition of “AI,” which is hard to do when the media now pays attention to each AI model release like they used to for the unveiling of the latest iPhone. AI is, however, a constellation of approaches and technologies, not just the training-and-generating of the latest large language models, and in the past year, a new way of associating less generative, more research- and learning-oriented aspects of AI with the unique and important materials in cultural heritage institutions has emerged, one that holds the potential for a less extractive, more controlled, and ultimately more fruitful interaction between this new technology and cultural collections.
In November 2024, Anthropic released the Model Context Protocol (MCP), an open framework for connecting AI tools with repositories of documents, data, and software. (Disclosures: Northeastern University, where I work, is one of the launch partners for Claude in Education, and the library I oversee is working with Anthropic; as a historian and author of books, I am also, potentially, a member of the class in a class action lawsuit against Anthropic for their use of purchased and downloaded books in the training of Claude. Life is complicated.) We need not get into the weeds too deeply for the purposes of this piece, but the key point is that MCP enables a more web-like model for AI and the things it operates upon than you may have in mind from the training-and-generating big centralized models. MCP offers a loose rather than tight coupling of resources and software. This decentralization expands greatly upon some other ideas for supplementing AI models with additional materials, such as Retrieval Augmented Generation (RAG), which you may have encountered if the chatbot you are using goes out to the web to find some pertinent pages to chew on as it tries to answer your query. MCP allows any organization to provide access, in specific, configurable ways, to their collections for users of MCP-enabled AI programs. Following Anthropic’s lead, OpenAI, Google, and other AI companies accepted MCP as a sensible, standardized framework, and many open-source packages can now work with it as well.
This is good news for libraries, archives, and museums. There are more complicated ways these organizations can engage with AI, from training their own models using local collections to partnering more closely with myriad vendors of AI tools, but MCP aligns well with the work cultural heritage institutions have already done on the web, from the basic (informational sites and search and discovery tools on top of their collections) to the more advanced (protocols such as the International Image Interoperability Framework (IIIF), which shares and combines collections across institutions, and enables sophisticated forms of visual manipulation by researchers). Moreover, unlike forking over entire collections for model training—in effect ceding all functionality to a central AI model—MCP allows libraries, archives, and museums to maintain control of their digital collections and shape them for external use in whatever formats they wish, within the limits set by the organization or by stakeholders such as artists, authors, or communities.
MCP also opens up a pathway to a more fluid and responsive discovery interface for cultural collections, or multiple interfaces for different audiences, moving beyond the dated Boolean searches still common on our institutional websites. For cultural heritage practitioners who want to serve students, scholars, and the public, and who understand that many of those patrons could use AI for at least some purposes (finding an item in a collection without knowing the right keywords, grasping the meaning of opaque details in a document or artwork), it also opens new avenues for accessing collections for learning, research, and appreciation. At the same time, the cultural heritage institution running its own MCP server can configure precisely what can be accessed and how, for example limiting AI’s access to metadata rather than the entire object. Such a limit would help students find non-hallucinated citations and links to primary sources of interest, but would then encourage them to read or view the full text or high-resolution image themselves, on the website or in the stacks of the institution, rather than having them pulled, summarized, and analyzed entirely by the AI. Because MCP is an open protocol, those who are concerned about the environmental impact of Big AI or who seek complete privacy for their work could use smaller, locally run models that have the same connections to cultural heritage servers. In short, more varied and thoughtful possibilities emerge from this kind of decentralized association.
Adding libraries, archives, and museums to the portfolio of sources available as ground-truth checks during an AI query would sustain the values—and value—of cultural heritage institutions. (In Claude, these are called “integrations”; we are working on a Northeastern University Library integration, but this same MCP server and plug-in should work for any MCP-enabled platform or tool.) Right now chatbots, at best, go out to the web to supplement information already stored in their models; imagine a world in which these bots are instead routed to services and resources provided by society’s trustworthy organizations. We can act as a firmer foundation for a technology that so far has advanced with a probabilistic, fingers-crossed approach that still produces mistakes big and small. Previously, we thought we had to give our collections wholesale to AI as part of mass training, and perhaps through open access initiatives we already have, inadvertently, done a considerable amount of that (while also helping human audiences who are unable to visit our collections in person, a positive good). But we don’t need a marriage between AI and libraries, archives, and museums. We can collaborate at arm’s length and still make considerable progress toward beneficial goals in the public interest.
Additional notes from the summer:
The latest update on our Mellon grant on AI + books is out, from our meeting with many smart people in New York City in July. My thanks to NYU for hosting.
My article “When Information Is Networked,” on the life and ideas of Clifford Lynch and what it has meant to the progress of research and scholarship, posted here in the spring, is now out in the full festschrift for Cliff. There are many other chapters in the volume worth your attention.
We are in the process of relaunching Digital Humanities Now, which I will write about soon, but if you would like a preview, subscribe (for free, as it has been since its inception 16 years ago) to get the latest on the creative use of digital methods and technology in history, art history, literature, philosophy, religion, archeology, and other fields.