August 2021

Advisory Committee on Historical Diplomatic Documentation August 30, 2021

Note: The minutes from this meeting will be posted after they have been approved at the next meeting. In the meantime, a video edition and transcript of the lecture presented by Dr. Joseph Wicentowski during the public session is provided here. An essay on this topic can be found in H-Diplo Forum 2021-2, “Scholars and Digital Archives: Living the Dream?” (6 October 2021).

Report on the Origins and Evolution of the Foreign Relations Digital Edition (Video and transcript)

Welcome everyone, and thank you for attending this presentation. I’m pleased to see so many here who have contributed to the initiatives I’ll be discussing, and I’d like to thank Richard Immerman and all of the members of the HAC for your support of these efforts.

Today’s presentation has five parts. I’ll begin by tracing the origins of today’s FRUS digital edition by examining several earlier efforts. Then I’ll describe the goals we had and the hurdles we faced in designing and creating history.state.gov and its new interface to FRUS in 2007–2008. Then I’ll describe the launch of history.state.gov in 2009, followed by our expansion of FRUS coverage and other improvements, notably search, and I’ll discuss some future improvements that we’re exploring. Finally I’ll show how to access the open data repositories that contain all of the Office’s publications and datasets and discuss the possibilities this opens up for research that go beyond the capabilities of our website.

Before the launch of history.state.gov in 2009, the Office had carried out several early online publishing initiatives. These began in the mid-1990s, with DOSFAN, the Department of State Foreign Affairs Network collection, hosted by the University of Illinois at Chicago. This effort was followed by two major incarnations of state.gov in the late 1990s and early aughts. Taken together, these 3 efforts put 88 volumes online covering the post-WWII era but centering on the Kennedy, Johnson, Nixon, and Ford administrations. Separate from the Office’s efforts, in the early aughts the University of Wisconsin Digital Collections Center carried out an ambitious project to offer access to earlier volumes in the series. The University scanned 375 volumes covering the series’ first 99 years in print. These were the resources that the Office reviewed in 2007 when we began planning the FRUS digital edition that we see today on history.state.gov. Let’s take a look at each of these sites and note their evolution and distinct qualities.

This is the first FRUS volume posted online to DOSFAN: the Eisenhower Eastern Europe volume, published in 1993. A second was published in 1995. The DOSFAN volumes established a model for publishing FRUS online that endured for over a decade. It presented FRUS documents, a chapter a time, on long webpages, using very minimal markup. Chapter headings, document titles, body text, and footnotes were all encoded as paragraphs. Footnotes received special treatment. They were demarcated by a pair of slash symbols signaling a footnote reference inside the text. Footnote text appeared in the following paragraph, using the same double slash symbol. This presentation was simple, but effective—a reasonable use of 1990s-era HTML. This site lacked search capabilities, but its goal was clearly to get the full text of the volumes online, for free, public access.

The Office’s first homepage on the Department’s new state.gov website in 1997 followed the DOSFAN model of presenting a chapters’ worth of documents on a single webpage. But it made two improvements in the experience of reading FRUS documents online: (1) Bold text for document headings and centered titles; and (2) A distinct, blue appearance for footnotes to set them off from the body text of documents. My colleagues who applied the HTML coding to turn footnotes blue attest that it was a manual and very time consuming process. The text of 28 volumes were released by the Office on this incarnation of state.gov. While the site had a search engine, the search results could not be limited to FRUS, so they often included many other resources.

With the launch of the second incarnation of the Office’s homepage on the state.gov website in 2001, the Office made two more improvements on the previous model:

First, with the advent of the PDF format, which preserved the page layout of the printed edition, the Office could post chapters and entire volumes as downloadable PDF. But the Office didn’t cease publishing the text of printed volumes on chapter-length webpages. Plain old webpages were (and remain) faster to load than PDFs, and offered many other advantages. A PDF the length of a typical FRUS volume—about a thousand pages—was difficult to navigate on a computer, especially without internal links, or bookmarks. The text in a PDF had fixed dimensions, so users with small screens might struggle to read the text at a natural size, whereas web pages would easily reflow to the user’s screen size and desired font size. Also, users of search engines like Google had a better chance of finding FRUS documents released as web pages because search engines would typically only index the first few pages of PDFs but would always fully index web pages.

The second innovation was also related to the PDF format: a new type of publication known in the Office as the “e-pub,” or the “electronic-only publication.” E-pubs allowed visitors to download PDFs of scanned images of the original archival document for offline viewing. In addition, a transcribed copy of the text was also posted for convenient search and for accessibility reasons—to allow visitors with visual disabilities to use screen reader technology to read the documents. These volumes were intended to be electronic-only and never bound into books.

In many ways, e-pubs were an internet-age counterpart to the microfiche supplements that the Office published in the 1980s and 1990s; they offered the reader a facsimile view of original archival document—but at a cost. The technological limitations of both microfiche and e-pub formats prevented editors from using footnotes to place contextual information at the point of reference. Instead, editors were limited to writing a brief summary that was prepended to the document. Also, like microfiche, the hope was that e-pubs could allow the Office to release more documents more quickly and cheaply than printed books. However, the labor associated with the e-pubs—scanning, transcribing, and uploading per-document PDFs and transcriptions, all performed in house—was substantial. Furthermore, every document had to undergo the same declassification procedures as printed documents, so the savings in time and resources was somewhat illusory. Nonetheless, the e-pub experiment produced 10 uniquely accessible volumes—contributing to a total of 58 volumes—on this second incarnation of the Office’s state.gov website. In terms of search, this version of state.gov’s search engine remained unable to limit searches to just FRUS documents.

Separate from the Office’s initiatives, the University of Wisconsin Digital Collections Center took a completely different and very ambitious approach to presenting FRUS online. The University had selected FRUS as its first experiment in mass digitization, and over 5 years, from 2003–2008, collected 375 volumes from their holdings and partner libraries, sheared off the bindings of each book, scanned nearly 400,000 pages at high resolution, and created an interface for browsing this material, which put these scanned images of the printed page at the center of the online edition. This edition also allowed visitors to search within the series.

But because the University used optical character recognition, or OCR, to convert the scanned images into text for the search engine, the text was subject to typos, which could mean that a search for “Mr. McMath” or “Mr. Hunter” or “Empire” would miss documents that contained these phrases. In practice, such typos were not a problem for terms that appeared multiple times in a short span, since the OCR engine might correctly read one of the other instances. But for rarer terms, a typo meant that a visitor might overlook a document. By faithfully reproducing the scanned images of the printed volumes, the University of Wisconsin edition was more a database of pages than documents. Advancing from one document to the next required paging ahead an unknown number of times. Similarly, search results were listed at the page level, rather than at the document level. But the University’s amazing accomplishment—digitizing 99 years’ worth of volumes with such fidelity to the original and offering serviceable full text search—demonstrated to historians throughout the field—and in the Office—the power of a unified portal for browsing and searching the FRUS series.

As the Office surveyed this field in 2007 and 2008, we observed that while FRUS was available online, it was very fragmented, both in location and in form. Students and scholars had to sift across four different sites to find particular volumes, and each site organized and presented documents differently. If this situation persisted, it might even get worse; a new administration might overhaul the state.gov website again and further bifurcate the series’ online presence. We saw great potential in the idea of drawing on the best qualities of each of these models to build a consolidated portal for the series.

However, we thought we could go further. We imagined a new model for FRUS that was more than a collection of books, scanned pages, or text on long webpages—a home that understood that FRUS is a documentary edition—a collection of documents, selected by historians, annotated with rich footnotes that give readers essential context for understanding each document, and organized into volumes with rich internal cross-references and common editorial methodologies and purpose that knit individual volumes together into a cohesive series. So, the Office began planning a new website to begin tackling these challenges that had accumulated in the series’ first 15 or so years on the world wide web.

The Office set out the following 5 goals for an improved website:

Our first goal was to bring the 88 FRUS volumes that the Office had released online to date on its various sites under a single roof, with a capable search engine. Besides the 88, several volumes from the Eisenhower and Kennedy administrations weren’t yet available on either the Wisconsin or Department’s websites. These would need to be scanned and converted to text. We also hoped to scan any of the 88 volumes that pre-dated the PDF format, so we would have PDFs or archival quality scans of all volumes for reference. We adopted Wisconsin’s digitization guidelines for high resolution scanning.

Some of us at this phase dreamed of creating a complete archive, but with so many digitization tasks just for these 88 volumes—and so many questions yet to answer about creating the new site—it seemed to be an unattainable goal. So we limited our scope to the Department’s existing online offerings.

Second, we resolved to make the document the primary unit of the publication, rather the printed page or the chapter. We felt each webpage should only have a single document, with a unique, persistent URL for citations. The documents would be displayed complete, all on one page. We realized, as we considered scanning older volumes, that prioritizing the text over the scanned image might mean a slower and more costly conversion process—because we would need to find a way to eliminate or minimize OCR errors—but we felt this was a worthwhile investment for two reasons: First, federal accessibility laws require scanned images to be accompanied with transcriptions for use by visitors who rely on screen reader technology to access content. And, second, as the official documentary record of U.S. foreign relations, the online FRUS series could not be rife with OCR errors. A clean text would benefit all users.

Our third goal was to improve navigation within volumes by displaying internal cross-references as hyperlinks, showing footnote text both inline and at the foot of a page, and placing reference aids from a volume’s glossaries right beside the text of a document instead of on a separate page.

Goal four was to build a stable foundation for the future. The Office reasoned that a modern digital format for FRUS could be extended, as resources allowed, with new layers of analysis and annotation. If designed correctly, the FRUS data would not need to be continuously overhauled as the Office’s website underwent inevitable redesigns or server migrations.

Our fifth goal was to adopt open, standards-based solutions and avoid proprietary ones—a vital strategy in an uncertain budgetary environment. If we could keep the costs of software licenses and development low, perhaps we could allocate some of our precious resources to digitizing more in the series?

To achieve these goals, we first had to answer 3 fundamental questions: (1) What master digital format would allow us to achieve our goals? (2) What web server & search engines were compatible with this format? (3) How could we adapt our existing publications into this format?

First, what master digital format would allow us to achieve our goals? This was a straightforward choice:

The Text Encoding Initiative, or TEI, is the de facto standard for digital humanities text projects. It offered a mature set of guidelines for capturing all aspects of documentary editions like FRUS—and a far richer digital vocabulary than the paragraphs and blue footnotes from DOSFAN and state.gov. It was non-proprietary and based on the open XML standard. If we could just wrap our heads around it, we could use it for free and rely on it as the long term master digital storage format for our publications.

For example, we learned that TEI could capture the entire hierarchical structure of a volume in a single file. Here we see the text of the first page of the body of a Nixon China volume. The words on the page are shown in black, and the TEI tags that describe and decorate the text are shown in blue, with further details, or attributes, shown in orange. The TEI tags we see here identify the headings and relative hierarchical nesting of 3 types of divisions in FRUS volumes: compilations, chapters, and documents. TEI also had a vocabulary for capturing footnotes and original footnote numbers. This suggested we could do more with footnotes than had been possible before. TEI also offered tags for capturing datelines (which consist of the place and date a document was written). Besides the plain English form of a date (February 21, 1972), the TEI Guidelines explained how to use attributes to capture dates in standard, machine-readable formats, even including time zones. The TEI vocabulary extended to identifying people. Here, person name tags identify references to Nixon, Chiang Kai-shek, and Zhou Enlai, and the attributes contain unique identifiers linking the people to glossaries or authority files. The TEI vocabulary would allow us to identify signatures in documents. This could be useful for research to find all documents signed by a particular official. And the TEI could identify terms and link them to authority files. Here, the term GRC is linked to the glossary definition for this term from the volume’s front matter, Government of the Republic of China. Perhaps we could use this and the similar tagging of persons to help readers understand the document they’re viewing, so they didn’t have to flip the page to reference a glossary.

The TEI Guidelines described even more ways that a project could enrich a text, but we had to be selective. We chose the elements that were vital for capturing the structure of the volume and the documents and annotations contained within. We gave greater weight to tagging that we imagined could be used to answer research questions about documents. If there was anything the TEI format didn’t support out of the box, it also had a customization facility for adding new kinds of annotations.

But even if we could snap our fingers and transform all of our publications into TEI, we wouldn’t have a new website. The state.gov content management system didn’t support TEI, so we would have to find a web server that was capable of transforming TEI into the HTML format used by web browsers, with a search engine that could handle TEI as well. Most databases struggle to support large collections of text, especially in a richly tagged form such as TEI. Thanks to kind insights from James Cummings at Oxford and Mark Saunders, Holly Shulman, and David Sewell at Rotunda, the digital imprint of the University of Virginia Press, we learned of a special kind of database, called a native XML database. Unlike their more common counterparts, relational databases, native XML databases could readily ingest and process TEI files. And they allowed us to query and transform our data using free, open standards, such as XQuery and XSLT. After reviewing the available software packages, we selected eXist-db, a free, open source native XML database. Free meant that we wouldn’t have to pay any license costs to use the software. Open source, in this case, meant that eXist’s community of users could make additions and that all could benefit from the improvements.

With the format and software questions settled, that left one concrete problem: how could we convert our existing digital publications into TEI? How could we take a text like the one on the left here, and achieve the kind of intricate tagging on the right that would be needed to achieve the goals we had in mind for leveraging the unique structures of documentary editions to bring annotations to readers’ fingertips? Here again, thankfully, our friends at Rotunda shared valuable experience they learned in finding and working with vendors to scan and convert numerous volumes of documentary editions from the American Founding Era collection.

The ideal vendor would have the capability to overcome the OCR typo problem, by using a technique called “double keying”—having two technicians retype the entire text. The idea is that it’s unlikely that two different people would make the same mistake, so the resulting two texts could be compared against each other for differences, and only the differences would need to be reviewed. This technique has been shown to achieve 99.99% accuracy for the keyed text. The vendors wouldn’t always have to resort to this technique if a source text was so clean that OCR could achieve the desired levels of accuracy, but we wanted it to be a part of our vendor’s arsenal.

These were the key decisions that shaped the development of history.state.gov: We would encode our publications in TEI, host the website in an eXist-db server, and convert our publications to TEI with the help of qualified, experienced vendors.

We combined these ingredients and launched the website in March 2009. Some of you may remember our original homepage. With 11 TEI-formatted FRUS volumes available at launch, we continued to send volumes to our vendor for scanning and conversion. A year after launch, the site offered 100 volumes.

Here we see the same document whose TEI fragments we viewed earlier, rendered on the fly in HTML by our XQuery code running in the eXist-db server. The volume’s table of contents is on the left, highlighting the current chapter; the text of the document is in the main section of the page, with the document heading, its dateline, the participants list, and the body of the document below. Here’s how we handled footnotes: When the user hovered their mouse over the footnote reference, a pop-up would appear, containing the text of the footnote. You could also click on a footnote reference to jump to the bottom of the page to view all of the footnotes in the document. And we tagged cross-references, so that readers could jump directly to the referenced documents. The tagging, rendering, and user experience aspects of footnotes was a major focus of our efforts on the new FRUS interface.

In the right sidebar we placed additional contextual information: links to scanned page images of the printed volume, and filtered listings of the people and terms from the volume’s front matter. Here’s how it worked: When a visitor would hover their mouse over one of the names, they would see the description of the person or the definition of the term. The idea was to save visitors the step of leaving the document to manually scroll through the long lists of names and terms. Behind the scenes, we had tagged the individual instances of names and terms in the document’s TEI source, referencing these entities’ unique identifiers in the front matter lists. Using XQuery, we could present a filtered list of just those names and terms that appeared in the document. We pulled content into this page from a completely different location in the volume. It seemed to us like magic.

The features shown here were what we had in mind when designing the FRUS digital edition on history.state.gov: a document-centered view of the volume, with a clean presentation of the digitized text, formatted in a way that closely adhered to the printed source but was actually based on TEI’s semantic tagging rather than raw typographic style. We showed readers where the document was located within the volume’s table of contents and provided access to the scanned page images. We presented filtered views of contextual information from the volume’s front matter. And we exposed footnotes and cross-references in a way that allowed visitors to traverse the rich lattice of FRUS annotations within volumes and throughout the series.

After completing the conversion of the volumes from our earlier sites, we expanded the digitization program. This was due thanks to a partnership with the University of Wisconsin Digital Collections Center, which we proposed in February, 2009 and which Peter Gorman, the head of UWDCC, accepted. The University provided us with their scanned images, and we sent the images to our vendor to extract the text from the images and apply TEI tagging to the volumes.

To ensure the vendor’s submissions met our stringent requirements for text accuracy, we performed random samplings on their submissions and performed additional levels of reviews and enrichment. In return, we provided the University with our enriched master TEI files for each volume. The partnership with the University allowed us to save the cost of scanning these 375 volumes and build on the excellent work they had performed on the series’ first 99 years. We are truly grateful to the University for their contributions to this project.

In 2018, we completed the decade-long project to digitize the series’ 535 printed volumes. Today, history.state.gov houses over 310,000 documents from nearly 550 volumes, covering 1861–1989.

While completing the digitization project, we also added a new edition of all FRUS: ebooks. This was the Office’s contribution to the 2012 Digital Government Strategy, which asked agencies to make their online resources more mobile-friendly. Thanks to this initiative, visitors can download FRUS ebooks for free and read them on their Kindles or iPads, search within volumes, and highlight passages and take notes. We also provided an ebook catalog, using the OPDS, or Open Publication Distribution System, format. This allowed third party ereader apps like Shubooks to embed our ebook catalog into their apps, letting users browse the catalog and download FRUS ebooks. We didn’t even envision ebooks when we launched history.state.gov, but thanks to our adoption of a media-neutral format like TEI, supporting ebooks was simply a matter of writing some XQuery to transform any of our volumes into the ebook format. Today, free, open source tools like TEI Publisher make transforming TEI documents into customized PDF and ebook formats, not to mention searchable websites, much simpler than it was in 2009. But here are all the nearly 550 volumes available today for browsing on history.state.gov and for download as ebooks.

Besides FRUS, the Office’s website also houses numerous publications and databases on the history of U.S. foreign relations and the Department itself. These include: (1) Principal Officers and Chiefs of Mission (a database of ambassadors and senior leaders in the Department), (2) Travels of the President and Secretary of State (a database of their official travel abroad), (3) Visits of Foreign Leaders (a database of official visits of foreign leaders and heads of state to the U.S.), and (4) the Office’s newest publication: Administrative Timeline of the Department of State. Our Countries section houses Recognitions and Relations, a collection of essays on the history of U.S. relations with every country in the world, focusing on dates of recognition and key changes in bilateral relationships. Our About section the minutes from every meeting of the Advisory Committee on Historical Diplomatic Documentation (HAC) since 1996. Additional HAC minutes and related documents can be found in our FRUS history section, where the Office’s 2015 monograph on the history of the FRUS series can also be found. That monograph is truly an invaluable resource for understanding the series in its institutional and political context.

While the experience of browsing FRUS on history.state.gov has remained consistent even across the site redesign in 2016 that made the website mobile-friendly, the experience of searching FRUS has taken a leap forward, with the addition of date search in late 2018. So let me demonstrate this and review the options for searching FRUS.

There are two ways to start a search in FRUS. First, you can select the site-wide search field, enter some keywords, and hit submit. This method searches the entire website. If you want to just search within a specific FRUS volume, go to the volume’s landing page, and look in the right sidebar for the “search inside this volume” field. Both ways work well.

When you arrive at the search page, you’ll see a lot of options for refining your search, but at the very top of the page , you’ll see a link to our Search Tips page. Search Tips explains the search engine’s default behavior for keyword searches, and explains how to refine your searches. For example, you can perform a phrase search by wrapping quotation marks around your keywords to find documents that contain the entire phrase rather than just the individual keywords. You can use boolean operators like AND, OR, and NOT to perform more specific searches, such as finding documents that contain one word OR another word, or that contain one word but NOT another word. You can use wildcards to handle spelling variations or different forms of words. And you can use proximity operators to find words that occur within a certain number of words of each other. Anytime you need a refresher, you can return to the Search Tips page to review these options.

Besides these keyword options, the website supports two main ways to limit the search results. First, filtering by section. By default a search includes results from the entire website, so to limit results to a specific section, such as FRUS, you would select the Historical Documents section. Doing so unlocks a second filer, the Date Filter. The date filter allows you to search or filter results by an exact date or a date range. If you are looking for documents from a specific date , like December 7, 1941, enter it into the Start Date field using the formatting shown in the form and submit your search. A complete date returns documents from that date, but if you enter just 1941 or just December 1941, the search engine will return all documents from that entire year of 1941 or the entire month of December 1941. If you have a specific date range in mind, fill in the end date. Again, you can enter a specific end date, or a year or a month in a year, and the search engine will be inclusive. You can even specify a time in the from and to fields. It’s timezone sensitive, but it assumes the US Eastern time zone, so adjust your searches accordingly. You can perform date searches independent of keyword searches, so you can find all documents from a specific date or range of dates, without specifiying any search terms. Or you can combine keyword and date search to find documents across the series from a specific time period that contain specific keywords. With 310,000 documents covering over 128 years worth of international history, being able to perform not only keyword search but also date filtering—and sorting—is essential.

Enriching FRUS TEI documents with dates for every one of these documents was a major project, made particularly challenging by the presence of undated documents, documents dated using non-Gregorian calendars, and changing national and local time zones. But for users of the website, know that even undated or imprecisely dated documents are included in your searches. Our TEI captures the full range of possibilities for every document, depending on its individual circumstances.

Besides the date filter, the search engine also offers a volume filter, which shows the set of volumes that contain hits for the given search. For example, a search for the phrase “human rights” returns these results as the first of over 3,000 results, sorted chronologically. So how to narrow these down, other than by date? By examining the volumes filter to see the volumes that contained these 3,000 documents, you might be interested in focusing on the 1894 volume covering Affairs in Hawaii. Applying that filter, you would see this document from July 8, 1894. In this way, the search and filter options allow you to cast a wide or a narrow net, and use any combination of these options to explore a period, topic, or set of volumes across the entire FRUS series.

Looking ahead, we are continuing to digitize the FRUS Microfiche Supplements—the 13 publications released in the 1980s and 1990s—arguably the least accessible of all volumes in the series. So far 2 of these have been digitized and released online. We are scanning the microfiche and applying the same digitization workflow to these volumes as we had the earlier volumes. Given the poor quality of some of the original microfiche, this is painstaking work. So our plan is to release one each year or so, pending the availability of resources, until they’re complete. Hopefully within a decade or so.

Second, the Office is investigating providing visitors with even more powerful tools to narrow searches, by adding filters for the rich metadata found in each FRUS document’s heading and source note, such as sender and recipient, document type, provenance, original classification, and people, places, and organizations mentioned. The largest obstacle to expanding the selection of filters is the fact that the contemporary annotation practices for capturing this information varied across and within epochs in the series. Human readers can cope with such variation, but in its current form, this information is not sufficiently regularized to be “machine readable.” Just as the date filter and sorting options required a major effort to identify dates in documents and make them uniform and machine readable, every additional “facet” or “dimension” of metadata that the website could expose will require considerable investment of effort and resources. The Office continues to investigate possibilities for enhancements in these areas.

Third, the Office is exploring the use of digital annotation tools to aid in indexing documents. I am particularly excited about the latest release of TEI Publisher, a free, open source software project for making TEI collections browsable and searchable online. Version 7.1 added the ability to annotate TEI documents in the web browser. Here is a view of the TEI Publisher annotation interface, showing a document from the Reagan Soviet 85–86 volume released last year, with annotations of the document’s people, places, dates, and topics. Besides applying entity types to the text, we can also link individual instances to an entity database, as is shown in this description of the Chernobyl disaster in 1986. We’ve begun using this tool to help prepare back-of-book indexes for these volumes and look forward to being able to post the completed indexes.

Finally, I’d like to briefly introduce our open data repositories on GitHub. GitHub is a free website that many researchers and developers working on humanities, government data, and open-source software projects use to publish their data and source code. The Office posts all of its publications and datasets, as well as the full source code for history.state.gov, on GitHub, under the HistoryAtState organization. The FRUS source files can be found in the “frus” repository. Every new FRUS volume published to history.state.gov is simultaneously uploaded to GitHub, and every edit is time-stamped and logged with descriptive comments explaining each change. With an almost radical level of transparency, you can see the precise changes we make anytime we edit files. In this view, you can see that we fixed a typo reported via our mailbox and deleted a Z that found its way into our TEI file. By establishing a GitHub account, readers can fork our repositories (or make complete copies under their own accounts), report issues to us (such as typos or bugs), and even propose fixes, which we’ll evaluate. We’ve even posted directions you can use to download and install a complete, live copy of the history.state.gov website on your personal computers. This could offer a practical way to perform research when working without a stable internet connection. Besides these benefits, we believe that posting our raw data and source code can allow researchers, including you, to perform new kinds of analysis that the history.state.gov website does not facilitate. Thanks to our use of open standards and non-proprietary formats, researchers can load our documents into their databases for further analysis.

For historians and researchers in other humanities fields who may not have a background in computer science looking to gain such skills, luckily today, such training is readily available through university libraries, digital humanities centers, and summer institutes; through online tutorials such as the Programming Historian at programminghistorian.org (one word); through videos and course materials posted online, or books for humanists on learning programming languages like XQuery for mining TEI and XML sources. For scholars who have advanced text mining skills—or collaborate with those who do—the Office’s sources are natural targets for the application of natural language processing, text modeling, and other computational analysis techniques. And we’re excited that we’ve begun to see the fruits of such projects.

Thanks to invaluable university partnerships, the Office of the Historian built on earlier efforts and established a modern foundation for the FRUS digital edition on the basis of open standards and open-source software. The website houses a complete collection of the printed FRUS volumes in a full-text, searchable format and as downloadable ebooks on history.state.gov. This includes not just FRUS but also other vital publications for the study of U.S. foreign relations and the history of the Department of State. All raw data and source code is available on GitHub.

Besides working to digitize the least accessible volumes—the microfiche supplements—the Office continues to explore ways to improve our search tools and improve the utility and quality of all of the Office’s publications for students, scholars, and the general public.

Thank you for listening! We invite your feedback. Please email us at history@state.gov.