Advisory Committee on Historical Diplomatic Documentation August 30, 2021
- James Goldgeier, Chairman
- Kristin Hoganson
- Richard Immerman
- William Inboden
- Adriane Lentz-Smith
- Sharon Leon
- Melani McAlister
- Nancy McGovern
- Deborah Pearlstein
Office of the Historian
- Kristin Ahlberg
- Carl Ashley
- Margaret Ball
- Forrest Barnum
- Sara Berndt
- Josh Botts
- Myra Burton
- Tiffany Cabrera
- Mandy Chalou
- Elizabeth Charles
- Thomas Faith
- David Geyer
- Renée Goings
- Charles Hawley
- Kerry Hite
- Adam Howard
- Aiyaz Husain
- Virginia Kinniburgh
- Michael McCoyer
- Christopher Morrison
- Mircea Munteanu
- David Nickles
- Paul Pitman
- Alexander Poster
- Kathleen Rasmussen
- Matthew Regan
- Amanda Ross
- Seth Rotramel
- Daniel Rubin
- Nathaniel Smith
- Melissa Jane Taylor
- Chris Tudda
- Dean Weatherhead
- Joseph Wicentowski
- Alexander Wieland
- James Wilson
- Louise Woodroofe
Bureau of Administration
- Jeff Charlston
- Corynne Gerow
- Marvin Russell
- Eric Stein
- Susan Weetman
National Archives and Records Administration
- Cathleen Brennan
- Robert Fahs
- Beth Fidler
- David Langbart
- John Powers
- Amy Reytar
- Mark Sgambettera
Department of Defense
- John D. Smith
- Over 40 members of the public
Open Session, August 30
Approval of the Record
James Goldgeier, Committee Chair, opened the meeting and moved to approve the minutes of the June meeting. The minutes were unanimously approved. Goldgeier welcomed Ambassador Julieta Valls Noyes, Acting FSI Director.
Remarks by the Acting Director of the Foreign Service Institute
Noyes expressed delight to be speaking to the committee despite an incredibly busy daily agenda. She remarked that it was important to take the time to mark and to honor the exceptional contributions of departing Committee Chair Richard Immerman. She noted that Immerman, in addition to his leadership and counsel on the committee, had undertaken special trips from his home to OH on an entirely voluntary basis to advise on rather complicated declassification issues with FRUS. Noyes thanked Immerman for his service and offered best wishes for his future endeavors. Next, Noyes outlined recent contributions made by OH Historians regarding awareness of diversity and accessibility issues in the Department. Historian Sara Berndt delivered a virtual presentation to Mission Mexico, organized by Consulate Hermosillo, featuring standout U.S. diplomats from one or more underrepresented groups and highlighting the careers of Ambassador Raul H. Castro, Assistant Secretary of State for Democracy, Human Rights, and Labor Patricia Derian, and Ambassador Terence Todman. Berndt also organized an OH co-hosted event with Mission Mexico, Consulate Nogales, for Juneteenth, featuring a discussion between scholars Maria Esther Hammack and Alice Baumgartner who discussed southern routes to freedom for enslaved people along the 19th century U.S.-Mexico border. Historians Melissa Jane Taylor and Laura Kolar presented on the history of the Department, with a special emphasis on diversity issues, to this year's cohort of Foreign Affairs IT Fellows. Taylor also presented her History of Diversity in the Foreign Service presentation to U.S. Foreign Service Internship Program fellows, as well as to the Special Advisor’s Office for Diversity and Inclusion and the Department’s Office of Civil Rights. Noyes concluded her remarks by requesting that all virtual attendees unmute themselves and join in a hearty round of applause for Immerman’s contributions.
Report by the Executive Secretary
Goldgeier invited comments from Adam Howard, OH Director and Executive Secretary. Howard amplified Noyes’ positive comments regarding Immerman’s service. He stated that it was a pleasure to work with someone so dedicated to the work of the office and who believed in the important contributions FRUS and the office as a whole make to the historical community and government transparency. Howard noted that Immerman willingly gave up much of his personal time to speak on the phone, and to travel from Philadelphia to OH in some cases, to help the office with FRUS publishing and to help the committee function smoothly. Howard thanked Immerman again and read into the record a May 6, 2021, letter from Secretary of State, Antony J. Blinken:
“Dear Dr. Immerman: I am deeply grateful for your service on the Department’s Advisory Committee on Historical Diplomatic Documentation and particularly your decade of outstanding leadership as its Chair. Your initiative, ingenuity, and unflinching candor proved invaluable during a challenging period for the Foreign Relations of the United States series. Additionally, your attention to the proper retention and retirement of the Department’s records has been a great service to the American people. Thank you for your dedication and unfailing support for the work of the Office of the Historian. I wish you the best for the future.”
Next, Howard referred to an article in State magazine co-authored by OH Historians Elizabeth Charles and James Wilson on the Cold War legacy of former Secretary of State George Shultz, who passed away in February.
Howard noted that there are five relevant FRUS volumes
to the article:
• 1981–1988, Volume III, Soviet Union, January 1981–January 1983
• 1981–1988, Volume IV, Soviet Union, January 1983–March 1985
• 1981–1988, Volume V, Soviet Union, March 1985–October 1986
• 1981–1988, Volume VI, Soviet Union, October 1986–January 1989
• 1981–1988, Volume XI, START I
Howard concluded by announcing that the November issue of the American Historical Association’s newsmagazine Perspectives on History will include an article about OH and FRUS in honor of the 160th anniversary of FRUS which will occur this year and also the 100th anniversary of OH which the Department created in 1921.
Goldgeier expressed trepidation about assuming the role of Committee Chair following Immerman’s impressive 10-year term and noted that, due to the new term limits, no one will outlast Immerman in the position. Goldgeier also thanked Mary Dudziak for her years of helpful service on the committee.
Report by the General Editor
Goldgeier invited comments by Kathleen Rasmussen, General Editor.
Rasmussen described a bout with writer’s block in attempting to summarize her views on the importance of Immerman’s contributions both personally and professionally. She added her appreciation for Immerman’s unwavering support for the FRUS series throughout more than a decade of HAC service. Rasmussen noted Immerman’s consistently thoughtful engagement at the quarterly meetings, his willingness to review the unpublished Iran, 1951–1954 manuscript, enthusiastic participation on many FRUS-related panels at scholarly conferences, and his public and private advocacy, liaison, and outreach efforts on behalf of the series. Rasmussen exclaimed: “You are a true champion of FRUS, Richard. It has been both a professional and a personal pleasure to work with you!”
Next Rasmussen reported that OH and the National Archives and Records Administration have signed a Memorandum of Understanding (MOU) that will enable FRUS historians to conduct research in the classified Presidential Library records that are being transferred to the National Declassification Center. The MOU, which is essential to ensuring that FRUS can continue to meet its “thorough, accurate, and reliable” mandate, would not have been possible without the hard work and support of Jay Bosanko, Bill Fischer, John Laster, and Don McIlwain at NARA, as well as the entire OH FRUS management team, especially Assistant to the General Editor, Kristin Ahlberg.
Rasmussen next described FRUS and the return to the office, noting that even after OH shifted to maximum telework in March 2020, a core of FRUS historians were able to continue to work regularly at the office. The advent of widespread vaccine availability and declining caseload numbers this past spring resulted in an expanded in-office presence, although childcare issues and individual risk factors have kept most FRUS historians working largely from home. Over the last year and a half, the ability of FRUS historians to conduct research in classified records has steadily increased; it’s important to note, however, that Presidential Library records, among the most important sources for the series, are still inaccessible. Moreover, a number—but not all—of our interagency declassification partners are actively working through (and making excellent progress on) FRUS declassification referrals.
Rasmussen concluded by noting that four FRUS volumes had been published during the first half of 2021: Soviet Union, January 1983–March 1985; the second, revised edition of Documents on Western Europe, 1973–1976; the digital edition of the microfiche supplement for American Republics, 1961–1963, Cuba, 1961–1962, and Cuban Missile Crisis and Aftermath; and START I, 1981–1988. The office plans to publish one more volume by year’s end: Foundations of Foreign Policy, 1981–1988.
Remarks from the Acting Deputy Assistant Secretary, Office of Global Information Services (A/GIS)
Goldgeier invited comments from Eric Stein, Acting Deputy Assistant Secretary, A/GIS.
On behalf of the Department’s records, transparency, and declassification programs, Stein thanked Immerman for his service on the committee. Stein added that Immerman had helped to share knowledge about the mission of these programs and to improve their processes. Stein also thanked Mary Dudziak for her service on the committee. He noted that the Department’s declassification programs are functioning the best that they can given the Covid-19 pandemic. On Friday, August 27, the Department posted 728 records totaling 3,340 pages to the public FOIA website (foia.state.gov) as part of the Department’s monthly postings of records released through the FOIA program. Stein concluded by noting that the monthly reports to the committee will continue.
Remarks from the Historical Advisory Committee
Goldgeier invited other members of the Committee to speak.
Inboden expressed bittersweet emotions regarding Immerman’s departure and thanked Immerman for his mentorship regarding service. Inboden described Immerman as “a model for the rest of us” and concluded with three points. First, that Immerman did his job fantastically well. Second, that Goldgeier is a very knowledgeable and capable leader. And third, that the committee will continue to ask for Immerman’s views and input on important issues.
McAlister also honored Immerman for his leadership and expressed the wish that she could thank Immerman in person following the conclusion of the pandemic.
Hoganson noted that Immerman had alerted her to the importance of document retention and declassification over a decade ago and that he had educated a generation of historians on archival issues. She described Immerman’s departure as a loss but added that it was “so great that you served so wonderfully for so many years.”
Goldgeier invited Immerman to comment.
Immerman announced: “I’m tempted to shock everyone and say that I am speechless.” He continued: “I can’t express adequately my appreciation for all the comments” and described a successful transformation of the committee into an “incredibly effective advisory organ.” While essential to the work of historians, and in fact to the functioning of democracy, the work of the committee, FRUS, and declassification is mostly “unknown, underappreciated, and unacknowledged by our broader community.” Immerman gave special thanks to William McAllister, Steven Randolph, Howard, Deputy Historian Renée Goings, and Rasmussen. He also thanked former committee members Robert McMahon, Thomas Zeiler, and Katherine Sibley. Additionally, he thanked Stein, David Langbart, Bill Fischer and “all of the others that have reached out and assisted with cordiality.” Immerman concluded his spoken remarks by noting his interest in reading future annual reports by the committee, “which I won’t have to write anymore!” In the chat feature of the virtual meeting, he wrote: “I look forward to continuing to work with all of you, in whatever capacity, in the pursuit of our vital mission to promote transparency, democracy, citizenship, accountability, and 'good government.'”
Goldgeier expressed appreciation of Immerman’s remarks and described the importance of transparency in democracy. He stated that FRUS makes a tremendous contribution to democracy and that without transparency the country would be less free. Goldgeier also thanked William “Bill” Burr of the National Security Archive for his helpful role.
In the chat feature of the virtual meeting Burr wrote: “My colleagues at the National Security Archive and I thank you Richard for your extraordinary work on the HAC. The impact of your work will be lasting and your contributions, not least to the cause of openness in government, are unparalleled. We know that you will continue having an impact, not least by chairing the AHA’s National Archives Committee.”
Report on the Origins and Evolution of the Foreign Relations of the United States Digital Edition
Howard introduced Joseph Wicentowski, Digital History Advisor for the Office of the Historian. Wicentowski provided a presentation on the origins and evolution of the FRUS digital edition and ongoing digital initiatives in the Office. From the first iteration on DOSFAN, through the launch of the current website, history.state.gov, Wicentowski traced the Office’s efforts to provide and improve access to FRUS as well as other resources. As he explained, five principles guided the early efforts and continue to inform and drive future improvements: provide a single site to access all FRUS volumes; prioritize the document as the unit of publication; improve access to annotation; build a foundation for continued expansion; and use open standards-based solutions that would be agile and affordable. As a result, the Office chose TEI as a master format, a choice that required an infrastructure capable of supporting the enormous volume of FRUS. With input from University of Virginia’s Rotunda project, the Office selected eXist DB and launched the new version of the Office website in 2009 with just eleven digitized volumes. Collaboration with the University of Wisconsin helped to digitize previously published volumes.
Recent improvements to the digital edition of FRUS include availability in eBook format (2012) and an improved date filters and search capabilities. The Office plans to improve and expand available search filters and add the rest of the microfiche FRUS document supplements to the website. These would augment the additional resources already available on the site, including the databases of Travels, Visits, Principal Officers and senior officials, Guide to Countries, and the new Department of State Administrative Timeline. Wicentowski indicated the Office also plans to work with eXist programmers to allow the encoding of document level citation data to better interface with existing digital citation management software. Noting that open source tools make this and other forms of data analysis more possible than ever, he pointed interested individuals to the Office’s open data repository located at: https://github.com/HistoryAtState where they can explore other means of interacting with the data, as well as trace our updates.
Rasmussen thanked Wicentowski for the work he has done on improving the website and increasing accessibility to the FRUS series. Immerman mentioned that an article on H-Diplo detailing the new features of the website is scheduled for publication.
Derek Chollet, Counselor to the Department of State, joined the HAC meeting during the Q&A session. Chollet was scheduled to speak earlier in the day but was delayed. Chollet apologized for missing his scheduled time earlier because of a ministerial meeting on Afghanistan he was attending alongside Secretary Blinken. “I am living history.” Chollet joked.
Chollet recounted his long connection to the Office of the Historian, both working alongside historians in the office on a special project on the Dayton Accords, as well as an end user of the FRUS series. He thanked Richard Immerman for his service on the HAC and as chairman of the Committee over the past decade. “Richard has been a mentor of mine over the years” Chollet added, stressing that Immerman has left a mark on the historical community, and that during his leadership, the Advisory Committee worked through numerous tough and important issues which “leaves you with a fine legacy.” Chollet then welcomed the new chairman of the Advisory Committee, James Goldgeier, “a longtime friend, colleague, and co-author for a long, long time.”
While the recent weeks were overwhelmed by current events and living minute to minute, “history is never far from our minds” Chollet said. The Department leadership already requested the Office of the Historian work on lessons learned from past efforts dealing with massive refugee situations. Beyond that, Collet continued, all of us have been thinking, reflecting, and learning the lessons from Afghanistan from the past six months to the past 20 years. That learning, Chollet stressed, will continue through the work of historians for many decades to come. The work of the Office of the Historian is vital to that effort, both in providing the documents and the access to the public and, indeed, the world, and educating the public, something that continues to be very important.
Excusing himself for having to run to additional meetings, Chollet thanked Richard Immerman for his service on the HAC, thanked the members of the HAC for continuing their advisory and support work with the office. Finally, he thanked the office leadership and staff for their efforts on behalf of the U.S. public and promised to visit the office once that becomes possible and continue the work of integrating the work of the office into the day-to-day work of the Department.
Goldgeier thanked Chollet for taking time to speak to the HAC and stressed the contributions the Office of the Historian makes to our government’s strategic competition with non-democratic actors through its transparent treatment of this nation’s history, and to our ability to learn from the history.
Returning to the Q&A with Wicentowski, Leon asked on the status of encoding document level information for citation managers. Wicentowski admitted that it remains a work in progress, but that the resources to get it done in the near future are available. Hoganson asked if data on site usage and GitHub usage is available. The website receives over 19 million unique visitors a year, Wicentowski said, but no data on GitHub use is presently available. A member of the public asked if the HAC session was being recorded. Rasmussen responded that presently sessions are not being recorded but that the issue is under consideration.
Leon inquired to the status of integrating FRUS data with external data authorities. Wicentowski responded that there are no plans for integration at this time, but that he has external authority files in mind as possible candidates. He also mentioned that, in the process of creating the office’s authority file, many external data authorities were consulted and referenced.
Goldgeier asked why volumes haven’t expanded as electronic publishing took primacy and whether as documents become available, they may be added to the electronic version of the FRUS volumes. Rasmussen answered that while historians always want to add more documents to volumes, there are some limitations. First and foremost, there are physical limitations, since the volumes are still being printed in hard bound copies. Second, the office learned during the publication of the Nixon-Ford volumes, that e-publications are not always faster to come to the public. Adding more documents, Rasmussen pointed out, means additional time in editing and declassification. Lastly, even if additional documents become available, integrating them into previously published volumes would present incredible challenges. Howard pointed out that adding documents after the publication is an exception, one that happened only once in a very specific case where documents from the Camp David accords were found after the publication of the volume because they had been filed in an unexpected place. The discussion continued including on how the FRUS-law mandate of “thorough, accurate, and reliable” plays a role in how the volumes are compiled and published.
Below is a video edition and transcript of the lecture presented by Dr. Joseph Wicentowski during the public session. An essay on this topic can be found in H-Diplo Forum 2021-2, “Scholars and Digital Archives: Living the Dream?” (6 October 2021).
Report on the Origins and Evolution of the Foreign Relations Digital Edition (Video and transcript)
Welcome everyone, and thank you for attending this presentation. I’m pleased to see so many here who have contributed to the initiatives I’ll be discussing, and I’d like to thank Richard Immerman and all of the members of the HAC for your support of these efforts.
Today’s presentation has five parts. I’ll begin by tracing the origins of today’s FRUS digital edition by examining several earlier efforts. Then I’ll describe the goals we had and the hurdles we faced in designing and creating history.state.gov and its new interface to FRUS in 2007–2008. Then I’ll describe the launch of history.state.gov in 2009, followed by our expansion of FRUS coverage and other improvements, notably search, and I’ll discuss some future improvements that we’re exploring. Finally I’ll show how to access the open data repositories that contain all of the Office’s publications and datasets and discuss the possibilities this opens up for research that go beyond the capabilities of our website.
Before the launch of history.state.gov in 2009, the Office had carried out several early online publishing initiatives. These began in the mid-1990s, with DOSFAN, the Department of State Foreign Affairs Network collection, hosted by the University of Illinois at Chicago. This effort was followed by two major incarnations of state.gov in the late 1990s and early aughts. Taken together, these 3 efforts put 88 volumes online covering the post-WWII era but centering on the Kennedy, Johnson, Nixon, and Ford administrations. Separate from the Office’s efforts, in the early aughts the University of Wisconsin Digital Collections Center carried out an ambitious project to offer access to earlier volumes in the series. The University scanned 375 volumes covering the series’ first 99 years in print. These were the resources that the Office reviewed in 2007 when we began planning the FRUS digital edition that we see today on history.state.gov. Let’s take a look at each of these sites and note their evolution and distinct qualities.
This is the first FRUS volume posted online to DOSFAN: the Eisenhower Eastern Europe volume, published in 1993. A second was published in 1995. The DOSFAN volumes established a model for publishing FRUS online that endured for over a decade. It presented FRUS documents, a chapter a time, on long webpages, using very minimal markup. Chapter headings, document titles, body text, and footnotes were all encoded as paragraphs. Footnotes received special treatment. They were demarcated by a pair of slash symbols signaling a footnote reference inside the text. Footnote text appeared in the following paragraph, using the same double slash symbol. This presentation was simple, but effective—a reasonable use of 1990s-era HTML. This site lacked search capabilities, but its goal was clearly to get the full text of the volumes online, for free, public access.
The Office’s first homepage on the Department’s new state.gov website in 1997 followed the DOSFAN model of presenting a chapters’ worth of documents on a single webpage. But it made two improvements in the experience of reading FRUS documents online: (1) Bold text for document headings and centered titles; and (2) A distinct, blue appearance for footnotes to set them off from the body text of documents. My colleagues who applied the HTML coding to turn footnotes blue attest that it was a manual and very time consuming process. The text of 28 volumes were released by the Office on this incarnation of state.gov. While the site had a search engine, the search results could not be limited to FRUS, so they often included many other resources.
With the launch of the second incarnation of the Office’s homepage on the state.gov website in 2001, the Office made two more improvements on the previous model:
First, with the advent of the PDF format, which preserved the page layout of the printed edition, the Office could post chapters and entire volumes as downloadable PDF. But the Office didn’t cease publishing the text of printed volumes on chapter-length webpages. Plain old webpages were (and remain) faster to load than PDFs, and offered many other advantages. A PDF the length of a typical FRUS volume—about a thousand pages—was difficult to navigate on a computer, especially without internal links, or bookmarks. The text in a PDF had fixed dimensions, so users with small screens might struggle to read the text at a natural size, whereas web pages would easily reflow to the user’s screen size and desired font size. Also, users of search engines like Google had a better chance of finding FRUS documents released as web pages because search engines would typically only index the first few pages of PDFs but would always fully index web pages.
The second innovation was also related to the PDF format: a new type of publication known in the Office as the “e-pub,” or the “electronic-only publication.” E-pubs allowed visitors to download PDFs of scanned images of the original archival document for offline viewing. In addition, a transcribed copy of the text was also posted for convenient search and for accessibility reasons—to allow visitors with visual disabilities to use screen reader technology to read the documents. These volumes were intended to be electronic-only and never bound into books.
In many ways, e-pubs were an internet-age counterpart to the microfiche supplements that the Office published in the 1980s and 1990s; they offered the reader a facsimile view of original archival document—but at a cost. The technological limitations of both microfiche and e-pub formats prevented editors from using footnotes to place contextual information at the point of reference. Instead, editors were limited to writing a brief summary that was prepended to the document. Also, like microfiche, the hope was that e-pubs could allow the Office to release more documents more quickly and cheaply than printed books. However, the labor associated with the e-pubs—scanning, transcribing, and uploading per-document PDFs and transcriptions, all performed in house—was substantial. Furthermore, every document had to undergo the same declassification procedures as printed documents, so the savings in time and resources was somewhat illusory. Nonetheless, the e-pub experiment produced 10 uniquely accessible volumes—contributing to a total of 58 volumes—on this second incarnation of the Office’s state.gov website. In terms of search, this version of state.gov’s search engine remained unable to limit searches to just FRUS documents.
Separate from the Office’s initiatives, the University of Wisconsin Digital Collections Center took a completely different and very ambitious approach to presenting FRUS online. The University had selected FRUS as its first experiment in mass digitization, and over 5 years, from 2003–2008, collected 375 volumes from their holdings and partner libraries, sheared off the bindings of each book, scanned nearly 400,000 pages at high resolution, and created an interface for browsing this material, which put these scanned images of the printed page at the center of the online edition. This edition also allowed visitors to search within the series.
But because the University used optical character recognition, or OCR, to convert the scanned images into text for the search engine, the text was subject to typos, which could mean that a search for “Mr. McMath” or “Mr. Hunter” or “Empire” would miss documents that contained these phrases. In practice, such typos were not a problem for terms that appeared multiple times in a short span, since the OCR engine might correctly read one of the other instances. But for rarer terms, a typo meant that a visitor might overlook a document. By faithfully reproducing the scanned images of the printed volumes, the University of Wisconsin edition was more a database of pages than documents. Advancing from one document to the next required paging ahead an unknown number of times. Similarly, search results were listed at the page level, rather than at the document level. But the University’s amazing accomplishment—digitizing 99 years’ worth of volumes with such fidelity to the original and offering serviceable full text search—demonstrated to historians throughout the field—and in the Office—the power of a unified portal for browsing and searching the FRUS series.
As the Office surveyed this field in 2007 and 2008, we observed that while FRUS was available online, it was very fragmented, both in location and in form. Students and scholars had to sift across four different sites to find particular volumes, and each site organized and presented documents differently. If this situation persisted, it might even get worse; a new administration might overhaul the state.gov website again and further bifurcate the series’ online presence. We saw great potential in the idea of drawing on the best qualities of each of these models to build a consolidated portal for the series.
However, we thought we could go further. We imagined a new model for FRUS that was more than a collection of books, scanned pages, or text on long webpages—a home that understood that FRUS is a documentary edition—a collection of documents, selected by historians, annotated with rich footnotes that give readers essential context for understanding each document, and organized into volumes with rich internal cross-references and common editorial methodologies and purpose that knit individual volumes together into a cohesive series. So, the Office began planning a new website to begin tackling these challenges that had accumulated in the series’ first 15 or so years on the world wide web.
The Office set out the following 5 goals for an improved website:
Our first goal was to bring the 88 FRUS volumes that the Office had released online to date on its various sites under a single roof, with a capable search engine. Besides the 88, several volumes from the Eisenhower and Kennedy administrations weren’t yet available on either the Wisconsin or Department’s websites. These would need to be scanned and converted to text. We also hoped to scan any of the 88 volumes that pre-dated the PDF format, so we would have PDFs or archival quality scans of all volumes for reference. We adopted Wisconsin’s digitization guidelines for high resolution scanning.
Some of us at this phase dreamed of creating a complete archive, but with so many digitization tasks just for these 88 volumes—and so many questions yet to answer about creating the new site—it seemed to be an unattainable goal. So we limited our scope to the Department’s existing online offerings.
Second, we resolved to make the document the primary unit of the publication, rather the printed page or the chapter. We felt each webpage should only have a single document, with a unique, persistent URL for citations. The documents would be displayed complete, all on one page. We realized, as we considered scanning older volumes, that prioritizing the text over the scanned image might mean a slower and more costly conversion process—because we would need to find a way to eliminate or minimize OCR errors—but we felt this was a worthwhile investment for two reasons: First, federal accessibility laws require scanned images to be accompanied with transcriptions for use by visitors who rely on screen reader technology to access content. And, second, as the official documentary record of U.S. foreign relations, the online FRUS series could not be rife with OCR errors. A clean text would benefit all users.
Our third goal was to improve navigation within volumes by displaying internal cross-references as hyperlinks, showing footnote text both inline and at the foot of a page, and placing reference aids from a volume’s glossaries right beside the text of a document instead of on a separate page.
Goal four was to build a stable foundation for the future. The Office reasoned that a modern digital format for FRUS could be extended, as resources allowed, with new layers of analysis and annotation. If designed correctly, the FRUS data would not need to be continuously overhauled as the Office’s website underwent inevitable redesigns or server migrations.
Our fifth goal was to adopt open, standards-based solutions and avoid proprietary ones—a vital strategy in an uncertain budgetary environment. If we could keep the costs of software licenses and development low, perhaps we could allocate some of our precious resources to digitizing more in the series?
To achieve these goals, we first had to answer 3 fundamental questions: (1) What master digital format would allow us to achieve our goals? (2) What web server & search engines were compatible with this format? (3) How could we adapt our existing publications into this format?
First, what master digital format would allow us to achieve our goals? This was a straightforward choice:
The Text Encoding Initiative, or TEI, is the de facto standard for digital humanities text projects. It offered a mature set of guidelines for capturing all aspects of documentary editions like FRUS—and a far richer digital vocabulary than the paragraphs and blue footnotes from DOSFAN and state.gov. It was non-proprietary and based on the open XML standard. If we could just wrap our heads around it, we could use it for free and rely on it as the long term master digital storage format for our publications.
For example, we learned that TEI could capture the entire hierarchical structure of a volume in a single file. Here we see the text of the first page of the body of a Nixon China volume. The words on the page are shown in black, and the TEI tags that describe and decorate the text are shown in blue, with further details, or attributes, shown in orange. The TEI tags we see here identify the headings and relative hierarchical nesting of 3 types of divisions in FRUS volumes: compilations, chapters, and documents. TEI also had a vocabulary for capturing footnotes and original footnote numbers. This suggested we could do more with footnotes than had been possible before. TEI also offered tags for capturing datelines (which consist of the place and date a document was written). Besides the plain English form of a date (February 21, 1972), the TEI Guidelines explained how to use attributes to capture dates in standard, machine-readable formats, even including time zones. The TEI vocabulary extended to identifying people. Here, person name tags identify references to Nixon, Chiang Kai-shek, and Zhou Enlai, and the attributes contain unique identifiers linking the people to glossaries or authority files. The TEI vocabulary would allow us to identify signatures in documents. This could be useful for research to find all documents signed by a particular official. And the TEI could identify terms and link them to authority files. Here, the term GRC is linked to the glossary definition for this term from the volume’s front matter, Government of the Republic of China. Perhaps we could use this and the similar tagging of persons to help readers understand the document they’re viewing, so they didn’t have to flip the page to reference a glossary.
The TEI Guidelines described even more ways that a project could enrich a text, but we had to be selective. We chose the elements that were vital for capturing the structure of the volume and the documents and annotations contained within. We gave greater weight to tagging that we imagined could be used to answer research questions about documents. If there was anything the TEI format didn’t support out of the box, it also had a customization facility for adding new kinds of annotations.
But even if we could snap our fingers and transform all of our publications into TEI, we wouldn’t have a new website. The state.gov content management system didn’t support TEI, so we would have to find a web server that was capable of transforming TEI into the HTML format used by web browsers, with a search engine that could handle TEI as well. Most databases struggle to support large collections of text, especially in a richly tagged form such as TEI. Thanks to kind insights from James Cummings at Oxford and Mark Saunders, Holly Shulman, and David Sewell at Rotunda, the digital imprint of the University of Virginia Press, we learned of a special kind of database, called a native XML database. Unlike their more common counterparts, relational databases, native XML databases could readily ingest and process TEI files. And they allowed us to query and transform our data using free, open standards, such as XQuery and XSLT. After reviewing the available software packages, we selected eXist-db, a free, open source native XML database. Free meant that we wouldn’t have to pay any license costs to use the software. Open source, in this case, meant that eXist’s community of users could make additions and that all could benefit from the improvements.
With the format and software questions settled, that left one concrete problem: how could we convert our existing digital publications into TEI? How could we take a text like the one on the left here, and achieve the kind of intricate tagging on the right that would be needed to achieve the goals we had in mind for leveraging the unique structures of documentary editions to bring annotations to readers’ fingertips? Here again, thankfully, our friends at Rotunda shared valuable experience they learned in finding and working with vendors to scan and convert numerous volumes of documentary editions from the American Founding Era collection.
The ideal vendor would have the capability to overcome the OCR typo problem, by using a technique called “double keying”—having two technicians retype the entire text. The idea is that it’s unlikely that two different people would make the same mistake, so the resulting two texts could be compared against each other for differences, and only the differences would need to be reviewed. This technique has been shown to achieve 99.99% accuracy for the keyed text. The vendors wouldn’t always have to resort to this technique if a source text was so clean that OCR could achieve the desired levels of accuracy, but we wanted it to be a part of our vendor’s arsenal.
These were the key decisions that shaped the development of history.state.gov: We would encode our publications in TEI, host the website in an eXist-db server, and convert our publications to TEI with the help of qualified, experienced vendors.
We combined these ingredients and launched the website in March 2009. Some of you may remember our original homepage. With 11 TEI-formatted FRUS volumes available at launch, we continued to send volumes to our vendor for scanning and conversion. A year after launch, the site offered 100 volumes.
Here we see the same document whose TEI fragments we viewed earlier, rendered on the fly in HTML by our XQuery code running in the eXist-db server. The volume’s table of contents is on the left, highlighting the current chapter; the text of the document is in the main section of the page, with the document heading, its dateline, the participants list, and the body of the document below. Here’s how we handled footnotes: When the user hovered their mouse over the footnote reference, a pop-up would appear, containing the text of the footnote. You could also click on a footnote reference to jump to the bottom of the page to view all of the footnotes in the document. And we tagged cross-references, so that readers could jump directly to the referenced documents. The tagging, rendering, and user experience aspects of footnotes was a major focus of our efforts on the new FRUS interface.
In the right sidebar we placed additional contextual information: links to scanned page images of the printed volume, and filtered listings of the people and terms from the volume’s front matter. Here’s how it worked: When a visitor would hover their mouse over one of the names, they would see the description of the person or the definition of the term. The idea was to save visitors the step of leaving the document to manually scroll through the long lists of names and terms. Behind the scenes, we had tagged the individual instances of names and terms in the document’s TEI source, referencing these entities’ unique identifiers in the front matter lists. Using XQuery, we could present a filtered list of just those names and terms that appeared in the document. We pulled content into this page from a completely different location in the volume. It seemed to us like magic.
The features shown here were what we had in mind when designing the FRUS digital edition on history.state.gov: a document-centered view of the volume, with a clean presentation of the digitized text, formatted in a way that closely adhered to the printed source but was actually based on TEI’s semantic tagging rather than raw typographic style. We showed readers where the document was located within the volume’s table of contents and provided access to the scanned page images. We presented filtered views of contextual information from the volume’s front matter. And we exposed footnotes and cross-references in a way that allowed visitors to traverse the rich lattice of FRUS annotations within volumes and throughout the series.
After completing the conversion of the volumes from our earlier sites, we expanded the digitization program. This was due thanks to a partnership with the University of Wisconsin Digital Collections Center, which we proposed in February, 2009 and which Peter Gorman, the head of UWDCC, accepted. The University provided us with their scanned images, and we sent the images to our vendor to extract the text from the images and apply TEI tagging to the volumes.
To ensure the vendor’s submissions met our stringent requirements for text accuracy, we performed random samplings on their submissions and performed additional levels of reviews and enrichment. In return, we provided the University with our enriched master TEI files for each volume. The partnership with the University allowed us to save the cost of scanning these 375 volumes and build on the excellent work they had performed on the series’ first 99 years. We are truly grateful to the University for their contributions to this project.
In 2018, we completed the decade-long project to digitize the series’ 535 printed volumes. Today, history.state.gov houses over 310,000 documents from nearly 550 volumes, covering 1861–1989.
While completing the digitization project, we also added a new edition of all FRUS: ebooks. This was the Office’s contribution to the 2012 Digital Government Strategy, which asked agencies to make their online resources more mobile-friendly. Thanks to this initiative, visitors can download FRUS ebooks for free and read them on their Kindles or iPads, search within volumes, and highlight passages and take notes. We also provided an ebook catalog, using the OPDS, or Open Publication Distribution System, format. This allowed third party ereader apps like Shubooks to embed our ebook catalog into their apps, letting users browse the catalog and download FRUS ebooks. We didn’t even envision ebooks when we launched history.state.gov, but thanks to our adoption of a media-neutral format like TEI, supporting ebooks was simply a matter of writing some XQuery to transform any of our volumes into the ebook format. Today, free, open source tools like TEI Publisher make transforming TEI documents into customized PDF and ebook formats, not to mention searchable websites, much simpler than it was in 2009. But here are all the nearly 550 volumes available today for browsing on history.state.gov and for download as ebooks.
Besides FRUS, the Office’s website also houses numerous publications and databases on the history of U.S. foreign relations and the Department itself. These include: (1) Principal Officers and Chiefs of Mission (a database of ambassadors and senior leaders in the Department), (2) Travels of the President and Secretary of State (a database of their official travel abroad), (3) Visits of Foreign Leaders (a database of official visits of foreign leaders and heads of state to the U.S.), and (4) the Office’s newest publication: Administrative Timeline of the Department of State. Our Countries section houses Recognitions and Relations, a collection of essays on the history of U.S. relations with every country in the world, focusing on dates of recognition and key changes in bilateral relationships. Our About section the minutes from every meeting of the Advisory Committee on Historical Diplomatic Documentation (HAC) since 1996. Additional HAC minutes and related documents can be found in our FRUS history section, where the Office’s 2015 monograph on the history of the FRUS series can also be found. That monograph is truly an invaluable resource for understanding the series in its institutional and political context.
While the experience of browsing FRUS on history.state.gov has remained consistent even across the site redesign in 2016 that made the website mobile-friendly, the experience of searching FRUS has taken a leap forward, with the addition of date search in late 2018. So let me demonstrate this and review the options for searching FRUS.
There are two ways to start a search in FRUS. First, you can select the site-wide search field, enter some keywords, and hit submit. This method searches the entire website. If you want to just search within a specific FRUS volume, go to the volume’s landing page, and look in the right sidebar for the “search inside this volume” field. Both ways work well.
When you arrive at the search page, you’ll see a lot of options for refining your search, but at the very top of the page , you’ll see a link to our Search Tips page. Search Tips explains the search engine’s default behavior for keyword searches, and explains how to refine your searches. For example, you can perform a phrase search by wrapping quotation marks around your keywords to find documents that contain the entire phrase rather than just the individual keywords. You can use boolean operators like AND, OR, and NOT to perform more specific searches, such as finding documents that contain one word OR another word, or that contain one word but NOT another word. You can use wildcards to handle spelling variations or different forms of words. And you can use proximity operators to find words that occur within a certain number of words of each other. Anytime you need a refresher, you can return to the Search Tips page to review these options.
Besides these keyword options, the website supports two main ways to limit the search results. First, filtering by section. By default a search includes results from the entire website, so to limit results to a specific section, such as FRUS, you would select the Historical Documents section. Doing so unlocks a second filer, the Date Filter. The date filter allows you to search or filter results by an exact date or a date range. If you are looking for documents from a specific date , like December 7, 1941, enter it into the Start Date field using the formatting shown in the form and submit your search. A complete date returns documents from that date, but if you enter just 1941 or just December 1941, the search engine will return all documents from that entire year of 1941 or the entire month of December 1941. If you have a specific date range in mind, fill in the end date. Again, you can enter a specific end date, or a year or a month in a year, and the search engine will be inclusive. You can even specify a time in the from and to fields. It’s timezone sensitive, but it assumes the US Eastern time zone, so adjust your searches accordingly. You can perform date searches independent of keyword searches, so you can find all documents from a specific date or range of dates, without specifiying any search terms. Or you can combine keyword and date search to find documents across the series from a specific time period that contain specific keywords. With 310,000 documents covering over 128 years worth of international history, being able to perform not only keyword search but also date filtering—and sorting—is essential.
Enriching FRUS TEI documents with dates for every one of these documents was a major project, made particularly challenging by the presence of undated documents, documents dated using non-Gregorian calendars, and changing national and local time zones. But for users of the website, know that even undated or imprecisely dated documents are included in your searches. Our TEI captures the full range of possibilities for every document, depending on its individual circumstances.
Besides the date filter, the search engine also offers a volume filter, which shows the set of volumes that contain hits for the given search. For example, a search for the phrase “human rights” returns these results as the first of over 3,000 results, sorted chronologically. So how to narrow these down, other than by date? By examining the volumes filter to see the volumes that contained these 3,000 documents, you might be interested in focusing on the 1894 volume covering Affairs in Hawaii. Applying that filter, you would see this document from July 8, 1894. In this way, the search and filter options allow you to cast a wide or a narrow net, and use any combination of these options to explore a period, topic, or set of volumes across the entire FRUS series.
Looking ahead, we are continuing to digitize the FRUS Microfiche Supplements—the 13 publications released in the 1980s and 1990s—arguably the least accessible of all volumes in the series. So far 2 of these have been digitized and released online. We are scanning the microfiche and applying the same digitization workflow to these volumes as we had the earlier volumes. Given the poor quality of some of the original microfiche, this is painstaking work. So our plan is to release one each year or so, pending the availability of resources, until they’re complete. Hopefully within a decade or so.
Second, the Office is investigating providing visitors with even more powerful tools to narrow searches, by adding filters for the rich metadata found in each FRUS document’s heading and source note, such as sender and recipient, document type, provenance, original classification, and people, places, and organizations mentioned. The largest obstacle to expanding the selection of filters is the fact that the contemporary annotation practices for capturing this information varied across and within epochs in the series. Human readers can cope with such variation, but in its current form, this information is not sufficiently regularized to be “machine readable.” Just as the date filter and sorting options required a major effort to identify dates in documents and make them uniform and machine readable, every additional “facet” or “dimension” of metadata that the website could expose will require considerable investment of effort and resources. The Office continues to investigate possibilities for enhancements in these areas.
Third, the Office is exploring the use of digital annotation tools to aid in indexing documents. I am particularly excited about the latest release of TEI Publisher, a free, open source software project for making TEI collections browsable and searchable online. Version 7.1 added the ability to annotate TEI documents in the web browser. Here is a view of the TEI Publisher annotation interface, showing a document from the Reagan Soviet 85–86 volume released last year, with annotations of the document’s people, places, dates, and topics. Besides applying entity types to the text, we can also link individual instances to an entity database, as is shown in this description of the Chernobyl disaster in 1986. We’ve begun using this tool to help prepare back-of-book indexes for these volumes and look forward to being able to post the completed indexes.
Finally, I’d like to briefly introduce our open data repositories on GitHub. GitHub is a free website that many researchers and developers working on humanities, government data, and open-source software projects use to publish their data and source code. The Office posts all of its publications and datasets, as well as the full source code for history.state.gov, on GitHub, under the HistoryAtState organization. The FRUS source files can be found in the “frus” repository. Every new FRUS volume published to history.state.gov is simultaneously uploaded to GitHub, and every edit is time-stamped and logged with descriptive comments explaining each change. With an almost radical level of transparency, you can see the precise changes we make anytime we edit files. In this view, you can see that we fixed a typo reported via our mailbox and deleted a Z that found its way into our TEI file. By establishing a GitHub account, readers can fork our repositories (or make complete copies under their own accounts), report issues to us (such as typos or bugs), and even propose fixes, which we’ll evaluate. We’ve even posted directions you can use to download and install a complete, live copy of the history.state.gov website on your personal computers. This could offer a practical way to perform research when working without a stable internet connection. Besides these benefits, we believe that posting our raw data and source code can allow researchers, including you, to perform new kinds of analysis that the history.state.gov website does not facilitate. Thanks to our use of open standards and non-proprietary formats, researchers can load our documents into their databases for further analysis.
For historians and researchers in other humanities fields who may not have a background in computer science looking to gain such skills, luckily today, such training is readily available through university libraries, digital humanities centers, and summer institutes; through online tutorials such as the Programming Historian at programminghistorian.org (one word); through videos and course materials posted online, or books for humanists on learning programming languages like XQuery for mining TEI and XML sources. For scholars who have advanced text mining skills—or collaborate with those who do—the Office’s sources are natural targets for the application of natural language processing, text modeling, and other computational analysis techniques. And we’re excited that we’ve begun to see the fruits of such projects.
Thanks to invaluable university partnerships, the Office of the Historian built on earlier efforts and established a modern foundation for the FRUS digital edition on the basis of open standards and open-source software. The website houses a complete collection of the printed FRUS volumes in a full-text, searchable format and as downloadable ebooks on history.state.gov. This includes not just FRUS but also other vital publications for the study of U.S. foreign relations and the history of the Department of State. All raw data and source code is available on GitHub.
Besides working to digitize the least accessible volumes—the microfiche supplements—the Office continues to explore ways to improve our search tools and improve the utility and quality of all of the Office’s publications for students, scholars, and the general public.
Thank you for listening! We invite your feedback. Please email us at email@example.com.