Category Archives: Archive-It

Name: Book Talk: The Line: AI and the Future of Personhood
Start: 2024-11-19T10:00:00-08:00
End: 2024-11-19T11:00:00-08:00
Location: VIRTUAL

Columbus Neighborhood Newspapers Showcase the City’s Diverse Communities

Posted on November 5, 2024 by Anna Trammell

The following guest post from Aaron O’Donovan (aodonovan@columbuslibrary.org), Columbus Metropolitan Library Special Collections Manager, is part of a series written by members of the Internet Archive’s Community Webs program. Community Webs advances the capacity of community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices.

As a local history and genealogy department in a public library, our materials run the gamut from books from the 1700s about the creation of our country to yearbooks of local high schools that patrons like to peruse for nostalgia’s sake. In addition to our approximately 90,000 reference books, our archives room holds approximately 2,500 linear feet of photographic material, records, and manuscript material. We are constantly seeking new opportunities to expand access to our collections for our patrons, and when the opportunity arose to digitize materials as part of the Community Webs program, I knew what I wanted to digitize first: local neighborhood newspapers of Columbus.

We joined the Community Webs program in 2017 to archive important cultural and local government websites of Columbus, Ohio. The catalyst for the project was the belief that we had done a good job of telling the story of Columbus in its first 150 years, but we were missing telling the story of the evolution of the city of the more recent past, as well as failing to record the present. With the object of capturing more recent changes to our city, we focused on archiving our city government website, as well as archiving social service websites, especially those helping new immigrants in our city. Because of the Community Webs program, we were able to take a snapshot of the diverse populations that were making their homes in Columbus, and the medium of web archiving was the only way we were able to tell the stories of these new immigrant communities including the Somalian, Nepalese, Bhutanese, and Mexican populations. To further this focus on migration patterns into Columbus, we felt it was important to make our neighborhood newspapers that we had on microfilm accessible because the neighborhood newspapers featured stories and obituaries on immigrant populations who came to Columbus in the mid-19^th century and early 20^th century.

The newspapers had been preserved on microfilm for decades, but we were never able to digitize them due the time commitment involved for a project that size. During my time in the local history field in Columbus, it has become clear to me that our library patrons crave hyper-local history material that personally connects their stories to the place they live. While general local history topics about Columbus are popular, nothing is more popular in our library than content generated from Columbus neighborhoods. To finally get an opportunity to digitize neighborhood newspapers and make them accessible to our patrons was one that I could not pass up.

The most important newspaper for the library to digitize was the Columbus Call and Post, a historic Black newspaper that served Columbus from 1962-2007. For years patrons have asked us if the newspaper was digitized, but unfortunately all the library had was microfilm starting in 1972, which was very difficult to browse and ultimately did not serve our patron’s needs for accessibility. Because the Internet Archive performed optical character recognition (OCR) on the text of the newspapers, researchers can now use keyword searching to find an address, a business name, or search for personal names to find news stories that mention the people and places that they hold in their memory.

Digitizing the microfilm of the Call and Post also complemented another project we began several years ago when we partnered with the King Arts Complex to digitize the photograph archive of the iconic newspaper, which was donated to the organization in the mid-1990s. Many of the photographs in the collection have little to no information attached to them (information written on the back of the photographic prints, the name of the photographer, etc.). Digitization of the Call and Post provided additional information to match and apply to the photographs in the archive, adding an enhanced level of searchability and accessibility to this collection. The collections work together to preserve Black history in a way that was not possible before because much of the content from the Call and Post was unique and rare. Being able to bring this newspaper back into the public consciousness has been a thrilling experience for us.

*Congressman* *Adam Clayton Powell Jr. in Columbus, Columbus Call and Post Photograph Collection*

As the project continued to take shape, we felt it was important to represent Columbus neighborhoods geographically, which also enabled us to represent different economic and ethnically diverse communities throughout Columbus history. Our most accessed newspaper thus far has been the Hilltop Record, a title which focused on a local neighborhood with strong Appalachian ties and has a long history of covering the issues of working-class citizens on the westside of Columbus. Other digitized community newspapers include :

· The Eastern Spectator and Eastern Review offers perspectives from the city’s Jewish community.

· The Southside Booster and Southside Leader, shares the industrial and union history of Columbus.

· The Linden NE News showcases stories from north Columbus, an area that has experienced several demographic shifts throughout its 100 years of history.

The rarest newspapers digitized for this project were also some of the oldest newspapers that were preserved on microfilm in our collection. Among those titles are the Ohio Columbian (1853-1856), an anti-slavery newspaper that reported on Underground Railroad activities as they were happening in Ohio and surrounding states. It has potential for illuminating our understanding and knowledge of individuals that were involved in assisting enslaved people seeking freedom in the 1850s. Other newspapers with great research potential include early (and shorter) runs of Black newspapers that have not been digitized before this project including The Columbus Recorder (1927), The Columbus Voice (1929), which was edited by Florence W. Oakfield, and The Ohio Torch (1928-1930), the longest running newspaper for the Black community during the 1920s. We are excited to report that researchers are already using these resources to better understand Columbus history more objectively and completely.

With this support from the Internet Archive and the National Historical Publications and Records Commission, we have been able to help our local users find information that was not available elsewhere. Recently, we had a researcher request an obituary from June of 1964 when our two major newspapers were on strike. Thankfully, the South Side Spectator had been digitized and was available through the Internet Archive. Our librarian was able to locate the obituary that was only available in that newspaper. We also got this enthusiastic email from a regular library patron after we informed them that we had digitized the Hilltop Record and it was now keyword searchable on the Internet Archive: “OH MY GOSH! ARE YOU SERIOUS!?! THAT’S FANTASTIC! Have I told you lately how much I love you guys? You rock my world! Thank you so much for everything you do. I am so grateful for everyone in Local History & Genealogy.”

Moreover, the librarians are using the digitized newspapers in regular programming, furthering our promotion of these new digitized collections. Every month the library hosts a virtual Black Heritage Collection Spotlight on a notable person or topic from Black history in Columbus. The images and news articles from the digitized Call and Post are used frequently for the program, and we look forward to learning about more ways the digitized newspapers are used in local research to highlight and deepen our community’s connections to Columbus’ past.

Browse the Columbus Neighborhood Newspapers Collection on archive.org.

The Internet Archive and Community Webs are thankful for the support from the National Historical Publications & Records Commission for Collaborative Access to Diverse Public Library Local History Collections, which will digitize and provide access to a diverse range of local history archives that represent the experiences of immigrant, indigenous, and African American communities throughout the United States.

Internet Archive Services Update: 2024-10-17

Posted on October 18, 2024 by Brewster Kahle

[Washinton Post piece]

Last week, along with a DDOS attack and exposure of patron email addresses and encrypted passwords, the Internet Archive’s website javascript was defaced, leading us to bring the site down to access and improve our security.

The stored data of the Internet Archive is safe and we are working on resuming services safely. This new reality requires heightened attention to cyber security and we are responding. We apologize for the impact of these library services being unavailable.

The Wayback Machine, Archive-It, scanning, and national library crawls have resumed, as well as email, blog, helpdesk, and social media communications. Our team is working around the clock across time zones to bring other services back online. In coming days more services will resume, some starting in read-only mode as full restoration will take more time.

We’re taking a cautious, deliberate approach to rebuild and strengthen our defenses. Our priority is ensuring the Internet Archive comes online stronger and more secure.

As a library community, we are seeing other cyber attacks—for instance the British Library, Seattle Public Library, Toronto Public Library, and now Calgary Public Library. We hope these attacks are not indicative of a trend.

For the latest updates, please check this blog and our official social media accounts: X/Twitter, Bluesky and Mastodon.

Thank you for your patience and ongoing support.

Illuminating the Stories of Brooklynites Through Digitized Directories

Posted on September 24, 2024 by Anna Trammell

The following guest post from Dee Bowers (they/them), Archives Manager at the Brooklyn Public Library Center for Brooklyn History, is part of a series written by members of the Internet Archive’s Community Webs program. Community Webs advances the capacity of community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices.

Some say as many as one in seven Americans have family roots in Brooklyn, and I expect the newly digitized Brooklyn city directories now available through the Internet Archive will get heavy use from genealogists, historians, authors, journalists, students, and even artists to trace connections to the diverse and ever-changing borough.

Black and white two-page spread of directory title page including map of Brooklyn. — Title page, Spooner’s Brooklyn Directory 1822. Brooklyn Public Library, Center for Brooklyn History.

What is now the Center for Brooklyn History first joined the Internet Archive’s Community Webs program in 2017 as part of the original cohort. This program gave us the tools and training we needed to save over 2TB of web-based Brooklyn history content, including over 1,000 individual URLs. We also host our digitized high school newspapers and audiovisual material on the Internet Archive.

In addition to helping us preserve this web-based content, Community Webs has now also made it possible to increase access to our physical collections through digitization. As part of the Collaborative Access to Diverse Public Library Local History Collections project, made possible by a grant from the National Historical Publications and Records Commission, we were able to partner with the Internet Archive to digitize 236 microfiche sheets of Brooklyn city directories.

Microfiche sheet from the Brooklyn city directories, 1822. Brooklyn Public Library, Center for Brooklyn History.

These directories show the movement, growth, and changing nature of immigrant populations in Brooklyn in the early to mid 19th century and help document the immigrant experience by providing data on the residency and, in some cases, ethnicities of Brooklynites over time. We knew that expanding digital access would be extremely useful to the many researchers who use our online resources, especially since our number one research topic is genealogy. The project is also directly in line with our mission:

“Democratize access to Brooklyn’s history and be dedicated to expanding and diversifying representation of the history of the borough by unifying resources and expertise, and broadening reach and impact.“

By increasing the visibility of these collections through digitization and freely available public access, researchers and historians will have a richer, more accessible view into the diversity of American history. The history of Brooklyn is extraordinarily diverse but, like many archives, our collections don’t always tell the fullness of those stories. By expanding access to our city directories, we provide insight into earlier residents of Brooklyn and enable diverse communities to trace their Brooklyn roots to a greater degree.

Screenshot of digitized directory page in Internet Archive viewer. — Screenshot of the early Brooklyn directories in the Internet Archive.

Here’s an example of how the directories look in the Internet Archive. In this screenshot above, they include content outside of just directory listings. In this case, there’s a chronological listing of “memoranda” – notable moments in Brooklyn history – including “June 11, 1812 – News received in Brooklyn, of the declaration of war between the United States and Great Britain.”

One example of research that can be conducted with these directories is finding out more about early Black Brooklynites. Slavery was abolished in New York State in 1827, so the earliest days of post-enslavement Brooklyn are represented in the digitized directories.

Screenshot of digitized directory page in Internet Archive viewer with the purple highlighted surname “Hodges.” — Screenshot of 1857 directory on the Internet Archive with the highlighted surname “Hodges.”

By searching the text of the directories using keywords, I picked out an individual to learn more about, Rev. William J. Hodges, who lived on Broadway in Brooklyn in 1857. By cross-referencing with our digitized newspapers, I was able to find out more about him and his abolitionist activism in Brooklyn and beyond. It turns out he was not born in Brooklyn, nor did he reside there very long, but he did make an impact during his time there, as he founded the Colored Political Association of Kings County (which is the modern-day borough of Brooklyn).

Black and white newspaper clipping describing a “colored indignation meeting” in which William Hodges took part. — “Local Items,” June 5 1856, *Brooklyn Times Union*, page 2.

If not for the digitized city directories, I doubt I ever would have learned of Rev. Hodges and his time in Brooklyn. I hope that many more stories like these will emerge once researchers start digging into these directories.

Black and white image of buildings on a tree-lined street with information about T. Reeve, architect. — Directory advertisement for T. Reeve, Architect and Builder.

The directories also contain items like this – an advertisement showing this architect and builder’s office on Schermerhorn Street in Downtown Brooklyn. This part of Brooklyn looks very different now, and this insight into what it looked like pre-photography is invaluable, particularly for people conducting house, building, and neighborhood research.

The directories are linked on our Search Our Collections page. We also have a tutorial for using the digitized directories. Additionally, we have several related research guides which assist researchers in exploring various topics. These materials are in the public domain, and we hope they will be used for a broad spectrum of applications, from family research to demographic research to writing to artwork. We are grateful to Community Webs, the Internet Archive, and the National Historical Publications and Records Commission for making this material available and searchable online and allowing us to expand access across the borough, city, and beyond.

Browse the Brooklyn City Directories on archive.org.

The Internet Archive and Community Webs are thankful for the support from the National Historical Publications & Records Commission for Collaborative Access to Diverse Public Library Local History Collections, which will digitize and provide access to a diverse range of local history archives that represent the experiences of immigrant, indigenous, and African American communities throughout the United States.

Public Libraries Meet to Advance Community Archiving

Posted on August 28, 2024 by Anna Trammell

On August 13, Community Webs members from all over the US and Canada gathered in Chicago for the 2024 Community Webs National Symposium. Launched in 2017, Internet Archive’s Community Webs program empowers public libraries and other cultural heritage organizations to document their communities. Members of the program receive access to Internet Archive’s Archive-It web archiving service and Vault digital preservation service as well as training, technical support, and opportunities for professional development.

Members of Internet Archive’s Community Webs program at the Community Webs National Symposium

This event was made possible in part by support from the Mellon Foundation. Held at the Museum of Contemporary Art Chicago, this year’s symposium was an opportunity for members to learn together and connect with each other. The day was organized around two workshops designed to support the community archiving and digital preservation work happening across Community Webs member institutions.

The first workshop, “Collective Wisdom: Collaborative Learning to Support Your Community Archiving Projects,” was taught by Natalie Milbrodt, CUNY University Archivist and co-founder of the Queens Memory Project. Attendees spent time working in small groups to create definitions of “Community Archiving” and reflect on some of the shared challenges and opportunities they were experiencing when engaging in community-centered work. This workshop emphasized the value of the collective wisdom of Community Webs members and will inform future educational opportunities. The community archives focus of this workshop also supported the Community Webs Affiliates Program, which encourages relationship-building among public libraries and other community-focused cultural heritage and social service organizations to broaden access to archiving tools for documenting the lives of their patrons.

Attendees work together to discuss strategies for documenting their communities

In the second half of the day, Stacey Erdman and Jaime Schumacher of Digital POWRR led a “Walk the Workflow” workshop which demonstrated a step-by-step digital preservation process using a variety of free preservation tools including Internet Archive’s Vault digital preservation system.

A main goal for the symposium was to provide an opportunity for Community Webs members to connect and learn from each other. Throughout the day, attendees discussed projects, shared ideas, described lessons learned, and brainstormed possible avenues for future collaboration.

A digital preservation workshop provided attendees with strategies for supporting long term preservation of digital collections

The following day, Community Webs members toured the Chicago Public Library Special Collections. Johanna Russ, Unit Head for Special Collections, gave a presentation about the complex, multi-year project CPL undertook to preserve and provide access to the records of the Chicago Park District. Highlights from this collection were available for attendees to view in the reading room.

That afternoon, the Archive-It Partner Meeting provided opportunities for Community Webs members and other Archive-It users to spend some time with Internet Archive staff to discuss topics such as strategies for capturing social media and making web archives more useful.

Community Webs members view highlights from Chicago Public Library’s special collections

In-person events like this are instrumental in achieving a key goal of the program: offering opportunities for networking and professional development for Community Webs members. Internet Archive’s support for this national network of practitioners empowers their work on a local level to preserve and provide access to digital heritage sources reflecting the unique life and culture of their communities.

Interested in learning more about Community Webs? Explore Community Webs collections, read the latest program news, or apply to join!

Addressing Underrepresentation in Rural New England Community Archives: Documenting the History of Black Lives in Rural New England

Posted on March 20, 2024 by tpadilla

A family in Hatfield, ca. 1889. L.H. Kingsley, photographer.

Guest post by Dylan Gaffney, Information Services Associate for Local History & Special Collections, Forbes Library.

This post is part of a series written by members of the Community Webs program. Community Webs advances the capacity for community-focused memory organizations to build web and digital archives documenting local histories and underrepresented voices. For more information, visit communitywebs.archive-it.org.

Forbes Library has been a member of Community Webs since its inception in 2017. At that time, we were hopeful that the program would allow us to create an archive which more fully represented the community in which we live, and provide a more diverse history/record of our region and the people we serve. This project inspired archives staff to examine the many silences in our archives, and make plans for the ethical collection and preservation of materials that would help fill in these gaps in our historical record. At the same time, the library had begun to shift its focus toward collaboration with other local historical and community organizations.

In the years following the kickoff of the Community Webs Project, Forbes library co-hosted multiple series of exhibits, films, workshops, walking tours, and community reads on themes of mass incarceration, the Underground Railroad, and the history of slavery in our region. These events, and the passionate response of the community to them, inspired us to continue seeking out collaborations, large and small, and solidified our view that surfacing stories of people who had been underrepresented in the archives should be a core value in our work as an institution.

This work inspired Forbes Library, Historic Northampton, UMass Amherst, and the Pioneer Valley History Network to take lead roles in the 2021 Documenting Early Black Lives in the Connecticut River Valley project, which seeks to gather the fragmentary information about Black lives from the wide range of sources and archives in Western Massachusetts so that a whole might be perceived that is larger than the sum of those parts. The project, to date, has surfaced over 3500 records or references to people of color, enslaved and free, in Western Massachusetts from the 17th through 19th centuries. These histories are being made available through the project’s database and on the project website. We contributed an essay titled Searching for Black History in a Public Library Archive to the Project Handbook on the experiences and takeaways of doing this work from a public librarian’s perspective.

We know too little about Black lives in rural and small-town New England, and the places Black residents were able to carve out for themselves in these communities. With this project, we hoped to uncover names, details of their lives, and some small sense of how people of color survived in the Connecticut River Valley before and after the abolition of slavery in Massachusetts in 1783. At the kickoff event for the project, UMass Amherst professor Gretchen Holbrook Gerzina mentioned challenging the assumptions of others (sometimes called Gatekeepers) who “might be quick to discourage a researcher interested in Black History, reporting that they don’t have much…or not thinking about ways that records of white families might be useful to this research” Gerzina remarked that researchers, curators, and librarians should ”start from the perspective of presence.”

As the Documenting Black Lives project was undertaken with grant funding, and the time thus limited, we needed to develop an approach that would be productive right away. We identified several collections in the library’s Hampshire Room for Local History that we expected could be productive resources for identifying enslaved people in the area. The most promising of these was the Judd Manuscript Collection, a collection of 60+ volumes created by local newspaper editor and historian Sylvester Judd in the 1840s. The manuscript was originally purchased from the Judd estate by local historian James Trumbull and subsequently sold to the trustees of the library. It has been the property of the library since 1904, but use has been limited to a small group of academics and local historians who were aware of the contents and could physically visit during our few open archives hours. Those who knew of its tremendous historical value had discovered that it features content documenting Indigenous lives, enslaved people, and free Black people in New England and had used it to research Indigenous culture, the history of colonial settlement, enslavement, and the early abolitionist movement in the area.

Sylvester Judd’s Account of Sojourner Truth speaking and singing at his grandson Hall’s funeral. Sylvester Judd Notebook Vol. 3.

Public Historian and Author Marla Miller on the value of Judd:

“Sylvester Judd, in his transcriptions of historic documents as well as the conversations he described with local residents, preserves extraordinary details that survive nowhere else. Because of Judd’s meticulous, wide-ranging work, I was able to gain insight into the lives of laboring people that would never otherwise have been possible…Judd’s notes preserve genealogical information about enslaved people that is found nowhere else. The Judd manuscript is almost archaeological in nature, with shards of evidence that can be unearthed via careful scrutiny. As he records, for instance, who had the first piano in town, who laid the first carpet, the sound of the geese squawking through Sunday sermons, and a hundred other small details of daily life, a picture emerges that simply cannot be found in any other kind of more formal or systematic archival material. These pages, filled from edge to edge with his notes, cross references, sketches, and other materials, simply teem with the kinds of details that historians crave, but cannot hope to find—except in Northampton.”

If we start from an assumption of presence (of underrepresented people both in the community and in the archives), the primary obstacles to discovering and surfacing information in collections like ours, often revolve around issues of access, and methodologies for search and discovery. We had long dreamed of digitizing all 60+ bound volumes of the collection to make them available to a wider group of researchers and the public at large. When the Community Webs program began to explore funding for a digitization program dedicated to expanding the amount and diversity of locally-focused community archives available online to users, the Judd Manuscript Collection seemed a good fit.

Now that the volumes have been digitized, our mission is to spread the word about their value and availability, so that the materials within can inform and inspire new research and discovery. As an illustration of the value of the collection and its contents, it is useful to look at how the increased availability of this resource could lead to new discoveries in long hidden collections. As an example, I will examine how Judd enriched our understanding of one local Black family.

Judd entry for the Hull Family. Northampton Genealogies Volume 4, p. 380.

Judd devoted entire volumes to genealogies of local families, but the 600+ page volume on Northampton Genealogies contains, to our knowledge, only two Black families, both listed without last names. The work we had done in the Documenting Black Lives project enabled us to compile a list of 3500+ entries for Black residents of the region in the period between the 1650-1900. We recognized these names as those of Amos and Bathsheba Hull and their children. Bathsheba can be found elsewhere in our own archives as a member of the Church of Christ during Jonathan Edwards ministry between 1729-1750, in records recorded by Jonathan Edwards own hand.

Goods purchased by Amos Hull between 1754-1759, as listed in Judd’s transcription of Ebenezer hunt’s Account Book. Northampton Account Books, p. 68.

This entry transcribed from a local merchant’s account book shows items purchased by Amos Hull, the services he would perform in exchange for goods received, and the rate at which he was paid. It notes that in 1761, the same year their daughter Margaret was born, Amos Hull died. Afterward, his widow Bathsheba paid for his and her accounts by washing. Bathsheba surely would have a difficult time supporting multiple children without her husband, and documents subsequently found elsewhere in our archives and in other institutions prove this to be the case.

Asaph Hull’s 1762 Indenture Record. Northampton Manuscript Collection.

By 1762, a document found elsewhere in our archives records their son Asaph indentured to Seth Pomeroy, who is well known for his service in the French and Indian War and would go onto fight at the Battle of Bunker Hill and achieve the rank of Major General.

Judd entry describing the town seizing Bathsheba Hull’s land. Northampton Vol. 2, p. 300.

Bathsheba and her family come up again in several entries in Judd, including multiple mentions of the town seizing her land and displacing her from it in 1765. This cruel act forces Bathsheba and her young children from the town. Bathsheba and her son Agrippa would relocate to Stockbridge, Massachusetts. It is in Stockbridge where Agrippa Hull would enlist in May of 1777, and served for the remainder of the Revolutionary War in the Continental Army, including witnessing the surrender of British General John Burgoyne at Saratoga, New York, enduring the winter of 1777-78 at Valley Forge and was part of the battle at Monmouth Courthouse, New Jersey in June 1778. He then served as a personal assistant for the famed Polish general, revolutionary and engineer Taddeusz Kosciuszko and became a close friend of the General, during their years of War Service together. Agrippa’s story and friendship with Kosciuszko, along with Kosciuszko’s friendship with Thomas Jefferson is examined in Gary Nash and Graham Hodge’s 2012 book “Friends of Liberty: Thomas Jefferson, Tadeusz Kosciuszko, and Agrippa Hull”.

Portrait of Agrippa Hull, Courtesy of the Stockbridge Library, Museum & Archives.

Agrippa Hull went on to become the most prominent black landowner in Stockbridge MA and is buried along with his wife and children in Stockbridge Cemetery. His brother Amos Hull, Jr. also fought in the Continental Army, and surfaces in Belchertown MA records recorded as part of the Documenting Black Lives project.

This is just one brief example of the elaborate web of information that can be revealed when we prioritize the surfacing of stories that had previously been hidden in our collections, increase access through digitization, and collaborate to research and promote the information within.

As Marla Miller wrote in her letter of support for the NHPRC Archives Collaboratives grant:

“ We can hardly wait to learn— alongside the many other academic and avocational historians whose work will be enriched and transformed by these records—what else remains to be discovered. Once available in digital form, available for scouring by researchers with their own wide range of questions, these materials will certainly spark, inform, and enrich generations of new research, from student papers to dissertations to academic monographs. It is almost impossible to predict all the ways the volumes might reshape historiography, as well as conventional historical wisdom, because the contents at present are comparatively difficult to ferret out. But to be sure, these volumes have the potential to transform local and regional historical understanding, and once digitized, will certainly come to the attention of researchers nationwide.”

Click here to browse the Sylvester Judd Manuscript Collection on archive.org.

The Internet Archive and Community Webs are thankful for the support from the National Historical Publications & Records Commission for Collaborative Access to Diverse Public Library Local History Collections, which will digitize and provide access to a diverse range of local history archives that represent the experiences of immigrant, indigenous, and African American communities throughout the United States.

Community Webs Receives $750,000 Grant to Expand Community Archiving by Public Libraries

Posted on February 1, 2024 by jefferson

Started in 2017, our Community Webs program has over 175 public libraries and local cultural organizations working to build digital archives documenting the experiences of their communities, especially those patrons often underrepresented in traditional archives. Participating public libraries have created over 1,400 collections documenting local civic life totaling nearly 100 terabytes and tens of millions of individual documents, images, audio/video files, blogs, websites, social media, and more. You can browse many of these collections at the Community Webs website. Participants have also collaborated on digitization efforts to bring minority newspapers online, held public programming and outreach events, and formed local partnerships to help preservation efforts at other mission-aligned organizations. The program has conducted numerous workshops and national symposia to help public librarians gain expertise in digital preservation and cohort members have done dozens of presentations at professional conferences showcasing their work. In the past, Community Webs has received support from the Institute of Museum and Library Services, the Mellon Foundation, the Kahle Austin Foundation, and the National Historical Publications and Records Commission.

We are excited to announce that Community Webs has received $750,000 in funding from The Mellon Foundation to continue expanding the program. The award will allow additional public libraries to join the program and will enable new and existing members to continue their web archiving collection building using our Archive-It service. In addition, the funding will also provide members access to Internet Archive’s new Vault digital preservation service, enabling them to build and preserve collections of any type of digital materials. Lastly, leveraging members’ prior success in local partnerships, Community Webs will now include an “Affiliates” program so member public libraries can nominate local nonprofit partners that can also receive access to archiving services and resources. Funding will also support the continuation of the program’s professional development training in digital preservation and community archiving and its overall cohort and community building activities of workshops, events, and symposia.

We thank The Andrew W. Mellon Foundation for their generous support of Community Webs. We are excited to continue to expand the program and empower hundreds of public librarians to build archives that document the voices, lives, and events of their communities and to ensure this material is permanently available to patrons, students, scholars, and citizens.

Moving Getty.edu “404-ward” With Help From The Internet Archive API

Posted on November 2, 2023 by jefferson

This is a guest post from Teresa Soleau (Digital Preservation Manager), Anders Pollack (Software Engineer), and Neal Johnson (Senior IT Project Manager) from the J. Paul Getty Trust.

Project Background

Getty pursues its mission in Los Angeles and around the world through the work of its constituent programs—Getty Conservation Institute, Getty Foundation, J. Paul Getty Museum, and Getty Research Institute—serving the general interested public and a wide range of professional communities to promote a vital civil society through an understanding of the visual arts.

In 2019, Getty began a website redesign project, changing the technology stack and updating the way we interact with our communities online. The legacy website contained more than 19,000 web pages and we knew many were no longer useful or relevant and should be retired, possibly after being archived. This led us to leverage the content we’d captured using the Internet Archive’s Archive-It service.

We’d been crawling our site since 2017, but had treated the results more as a record of institutional change over time than as an archival resource to be consulted after deletion of a page. We needed to direct traffic to our Wayback Machine captures thus ensuring deleted pages remain accessible when a user requests a deprecated URL. We decided to dynamically display a link to the archived page from our site’s 404 error “Page not found” page.

Getty.edy 404 page — *Getty.edu 404 error “Page not found” message including the dynamically generated instructions and Internet Archive page link.*

The project to audit all existing pages required us to educate content owners across the institution about web archiving practices and purpose. We developed processes for completing human reviews of large amounts of captured content. This work is described in more detail in a 2021 Digital Preservation Coalition blog post that mentions the Web Archives Collecting Policy we developed.

In this blog post we’ll discuss the work required to use the Internet Archive’s data API to add the necessary link on our 404 pages pointing to the most recent Wayback Machine capture of a deleted page.

Technical Underpinnings

Implementation of our Wayback Machine integration was very straightforward from a technical point of view. The first example provided in the Wayback Machine APIs documentation page provided the technical guidance needed for our use case to display a link to the most recent capture of any page deleted from our website. With no requirements for authentication or management of keys or platform-specific software development kit (SDK) dependencies, our development process was simplified. We chose to incorporate the Wayback API using Nuxt.js, the web framework used to build the new Getty.edu site.

Since the Wayback Machine API is highly performant for simple queries, with a typical response delay in milliseconds, we are able to query the API before rendering the page using a Nuxt route middleware module. API error handling and a request timeout were added to ensure that edge cases such as API failures or network timeouts do not block rendering of the 404 response page.

The only Internet Archive API feature missing for our initial list of requirements was access to snapshot page thumbnails in the JSON data payload received from the API. Access to these images would allow us to enhance our 404 page with a visual cue of archived page content.

Results and Next Steps

Our ability to include a link to an archived version of a deleted web page on our 404 response page helped ease the tough decisions content stakeholders were obliged to make about what content to archive and then delete from the website. We could guarantee availability of content in perpetuity without incurring the long term cost of maintaining the information ourselves.

The API brings back the most recent Wayback Machine capture by default which is sometimes not created by us and hasn’t necessarily passed through our archive quality assurance process. We intend to develop our application further so that we privilege the display of Getty’s own page captures. This will ensure we’re delivering the highest quality capture to users.

Google Analytics has been configured to report on traffic to our 404 pages and will track clicks on links pointing to Internet Archive pages, providing useful feedback on what portion of archived page traffic is referred from our 404 error page.

To work around the challenge of providing navigational affordances to legacy content and ensure web page titles of old content remains accessible to search engines, we intend to provide an up-to-date index of all archived getty.edu pages.

As we continue to retire obsolete website pages and complete this monumental content archiving and retirement effort, we’re grateful for the Internet Archive API which supports our goal of making archived content accessible in perpetuity.

IMLS National Leadership Grant Supports Expansion of the ARCH Computational Research Platform

Posted on September 19, 2023 by jefferson

In June, we announced the official launch of Archives Research Compute Hub (ARCH) our platform for supporting computational research with digital collections. The Archiving & Data Services group at IA has long provided computational research services via collaborations, dataset services, product features, and other partnerships and software development. In 2020, in partnership with our close collaborators at the Archives Unleashed project, and with funding from the Mellon Foundation, we pursued cooperative technical and community work to make text and data mining services available to any institution building, or researcher using, archival web collections. This led to the release of ARCH, with more than 35 libraries and 60 researchers and curators participating in beta testing and early product pilots. Additional work supported expanding the community of scholars doing computational research using contemporary web collections by providing technical and research support to multi-institutional research teams.

We are pleased to announce that ARCH recently received funding from the Institute of Museum and Library Services (IMLS), via their National Leadership Grants program, supporting ARCH expansion. The project, “Expanding ARCH: Equitable Access to Text and Data Mining Services,” entails two broad areas of work. First, the project will create user-informed workflows and conduct software development that enables a diverse set of partner libraries, archives, and museums to add digital collections of any format (e.g., image collections, text collections) to ARCH for users to study via computational analysis. Working with these partners will help ensure that ARCH can support the needs of organizations of any size that aim to make their digital collections available in new ways. Second, the project will work with librarians and scholars to expand the number and types of data analysis jobs and resulting datasets and data visualizations that can be created using ARCH, including allowing users to build custom research collections that are aggregated from the digital collections of multiple institutions. Expanding the ability for scholars to create aggregated collections and run new data analysis jobs, potentially including artificial intelligence tools, will enable ARCH to significantly increase the type, diversity, scope, and scale of research it supports.

Collaborators on the Expanding ARCH project include a set of institutional partners that will be closely involved in guiding functional requirements, testing designs, and using the newly-built features intended to augment researcher support. Primary institutional partners include University of Denver, University of North Carolina at Chapel Hill, Williams College Museum of Art, and Indianapolis Museum of Art, with additional institutional partners joining in the project’s second year.

Thousands of libraries, archives, museums, and memory organizations work with Internet Archive to build and make openly accessible digitized and born-digital collections. Making these collections available to as many users in as many ways as possible is critical to providing access to knowledge. We are thankful to IMLS for providing the financial support that allows us to expand the ARCH platform to empower new and emerging types of access and research.

Build, Access, Analyze: Introducing ARCH (Archives Research Compute Hub)

Posted on June 26, 2023 by tpadilla

We are excited to announce the public availability of ARCH (Archives Research Compute Hub), a new research and education service that helps users easily build, access, and analyze digital collections computationally at scale. ARCH represents a combination of the Internet Archive’s experience supporting computational research for more than a decade by providing large-scale data to researchers and dataset-oriented service integrations like ARS (Archive-it Research Services) and a collaboration with the Archives Unleashed project of the University of Waterloo and York University. Development of ARCH was generously supported by the Mellon Foundation.

ARCH Dashboard

What does ARCH do?

ARCH helps users easily conduct and support computational research with digital collections at scale – e.g., text and data mining, data science, digital scholarship, machine learning, and more. Users can build custom research collections relevant to a wide range of subjects, generate and access research-ready datasets from collections, and analyze those datasets. In line with best practices in reproducibility, ARCH supports open publication and preservation of user-generated datasets. ARCH is currently optimized for working with tens of thousands of web archive collections, covering a broad range of subjects, events, and timeframes, and the platform is actively expanding to include digitized text and image collections. ARCH also works with various portions of the overall Wayback Machine global web archive totaling 50+ PB going back to 1996, representing an extensive archive of contemporary history and communication.

ARCH, In-Browser Visualization

Who is ARCH for?

ARCH is for any user that seeks an accessible approach to working with digital collections computationally at scale. Possible users include but are not limited to researchers exploring disciplinary questions, educators seeking to foster computational methods in the classroom, journalists tracking changes in web-based communication over time, to librarians and archivists seeking to support the development of computational literacies across disciplines. Recent research efforts making use of ARCH include but are not limited to analysis of COVID-19 crisis communications, health misinformation, Latin American women’s rights movements, and post-conflict societies during reconciliation.

ARCH, Generate Datasets

What are core ARCH features?

Build: Leverage ARCH capabilities to build custom research collections that are well scoped for specific research and education purposes.

Access: Generate more than a dozen different research-ready datasets (e.g., full text, images, pdfs, graph data, and more) from digital collections with the click of a button. Download generated datasets directly in-browser or via API.

Analyze: Easily work with research-ready datasets in interactive computational environments and applications like Jupyter Notebooks, Google CoLab, Gephi, and Voyant and produce in-browser visualizations.

Publish and Preserve: Openly publish datasets in line with best practices in reproducible research. All published datasets will be preserved in perpetuity.

Support: Make use of synchronous and asynchronous technical support, online trainings, and extensive help center documentation.

How can I learn more about ARCH?

To learn more about ARCH please reach out via the following form.

Collective Web-Based Art Preservation and Access at Scale

Posted on May 17, 2023 by tpadilla

Art historians, critics, curators, humanities scholars and many others rely on the records of artists, galleries, museums, and arts organizations to conduct historical research and to understand and contextualize contemporary artistic practice. Yet, much of the art-related materials that were once published in print form are now available primarily or solely on the web and are thus ephemeral by nature. In response to this challenge, more than 40 art libraries spent the last 3 years developing a collective approach to preservation of web-based art materials at scale.

Supported by the Institute of Museum and Library Services and the National Endowment for the Humanities, The Collaborative ART Archive (CARTA) community has successfully aligned effort across libraries large and small, from Manoa, Hawaii to Toronto, Ontario and back resulting in preservation of and access to 800 web-based art resources, organized into 8 collections (art criticism, art fairs and events, art galleries, art history and scholarship, artists websites, arts education, arts organizations, auction houses), totalling nearly 9 TBs of data with continued growth. All collections are preserved in perpetuity by the Internet Archive.

Today, CARTA is excited to launch the CARTA portal – providing unified access to CARTA collections.

🎨 CARTA portal 🎨

The CARTA portal includes web archive collections developed jointly by CARTA members, as well as preexisting art-related collections from CARTA institutions, and non-CARTA member collections. CARTA portal development builds on the Internet Archive’s experience creating the COVID-19 Web Archive and Community Webs portal.

CARTA collections are searchable by contributing organization, collection, site, and page text. Advanced search supports more granular exploration by host, results per host, file types, and beginning and end dates.

🔭 CARTA search 🔭

In addition to the CARTA portal, CARTA has worked to promote research use of collections through a series of day long computational research workshops – Working to Advance Library Support for Web Archive Research – backed by ARCH (Archives Research Compute Hub). A call for applications for the next workshop, held concurrent to the annual Society of American Archivists meeting, is now open.

Moving forward CARTA aims to grow and diversify its membership in order to increase collective ability to preserve web-based art materials. If your art library would like to join CARTA please express interest here..

Internet Archive Blogs

A blog from the team at archive.org