Category Archives: Television Archive

Vanishing Culture: Q&A with Philip Bump, The Washington Post

The following Q&A between writer Caralee Adams and journalist Philip Bump of The Washington Post is part of our Vanishing Culture series, highlighting the power and importance of preservation in our digital age. Read more essays online or download the full report now.

Philip Bump is a columnist for The Washington Post based in New York. He writes the weekly newsletter How To Read This Chart. He’s also the author of The Aftermath: The Last Days of the Baby Boom and the Future of Power in America.

Caralee Adams: What does it mean for an individual journalist to have their work preserved? Why is it important to have easy access to news stories from the past?

Philip Bump: One of the nice things about my career has been that I’ve worked for outlets that I feel confident are doing their own preservation, like The Washington Post. I’m not particularly worried about losing access to my writing. However, it’s less of a concern for me than it is for other outlets, unfortunately. It is unquestionably the case that I find the Internet Archive useful and use it regularly for a variety of things—both for its preservation of online content and collection of closed captioning for news programs.

Any recent examples of when you’ve found the Internet Archive particularly useful?

I use the search tool on closed captioning more than anything else. The other day I was trying to find an old copy of a webpage. I was writing about Donald Trump’s comments on Medal of Honor recipients. As it turns out, there is not an immediately accessible resource for when Medals of Honor were granted to members of the military. You can see aggregated—how many there are—but you can’t see who was given a medal and when they served. I actually used the Internet Archive to see how the metrics changed between the beginning of Trump’s presidency and by the end of it. I was able to see that there were medals awarded to about 11 people who served during the War on Terror, three who served in Vietnam, and one during World War II. Then, I was able to go back and double check against the Trump White House archive, which is done by the National Archives, and see the people to whom he had given this award. That’s a good example of being able to take those two snapshots in time and then compare them in order to see what the difference was to get this problem solved.

Why is it important for the public to have free public access to an archive of the news for television or print?

It’s the same reason that it’s important, in general, to have any sort of archive: it increases accountability and increases historical accuracy. The Internet Archive is essential at ensuring that we have an understanding of what was happening on the internet at a given point in time. That is not something that is constantly useful, but it is something that is occasionally extremely useful. I do a lot of work in politics and get to see what people are saying at certain points in time, which are important checks and accountability for elected officials.  The public can know what they were saying when they were running in the primary as compared with the general [election]. The Archive allows anyone to be able to get information from websites that are no longer active. If you’re looking for something and you have the old link to Gawker or the old link to a tweet, you can often [find] it archived.  The Internet Archive doesn’t capture everything—it couldn’t possibly do so. But it captures enough to generally answer the questions that need to get answered. There’s nowhere else that does that. There are other archiving sites, but none that do so as comprehensively, or none with an archive that goes back that far.

Download the full Vanishing Culture report.

Has any of your journalism vanished from the public? Do you have any examples where you’ve been looking for something and it’s been missing?

Yes. One of the challenges is that multimedia content has often, in the past, been overlooked. There are old news reports that I’ve been unable to find because they’re on video in the era before there was a lot of accessibility and transcripts. Therefore, yes, there are certainly things like that which come up with some regularity. Also, particularly in the era of 2005 to 2015, there were a lot of independent sites that had useful news reports—particularly since we’re talking about the cast of political characters that have been around in the public eye at that point in time. It’s often the case that it’s hard to track those things down. Or if you’re trying to track down the original source or verify a rumor, you might need to dip into the Archive. There are a lot of sites from that era of “bespoke” blogs that the Internet Archive often captures. 

How does limited access to historical data or previous coverage impact you as a journalist?

It is hard to say, because relatively speaking, I am advantaged by the fact that I live in this era.  If I were doing this in 1990, [I’d use] basically whatever was at the New York Public Library and on microfiche. It is far better than it used to be, but the amount of content being produced is also far larger. It is both a positive and a negative that it is far easier to do that sort of research here from my desk at home than it would possibly have been 30 years ago. In fact, I was working on a project where I relied heavily on a local newspaper in a small town in Pennsylvania that wasn’t available online. I literally had to hire someone in the town to go to the library, find [coverage from] the particular date and the local paper and to get the scans done. It cost me hundreds of dollars, but that was the only way to do it. You can see how getting these things done is problematic and challenging.

When Paramount deleted the MTV News Archive in June, there was a lot of dismay, but some say it was frivolous, disposable, and kind of meant to be thrown away. How do you feel about that?

My first writing gig online was at MTV News in college, so that actually had a personal resonance for me. I was at Ohio State in the early to mid 1990s, and I got this little internship with MTV News. I wrote one piece about this band called The Hairy Patt Band. It ended up on the MTV News website. I was very excited. I haven’t seen that in 30 years. It’s one of those things where I wondered what ever happened to that story or if it exists anywhere, in any form. So, that [news] actually had resonance. It’s a bummer. Is it as important to maintain the archives of MTV News as it is The Washington Post? I’m biased, but I would say, no. But it is still a loss of culture—and it is a unique loss of culture. This was a unique and novel form of information that was emergent in the 1990s and now is lost. In the moment, its very existence captured the culture in a way that is worth preserving.

How do you feel about the future of digital preservation of news, data, and information?

I’m more pessimistic than I used to be. I came of age with the internet. When it was new, I used to describe it as the emergence from a new dark age. We had all this information and there was no more going back. All this existed. Everything was online, and we had archives. Now, we see, in part because the scale has increased so quickly that economic considerations come into play, and all of a sudden… the internet isn’t just an endless archive anymore. There are very few places that are doing what libraries do to capture these things on microfiche or store books for the public’s benefit. There is so much of it and that becomes the problem.

Why is it important to pay attention to this issue and preserve journalism for future reporters?

It is obviously the case that we are creating information, culture, and benchmarks for society faster than we can figure out how we’re going to make sure they’re preserved. I think that’s probably always been the case, except that what’s different now is that we are more cognizant of the process of preservation and the challenges of preservation. We expect there to be this thing that exists forever. We don’t yet know how to balance the interest in having as few things be ephemeral as possible, versus the value in doing that… maybe it’s not even possible to preserve everything in the way that we would want to at scale. We have created a process by which it is possible to record and observe nearly everything, and now we’re realizing that that is potentially in conflict with our desire to also store and preserve all this information indefinitely.

Anything you’d like to add?

I think it’s worth noting that preservation is one of the few areas in which I think artificial intelligence bears some potential benefit. One of the things that I’ve long found frustrating is that The New York Times, The Washington Post, and other major news outlets, have enormous storehouses of information—not all of it textual. The New York Times must have, in its archives, photos of every square inch of New York City at some point in time over the course of the past 100 years. Artificial intelligence is a great tool for indexing and documenting. We now have tools that allow us to go deeper into our archives and extract more information from them, which I think is a positive development, and is something I’ve advocated for a long time publicly. Only with the advent of artificial intelligence does large-scale preservation become something that seems feasible. One can go through the National Archive and extract an enormous amount of information that is currently stored there in an accessible form, which saves someone from having to stumble upon a particular image. I think that is beneficial. I don’t think that necessarily solves the storage at scale issue, but it does address the fact that so much information is currently locked away and inaccessible, which is another facet of the challenge.  

About the author

Caralee Adams is a journalist based in Bethesda, Maryland. She is a graduate
of Iowa State University and received her master’s in political science at the
University of New Orleans. After working at newspapers and magazines, she
has been a freelancer covering education, science, tech and health for a
variety of publications for more than 30 years.

Internet Archive Services Update: 2024-10-17

[Washinton Post piece]

Last week, along with a DDOS attack and exposure of patron email addresses and encrypted passwords, the Internet Archive’s website javascript was defaced, leading us to bring the site down to access and improve our security. 

The stored data of the Internet Archive is safe and we are working on resuming services safely. This new reality requires heightened attention to cyber security and we are responding. We apologize for the impact of these library services being unavailable.

The Wayback Machine, Archive-It, scanning, and national library crawls have resumed, as well as email, blog, helpdesk, and social media communications.  Our team is working around the clock across time zones to bring other services back online. In coming days more services will resume, some starting in read-only mode as full restoration will take more time. 

We’re taking a cautious, deliberate approach to rebuild and strengthen our defenses. Our priority is ensuring the Internet Archive comes online stronger and more secure.

As a library community, we are seeing other cyber attacks—for instance the British Library, Seattle Public Library, Toronto Public Library, and now Calgary Public Library. We hope these attacks are not indicative of a trend.

For the latest updates, please check this blog and our official social media accounts: X/Twitter, Bluesky and Mastodon.

Thank you for your patience and ongoing support.

Coming this October: The Vanishing Culture Report

This October, we are publishing The Vanishing Culture Report, a new open access report examining the power and importance of preservation in our digital age. 

As more content is created digitally and provided to individuals and memory institutions through temporary licensing deals rather than ownership, materials such as sound recordings, books, television shows, and films are at constant risk of being removed from streaming platforms. This means they are vanishing from our culture without ever being archived or preserved by libraries.

But the threat of vanishing is not exclusive to digital content. As time marches on, analog materials on obsolete formats—VHS tapes, 78rpm recordings, floppy disks—are deteriorating and require urgent attention to ensure their survival. Without proper archiving, digitization, and access, the cultural artifacts stored in these formats are in danger of being lost forever.

By highlighting the importance of ownership and preservation in the digital age, The Vanishing Culture Report aims to inform individuals, institutions, and policymakers about the breadth and scale of cultural loss thus far, and inspire them to take proactive steps in ensuring that our cultural record remains accessible for future generations.

Share Your Story!

As part of the Vanishing Culture report, we’d like to hear from you. We invite you to share your stories about why preservation is important for the media you use on our site. Whether it’s a website crawl in the Wayback Machine, a rare book that shaped your perspective, a vintage film that captured your imagination, or a collection that you revisit often, we want to know why preserving these items is important to you. Share your story now!

Internet Archive Celebrates Research and Research Libraries at Annual Gathering

At this year’s annual celebration in San Francisco, the Internet Archive team showcased its innovative projects and rallied supporters around its mission of “Universal Access to All Knowledge.”

Brewster Kahle, Internet Archive’s founder and digital librarian, welcomes hundreds of guests to the annual celebration on October 12, 2023.

“People need libraries more than ever,” said Brewster Kahle, founder of the Internet Archive, at the October 12 event. “We have a set of forces that are making libraries harder and harder to happen—so we have to do something more about it.”

Efforts to ban books and defund libraries are worrisome trends, Kahle said, but there are hopeful signs and emerging champions.

Watch the full live stream of the celebration

Among the headliners of the program was Connie Chan, Supervisor of San Francisco’s District 1, who was honored with the 2023 Internet Archive Hero Award. In April, she authored and unanimously passed a resolution at the San Francisco Board of Supervisors, backing the Internet Archive and the digital rights of all libraries.

Chan spoke at the event about her experience as a first-generation, low-income immigrant who relied on books in Chinese and English at the public library in Chinatown.  

Watch Supervisor Chan’s acceptance speech

“Having free access to information was a critical part of my education—and I know I was not alone,” said Chan, who is a supporter of the Internet Archive’s role as a digital, online library. “The Internet Archive is a hidden gem…It is very critical to humanity, to freedom of information, diversity of information and access to truth…We aren’t just fighting for libraries, we are fighting for our humanity.”

Several users shared testimonials about how resources from the Internet Archive have enabled them to advance their research, fact-check politicians’ claims, and inspire their creative works. Content in the collection is helping improve machine translation of languages. It is preserving international television news coverage and Ukrainian memes on social media during the war with Russia.  

Quinn Dombrowski, of the Saving Ukrainian Cultural Heritage Online project, shows off Ukrainian memes preserved by the project.

Technology is changing things—some for the worse, but a lot for the better, said David McRaney, speaking via video to the audience in the auditorium at 300 Funston Ave. “And when [technology] changes things for the better, it’s going to expand the limited capabilities of human beings. It’s going to extend the reach of those capabilities, both in speed and scope,” he said. “It’s about a newfound freedom of mind, and time, and democratizing that freedom so everyone has access to it.”

Open Library developer Drini Cami explained how the Internet Archive is using artificial intelligence to improve access to its collections.

When a book is digitized, it used to be that photographs of pages had to be manually cropped by scanning operators. The Internet Archive recently trained a custom machine learning model to automatically suggest page boundaries—allowing staff to double the rate of process. Also, an open-source machine learning tool converts images into text, making it possible for books to be searchable, and for the collection to be available for bulk research, cross-referencing, text analysis, as well as read aloud to people with print disabilities.

Open Library developer Drini Cami.

“Since 2021, we’ve made 14 million books, documents, microfiche, records—you name it—discoverable and accessible in over 100 languages,” Cami said.

As AI technology advanced this year, Internet Archive  engineers piloted a metadata extractor, a tool that automatically pulls key data elements from digitized books. This extra information helps librarians match the digitized book to other cataloged records, beginning to resolve the backlog of books with limited metadata in the Archive’s collection. AI is also being leveraged to assist in writing descriptions of magazines and newspapers—reducing the time from 40 to 10 minutes per item.

“Because of AI, we’ve been able to create new tools to streamline the workflows of our librarians and the data staff, and make our materials easier to discover, and work with patrons and researchers, Cami said. “With new AI capabilities being announced and made available at a breakneck rate, new ideas of projects are constantly being added.”

Jamie Joyce & AI hackathon participants.

A recent Internet Archive hackathon explored the risks and opportunities of AI by using the technology itself to generate content, said Jamie Joyce, project lead with the organization’s Democracy’s Library project. One of the hackathon volunteers created an autonomous research agent to crawl the web and identify claims related to AI. With a prompt-based model, the machine was able to generate nearly 23,000 claims from 500 references. The information could be the basis for creating economic, environmental and other arguments about the use of AI technology. Joyce invited others to get involved in future hackathons as the Internet Archive continues to expand its AI potential.

Peter Wang, CEO and co-founder at Anaconda, said interesting kinds of people and communities have emerged around cultures of sharing. For example, those who participate in the DWeb community are often both humanists and technologists, he said, with an understanding about the importance of reducing barriers to information for the future of humanity. Wang said rather than a scarcity mindset, he embraces an abundant approach to knowledge sharing and applying community values to technology solutions.

Peter Wang, CEO and co-founder at Anaconda.

“With information, knowledge and open-source software, if I make a project, I share it with someone else, they’re more likely to find a bug,” he said. “They might improve the documentation a little bit. They might adapt it for a novel use case that I can then benefit from. Sharing increases value.”

The Internet Archive’s Joy Chesbrough, director of philanthropy, closed the program by expressing appreciation for those who have supported the digital library, especially in these precarious times.

“We are one community tied together by the internet, this connected web of knowledge sharing. We have a commitment to an inclusive and open internet, where there are many winners, and where ethical approaches to genuine AI research are supported,” she said. “The real solution lies in our deep human connection. It inspires the most amazing acts of generosity and humanity.”

***

If you value the Internet Archive and our mission to provide “Universal Access to All Knowledge,” please consider making a donation today.

A New Approach To Understanding War Through Television News: Introducing The TV News Visual Explorer & The Belarusian, Russian & Ukrainian TV News Archive

For more than 20 years, the Internet Archive’s Television News Archive has monitored television news, preserving more than 9.5 million broadcasts totaling more than 6.6 million hours from across the world, with a continuous archive spanning the past decade. Today just a small sliver of that archive is accessible to journalists and scholars due to the inaccessibility of video at this scale: fast forwarding through that much television news is simply beyond the ability of any human to make sense of. The small fraction of programs that contain closed captioning, speech recognition transcripts or OCR’d onscreen text can be keyword searched through the TV Explorer and TV AI Explorer, but for the majority of this global multi-decade archive, there has until now been no way for researchers to assess and understand the narratives of television news at scale, especially the visual landscape that distinguishes television from other forms of media and which is so central to understanding many of the world’s biggest stories from war to pandemics to the economy.

As the TV News Archive enters its third decade, it is increasingly exploring the ways in which it can preserve the domestic and international response to global events as it did with 9/11 two decades ago. As a first step towards this vision, over the last few months the Archive has preserved more than 46,000 broadcasts from domestic Belarusian, Russian and Ukrainian television news channels, including (in the order they were added to the Archive) Russia Today (part of the Archive since July 2010 but included in this collection starting January 1), Russian channels 1TV, NTV and Russia 1 (from March 26) and Russia 24 (from April 25), Ukrainian channel Espreso (from April 25) and Belarusian channel Belarus 24 (from May 16).

Why preserve television news coverage in a time of war? For journalists today it makes it possible to digest and report on how the war is being framed and narrated, with an eye towards how these narratives influence and shape popular support for the conflict and its potential future trajectory. For future generations of scholars, it makes it possible to look back at the contemporary information environment and prevailing public information, perspectives, and narratives.

While there are myriad options for the general public to watch these channels today in realtime, there is no research-oriented archival interface designed for journalists and scholars to understand their coverage at the scale of days to months, to scan for key visuals and events and to comment, discuss and illustrate how nations are portraying major stories.

To address this critical need, today we are tremendously excited to unveil the Television News Visual Explorer, a collaboration of the GDELT Project, the Internet Archive’s Television News Archive and the Media-Data Research Consortium to explore new approaches to enabling rapid exploration and understanding of the visual landscape of television news.

The Visual Explorer converts each broadcast into a grid of thumbnails, one every 4 seconds, displayed in a grid six frames wide and scrolling vertically through the entire program, making it possible to skim an hour-long broadcast in a matter of seconds. Clicking on any thumbnail plays a brief 30 second clip of the broadcast at that point, making it trivial to rapidly triage a broadcast for key moments. The underlying thumbnails can even be downloaded as a ZIP file to enable non-consumptive computational analysis, from OCR to augmented search.

Machines today can catalog the basic objects and activities they see in video and generate transcripts of their spoken and written words, but the ability to contextualize and understand the meaning of all that coverage remains a uniquely human capability. No person could watch the entirety of the Archive’s 6.6 million hours of broadcasts, yet even just the 46,000 broadcasts in this new collection would be difficult for a single researcher to watch or even fast forward through in their entirety. Television’s linear format means coverage has historically been consumed a single moment at a time like a flashlight in a darkened warehouse. In contrast, this new interface makes it possible to see an entire broadcast all at once in a single display, making television news “skimmable” for the first time.

The Visual Explorer and this new research collection of Belarusian, Russian and Ukrainian television news coverage represent early glimpses into a new initiative reimagining how memory institutions like the Archive can make their vast television news archives more accessible to scholars, journalists and informed citizens. Beneath the simple and intuitive interface lies an immensely complex and highly experimental set of workflows prototyping both an entirely new scholarly and journalistic interface to television news and entirely new approaches to rapidly archiving international television coverage of global events.

Over the coming weeks, additional channels from the TV News Archive will become available through the new Visual Explorer, as well as a variety of experiments with the new lenses that tools like automatic transcription and translation can offer in helping journalists and scholars make sense of such vast realtime archives.

Get Started With The Television News Visual Explorer!

About Kalev Leetaru

For more than 25 years, GDELT’s creator, Dr. Kalev H. Leetaru, has been studying the web and building systems to interact with and understand the way it is reshaping our global society. One of Foreign Policy Magazine’s Top 100 Global Thinkers of 2013, his work has been featured in the presses of over 100 nations and fundamentally changed how we think about information at scale and how the “big data” revolution is changing our ability to understand our global collective consciousness.

Library as Laboratory Recap: Opening Television News for Deep Analysis and New Forms of Interactive Search

Watching a single episode of the evening news can be informative. Tracking trends in broadcasts over time can be fascinating. 

The Internet Archive has preserved nearly 3 million hours of U.S. local and national TV news shows and made the material open to researchers for exploration and non-consumptive computational analysis. At a webinar April 13, TV News Archive experts shared how they’ve curated the massive collection and leveraged technology so scholars, journalists and the general public can make use of the vast repository.

Roger Macdonald, founder of the TV News Archive, and Kalev Leetaru, collaborating data scientist and GDELT Project founder, spoke at the session. Chris Freeland, director of Open Libraries, served as moderator and Internet Archive founder Brewster Kahle offered opening remarks.

Watch video

“Growing up in the television age, [television] is such an influential, important medium—persuasive, yet not something you can really quote,” Kahle said. “We wanted to make it so that you could quote, compare and contrast.” 

The Internet Archive built on the work of the Vanderbilt Television Archive, and the UCLA Library Broadcast NewsScape to give the public a broader “macro view,” said Kahle. The trends seen in at-scale computational analyses of news broadcasts can be used to understand the bigger picture of what is happening in the world and the lenses through which we see the world around us.

In 2012, with donations from individuals and philanthropies such as the Knight Foundation, the Archive started repurposing the closed captioning data stream required of all U.S. broadcasters into a search index. “This simple approach transformed the antiquated experience of searching for specific topics within video,” said Macdonald, who helped lead the effort. “The TV caption search enabled discovery at internet speed with the ability to simultaneously search millions of programs and have your results plotted over time, down to individual broadcasters and programs.”

“[Television] is such an influential, important medium—persuasive, yet not something you can really quote. We wanted to make it so that you could quote, compare and contrast.”

Brewster Kahle, Internet Archive

Scholars and journalists were quick to embrace this opportunity, but the team kept experimenting with deeper indexing. Techniques like audio fingerprinting, Optical Character Recognition (OCR) and Computer Vision made it possible to capture visual elements of the news and improve access, Macdonald said. 

Sub-collections of political leaders’ speeches and interviews have been created, including an extensive Donald Trump Archive. Some of the Archive’s most productive advances have come from collaborating with outsiders who have requested more access to the collection than is available through the public interface, Macdonald said. With appropriate restrictions to maintain respect for broadcasters and distribution platforms, the Archive has worked with select scientists and journalists as partners to use data in the collection for more complex analyses.

Treating television as data

Treating television news as data creates vast opportunities for computational analysis, said Leetaru. Researchers can track word frequency use in the news and how that has changed over time.  For instance, it’s possible to look at mentions of COVID-related words across selected news programs and see when it surged and leveled off with each wave before plummeting downward, as shown in the graph below.

The newly computed metadata can help provide context and assist with fact checking efforts to combat misinformation. It can allow researchers to map the geography of television news—how certain parts of the world are covered more than others, Leetaru said. Through the collections, researchers have explored  which presidential tweets challenging election integrity got the most exposure on the news.  OCR of every frame has been used to create models of how to identify names of every “Dr.” depicted on cable TV after the outbreak of COVID-19 and calculate air time devoted to the medical doctors commenting on one of the virus variants.  Reverse image lookup of images in TV news has been used to determine the source of photos and videos.  Visual entity search tools can even reveal the increasing prevalence of bookshelves as backdrops during home interviews in the pandemic, as well as appearances of books by specific authors or titles. Open datasets of computed TV news metadata are available that include all visual entity and OCR detections, 10-minute interval captioning ngrams and second by second inventories of each broadcast cataloging whether it was “News” programming, “Advertising” programming or “Uncaptioned” (in the case of television news this is almost exclusively advertising).

From television news to digitized books and periodicals, dozens of projects rely on the collections available at archive.org for computational and bibliographic research across a large digital corpus. Data scientists or anyone with questions about the TV News Archives, can contact info@archive.org.

Up Next

This webinar was the fourth a series of six sessions highlighting how researchers in the humanities use the Internet Archive. The next will be about Analyzing Biodiversity Literature at Scale on April 27. Register here.

TV News Record: Six takeaways from adding Hillary Clinton, Barack Obama & more to Face-o-Matic facial detection

A round up on what’s happening at the TV News Archive by Katie Dahl and Nancy Watzman.

This week we release new data generated by our Face-o-Matic tool, developed in collaboration with Matroid, adding to our list of public figures detected by facial-recognition on major cable news stations on the  TV News Archive.

In addition to President Donald Trump and the four congressional leaders, the expanded list now includes most former living presidents and recent major party presidential contenders, including Hillary Clinton and Barack Obama. (For the full list of public officials tracked, as well as methodical notes, see bottom of the post.)

Detecting faces on TV news and turning them into data provides a new quantitative path for journalists and researchers to explore how news is presented to the public and compare and contrast editorial choices that individual networks make. This new measure shows us the duration that politicians’ faces are actually shown on screen, whether it’s a clip of that person speaking, muted footage, or a still photo shown in the background to illustrate a point.

Adding to the Television Explorer, fueled by closed captions and our Third Eye chyron reading tool, a wealth of information is now available to analyze. (See the TV News Archive home page for examples of visualizations created by journalists and researchers using TV News Archive data.)

Here are six quick takeaways using Face-o-Matic for an analysis covering roughly six months, from November 2017 through May 2018, looking at four cable TV news networks: BBC News, CNN, Fox News, and MSNBC.

Download Face-o-Matic data to explore your own research questions.

1. Trump trumps every other political figure in face-time on cable TV news, all the time, every day, in every way, on every network and program.

As we’ve seen in past analyses with Face-o-Matic data, President Donald Trump is the major political star on cable TV news as compared to other top political figures examined. To put this in perspective: over a six month period stretching from November 2017 to May 2018, the president’s face appeared on TV cable news the equivalent of a full 13.5 days, counting every second of face-time. The next closest political figure we analyzed was House Speaker Paul Ryan, R., Wis., whose visage appeared the equivalent of one day.


  1. After Trump, GOP leaders in Congress are the most popular faces on TV cable news.

The two GOP leaders in Congress, Ryan and Senate Majority Leader Mitch McConnell, R., Ky. are the next most popular faces on TV news cable news networks. Between the two, Ryan ranks first on the TV news cable networks we examined: BBC News, CNN, Fox News, and MSNBC.  McConnell is the next most shown face on these networks, with the exception of BBC News.

Link to interactive version of above chart, where view can be changed to exclude specific politicians.

  1. Hillary Clinton and Barack Obama figure prominently on Fox News.

Fox News airs proportionately more images of failed presidential candidate 2016 Hillary Clinton and former president Barack Obama than other cable TV news networks. Fox News showed Clinton’s face 7.6 times more than CNN did, and Obama’s 3.6 times more. Fox News also showed Clinton 3.6 times more than MSNBC, and Obama, 2.3 times more.


  1. Hannity shows more Hillary Clinton face-time than any other top-rated Fox News show.

Not only does the Fox News “Hannity” program air more images of Hillary Clinton proportionately than any other top rated Fox News show, with just one exception, it is the Fox News show that shows her face more than current congressional leaders–Ryan, McConnell, Schumer or Pelosi. “Hannity” also shows more images of Obama than other top rated Fox News shows.

Link to interactive version of above chart, where view can be changed to exclude specific politicians.

  1. Ryan face-time spikes on news shows aired during morning hours.

All three U.S. cable news networks examined showed high rates of face-time for Ryan on shows airing during morning hours, ranging from 9 am to 11 am. This may be linked to his leadership role in Congress and that morning hours are prime for large announcements. For example, on Fox News’ “America’s Newsroom” and “Happening Now” show spikes of face-time for Ryan. On MSNBC, “Live with Hallie Jackson” and “Live with Velshi and Ruhle” show high rates of images for Ryan. And on CNN, “At This Hour with Kate Bolduan” shows high rates of Ryan as well. 

Links to interactive charts for top-rated news shows; view can be adjusted to exclude specific politicians. The source for top-rated shows is shows with 2017 top viewership by Nielsen.

Top-rated Fox News shows.

Top-rated MSNBC news shows.

Top-rated CNN shows.

  1. BBC News just isn’t that into us.

BBC News provides a window into how news is presented to a major foreign audience. Like U.S. cable news networks, BBC News features more face-time for Trump than other political figures examined. Ryan ranks a distant second. Overall, BBC News, however, shows much lower rates of images of U.S. political figures than U.S. cable news shows do.

Link to interactive version of above chart, where view can be changed to exclude specific politicians.

Methodological notes

The Face-o-Matic data set, available for download on the Internet Archive, uses facial recognition to track the faces of prominent public officials as they appear on major cable TV news networks: BBC News, CNN, Fox News, and MSNBC. The list of public officials tracked, along with the date that detection began, is here:

President & current congressional leaders

President Donald Trump, 7/13/17

Speaker Paul Ryan, R., Wis., 7/13/17

House Minority Leader Nancy Pelosi, D., Calif., 7/13/17

Senate Majority Leader Mitch McConnell, R., Ky., 7/13/17

Senate Minority Leader, Chuck Schumer, D., N.Y., 7/13/17

Former living presidents and recent major party presidential candidates*

George H.W. Bush, 10/5/17

George W. Bush, 11/1/17

Jimmy Carter, 10/21/17

Bill Clinton, 9/12/17

Hillary Clinton, 9/12/17

Barack Obama, 7/13/17

Mitt Romney, 10/4/17

*Note: Our data set does not include Sen. John McCain, R., Ariz., who ran for president opposite Obama in 2008. Sample testing of facial detection for the senator revealed a somewhat frequent rate of false positives  – instances where the identified face was not the senator’s, but rather one of a number of lookalikes. While we make no claim that all of the detections in the Face-o-matic data set are error free, we did test faces to minimize these. Please be sure to notify us if you find errors in the data.

TV News Record: Recognizing Trump’s voice on TV, NYT & Axios coverage, + Ryan fact-check

A round up on what’s happening at the TV News Archive by Katie Dahl and Nancy Watzman.

This week we explore cutting edge work by Joostware that moves us closer to solving the challenge of searching vast archives of video by speaker, note the use of TV News Archive data by The New York Times and Axios, and share a fact-checked interview by exiting House Speaker Paul Ryan about his legacy.

Joostware trained model to recognize Trump’s voice

What if you wanted to search the TV News Archive to find every instance where President Donald Trump is talking?

That’s the research question that the San Francisco-based firm Joostware concentrated on for its Who Said What project, which won a $50,000 prototype grant from the John S. and James L. Knight Foundation. Last week Joostware’s founder, Delip Rao, presented the project’s progress at a gathering in Austin, Texas. (The Internet Archive’s own Dan Schultz, in his Bad Idea Factory incarnation, also presented on Contextubot, which we recently profiled here.)

Audio and video today is viewed as an opaque object and it’s meant for linear consumption,” Rao said in his presentation. “But truly any audio and video especially in the context of news has a lot of structure to it. There are speakers of interest, and these speakers take turns, and then within each turn something was communicated. So our goal is to identify these speakers who are of interest and also the content that was spoken in that turn and indexing that.

Anyone can search the TV News Archive already via closed captions at the Internet Archive or via Television Explorer. Our experiments with facial detection and chyron extraction are another way to find and analyze news clips. But searching a video archive by “speaker id” – finding all the video where a person is actually talking – is a tough technical challenge. Our Trump Archive and congressional, executive branch, and administration archives are all manually curated video collections designed to demonstrate what it would be like to have automated speaker id search.

Joostware researchers have made progress toward this goal. They took material from the Trump Archive, and used it to train a model that recognizes the president’s voice, by using properties of the voice signal. They created a prototype search software that is more than 95% accurate on a human annotated dataset in returning video clips where Trump is actually speaking.

What’s next? With more resources, Joostware hopes to give this technology back to the Internet Archive to improve search within the TV News Archive. And Rao and others continue to work within the larger community of researchers working to crack the code of video to help fact-checkers and journalists hold power accountable.

No one is talking about tax law on cable TV news

Jim Tankersley and Karl Russell, reporters for The New York Times, used TV News Archive captions via GDELT’s Television Explorer to demonstrate how little coverage there is on cable TV news for the newly minted $2.5 trillion tax overhaul:

“Consider one of Mr. Trump’s preferred yardsticks: cable news coverage. Throughout the fall, as Republicans rushed their tax bill through Congress in two breakneck months, CNN, Fox News and MSNBC routinely devoted 10 percent of their daily coverage to tax issues, according to data from the Gdelt Project. Interest spiked as Mr. Trump signed the bill in late December, and then it fell precipitously.”

“Stormy Daniels wins TV war: overshadows taxes, health care”

For Axios, Caitlin Owens used TV New Archive data with GDELT’s Television Explorer to shed light on whether the TV networks are paying attention the priorities of the political parties: “Tax cuts and the Affordable Care Act are supposed to be big issues in the midterm elections, but both have faded from the attention of the cable news networks now that they’re no longer front and center in Congress.” Owens thinks it matters because “Democrats are campaigning hard on the GOP’s unpopular attempt to repeal and replace the ACA, and Republicans are pushing the financial benefits of their tax law.”


Fact-Check: Corporate tax revenues are rising (misleading)

House Speaker Paul Ryan, R., Wisc., announced last week he would not be seeking reelection, prompting television interviews that reflected on his legacy. In a “Meet the Press” interview Sunday on NBC, host Chuck Todd asked Ryan to respond to a statement by Sen. Bob Corker, R., Tenn.:

“’This Congress and this administration likely will go down as one of the most fiscally irresponsible administrations and Congresses that we ever had.’ And he’s referring to the fact that this tax bill spiked the deficit. It’s higher than even what was projected.” Ryan responded “That was going to happen. The baby boomers’ retiring was going to do that. These deficit trillion-dollar projections have been out there for a long, long time. Why? Because of mandatory spending, which we call entitlements. Discretionary spending under the CBO baseline is going up about $300 billion over the next 10 years. Tax revenues are still rising. Income tax revenues are still rising. Corporate income tax revenues. Corporate rate got dropped 40 percent, still rising.”

Eugene Kiely reported for FactCheck.org that “Ryan is right that $1 trillion deficit projections ‘have been out there for a long, long time…But corporate tax revenues are down for the first six months of the fiscal year, and they are projected to be less over the next 10 years than they otherwise would have been because of the law.”

Salvador Rizzo and Meg Kelly reported for The Washington Post’s Fact Checker, “The baby-boom generation is retiring, and Congress at best has taken only modest steps to rein in spending on old-age programs, largely because any serious effort is met with hostility and often-misleading attack ads…But the revenue side of the picture cannot be ignored.” “Congress has not been able to grapple with the spending — and  keeps taking steps to undermine the revenue flow as well.”

Follow us @tvnewsarchive, and subscribe to our biweekly newsletter here.

Audio / Video player updated – to jwplayer v8.2

We updated our audio/video (and TV) 3rd party JS-based player from v6.8 to v8.2 today.

This was updated with some code to have the same feature set as before, as well as new:

  • much nicer cosmetic/look updates
  • nice “rewind 10 seconds” button
  • controls are now in an updated control bar
  • (video) ‘Related Items’ now uses the same (better) recommendations from the bottom of an archive.org /details/ page
  • Airplay (Safari) and Chromecast basic casting controls in player
  • playback speed rate control now easier to use / set
  • playback keyboard control with SPACE and left , right and up, down keys
  • (video) Web VTT (captions) has much better user interface and display
  • flash is now only used to play audio/video if html5 doesnt work (flash does not do layout or controls now)

Here’s some before / after screenshots:

TV News Record: Caption analyses, plus fact-checks on wall & immigrants

A round up on what’s happening at the TV News Archive by Katie Dahl and Nancy Watzman.

This week we bring you analyses of cable TV news coverage and fact-checks of recent statements by President Donald Trump on immigration and his proposed wall on the border with Mexico.

Vox & Post turn TV news captions into media analysis

Vox’s Alvin Chang and The Washington Post’s Philip Bump continue to turn TV News Archive caption data, via Television Explorer, into analyses of current news. Chang analyzes cable TV network coverage of the March for Our Lives, an anti-gun violence demonstration, reporting that on Fox News, “There was a massive spike in mentions of the “Second Amendment” or “Constitution” during the peak of the march, and most of those mentions came from pundits and guests on the network.”

Source: Vox

Bump’s piece examines mentions of Hillary Clinton on cable TV news networks compared to those of Stormy Daniels, the adult entertainer involved in a legal dispute with the president. He finds that Fox News mentions Clinton the most, while CNN features more coverage of Daniels.

Source: The Washington Post

Fact-Check: We’ve started building the wall (Mostly False/Three Pinocchios)

During a press conference with the presidents of Estonia, Latvia, and Lithuania, President Donald Trump talked about his proposed border wall between the United States and Mexico: “We have to have strong borders. We need the wall. We’ve started building the wall, as you know, we have a $1.6 billion toward building the wall and fixing existing wall that’s falling down, it was never appropriate in the first place.”

The funding the president references comes from a spending bill recently passed by Congress. The omnibus “bill included $1.6 billion for some projects at the border, but none of that can be used toward the border wall promised during the presidential campaign.” For PolitiFact, Miriam Valverde rates the president’s claim “Mostly False.”

At The Washington Post’s Fact-Checker, Glenn Kessler gives the same claim “three Pinocchios”:

The White House failed miserably to achieve its objectives on funding for a border wall, receiving relative peanuts. It sought $25 billion, but ended up with just 5 percent of that. Moreover, the money came with strings attached so that it could only be used for fencing, not the “great” and “beautiful wall” promised by Trump.

In Orwellian fashion, fences have now become walls. Even then, the president has only secured enough money to pay for one-tenth of the new fence/wall he has sought.


Fact-Check: Caravans of people are coming to cross the U.S.-Mexico border (Half True)

Just after Fox News aired a segment on a caravan of people from Central America making its way through Mexico toward the United States, the president wrote on Twitter:

“Half True,” writes W. Gardner Shelby for PolitiFact: “President Trump tweeted that caravans of immigrants are coming to the Mexico-U.S. border… We confirmed that a caravan of 1,200 to 1,500 people from Central America–not caravans–was in southern Mexico, about 900 miles from the Rio Grande, when Trump tweeted. Also, accounts vary on whether all participants are bound to enter the U.S. An organizer estimated that most of the people intend to remain in Mexico.”

Reporting for FactCheck.org, Robert FarleyEugene Kiely and Lori Robertson write “Trump’s messages included muddled and inaccurate claims.” They summarize with the following bullet points:

  • Contrary to Trump’s assertion, there is no “liberal (Democrat)” law requiring the “Catch & Release” of people caught illegally crossing the border. There are court cases and laws that require some unaccompanied children, families and asylum-seekers to be released in the U.S., pending an immigration hearing. But it’s a stretch to blame those entirely on Democrats.

  • Trump said “big flows of people” are illegally entering the U.S. from Mexico “to take advantage of DACA.” In fact, current border-crossers are not eligible for the Deferred Action for Childhood Arrivals program.

  • Trump said that “caravans” of people were coming to the Southwest border and that Mexico “must stop them.” The caravan, a yearly demonstration, was organized by the activist group Pueblo Sin Fronteras, which says the people walking in the caravan have “a lot of intentions,” with some wanting to stay in Mexico. The caravan is now in southern Mexico, more than 800 miles from the U.S. border.

    Follow us @tvnewsarchive, and subscribe to our biweekly newsletter here.