Machine intelligence empowers journalism by giving journalists the opportunity to see what they missed, panelists say

For the first time in its 21-year-history, the International Symposium of Online Journalism (ISOJ) was held online only in 2020. To watch this panel, click here. To watch other panels, click here.

Computers will do as much as they are told to do, and it takes a team of journalists to do so, said the panelists of the session “Online investigations: How journalists are using AI (artificial intelligence) and OSINT (open source intelligence)” part of the 2020 International Symposium on Online Journalism.

ISOJ2020: Online investigations: How journalists are using AI and OSINT
ISOJ2020: Online investigations: How journalists are using AI and OSINT
John Keefe

Although journalists might be hesitant to start using machine learning because they don’t understand what it is, John Keefe, adjunct professor, Newmark Graduate School of Journalism at CUNY, said it was like a group of journalism interns who don’t know anything else other than to look for a certain pattern and group items together.

“They are very good, they are very devoted and committed to helping, but they don’t know much about the world,” Keefe said. “That’s what sort of machine learning is like in journalism when you are looking to using it in the newsrooms for investigation.”

As an example of this, he used the work Quartz did with KPCC, a public radio station in southern California that had committed to answering every COVID-19 question their listeners had, soon getting over 1,000 questions. Keefe said they created a model that told the computer to analyze the language being used and group them into a dozen different buckets.

“The station was able to use this, give a bucket to a person… [and] the labor of dividing that was done by computer,” he said. This system allowed the station to also cater its programing based on the questions it was receiving, now amounting to over 3,900.

Emilia Diaz-Struck
Emilia Díaz-Struck

Emilia Díaz-Struck, research editor and Latin America coordinator at the International Consortium of Investigative Journalists (ICIJ), said reporters’ expertise is meaningful when telling machines what to do. “It is not magic and it is not a response to all our journalistic problems,” she said. “We need to decide when we do embark ourselves on a machine learning adventure.”

There is a key human and computer component, she said. In their work for the investigative series Implant Files, the ICIJ team taught the computer to run through the data it was given and to identify patients deaths in which the events reported to the authorities were misclassified. However, they started to come across false positives, and journalists needed to refine the process. A full team was involved from the beginning and fact checked the machine’s findings afterward.

When asked during the Q&A section by moderator María Teresa Ronderos, founder of Latin American Center for Investigative Journalism (CLIP), what they would say to those in the journalism industry who had fears that machine intelligence would leave them without a job, Díaz-Struck said that the use of these tools requires the input and knowledge of all those in the newsroom. Machine intelligence empowers journalism, she added, by giving journalists the opportunity to see what they missed.

“When you talk about investigative journalism you need to verify your findings, you need to verify what the computer shows you, what the results are,” Díaz-Struck said.“If the computer is wrong, you need to give the input to the computer, so you still need the humans there… you need humans, you need machines and you need time.”

Charlotte Godart
Charlotte Godart

As for the use of OSINT in newsrooms, Charlotte Godart, open source investigator & trainer at Bellingcat, and her team started mapping the increase in cases of police violence against journalists shortly after CNN reporter Omar Jimenez was arrested while while covering a Black Lives Matter protest in Minneapolis in light of George Floyd’s killing by police. Bellingcat has released two visualizations, first as plotted data onto an interactive map and then as a collaboration with Forensic Architecture allowing for the navigation of the events in time and by category of event within the map.

They were able to get the open source material, about 120 videos, due to a Twitter thread started by her colleague Nick Waters asking users to post any incidents of news crews being targeted by law enforcement. She explained the investigative team’s verification process in three steps; find the original source and verify when published; find the geolocation of the incident and identify features of the image so it can be plotted on a map; and finally analyze the video’s sequence of events by corroborating with other videos to get a full understanding of what went underway.

Their next project is to plot incidents of police violence against all civilians, not just journalists. Since they don’t want police to use Bellingcat’s reporting to identify protestors and subsequently arrest them, Godart said that a solution is to blur the entire image and “potentially just leave in some sound in order to still leave in some context and the gravity of the situation.”

Haley Willis
Haley Willis

Like Bellingcat, Haley Willis, visual investigations reporter at The New York Times, and her team also do visual investigative pieces in which they combine traditional journalism techniques (interviews with experts and witnesses) and open-source material, such as digital forensics like verified social media content, police scanner audio and publicly available government data.

Such was the case with their video deconstructing George Floyd’s killing using footage from different angles, open-source information, and all the police scanner audio and communications between EMS, fire and police from Minneapolis that day. This visual information, she said, contradicted the information initially in the complaint and it showed former officer Derek Chauvin had his knee on Floyd’s neck for eight minutes and 15 seconds, not seven minutes and 49 seconds.

Using open-source material leads to greater transparency with audiences and increased levels of government accountability.

“We do investigations hoping that the official response will change or something will come out of it. When you have such specific digital evidence and that can be replicated by the government or the people, it makes it harder for them to say that is not true,” she said.

You can watch the full panel here.