Last year, when The Accountability Project investigated excessive fees charged by Connecticut state marshals, we needed to devise a process to analyze thousands of pages of eviction records.
We trained a machine learning model to sift through the documents, extracting key pieces of information, such as the names of the marshals who worked on each case, and the amounts they charged for delivering paperwork to people facing foreclosure.
It was a successful, but laborious task that required some technical ingenuity to pull off.
One year later, artificial intelligence models trained on billions of pieces of human text make the same reporting task seem trivial. DocQuery, a project that builds on those large language models, can answer the questions we posed in that investigation with a few basic prompts (written in plain English). It’s as simple as asking: “What is the name of the state marshal who filed this document?”
The prospect that AI tools will transform human endeavors has piqued curiosity in newsrooms worldwide. And it took center stage at this year’s Computation + Journalism Symposium, which offered a glimpse into AI experiments underway at journalism organizations around the globe, and a window into how they’re grappling with difficult questions of ethics, authorship and transparency.
The event, held in June at the public research university ETH Zurich, brought together news editors, reporters, social scientists and others to share ideas for advancing the use of technology in journalism. It was held this year in conjunction with the European Data & Computational Journalism Conference.
The rapid rise of generative AI tools such as ChatGPT served as the backdrop. But as the conference talks showed, journalists are finding wide-ranging uses for emerging AI tools.
Tamedia, a Swiss media company based in Zurich, is testing software that allows journalists to generate headlines and other parts of a print news story using their own writing, or third-party content, such as press releases.
The company’s head of newsroom automation said journalists who tried it found about 60% to 70% of the material helpful. It wasn’t necessarily better than what human journalists produced, but it was faster, he said. They’re now considering other potential uses, such as generating newsletter copy that summarizes articles by staff.
Among its initiatives, Bayerischer Rundfunk, a public broadcaster in Germany, is harnessing its library of archival audio recordings to train a model to parse regional dialects.
Closer to home, The Associated Press is working with five local news rooms to pilot initiatives aimed at reducing some of the most tedious parts of gathering and reporting news.
One such tool would automate writing articles about public safety incidents for the Minnesota-based Brainerd Dispatch. Two others, being piloted in conjunction with San Antonio-based television station KSAT-TV, and Michigan Radio’s WUOM-FM, focus on transcribing recorded videos (such as public meetings) and summarizing the content.
Researchers from several universities recently released a project they hope will make this task more achievable: a collection of data summarizing city council meetings in six major American cities. They hope it can be used to more accurately train existing large language models to summarize the business of similar public bodies.
Just wrapped at #cplusJ #datajconf at @ETH in Zurich! Thanks to the @datajconf organizers for a super fascinating set of discussions about the future of journalism and technology. Sharing 5 takeaways... pic.twitter.com/EndCNcSFhL
— Jim Haddadin (@JimHaddadin) June 24, 2023
While all this hints at the prospect of journalists being replaced by computer intelligence systems, many news organizations seem to be coalescing around a different expectation – that human journalists will remain “in the loop,” meaning that headlines, news copy and other text created by generative AI systems will undergo human editing before being published.
If you’ve spent time experimenting with some of these tools, you likely understand why. It’s common for their output to include factual errors, and in some cases, outright fabrications – a problem dubbed by those in the industry as “hallucination.”
At the same time, it’s hard to ignore their strengths. Generative AI tools can write grammatically flawless prose, quickly summarize documents and conversations, and generate punchy headlines for news articles and social media.
And many news organizations are now testing those features. Nicholas Diakopoulos, associate professor in communication studies and computer science at Northwestern University, and colleague Hannes Cools recently described developing usage guidelines in the industry. Interestingly, many leave open the possibility of incorporating computer-generated text in journalistic work, albeit with a layer of manual human review.
Outside of journalism, policymakers in Connecticut are grappling with related challenges that spring from AI’s swift evolution.
Among them is the need to understand which tools are already in use today. A project undertaken previously by Yale Law School's Media Freedom & Information Access Clinic found the state’s existing open records laws are inadequate to keep the public sufficiently informed about how the government uses artificial intelligence tools.
The state has since taken steps to address that deficit. In a column published this year in the Connecticut Mirror, Connecticut’s chief information officer, Mark Raymond, called for the state to adopt a deliberate approach to designing and implementing AI systems, with a focus on transparency, accountability, fairness and safety.
Connecticut lawmakers also passed one of the country’s most comprehensive pieces of legislation aimed at cataloging and shaping the use of AI. The bill requires the state to produce an inventory of all artificial intelligence technologies currently in use, and review them regularly for potential harm, according to The AP.
Connecticut state Sen. James Maroney, a Milford Democrat, told The AP he also plans to work with lawmakers in other states this fall on model legislation that would establish “broad guardrails” for private industry in areas such as product liability for AI systems.
“It’s rapidly changing and there’s a rapid adoption of people using it,” he said. “So we need to get ahead of this. We’re actually already behind it, but we can’t really wait too much longer to put in some form of accountability.”