The narratives and routines of journalistic productions based on open data

By María Florencia Haddad and Elena Brizuela

This research, based on a detailed comparative analysis of 20 online articles published in Argentina, aims to characterize the types of narratives that result from databases. Because professional routines change, mostly according to work groups, the study also included semi-structured interviews with two editorial leaders who have extensive backgrounds with data. Five narrative types based on data were identified: personalization, main trends, outliers, synchronic comparison, and diachronic comparison. Each one offers different possibilities for journalists; editorial choices rest on the type of information involved and on the questions posed by the journalistic team.

The idea of news as a factual account accessed by reading or viewing is now accompanied by news as a narrative based on the exploration of big databases. The increased complexity of journalistic work calls for a team approach and professional routines that differ from those of the past: Data as well as facts are now a starting point for journalists eager to discover unbiased truth.

In Latin America, the development of data journalism is taking place within an atmosphere that foregrounds the need for quality journalism to provide accurate and verified information. An exploration of data journalism enables analysis of its use as a research method and a tool for validating journalistic sources (Brizuela, 2015a).

It is challenging to unequivocally identify where to begin in defining data journalism. But one possibility is to examine its methodology: Data journalism results from finding databases, questioning them, and finally allowing these databases to be accessible to readers through a narrative that often includes visualizations (Crucianelli, 2013).

Three factors establish data journalism as a trend that is here to stay. These are information digitalization; perfection of free, cloud-based tools that allow databases to be refined and visualizations to be created without the need for expert programmers; and increased awareness of the possibilities in public and private databases (Bruzuela, 2014, 2015b).

An increased emphasis on the value of credible information in the contemporary media environment also creates a new professional opportunity to foster trust, respect, and citizen engagement with journalists and media outlets. The result can be stronger democracy and increased social cohesion (Klitgaard, 1994). Open data development is of particular interest to journalists in this context, largely because of a desire to made better information available to citizens and thus to facilitate better civic decisions based on that high-quality information. (Ramírez Alujas, 2012)

However, this trend challenges journalists’ previous habits of thought, as it requires a reconsideration of how their profession is linked to other disciplines, as well as the more practical concern of how to build narratives so that information based on data can be made accessible to audiences. Access to a vast and prolific flow of data calls for the professional mediation of a journalist to build stories, for instance by comparing data to uncover problems and discordances, analyzing data for their singular value and their relationship with other data, and generally bringing data closer to citizens in a way that affordably adds value to their daily lives. The pathway begins with the extraction of huge databases and ends with the creation of visuals or other narratives.

This study explores the way journalists think about narratives rooted in data, ranging from storytelling goals to the skills needed to create those stories. To do so, it draws on literature around narrative and the uses of data.

Narrative: Multiple Voices and Fluid Constructions

Narrative communication is based on discourse, built on a timeline, that acquires function and sense through its use and social practice. Narratives are generated in an effort to differentiate the diversity of potential uses and practices, which might be related to subjects including “institutions, social, historical, and cultural contexts” (Rodríguez Ruiz, 2009, p. 15). According to Bajtín (1982), we must think about narratives from heterogeneous perspectives involving intertextuality and interdiscursivity, rather than as structures imposed by authoritarian speakers. The idea that no speech can be considered finished has long existed, and digital hyper-textuality resides within this understanding. Moreover, every discourse has two participants, each with a distinctive voice: that of the person who produces it, and of the person who receives it (Heller-Roazen, 2008).

Bajtín (1982), who studied the novel as a narrative emblematic of modern life, emphasized that there are no words uttered for the first time. Instead, words are always inserted in a discourse communication chain. Language is chrono-topic: It relates to a specific time and place. Thus, narratives built in digital contexts are constructs located in a defined space and time, and both authors and users can access them to create new understandings. Bajtín thus enables us to connect senses and platforms, including hypertextual, interactive, and other digital platforms, and to engage a variety of stakeholders in their production and reception:

Bajtín refuses to consider time and space as pure forms of man’s consciousness. He understands that these are categories—in the sense that without them, there may not be knowledge of the world—but that constitute objective entities that exist. (De Olmos, 2006, p. 69).

The idea of a chronotope projects time as a space coordinate, with residue of the past leaking into our present expression and influencing our perceptions of the future. For journalists, this suggests the need for conscious reflection on the production of a story, its organization, its form, and especially its effect. The narrative exercise is undertaken by both the narrator, who tells the story and draws attention to it, and the reader, who receives and pays attention (Rodríguez Ruiz, 2009).

In the contemporary communications environment, a paradigm shift has made it possible to describe a new way to narrate. New paradigmatic configurations are based on “indetermination, self-organization, complexity, essenceless reality, a world as representation, impossibility to separate the subject from the object, disciplinary borders vanishing, a reality that is built, as opposed to the idea of a given reality” (Rodríguez Ruiz, 2009, p. 38). In other words, modernity has produced hybridization, which takes a variety of forms. For instance, oral works have emerged as an alternative to written works, resulting in transposition phenomena that occur when a textual genre or product changes its form.

Our contemporary culture is highly unstable, with disorder, irregularity, and asymmetry now the norm. Among the changes is the fact that the representations of privileged writers are no longer seen as either the best or the only way of viewing the world. Readers, users, and other ‘receptors’ of information also produce contemporary narratives (Rodríguez Ruiz, 2009).

An aesthetic understanding of this new postmodern way of writing suggests a world of intertextuality, where borders no longer exist between reality and fiction. In addition, a belief in narrative authority has vanished, with authors now wanting their works to be problematized and fractured instead of being simply received as a hermetic and homogenous whole. Again, the process of generating meaning is no longer enclosed or determined solely by the author; both understanding and genre are subject to mixing and hybridization (García Canclini, 1989).

In a digital context, narratives also are continually created and recreated. The use of information technology, freed from the limitations of the written word, allows the creation of new discourse structures that can integrate non-verbal expression. Digital media therefore are platforms capable of a highly efficient artistic interrelation (Rodríguez Ruiz, 2009).

Rodríguez Ruiz (2009) suggests that digital aesthetics are built on six conditions: discontinuity, or an absence of predefined routes; interactivity, which facilitates and foregrounds readers’ participation; dynamism and vitality, both in the making and in the interpretation of content; ethereal worlds, in which there is no clear matter but rather unlimited potential; ephemeral worlds, with language continually updated; and virtual community development, which involves a construction of new global awareness.

These new digital narratives thus reconfigure the roles of the “writer” and the “reader.” The former must get used to “data manipulation, multimedia application and graphic design handling, and doing collaborative work with other professionals such as the programmer, the drawer, the designer, the audiovisual technician, etc.” (Rodríguez Ruiz, 2009, p. 25). And users must develop iconicity, editability, and navigability in order to strengthen the hypertext elements, edit them, and rebuild them.

Interactivity promotes activities based on a collective construction of both artifacts and senses. “It is, it must be, a narrative that dissolves its forms and traditional functions, virtualizes them, reduces them to primary elements, to particles that must be later recomposed through connectivity operations” (Rodríguez Ruiz, 2009, p. 25).

Applied journalistically in relation to narratives based on open data, such concepts suggest a need to imagine stories of heightened social or political utility, enabling citizens to use them to meaningfully to participate in society.

From Digital Informative Narratives to Data Journalism Narratives

A quarter century ago, before the rise of the Internet, Philip Meyer (1993) defined a virtuous circle, with quality content increasing both the credibility and the social influence of the media—in turn leading to an increase in circulation and therefore in profitability. Today, prestigious news organizations around the world are pursuing this strategy, including The Guardian in Britain, The New York Times in the United States and La Nación in Argentina. Each is among a growing number of news oultets forming interdisciplinary data journalism teams in their editorial offices.

This study was guided by an understanding of four key factors in training data journalism teams, as outlined by Zanchelli and Crucianelli (2012):

  1. Physical proximity. The data journalists should be located physically near other editorial decision-makers. Zanchelli and Crucianelli (2012, p. 3) cite the editor at The Guardian, who recommends locating the data journalism team “near the editorial table “ because “it is easier to recommend stories and to be part of the process when they are closer.”
  2. Collaboration. Journalists and developers, who each have specialized skill sets, should be encouraged to work together in order to generate data-based stories. Productivity results from combining the two groups’ different views of reality (Zanchelli & Crucianelli, 2012). Developers have the ability to understand how to extract numbers, see patterns and trends, and interpret their meaning. Journalists know how to ask the important and meaningful questions; to extract insights from a story; and to place it in an appropriate political, social, and economic context. They also may be adept at spotting and analyzing trends.
  3. Shared skills. Journalists and developers who bridge the skills gap should be recruited. Each should try to understand, and if possible acquire, some of the skills of the other (Zanchelli & Crucianelli, 2012).
  4. Meaningful stories. The end result of the collaboration should be stories that show the meaning of data and why they should matter to the readership. Data-based news about topics that affect readers’ lives are not only socially valuable but also have an impact on Web traffic, highlighting the need for greater investment in data journalism teams (Zanchelli & Crucianelli, 2012).

In addition to focusing on aspects of its production within the newsroom, data journalism narratives can be understood in terms of three key features: hyper-textuality, multimedia, and interactivity.

Hyper-textuality is characterized by the links among disparate pieces of content, offering navigation alternatives through nodes of non-sequential writing with links that allows the user to choose (Díaz Noci, 2003, 2016a). Hypertext is complemented by multimedia when elements such as images, audio, video, or computer graphics are introduced into the narrative, resulting in a multideminsional form that can be referred to as “hypermedia.” Finally, interactivity enables users to participate. In the context of data journalism, this participation can be more or less inclusive, with options ranging from fully inclusive open code journalism to more controlled structures that allow users to participate, “but not to the point they can interfere in the news item construction” (Díaz Noci, 2003, p. 31).

Data journalism therefore facilities new news narratives, particularly including interactive graphics produced from structured databases. These narratives can be analyzed in various ways, including through a focus on interactivity and its implications for message construction, and through the way in which human stories are enabled to emerge from the numbers.

Data stories differ from traditional narratives in important ways. For instance, newspaper stories typically represent a set of events in a controlled progression; visual data also may be organized in a linear sequence, but it also may be interactive, inviting user personalization, verification, queries, and pursuit of alternative explanations (Segel & Heer, 2015).

Such interaction possibilities suggest a dichotomy between “author-guided” and “reader-guided” visualizations. There arguably is a need to strike a balance between collective participation in narrative construction and the author’s communicative intention, which runs the risk of becoming distorted.

Data visualization expert Jonathan Harris believes there is no need to choose one approach over another. He points out that human stories are, and will continue to be, powerful, which is why people should avoid changing “their sense of empathy for a fetish fascination with data, networks, patterns, and total information.” Data, he says, are only a part of the story. “Human material is the main material, and data must come to enrich it” (cited in Segel & Heer, 2015, p. 2). This perspective both emphasizes the value of journalistic sensibilities in working with data and provides a reminder that data-based stories can have a significant impact on citizens’ lives. Individuals’ life stories need to be rescued from the numbers, with the reach and interactivity of digital formats enabling global results to become local and personal.

Drawing on these ideas about digital narrative structure and data journalism, this study seeks to answer the following questions:

RQ1: How can we classify different journalistic narratives based on open databases?

RQ2: How do journalists draw on the affordances of digital data to create narrative structures and achieve journalistic goals?

RQ3: What training or skills do data team members need to produce data journalism narratives?


This study applies a qualitative perspective to understanding the narratives and production routines of journalistic stories based on open data in the Argentine media. It includes a categorization and analysis of journalistic pieces based on data, and semi-structured interviews with two key data journalism creators, one in Argentina (Florencia Coelho, from LaNacionData), and the other in the United States (Ben Welsh, from the Los Angeles Times). Input also was provided by other members of their respective data teams. This design enables the connection of outputs to producers.

Twenty data journalism pieces published in 2015 from Argentine local, state, and national media were selected. The development of data journalism in Argentina is just beginning; therefore, the selection criteria related to geographic diversity and topical variety, including the significance of the outlets and outputs analyzed. Appendix 1 provides a list of the items included in the analysis.


These examples were deconstructed in an analysis matrix, which the researchers created based on adaptation of categories and criteria proposed by other scholars (Bradshaw, 2015; Díaz Noci, 2016b; Segel & Heer, 2015), Information about the publication was related to structural features of the content. The variables analyzed were: visual narrative, which is the device that guides the user through the visualization; interactivity, which allows the audience to participate in different ways; narrative structure, such as user-guided or linear/author-guided; topics covered; and journalistic goals, or the real purposes behind professional work.

The study was exploratory. Its purpose is to document what is happening in actual newsrooms, offering conclusions that can guide both data journalists in their job and journalism scholars in their further research.


Five types of narratives based on open databases were identified in this research: personalization, main trends, outliers, synchronic comparison, and diachronic comparison. Before turning to these narrative categories, we first present contextual information provided by our interviewees.

Data Journalism Teams Continue to Change Newsrooms

Data journalism enables journalists to rethink narratives, informative production processes, and their own skills. It also substantially modifies the work place, providing an opportunity to create a cooperative team space for people with different yet complementary areas of expertise.

Although findings are preliminary, given the ongoing development of this field, they indicate that new competencies need to be developed by members of data teams, with convergence among developers and journalists to produce journalistic pieces based on data. New professional routines also emerge.

Both LN Data (F. Coelho, personal communication, October 30, 2016) and the data desk teams at the Los Angeles Times (B. Welsh, personal communication, August 29, 2015) share their physical space with the rest of the editorial office. This location enables them to be in permanent contact with their colleagues. Both teams meet periodically to get updates on their research or plans. Both also work on projects that may originate outside the team, for instance from another journalist, or that stem from access to or discovery of particular databases. A journalist with topical expertise may be asked to join the team to help create the content.

A mandatory condition for working in a data team is a willingness to learn. In this field, tools are continuously being updated; therefore, being open to new technology and constantly demonstrating a capacity to learn new things are essential. Visualization tools that are used to extract data (scraping) change frequently, again reinforcing the need to be willing to acquire new knowledge quickly.

Los Angeles Times data desk editor Ben Welsh said that everyone on the team shares two fundamental skills: They know how to tell stories, and they know how to write code. However, each journalist also specializes in a topic, such as infrastructure, transportation, police, or photography, among others. They differentiate themselves from the rest of the editorial office based on their skills in programming, data analysis, and Web content development.

Welsh said the data team helps make journalism more ambitious, contributing to a better reputation for the organization as a whole. He added that the team’s work as Web developers also contributes to overall journalistic quality, by attracting more readers and generating more page views. The team also develops research tools for journalists.

Other Times journalists said their weekly meetings are called “Show and Tell,” and overall feedback is offered without a specific project. Team members explained that there is not a single structured work process, but rather a high level of cooperation with different people in the editorial office.

La Nación’s data team, which is funded by advertising and to a lesser extent through scholarships, is made up of journalists, lawyers, a librarian, and an engineer (F. Coelho, personal communication, October 30, 2016). These disciplines bring specific perspectives to common work, so that discussions turn into learning experiences among colleagues. When something new emerges, members of the team meet to familiarize themselves with how it works.

Data team members La Nación also described continuing education experiences, including attendance at conferences, special events, and training days, as well as exchanges with people from other countries who visit the newspaper. As at the Times, they said that team work, openness to the new and the different, and exchanges of information are constant.

Overall, data team editors cited goals related to the journalistic objective of public service. These included fostering transparency, strengthening democracy, avoiding or revealing corruption, and exploring topics in depth. Discovery of something new also was an overarching goal, as was the conduct of in-depth research via the data.

Five Narrative Types of Data Stories

These goals were evident in the 20 Argentine media examples of data journalism that we analyzed, as described in this section. We identified five categories of data journalism, each offering interactive pathways for initiating a dialogue with readers.

  1. Personalization: The possibility to personalize content, to make it “a la carte,” is one of the main advantages of data journalism. This capability means that users control the information they are exposed to, and they can link it to their own individual reality so that the information is not about unknown others but about oneself. The ability for users to customize data is central to this narrative strategy, ideally in a way that enables connections between personal experience and wider implications.

Personalization narratives in this study tended to be about politics and police matters.

Two examples come from La Voz del Interior. One, a piece of data journalism titled “Traffic Monitor,” enabled users in the Argentinean province of Córdoba to see the most dangerous traffic intersections in their own neighborhood. Users also could search for victims by name. This narrative was created by applying filters based on geolocation of all automobile accidents. The other example offered voting results for each school where citizens voted. It displayed a map of Córdoba city, with filters allowing readers to see the 2015 city election results by winner, neighborhood, and school.

The main journalistic objectives in these personalization narratives involved data collection and systematization. Each piece sought to offer the audience access to a huge amount of information that would not be available any other way, with users then able to filter that information to make it more personally relevant. The more specific the database is regarding gender, age, geolocation, and other characteristics, the more possibilities the readers have to personalize the results, using interactivity options and filters to select variables of interest. In both these cases, the news organization provided explicit instructions to guide user exploration.

  1. Main trends (trend, mean, average): We use this term to identify narratives that offer a summary of data. For example, Cordoba’s average retirement wage might be represented by calculating an average, but a more useful approach might be to find a trend, such as values that are repeated more frequently in the database. It is important to see the context of the data in order to avoid errors of interpretation or inappropriate data manipulation.

An example of this narrative was Clarín’s piece on data from a “complex, lonely, educated, and unequal city,” based on data from the Home Survey 2014. Among Clarin’s conclusions were that the average family income in 2014 was ARS 16,578 in the north, while a freelancer in the north area had an average income of ARS 8,222. The Traffic Monitor piece cited above also made use of this narrative structure, for instance concluding that one person in the city dies every six days in a motorcycle accident. Another piece from Córdoba, titled “Growth of Crime in the City,” indicated there was one crime fatality every two days.

In the analyzed examples, journalistic objectives are data collection and systematization, as well as discoveries that can be made by cross-tabulating data. Here again, filters were used to allow readers to search and select graphics. There also was an option to share content through social media, as well as to get a code to embed a graph.

  1. Outliers: Outliers are values that depart from the average, or from the behavior of the majority. Generally, the results are interesting because they often represent situations that lead to breaking news.

One striking example in this narrative category was a La Nación newspaper piece about Argentina’s vice president requesting travel allowances for trips he did not make, along with the number of security guards assigned to his trips and the greater-than-average time spent traveling. This story was in line with the journalistic objective of uncovering something new by cross-checking data, which in this case revealed a number of anomalies that led to a deeper journalistic investigation. The piece also included interactive graphics using search and selection filters, as well as options to share on social media and to use a code to embed a graph. There were no explicit instructions to interact with the information in this story of political corruption.

  1. Synchronic comparison: This narrative type appears when different types of variables are compared in the same period of time. It enables analysis of a phenomenon from different viewpoints or through different spaces where a phenomenon occurs. Synchronic comparison narratives were the most used in the analyzed cases, and they covered the widest variety of topics, including not only police and politics but also international and economic news, among others. They answered such questions as: How many assets did political candidates have when they started campaigning? How many votes did each candidate get in each school? How many immigrants arrived in Europe this year?

For example, the piece about elections by school in Córdoba from La Voz del Interior, also cited under “personalization” above, offers a comparative map of the “winning candidate” variable in the same temporal event: the city elections in 2015.

The journalistic objective of this narrative approach is to collect and systematize data and to discover new information by cross-checking data. These pieces offer a range of options for interactivity, including adding comments, sharing via social media, applying filters, and using navigation buttons, along with an opportunity to take a code to embed graphs. The interaction typology for users is exploration.

  1. Diachronic comparison: Our final narrative type proposes a comparison related to the evolution over time of the same fact, phenomenon, or circumstance in order to produce greater contextualization. For example, an interactive documentary on Lost Streets shows the spread of drug dealing in the city of Rosario; it offers graphs that show “homicides according to gender,” to facilitate comparisons by month between January and December 2014.

This narrative type commonly complemented synchronic comparison, combining to support the journalistic objectives of collecting and systematizing data and of discovering something by cross-checking data. Topics included economics, statistics, politics and police.

In addition to the Rosario example, other questions posed—and answerable through the data—included how the amount of meat consumed by Córdoba citizens has evolved over the years; how long-distance bus destinations are distributed around the country; and how many crimes were committed in Argentine provinces in three different years in the early 2010s. The answers are provided by comparing the same databases in different time periods.

In these narratives, filters act as interactive options; also available are options to share graphs in social media or grab a code to insert the graph on another site. Explicit and implicit instructions are present in the analyzed cases, with user exploration again being encouraged.

Narrative Resources

The presentation of information in a visual format, accessible to audiences anywhere in the world, is also an aesthetic strategy worth noting. Although the data themselves are different, visual presentation allows information to be harmonized.

However, a significant aspect of the analyzed cases is that they have been presented as independent pieces or embedded in other journalistic stories, with text and graphs the most commonly used formats. Even though we found several links and hypertexts, the use of video or audio was unusual.

This implies a gap in the expressive systems that make multimedia narrative possible. Other globally accessible visual formats exist, such as the presentation of slides, the structuring of stories in the form of comics, the display of counter-arguments, and the production of videos or animated films. These sorts of options offer other semiotic and expressive possibilities that may be more inclusive and more conducive to the use of multimedia.

A sequential structure was the most frequently used form of visual communication in our analyzed cases. Other options, such as the checklist or the progress bars, were unexplored.

The way relevant data are highlighted also is an important aspect of the visual narrative, particularly when complex graphs are included. In the analyzed cases, animated resources such as close-ups, zoom, and movement are set apart from other highlighting options such as use of icons or of different text styles (bold, italics, colors, and sizes).

Our interviews indicate a shared view that data analysis enhances the reputation of the media outlet by offering better-quality products. In addition, technology can be used to cut news production costs, enabling journalists to focus their efforts on the relevant material. For example, Welsh of the Los Angeles Times described software that automatically writes posts about earthquakes as soon as the government receives the information (personal communication, August 29, 2015). However, we found no automatically updated visual material among our analyzed cases. Moreover, with the rare exception of manual updates, data journalism follow-ups were not offered; results thus remained static, referring only to the time at which they were published.

Finally, we found only three examples of links to the original database used to create a journalistic narrative, meaning that only rarely could users—or other journalists—build on the stories offered or analyze the information from a different perspective. The ability to do this would be more in line with the “open-source” philosophy underpinning much of the work around the use of data, and would encourage more collaborative and comprehensive work among different professionals.


A public information access law was intended to be partially in place in September 2017 in Argentina. Yet even in early 2018, data made available from the government still often were irrelevant or outdated.

Paradoxically, most databases of the analyzed cases in this study came from government agencies, with others stemming from data collected by journalists. However, the database of the only case that corresponds to the journalistic objective of thorough research—the narrative around political travel expenses, described above—was built and systematized with readers’ contributions. Databases from other research centers were not used in the analyzed cases, although they offer an information source that might be of value to journalists.

It is significant that the first Argentine media companies to develop journalistic pieces based on data are those whose legacy product is a newspaper. Future research might explore why this is so, as well as the potential for initiatives from broadcasters and other news outlets.

This article is dedicated to: Elena; a brilliant and humble colleague.


