Addressing Gaps in Data Governance to Promote Trust in Generative AI

December 20, 2023

Authored by:

April Horency

Generative artificial intelligence (AI) is built on large language models (LLMs), which are trained on large and diverse pools of data. While LLMs can be used to make predictions and solve complex problems, they also have the potential to undermine trust and longstanding democratic norms and lead to major changes in how we conduct and replicate research. Data governance is a tool many researchers are looking at to strike a balance between mitigating these risks and encouraging the development of LLMs. However, this needs to be done more effectively.

The Digital Trade and Data Governance Hub and the GW co-led NIST-NSF Institute for Trustworthy AI in Law & Society (TRAILS) brought experts from academia, industry, and government to the George Washington University on December 7 and 8, 2023, to address this issue at the conference, “Data Governance in the Age of Generative AI.” Over these two days, they worked together to identify gaps in data governance for LLMs and suggest and discuss ideas to address them.

TRAILS is focused on transforming the practice of AI from one driven primarily by technological innovation to one driven by ethics, human rights, and input and feedback from communities whose voices have previously been marginalized. To achieve this, they have begun supporting various events centered around AI, such as this conference and the workshop “Operationalizing the Measure Function of the NIST AI Risk Management Framework,” held in October 2023. Another goal of this conference was to promote a greater understanding of and participation in data governance as a tool for governing AI, which speaks to this focus, and more specifically, TRAILS’ fourth research thrust, participatory governance and trust.

“Good data governance is a crucial aspect of any comprehensive attempt to govern AI. At a time when significant attention is focused on regulating specific models and algorithms, this conference brings a much-needed focus to the data on which these algorithms and models operate,” said David Broniatowski, associate professor of engineering management and systems engineering and GW-PI of TRAILS.

Through six panels and three keynote discussions, conference attendees examined firms’ current strategy for obtaining data, whether firms choose to make their LLMs open, partially closed, or closed to outside review, and the implications of these choices for democracy, human rights, and trust. For example, content creators currently lack protection for their ideas and opinions, so there is growing evidence that they are less willing to share their data, which could undermine openness over time. In the first panel of day one, attendees assessed whether they should be granted new protections for the reuse of their online content to train LLMs.

Determining how to govern data in a way that allows generative AI and the LLMs to enhance human life without harming individuals, like content creators, is an ongoing conversation. By supporting these convenings, TRAILS is helping to solve this challenge in a way that promotes inclusive governance strategies that build trust and accountability in AI systems.