AI Usage Cards vs Datasheets for Datasets

You are documenting two different things

Researchers often put dataset documentation and AI disclosure in the same mental drawer.

That causes trouble.

A Datasheet for Datasets documents a dataset. An AI Usage Card documents how researchers used AI during a project. The first tells readers what the data is, where it came from, and how others should treat it. The second tells readers how a tool such as ChatGPT, Claude, Gemini, Copilot, or another AI system shaped the work.

Timnit Gebru and coauthors introduced Datasheets for Datasets as a structured way to document the motivation, composition, collection process, recommended uses, and maintenance of datasets. Jan Philip Wahle and coauthors introduced AI Usage Cards as a way to report AI use in scientific research, with attention to transparency, integrity, and accountability.

The overlap feels tempting because both documents ask for transparency. But they answer different questions.

If you built or released a dataset, write a datasheet or a similar dataset document. If you used AI during the research process, create an AI Usage Card. If you did both, you probably need both.

If AI Usage Cards are new to you, start with What Are AI Usage Cards? and then return to this comparison.

What a datasheet does

A datasheet treats the dataset as the object that needs explanation.

Gebru and coauthors used the analogy of electronics datasheets: when engineers buy a component, they expect technical documentation. Dataset users need the same kind of record. They need to know who created the data, why they created it, how they collected it, what it contains, what it leaves out, and how the creators expect others to use it.

A datasheet answers questions like these:

Why did you create the dataset?
What does it contain?
How did you collect, filter, clean, or annotate it?
Who or what appears in the data?
Does the dataset include personal, sensitive, or copyrighted material?
What uses fit the dataset?
What uses would be risky or misleading?
Who maintains the dataset after release?

The author is usually the dataset creator, curator, or steward. The reader is anyone who may inspect, reuse, audit, or build on the data.

That reader may not know your lab, your sampling choices, your annotation instructions, or the political and social context around the data. The datasheet gives them a place to check those details before they rely on the dataset.

Hugging Face dataset cards follow the same family of ideas, though they are not the same thing. Hugging Face explains that each dataset repository can use a README.md file as a dataset card and can include metadata such as license, language, size, and task information. The platform also recommends that dataset cards help users understand the contents of the dataset and its limits. See the Hugging Face dataset card documentation for the platform format.

So the distinction matters: a datasheet is a documentation framework. A dataset card is often a repository format. Both can help users judge a dataset before they reuse it.

What an AI Usage Card does

An AI Usage Card treats the research process as the object that needs explanation.

The card does not replace a methods section. It gives you a structured record of where AI entered the work and how humans checked the output. That matters because a short acknowledgment such as "we used ChatGPT for editing" leaves readers guessing.

An AI Usage Card answers questions like these:

Which AI tool did you use?
Which model or version did you use, if known?
When did you use it?
What task did you ask it to support?
Did it draft text, suggest code, label data, summarize sources, translate text, or screen records?
What prompts, instructions, or rubrics guided the tool?
What output did the tool produce?
Who reviewed, edited, tested, or rejected that output?

The author is the researcher or research team. The reader may be a journal editor, peer reviewer, supervisor, examiner, coauthor, or future reader who wants to understand the role of AI in the work.

The AI Usage Cards paper appeared at JCDL 2023. The National Research Council Canada record lists the DOI as 10.1109/JCDL57899.2023.00060 and describes AI Usage Cards as a way to report AI use in scientific research.

If you need examples, see AI Usage Cards Examples and Templates. If your main use case involves manuscript text, read How to Disclose ChatGPT Usage in Academic Papers.

You can also generate a card at ai-cards.org and copy the wording into an acknowledgment, methods note, supplement, thesis appendix, or repository file.

The fast comparison

Dimension	Datasheets for Datasets	AI Usage Cards
Main question	What is this dataset?	How did the researcher use AI?
Unit of documentation	One dataset	One study, paper, thesis, review, or project
Main author	Dataset creator, curator, or steward	Researcher or research team
Main reader	Dataset users and reviewers	Readers, reviewers, editors, supervisors
Core content	Motivation, composition, collection, labeling, consent, uses, limits, maintenance	Tool, task, prompts or workflow, output, verification, editing, oversight
Best fit	Dataset release, benchmark, data paper, repository	Any research workflow that used AI
Typical location	Repository, appendix, supplement, dataset paper	Methods section, acknowledgment, supplement, repository, appendix
Risk it addresses	Unclear data provenance or hidden dataset limits	Hidden AI assistance or thin AI disclosure

The short version: datasheets describe the data. AI Usage Cards describe the AI assisted work.

When you need a datasheet

Use a datasheet when you share data with others, especially when other researchers may download it, inspect it, cite it, or train models on it.

You need one if you create a new benchmark. The benchmark may include examples, labels, splits, metadata, and exclusion rules. Those choices shape every later result.

You need one if you curate raw material into a research dataset. Maybe you scraped web pages, filtered social media posts, merged institutional records, or cleaned sensor data. Those steps can remove groups, amplify errors, or create gaps that later users will not see in the final file.

You need one if your data involves people, language, images, health records, location traces, or other sensitive material. Readers need to know how you handled consent, privacy, anonymization, licenses, and risk.

A datasheet belongs close to the dataset. Put it in the repository, supplement, appendix, or data paper. If your dataset lives on a platform with its own dataset card format, you can still use the datasheet questions to make that card stronger.

When you need an AI Usage Card

Use an AI Usage Card when AI touched the workflow in a way that readers should know.

You need one if you used AI to draft or revise text. That includes rewriting paragraphs, generating summaries, suggesting titles, translating text, or editing for style.

You need one if you used AI to write code. Report the tool, the coding task, and how you tested the code. If Microsoft Copilot supported the work, How to disclose Microsoft Copilot use in academic writing gives more focused guidance.

You need one if you used AI for labeling, extraction, screening, coding, or synthesis support. This comes up often in qualitative research, systematic reviews, NLP annotation, peer review, and grant writing. For related cases, see AI Disclosure in Systematic Reviews and Meta-Analyses and AI Disclosure for Qualitative Research.

If you do not know whether your use needs disclosure, read Do I Need to Disclose AI Usage in My Paper? and AI Disclosure Policies by Major Journals.

Many dataset projects need both

Imagine your lab builds a sentiment dataset from public posts.

You collect posts, remove duplicates, filter by language, and define three labels. Then you ask an LLM to assign first pass labels. Human annotators review every label, resolve disagreements, and create the final release.

That project needs a datasheet because you created a dataset. The datasheet explains the source data, inclusion rules, filtering steps, annotation guide, class balance, consent or privacy choices, known gaps, and intended uses.

It also needs an AI Usage Card because AI shaped the annotation workflow. The card explains the model, prompt, labeling rubric, output, human review process, and error checks.

Neither document does the other job.

The datasheet tells future users what they are downloading. The AI Usage Card tells readers how AI helped produce part of the research output.

I see the mistake most often in papers that write one sentence: "We used GPT-4 to assist annotation." That sentence gives readers almost nothing. Which labels? Which prompt? Which version? Did humans check every item? Did the team measure agreement before or after AI support? Did the authors reject any AI output?

A card makes those questions harder to skip.

A practical example with LaTeX wording

Take a radiology NLP project.

A team collects de-identified radiology reports under ethics approval. The team asks an LLM to assign preliminary labels for findings such as "pleural effusion" and "pneumothorax." Two radiologists review the labels and correct errors before the team releases the dataset.

The dataset needs a datasheet. The paper also needs an AI Usage Card.

A short LaTeX version could look like this:

\section*{[[AI Usage Card](/ai-disclosure-in-peer-review-what-reviewers-and-editors-should-report/)](/ai-disclosure-for-social-science-research/)}
 
\textbf{Tool used:} GPT-4 class model via API.
 
\textbf{Purpose:} The model generated preliminary labels for
de-identified radiology reports before expert review.
 
\textbf{Prompting and input:} The authors supplied the annotation
rubric and one report at a time. The authors did not submit patient
names or direct identifiers.
 
\textbf{Human oversight:} Two radiologists reviewed all AI-generated
labels. The authors accepted no label without expert review.
 
\textbf{Verification:} The radiologists compared each label with the
annotation guide and corrected errors before dataset release.
 
\textbf{Role in the final dataset:} The AI system supported preliminary
annotation. Human experts produced the final released labels.

You can place this in a supplement, appendix, repository, or methods section. If you write in LaTeX, see LaTeX Tutorial for AI Usage Cards and How to Use AI Usage Cards in Overleaf.

The datasheet for the same project would look different. It would describe report sources, inclusion dates, clinical setting, de-identification, label definitions, annotator qualifications, class counts, release conditions, and limits on reuse.

Same project. Two records. Two audiences.

How this fits with other documentation frameworks

Datasheets and AI Usage Cards sit near other research documentation formats.

Model cards describe trained models. Mitchell and coauthors proposed Model Cards for Model Reporting to document model performance, intended use, evaluation details, and limits.

System cards describe deployed AI systems or model releases with attention to safety testing and deployment context. OpenAI, for example, publishes system cards for some model releases, such as the OpenAI o1 System Card.

Datasheets describe datasets.

AI Usage Cards describe how researchers used AI during a specific piece of scholarly work.

If you want the whole map, read AI Documentation Frameworks Compared. For adjacent comparisons, see AI Usage Cards vs Model Cards and AI Usage Cards vs System Cards.

A simple rule for manuscripts and repositories

Ask two questions before submission.

First, did we create, curate, or release a dataset that others may inspect or reuse?

If yes, write a datasheet or a dataset card that covers provenance, composition, collection, annotation, consent, recommended uses, limits, and maintenance.

Second, did we use AI for writing, coding, screening, extraction, annotation, analysis, translation, or data preparation?

If yes, create an AI Usage Card that records the tool, task, input, output, human review, and final responsibility.

If both answers are yes, do both.

That rule saves time because it separates the object from the workflow. The dataset is one object. Your AI assisted process is another.

Use the card before reviewers ask

Researchers often delay AI disclosure until the final submission checklist.

That creates weak wording. It also makes it harder to remember which model you used, what prompts you tried, and how you checked the output.

Generate an AI Usage Card at ai-cards.org while the work is still fresh. Keep it with your manuscript files, analysis repository, data release, or thesis appendix. If a journal asks for an AI statement, you can use the card itself or copy the generated text into your paper.

Use a datasheet to explain the dataset.

Use an AI Usage Card to explain the AI assisted work.

When your project has both stories, give readers both.