AI Disclosure for NLP Research Papers

NLP has a disclosure problem that other fields can ignore

NLP researchers use the same systems they study.

That creates a problem at submission time. If a chemistry author uses ChatGPT to revise prose in a paper about catalysts, the tool and the object of study sit in different boxes. If an NLP author uses a language model to draft prompts, write evaluation code, generate synthetic examples, and then evaluate another language model, the boxes start to blur.

Reviewers do not need a confession. They need a map.

A good NLP disclosure tells readers which model interactions produced evidence and which model interactions supported the work around the evidence. You should treat the first group as methods. You should treat the second group as AI assistance.

That distinction sounds small. It changes where you document the work, what details you include, and how a reviewer judges reproducibility.

If you need a structured record while you write, create an [[[AI Usage Card](/do-i-need-to-disclose-ai-usage/)](/ai-disclosure-for-social-science-research/)](/what-are-ai-usage-cards/) early in the project. Do not wait until the camera-ready deadline, when no one remembers which co-author used which tool.

Separate the experiment from the helper

Start with a blunt question: did the model interaction affect the research claim?

If yes, put it in the methods, experiment setup, data section, appendix, or repository. If no, put it in the AI usage statement or acknowledgments.

That rule handles most cases.

A paper that evaluates model performance on medical question answering should describe every prompt, dataset, decoding setting, judging procedure, and sampling decision that shaped the results. Those details belong to the experiment.

The same paper may also use GitHub Copilot to write boilerplate scripts, ChatGPT to spot unclear sentences, or Claude to summarize a draft for the authors. Those uses may affect the manuscript process, but they do not produce the reported measurements. They belong in an [AI disclosure](/ai-disclosure-in-systematic-reviews-and-meta-analyses/).

NLP papers often mix both. That does not make the work suspect. It means the authors need to label the roles.

Scenario	Where to document it
You prompt a model and analyze its outputs as data	Methods or experiment section
You use a model to generate synthetic training data	Data and methods section
You use a model to draft evaluation prompts	Methods, plus disclosure if the prompt design came from AI assistance
You use Copilot to write logging code	AI usage statement or acknowledgments
You use ChatGPT to revise the introduction	AI usage statement or acknowledgments
You use an AI search tool to find related work	Disclosure if your venue asks for it, and careful citation checking

This split also helps with journal and conference policies. For more general submission guidance, see AI Transparency Requirements for Journal Submissions and How to Disclose ChatGPT Usage in Academic Papers.

What ACL-family policies now ask authors to do

For ACL Rolling Review, generative AI tools do not qualify for authorship. ARR asks authors to disclose writing and coding assistance, including scope, in the Responsible NLP Checklist, with details in the acknowledgments section. ARR also says authors may include coding assistance details in README files. (github.com)

ARR recognizes several kinds of generative assistance: language polishing, short input suggestions, literature search, low-novelty text, new ideas, and new ideas plus new text. The policy treats simple proofreading tools differently from generative tools that produce content, ideas, or code. (github.com)

ACL's publication ethics policy says authors remain responsible for submitted content and may not list generative AI tools as authors. The policy also says authors should disclose generative AI content creation in acknowledgments, with an example such as noting that a section used inputs from ChatGPT. (aclweb.org)

ARR also requires a dedicated "Limitations" section, and it warns that papers without that section can face desk rejection. EACL 2026 follows ARR policies and also requires a Limitations section before references. (github.com)

The Responsible NLP Checklist now includes a question on AI assistants in research, coding, or writing. Published 2025 checklist examples show the question as part of the checklist record for accepted papers. (aclanthology.org)

For authors, the practical lesson is simple: do not hide AI assistance in vague language. Name the tool, describe the task, and say where the experimental use appears in the paper.

Four NLP cases that need careful wording

Prompt engineering

If you designed prompts to test a model, the prompts are part of the experiment. Put them in the paper, appendix, or repository.

If you asked another model to suggest prompt variants, disclose that extra help. You can still use the prompts, but readers should know how you produced them.

A clear methods sentence might read:

"We evaluated the models using the prompts in Appendix A. The authors wrote the initial prompts, then used ChatGPT to suggest alternative phrasings. The authors selected and edited the final prompt set before running the experiments."

That sentence gives the reviewer enough context. It does not over-explain.

Synthetic data

Synthetic data generation belongs in the research pipeline. Treat it like data collection.

Name the generator model, access route, date range, prompt pattern, filtering process, and any human review. If the model provider changes the model behind a product name, your date range and access route help future readers understand what you used.

A weak paper says, "We generated extra examples with an LLM."

A stronger paper says, "We generated 2,000 paraphrases using the API model listed in Appendix B between February 3 and February 7, 2026. We used temperature 0.7, rejected outputs with named entities that did not match the source item, and manually checked a random sample of 200 examples."

That second version lets a reviewer inspect the pipeline.

For adjacent data documentation, compare this record with AI Usage Cards vs Datasheets for Datasets. A datasheet describes the dataset. An AI Usage Card describes how your team used AI tools while creating, analyzing, or writing about it.

LLM-as-a-judge evaluation

NLP reviewers now expect detail when authors use one model to judge another. Report the judge model, prompt, rubric, number of judgments, sampling settings, aggregation rule, and any human validation.

If an AI tool helped you write the rubric, say so. If the judge model produced the actual evaluation scores, that belongs in methods, not only in the disclosure.

This case needs care because the helper can become part of the measurement instrument.

Code generation

Coding assistance can look harmless until it touches the core experiment.

If Copilot or another assistant helped write plotting scripts, logging code, or file conversion utilities, a short disclosure may suffice. If it helped write the evaluation metric, decoding loop, data filtering script, or statistical test, give more detail.

You do not need to paste every completion into the paper. You should keep a private log or repository note so you can answer reviewer questions. For conference work, see Do Conference Papers Need to Disclose AI Agent Use? and How to disclose AI use for NeurIPS, ICML, and ACL submissions.

A disclosure statement that works for an NLP paper

A strong statement separates research interactions from tool assistance. It also names where the research interactions appear.

You can adapt this template:

AI usage statement. We used AI tools during manuscript preparation and software development, separate from the model interactions reported as experiments in Sections 3 and 4. ChatGPT was used between November 2025 and February 2026 to suggest wording revisions for the introduction and to help debug non-core utility scripts. GitHub Copilot was used for boilerplate code in logging and plotting utilities. The authors reviewed and edited all AI-assisted text and code. Model interactions that produced experimental data, including prompts, settings, outputs, and LLM-as-judge rubrics, are reported in Section 4 and Appendix A.

That statement does three things.

It tells readers that the experimental model use appears in the methods. It tells readers that AI also helped with writing and code. It assigns responsibility to the human authors.

A poor version says:

We used AI tools in this work.

That sentence creates work for the reviewer. It forces them to guess whether the AI wrote the prose, generated the data, judged the outputs, produced code, or shaped the research idea.

For more templates, see AI Usage Cards Examples and Templates.

Add the statement to LaTeX without making a mess

Many NLP papers use ACL style files, so authors often need a compact disclosure section that fits near acknowledgments or after the main text, depending on venue instructions.

Use a short section like this if your venue allows it:

\section*{AI Usage Statement}
 
We used AI tools during manuscript preparation and software development,
separate from the model interactions reported in the experimental methodology.
ChatGPT was used to suggest wording revisions for Sections 1 and 2.
GitHub Copilot was used for boilerplate code in logging and plotting utilities.
The authors reviewed and edited all AI-assisted text and code.
 
All model interactions that produced experimental data, including prompts,
settings, outputs, and evaluation rubrics, are documented in Section 4
and Appendix A.

If the venue asks for the disclosure in acknowledgments, use this version:

\section*{Acknowledgments}
 
The authors used ChatGPT to suggest wording revisions for the introduction
and GitHub Copilot for boilerplate logging and plotting code. The authors
reviewed and edited all AI-assisted text and code. Experimental model
interactions are documented in Section 4 and Appendix A.

If you want to include an AI Usage Card in an appendix, add a short note:

\appendix
\section{AI Usage Card}
 
This appendix contains the AI Usage Card generated for this project.
It records the AI tools used for writing, coding, and research support,
and separates those uses from model interactions that formed part of
the experimental method.

For a longer walkthrough, use the LaTeX Tutorial for AI Usage Cards or the Overleaf guide.

AI Usage Cards and model cards answer different questions

NLP researchers know model cards. Mitchell et al. introduced model cards as a format for reporting trained machine learning models, including intended use and performance details across relevant conditions. (research.google)

An AI Usage Card answers a different question.

A model card says what a model is. An AI Usage Card says how researchers used AI in a project.

That distinction matters in NLP. You might cite a model card for the system you evaluate, include a datasheet for the dataset you create, and add an AI Usage Card for the AI assistance your team used while writing code, drafting text, or preparing analyses.

These records can sit together. They do not compete. For a fuller comparison, read AI Usage Cards vs Model Cards and AI Documentation Frameworks Compared.

Keep the log while the project happens

The easiest disclosure comes from a boring habit: record AI use when it happens.

Create a shared document with four columns: tool, date, task, and paper location. When a co-author uses an AI assistant, they add one line. If the use belongs in methods, note the section. If it belongs in disclosure, note the acknowledgment or AI usage statement.

This habit saves time. It also prevents the awkward final-week email where one author asks, "Did anyone use Copilot for the evaluation script?"

For NLP teams, version detail matters. Record the model name shown in the interface, the API model ID if you used one, the date range, and the route of access. "ChatGPT" alone may not tell readers enough. A product can change while the project runs.

You should also record negative boundaries. If AI helped with plotting scripts but not with the evaluation metric, say that. If AI helped polish the introduction but did not generate claims, say that too.

Editors and reviewers do not expect perfect diaries. They do expect honest, usable disclosure.

Generate the card before submission

Before you submit an NLP paper, generate an AI Usage Card at ai-cards.org.

Use it as a project record, an appendix item, or source text for your acknowledgments. For example, an NLP team can generate a card that lists ChatGPT for prose revision, Copilot for utility code, and an LLM-as-judge setup as experimental methodology documented elsewhere in the paper.

That card gives reviewers the map they need.

Generate your AI Usage Card at ai-cards.org, then copy the statement into your paper while the details still feel fresh.