AI Disclosure for NLP Research Papers

The Double Role Problem in NLP Research

NLP researchers face a disclosure challenge that does not exist in most other fields. When a materials scientist uses ChatGPT to polish a paper about carbon nanotubes, the AI tool and the research subject are clearly separate things. When an NLP researcher uses GPT-4 to help write a paper that evaluates GPT-4, the boundary between tool and subject becomes blurred.

This is not a hypothetical edge case. It describes the daily reality of most researchers working on large language models. You might use a model to generate evaluation examples, to help debug your experiment code, to summarize related work, or to assist with writing, all while that same family of models is the thing you are studying. Getting the disclosure right in this situation requires careful thinking about what counts as a tool, what counts as a research contribution, and where the two overlap.

When AI Assistance Becomes a Research Contribution

Consider a researcher who is evaluating how well GPT-4 performs on medical question answering. During the project, they interact with the model in several ways.

They prompt GPT-4 with medical questions and record the answers. This is data collection, not tool usage. It is the experiment itself and belongs in the methods section as part of the experimental protocol.

They use ChatGPT to help write a Python script that calculates inter-annotator agreement. This is ordinary tool usage, the same as any other researcher using AI to help with code. It belongs in an AI usage disclosure.

They use GPT-4 to generate paraphrases of their evaluation questions to test robustness. This sits in a gray area. The model's paraphrasing ability is part of what is being studied, so this is arguably both tool usage and experimental procedure.

They use ChatGPT to help draft the abstract and introduction. This is standard writing assistance and belongs in a disclosure statement.

The key distinction is between interactions that produce research data or findings and interactions that support the writing or engineering process. The first category is methodology. The second category is tool usage. Both need to be documented, but in different ways and in different parts of the paper.

ACL and EMNLP Policies

The Association for Computational Linguistics has been at the forefront of AI disclosure policies, which makes sense given that its community is most directly affected.

Since 2023, ACL venues have required authors to include a "Limitations" section in their papers. The responsible NLP checklist that accompanies submissions also asks about the use of AI assistants. As of the most recent policy updates, ACL requires that any use of AI writing assistants be disclosed, and that the authors take full responsibility for the content.

EMNLP follows similar guidelines as an ACL-affiliated venue. The key requirements from both can be summarized as follows.

Disclose all AI writing assistance. If you used a language model to help draft, edit, or polish any part of your paper, say so. This includes using models for translation.

Do not list AI as an author. ACL has been clear that AI systems cannot be listed as co-authors. The humans on the author list must take responsibility for all content.

Distinguish tool use from experimental use. When you interact with a model as part of your experiments, that goes in your methodology. When you use it as a writing or coding tool, that goes in your disclosure.

Document your prompts when relevant. If your experimental results depend on specific prompts, include them in the paper or supplementary materials. This is about reproducibility, not disclosure.

Handling Evaluation Scenarios

Many NLP papers involve interacting with models in ways that are hard to categorize. Here are practical guidelines for common scenarios.

Prompt engineering for experiments. When you develop prompts to evaluate a model, the prompting process is part of your experimental methodology. Document it in the methods section with enough detail for replication. If you used another AI model to help design your prompts, disclose that separately.

Using models to generate training or test data. If you used a language model to create synthetic data for your experiments, this is part of your research pipeline and should be described in your methods. But if you also used the same model to help you write the script that processes that data, the second usage is tool assistance and belongs in a disclosure.

Human-AI comparison studies. When your paper compares human performance against model performance, be especially clear about which model interactions were part of the study and which were incidental tool usage. Reviewers will scrutinize this.

Red-teaming and adversarial testing. Interacting with models to find failure modes is experimental work. Document it as methodology. But if you asked ChatGPT to help you brainstorm categories of adversarial attacks to try, that is tool usage worth disclosing.

Writing a Good Disclosure for an NLP Paper

Here is an example of a well-structured disclosure for an NLP research paper.

AI Tool Usage Statement. We used the following AI tools during the preparation of this paper, separate from the model interactions reported as part of our experimental methodology in Sections 4 and 5. ChatGPT (GPT-4, OpenAI, accessed October 2025 through January 2026) was used to assist with code debugging for our evaluation pipeline and to suggest improvements to the clarity of our writing in Sections 1 and 2. GitHub Copilot assisted with boilerplate code for our experiment infrastructure. All AI-generated text was substantially revised by the authors. The experimental interactions with GPT-4, LLaMA-3, and Claude 3.5, including all prompts and model outputs, are documented in our methodology (Section 4) and supplementary materials (Appendix A).

Notice how this statement explicitly separates experimental model interactions from tool usage. This is the most important thing to get right in an NLP paper.

Here is what a poor disclosure looks like for comparison.

We used AI tools in this work.

This tells the reviewer nothing and will likely trigger questions during review. It does not distinguish between the model interactions that are the subject of the paper and any incidental tool usage.

The Relationship Between AI Usage Cards and Model Cards

NLP researchers are already familiar with model cards, the documentation format introduced by Mitchell et al. (2019) for describing machine learning models. AI Usage Cards serve a complementary but distinct purpose.

A model card describes what a model is and how it was built. An AI Usage Card describes how a model was used in a specific research project. In an NLP paper, you might reference the model card for GPT-4 when describing the system you evaluated, and include an AI Usage Card to document how you used various AI tools during the research process itself.

You can generate an AI Usage Card at ai-cards.org that clearly separates your experimental model interactions from your tool usage. This structured format can be particularly helpful for NLP papers where reviewers need to quickly understand which model interactions are part of the science and which are not. For more on this distinction, see our comparison of AI Usage Cards and Model Cards.

Practical Tips for NLP Researchers

Keep a log from the start. When you begin a new NLP project, create a simple document that tracks every interaction with AI tools that is not part of your experiments. Note the tool, the date, and the task. This makes writing your disclosure at submission time much easier than trying to reconstruct months of usage from memory.

Be specific about model versions. The NLP community cares about model versions more than most fields. "GPT-4" is not specific enough. Include the access date, the API version if applicable, and whether you used the chat interface or the API.

Separate your scripts. Keep experiment code and utility code in different directories or repositories. This makes it easier to say "Copilot assisted with our utility scripts but not with our core experimental code," and to back that claim up if asked.

Talk to your co-authors. In a multi-author NLP paper, different authors may have used different tools. Someone needs to aggregate the information. Decide at the start of the project who will maintain the AI usage log and how disclosures from all authors will be collected before submission.

The NLP community has a special responsibility to lead by example on AI transparency. The models we study are the same models that other researchers are learning to use and document. Getting our own disclosures right sets the standard for everyone else.