OpenAI Asks Contractors to Upload Real-World Work to Train Next-Gen AI Agents

OpenAI and Handshake AI are reportedly asking contractors to submit actual files from past jobs, including code and presentations, to train AI for office work

OpenAI CEO Sam Altman
info_icon
Summary
Summary of this article
  • OpenAI is collecting "real-world" work samples from contractors to train next-gen models

  • Contractors are asked to provide actual deliverables alongside original request to capture workplace context

  • OpenAI provided "Superstar Scrubbing" tool to help workers anonymize proprietary or private data

OpenAI has asked third-party contractors to upload real examples of work they produced in past or current jobs to help train and evaluate its next generation of AI models, a Wired report said, raising concerns around intellectual property and privacy risks.

According to the report, OpenAI and training-data partner Handshake AI sought “concrete outputs” rather than abstract task descriptions, asking contractors to submit actual files such as documents, spreadsheets, presentations, code repositories and images. Contributors were instructed to pair each example with the original request, for instance, an instruction from a manager, and the final deliverable, with a preference for complex, time-intensive work that reflects real white-collar tasks.

Tax The Rich

1 January 2026

Get the latest issue of Outlook Business

amazon

How OpenAI Tried to Limit Risk?

OpenAI told contractors to remove or anonymise personally identifiable information, proprietary material and other confidential details before uploading files, and pointed them to a specialised ChatGPT-based “scrubbing” tool to help with that process.

“We’ve hired folks across occupations to help collect real-world tasks modeled off those you’ve done in your full-time jobs, so we can measure how well AI models perform on those tasks. Take existing pieces of long-term or complex work (hours or days+) that you’ve done in your occupation and turn each into a task,” OpenAI was quoted as saying in an internal document seen by Wired.

“Remove or anonymise any: personal information, proprietary or confidential data, material nonpublic information (e.g., internal strategy, unreleased product details),” it added.

Legal & Ethical Concerns

However, legal experts cited in the report warned that the approach places heavy reliance on individual contractors’ judgement about what is confidential. They cautioned that automated or manual scrubbing may miss embedded or contextual information, particularly in code or structured data, leaving OpenAI exposed to legal and reputational risk.

The episode highlights the growing demand across the AI industry for high-quality, task-specific training data that mirrors real workplace activity, as companies race to make models more useful for enterprise automation. But it also raises questions for contractors and their former employers, as sharing workplace outputs, even after anonymisation, could trigger confidentiality or IP disputes. Wired noted that it remains unclear what contractual safeguards, warranties or audit processes OpenAI and Handshake AI have in place to limit liability, underscoring broader uncertainty over accountability as AI firms increasingly rely on real-world work product to train models.

The effort reflects a broader trend within the generative AI boom, which has spawned a fast-growing ecosystem of third-party contracting firms, including Handshake AI, Surge, Mercor and Scale AI, that recruit and manage large pools of workers to generate high-quality, real-world training data. These firms play a key role in helping AI companies improve model performance on practical, white-collar tasks, even as the approach raises fresh questions around data protection, intellectual property and worker accountability.

Published At:
SUBSCRIBE
Tags

Click/Scan to Subscribe

qr-code

Advertisement

Advertisement

Advertisement

Advertisement

×