Slide 10 · the receipts moment
Am I in there?
Paste a URL or describe what you’ve published. This page tells you which of the major AI training corpora plausibly swept it — based on what each dataset publicly says it crawled, and when.
If you put it on the public web before 2024, the honest answer is: yes, probably, in several.
Read this first
- This is a heuristic built from publicly documented crawl windows. It is not a definitive check against any specific dataset or model.
- No data leaves your browser. Nothing is uploaded, logged, or sent to a server. The check runs entirely on this page.
- A “likely” result means a corpus’s crawl window overlaps your publish date and the corpus type matches. It does not mean any model has memorized your work.
- An “unlikely” result means the dates don’t line up. It does not guarantee absence — private datasets and licensing deals are out of scope here.
The corpora — what they swept, when
Reference list. Hand-curated from each dataset’s public documentation. Click out for primary sources.
Opt-out and defense
None of these recall past inclusions. They affect future training runs, future crawls, future datasets. The internet doesn’t forget; you can only steer what comes next.