Slide 10 · the receipts moment

Am I in there?

Paste a URL or describe what you’ve published. This page tells you which of the major AI training corpora plausibly swept it — based on what each dataset publicly says it crawled, and when.

If you put it on the public web before 2024, the honest answer is: yes, probably, in several.

Read this first

  • This is a heuristic built from publicly documented crawl windows. It is not a definitive check against any specific dataset or model.
  • No data leaves your browser. Nothing is uploaded, logged, or sent to a server. The check runs entirely on this page.
  • A “likely” result means a corpus’s crawl window overlaps your publish date and the corpus type matches. It does not mean any model has memorized your work.
  • An “unlikely” result means the dates don’t line up. It does not guarantee absence — private datasets and licensing deals are out of scope here.
What did you publish?

We parse the domain and look for a date in the path. Nothing is fetched.

Approximate is fine. Use the year it became publicly reachable, not the year you wrote it.

The corpora — what they swept, when

Reference list. Hand-curated from each dataset’s public documentation. Click out for primary sources.

Opt-out and defense

None of these recall past inclusions. They affect future training runs, future crawls, future datasets. The internet doesn’t forget; you can only steer what comes next.