Slide 10 · the confrontation

Open Source Receipt.

How much of you is in the training data? Nobody can say for sure — not even the people who scraped it. So we play the question for shape, not for truth.

Answer a few yes/no and numeric questions about what you put on the open web. The widget multiplies through illustrative numbers and prints a paper-receipt-style estimate.

Heuristic, not truth. The talk’s 1,800 of mine figure is real and specific to one person who checked one dataset search tool against 145,000 public Flickr photos. Use this widget to feel the shape of the question. Do not use it to know the answer.

The questionnaire

Answer for the years before ~2022 — that’s roughly when most public corpora stopped freshening. Skip what doesn’t apply. Inputs save as you type.