Upgrade your account to a Community Supporter account and remove most of the site ads.

Reply to thread

Message: <blockquote data-quote="Cadence" data-source="post: 9834749" data-attributes="member: 6701124">Just saw this about Amazon's Mechanical Turk:[URL unfurl="true"]https://link.springer.com/article/10.3758/s13428-025-02852-7[/URL]If you haven't heard of this MTurk before, a post on using it for surveys is at:[URL unfurl="true"]https://blog.mturk.com/getting-started-with-surveys-on-mturk-e2eea524c73[/URL]A Bluesky thread on the "Why you shouldn't..." article is at:[URL unfurl="false"]https://bsky.app/profile/johnholbein1.bsky.social/post/3mbwu3cuwlk2i[/URL]Looking at footnote 5, some folks seem to have figured out that hooking LLMs up to it isn't hard:"The data for the four studies also includes responses to a question asking participants whether they are human and a question asking participants whether they are a large language model. The responses to these questions are not directly relevant to the present project, but it is notable that, in response to the question of whether they are a large language model, 37.00% of the participants in the open MTurk sample said “yes”; 1.75% said “very large”; 3.00% provided some definition of a large language model (e.g., “A type of artificial intelligence (AI) program that can recognize and generate text, among other tasks”); 27.25% said “English” (perhaps because they interpreted the question as asking them what language they spoke); and 0.25% said “Spanish” (see prior parenthetical)."Asking ChatGPT (for what that's worth)...[SPOILER="Warning! ChatGPT Answers to two Prompts behind spoilers."]Can you easily be hooked up to Mechanical Turk to provide it answers?Short answer: not easily, and not legitimately—at least not in the way Mechanical Turk (MTurk) is designed to work.Here’s the breakdown.[HR][/HR]<h2>1. How Mechanical Turk is designed</h2>MTurk is built around human workers (“Turkers”) completing Human Intelligence Tasks (HITs). Requesters expect:<ul> <li data-xf-list-type="ul">A real person</li> <li data-xf-list-type="ul">With human judgment, perception, or lived experience</li> <li data-xf-list-type="ul">Who can be held accountable under MTurk’s terms</li> </ul>Using an AI to answer HITs while presenting it as a human generally violates MTurk’s Terms of Service.[HR][/HR]<h2>2. Technical feasibility vs. permission</h2><h3>Technically</h3>Yes, someone could:<ul> <li data-xf-list-type="ul">Write a script that pulls HITs</li> <li data-xf-list-type="ul">Sends prompts to a language model (like me)</li> <li data-xf-list-type="ul">Submits the generated answers</li> </ul>This is technically straightforward for text-based HITs.<h3>Practically / legally</h3>But:<ul> <li data-xf-list-type="ul">MTurk has bot-detection, quality checks, and attention tests</li> <li data-xf-list-type="ul">Many HITs are designed to catch automation</li> <li data-xf-list-type="ul">Accounts found automating work are commonly suspended or banned</li> <li data-xf-list-type="ul">Requesters may reject work if AI-generated responses are detected</li> </ul>So while it’s possible, it’s fragile, risky, and often short-lived.[HR][/HR]<h2>3. Ethical and research issues</h2>Many HITs exist to:<ul> <li data-xf-list-type="ul">Collect human opinions</li> <li data-xf-list-type="ul">Measure human behavior</li> <li data-xf-list-type="ul">Train or validate models</li> </ul>Using an LLM in place of a human can:<ul> <li data-xf-list-type="ul">Corrupt datasets</li> <li data-xf-list-type="ul">Bias research results</li> <li data-xf-list-type="ul">Undermine the platform’s purpose</li> </ul>That’s why most academic and commercial requesters explicitly forbid AI-generated responses.[HR][/HR]<h2>4. Legitimate ways AI</h2>There are acceptable models:<h3>✔ Human-in-the-loop</h3> <ul> <li data-xf-list-type="ul">AI suggests drafts</li> <li data-xf-list-type="ul">A human reviews, edits, and takes responsibility</li> <li data-xf-list-type="ul">Requester explicitly allows assistance tools</li> </ul><h3>✔ As a requester tool</h3> <ul> <li data-xf-list-type="ul">Researchers use LLMs to:<ul> <li data-xf-list-type="ul">Design HITs</li> <li data-xf-list-type="ul">Pretest instructions</li> <li data-xf-list-type="ul">Analyze MTurk results afterward</li> </ul></li> </ul><h3>✔ Separate “AI worker” platforms</h3>Some platforms explicitly allow or are built for:<ul> <li data-xf-list-type="ul">Synthetic respondents</li> <li data-xf-list-type="ul">Simulated users</li> <li data-xf-list-type="ul">AI evaluations</li> </ul>MTurk is not one of them.[HR][/HR]<h2>5. Bottom line</h2> <ul> <li data-xf-list-type="ul">Technically: Yes, it’s easy to connect an LLM to MTurk</li> <li data-xf-list-type="ul">Contractually: Usually not allowed</li> <li data-xf-list-type="ul">Practically: Likely to get caught over time</li> <li data-xf-list-type="ul">Ethically: Often inappropriate unless explicitly disclosed</li> </ul>If you’re asking out of curiosity or for research design, I can help explore proper alternatives (e.g., synthetic panels, agent-based simulations, or human-AI hybrid workflows).Which is the hardest to hook you up to, MTurk, Prolific, or Connect?Short answer: Prolific is the hardest to hook me up to, CloudResearch Connect is next, and MTurk is the easiest (though still not safe or allowed).Here’s why, from hardest → easiest.[HR][/HR]<h2>1. Prolific — hardest</h2>Prolific is explicitly designed to exclude bots and AI respondents.Why it’s hard:<ul> <li data-xf-list-type="ul">Strong identity verification (phone, email, payment, demographics consistency)</li> <li data-xf-list-type="ul">Longitudinal profiling: your answers are checked across studies over time</li> <li data-xf-list-type="ul">High-quality attention and consistency checks</li> <li data-xf-list-type="ul">Researcher expectations: Prolific is heavily used for academic research, where human-only data is critical</li> <li data-xf-list-type="ul">Explicit policy against automated or AI-generated participation</li> </ul>Bottom line:Even if you technically automated submissions, statistical inconsistency and profiling would expose it quickly. Prolific is built to reject synthetic respondents.[HR][/HR]<h2>2. CloudResearch Connect — medium-hard</h2>Connect sits between Prolific and MTurk.Why it’s harder than MTurk:<ul> <li data-xf-list-type="ul">Uses research-grade participants</li> <li data-xf-list-type="ul">Stronger quality scoring and reputation systems</li> <li data-xf-list-type="ul">Increasing use of AI-detection and response-pattern analysis</li> <li data-xf-list-type="ul">More studies with open-ended reasoning and manipulation checks</li> </ul>Why it’s still easier than Prolific:<ul> <li data-xf-list-type="ul">Less extensive longitudinal profiling</li> <li data-xf-list-type="ul">Smaller ecosystem → fewer cross-study consistency signals</li> </ul>Bottom line:Automation is possible but unstable, and quality checks would likely flag AI-like behavior over time.[HR][/HR]<h2>3. MTurk — easiest (but still not legitimate)</h2>MTurk was not originally designed with modern AI in mind.Why it’s comparatively easy:<ul> <li data-xf-list-type="ul">Older infrastructure</li> <li data-xf-list-type="ul">Many low-effort, short, text-based HITs</li> <li data-xf-list-type="ul">Less centralized participant profiling</li> <li data-xf-list-type="ul">Some requesters still use weak validation</li> </ul>Why it still fails long-term:<ul> <li data-xf-list-type="ul">Bot-detection scripts</li> <li data-xf-list-type="ul">Honey-pot questions</li> <li data-xf-list-type="ul">Requester-side rejection patterns</li> <li data-xf-list-type="ul">Account bans are common once automation is suspected</li> </ul>Bottom line:It’s the easiest to automate, but also the most polluted and surveilled, making it unreliable and risky.[HR][/HR]<h2>Overall ranking (hardest → easiest)</h2> <ol> <li data-xf-list-type="ol">Prolific <img class="smilie smilie--emoji" alt="🥇" src="https://cdn.jsdelivr.net/joypixels/assets/8.0/png/unicode/64/1f947.png" title="1st place medal    :first_place:" data-shortname=":first_place:" loading="lazy" width="64" height="64" /></li> <li data-xf-list-type="ol">CloudResearch Connect <img class="smilie smilie--emoji" alt="🥈" src="https://cdn.jsdelivr.net/joypixels/assets/8.0/png/unicode/64/1f948.png" title="2nd place medal    :second_place:" data-shortname=":second_place:" loading="lazy" width="64" height="64" /></li> <li data-xf-list-type="ol">MTurk <img class="smilie smilie--emoji" alt="🥉" src="https://cdn.jsdelivr.net/joypixels/assets/8.0/png/unicode/64/1f949.png" title="3rd place medal    :third_place:" data-shortname=":third_place:" loading="lazy" width="64" height="64" /></li> </ol>[HR][/HR]<h2>Important clarification</h2>In all three cases, using me (or any LLM) as a participant while claiming to be human:<ul> <li data-xf-list-type="ul">Violates platform rules</li> <li data-xf-list-type="ul">Undermines research validity</li> <li data-xf-list-type="ul">Risks permanent bans</li> </ul>If your interest is methodological (e.g., “Could AI replace human panels?”), that’s a legitimate research question—and I can help you design proper synthetic-subject experiments that don’t abuse these platforms.Just say the word.[/SPOILER]</blockquote>

Verification