Manual to Using

Manual to Using

Manual to Using

Amazon Mechanical Turk

dclLogoR289x327

Shaney Flores, May 2016

Table of Contents

  1. The Basics
  1. Requester Site
  1. Production Site
  1. Requester/Production Sandbox Site
  1. Building a Study for MTurk
  1. Command Line Tools
  1. mturk.properties
  1. .input file
  1. .question file
  1. .properties file
  1. run.sh
  1. getResults.sh
  1. approveWork.sh
  1. reviewResults.sh
  1. extendHITs.sh
  1. Communicating with Workers
  1. Custom Made MTurk Scripts
  1. assignQuals.py
  1. grantBonus.py

I.The Basics

Amazon Mechanical Turk (MTurk) is a service offered by Amazon where Amazon users can complete small tasks for monetary rewards. Originally developed to assist Amazon in recruiting humans to identify duplicate product pages on its retail site, it has quickly become a new method for behavioral researchers to run quick, large sample, cost-effective online studies with roughly similar validity to running a study in the lab.

The tasksposted on MTurk are called human intelligence tasks (HITs). The community of users on MTurk who can accept and complete these HITs are called workers. The community of users who post HITs are called requesters. When posting HITs, requesters can create multiple copies of the same HIT (i.e., make it so that several workers can completed the same HIT). These multiple copies are called assignments. For example, if I have a task where people watch and segment a single movie, I can post that task as a single HIT and then add 29 more assignments to it so that 30 total workers can complete the same task for me.

Because of the distinction between workers and requesters, Amazon has built two sites to accommodate each group. The site where workers can accept, complete and submit HITs is at The site where requesters can build and post HITs and review work submitted is at Both of these sites require an Amazon account. The account for the DCL is . The lab manager can provide the password for the account.

HITs on MTurk typically pay pretty small amounts ($1 - $5). As more and more researchers have started to use the service, the amount workers expect to get paid as gone up as well. It is recommended that we pay an amount competitive with other researchers using MTurk. The more you pay for a HIT, the faster the HITs will be completed. However, MTurk does charge you for posting HITs. Currently, there is a 20% administration fee for posting a HIT and an additional 20% service fee if you post more than 8 assignments of a HIT. You will need to be wary of how much you are paying workers and how much you will pay Amazon.

Experiments conducted on MTurk can be considered low-risk. Therefore it is advised that whenever you go through the Human Research Protection Office to get approval to conduct an MTurk study, you request a waiver of documented consent. Additionally, unless you are collecting PHI or other identifiable information from workers, you may also wish to classify the project as an exempt project to avoid submitting a continuing review every year.

Further reading providing an overview of the MTurk system can be found in:

Mason, W., & Suri, S. (2012). Conducting behavioral research on Amazon’s Mechanical Turk.Behavioral Research Methods, 44(1), 1-23.

You may also want to follow Amazon’s blog for recent developments to the MTurk platform at this site.

II.Requester Site

The lab is considered a requester on MTurk because we post online studies as HITs. Therefore, the site you will be using most is the requester site. To access this site, simply go to the requester site URL shown above (in “The Basics” section) and login.

Once you have logged in, there are multiple actions you could take. If you need to add money to the account, then you will need to click on the “My Account” link in the upper right-hand corner (as see in Figure 1). You will be directed to a page where you can add money to the account, view the current balance, change the username and password, and view a transaction historyfor money from our account to workers or MTurk.

Ebbinghaus Users shaneyflores Desktop Screen Shot 2016 04 28 at 12 50 18 PM png

Figure 1. “My Account” page of MTurk requester site. Here you can manage money paid for HITs, view current balance and transaction history, and change user settings.

Whenever a HIT is posted, the administrative/service fees and amount paid to workers for completing that HIT will be held for liability and will be deducted from your available balance. This is to prevent you from posting more HITs than you can afford. This money will not be paid out until the requester approves the HITs.

Another action you can do is to create HITs through MTurk’s web interface. This is mainly reserved for creating “compensation HITs,” HITs that are posted for workers who experienced a technical or other issue that prevented them from actually completing one of our studies. To post this kind of HIT, you will need to click on the “Create” tab on the requester site page (as seen in Figure 2). Then you will need to either click on “Copy” to create a new HIT or click on “Publish Batch” and then enter the number of HITs you would like to create. You will then need to provide the worker with the URL to access the compensation HIT. Workers will then simply accept the HIT and submit it.

Once it is submitted, the requester can then approve the HIT and pay a bonus equal to the amount the worker would’ve received for the HIT. When setting up this kind of HIT, always set the amount to be paid out to the worker as $0. This will help hide the HIT from other workers or “bots” that may wish to take advantage of you not thoroughly monitoring this HIT. Additionally, it allows us the flexibility to compensate a worker whatever amount we feel is appropriate (typically how much they would’ve received had they been able to finish the HIT).

Compensating workers in this manner offers two benefits: (1) it improves the worker’s HIT acceptance rate, making them more desirable for other requesters to recruit, and (2) it pays them for the work they would have submitted had their not been an issue. Both of these benefits break in favor of the worker, and in the MTurk community, happy workers are more likely to recommend to other workers to do your HITs.

Ebbinghaus Users shaneyflores Desktop Screen Shot 2016 04 28 at 1 21 06 PM png

Figure 2. If you want to create a new compensation HIT, you can click “Copy” for a previously created HIT and then change all the necessary fields. If you wish to reuse an old HIT, you can simply click on “Publish Batch” and post a new batch of HITs. However, once a HIT is created, you cannot change its properties or fields, so be sure you check what those are before posting a new batch for a previously created HIT.

To manage currently running HITs (i.e., approval or reject submissions, expire HITs, add additional assignments to a HIT, etc.), you can navigate to the “Manage” section (shown in Figure 3). Here, each HIT created through the MTurk web interface will be shown in an ascending list based on creation date.

Ebbinghaus Users shaneyflores Desktop Screen Shot 2016 04 28 at 4 30 43 PM png

Figure 3.The Manage HITs page. Clicking on the name of the HIT will take you to a web-management page for the HIT where you view properties of the HIT and download results from the HIT. It is recommended that you do not use this page to manage your currently running HITs.

By clicking on the name of the HIT, you will enter a simplified web-management properties page for that specific HIT (as seen in Figure 4) where you can view properties associated with the HIT, costs allocated to the HIT, and download worker submissions. The number of actions you can actually take on this page is somewhat limited therefore it is more useful to manage HITs individually where you have more options on what actions to take (more on this further below).

In the “Manage” tab, you will also be able to access a webpage that lists the worker id number for every worker that has completed a HIT for our group (see in Figure 4). Workers on this page are sorted in alphanumeric order by their worker id number, starting from the second character (every worker id is prefixed with the letter A as the first character). Clicking on someone’s worker id will take you to another page where you can assign/revoke qualifications for the worker, pay the worker a bonus, or block the worker from accepting any future HITs (it is highly encouraged that you do not block a worker unless it is absolutely necessary as blocking a worker can have negative repercussions!).

Ebbinghaus Users shaneyflores Desktop Screen Shot 2016 05 15 at 10 53 35 PM png

Figure 4. A listing of worker ids for workers who have previously completed HITs for us.

For HITs that were not created through the MTurk web interface (i.e., HITs created through the command line tools), you will need to click on the “Manage HITs individually” link in the right corner of the Manage>Results page.This page will also list any HITs created through the MTurk web interface. It is on this page where you will have the great flexibility and array of actions you can take to manage your currently running HITs. You should use this page.

Ebbinghaus Users shaneyflores Desktop Screen Shot 2016 04 28 at 4 31 33 PM png

Figure 5. This page allows you to individually manage each HIT posted onto MTurk from the CLTs or the web interface. You will want to be careful when performing actions on this page as most actions performed on MTurk cannot be cancelled after they’ve been performed.

Once, you enter the “Manage HITs individually” page (as seen in Figure 5), you will be able to do any of the functions that necessary to manage your currently running HITs from both web interface and command line created HITs. Such actions include adding more assignments to a certain HIT, extending the deadline for HITs to be completing or expiring that deadline altogether, downloading submissions for each HIT, or approving/rejecting individual submissions for a HIT.It is typically recommended that you use this page to manage all HITs that are posted on MTurk. Clicking on any of the actions that could be taken (e.g., download results, add time) will redirect you to a web-management page for the HIT (shown in Figure 6).

Figure 6. Web-management page for a HIT managed individually. This page should be used to manage your currently running HITs as it provides the greatest ease and flexibility with changing the number of assignments associated with the HIT, approving/rejecting specific submissions, or extending or expiring the time limit for the HIT to be posted on MTurk.

Further information on how to use the requester site can be found here in the API manual. Amazon also provides a basic tour of the site here.

III.Production Site

The production site is the official location where workers can go and accept HITs to work on (shown in Figure 7). Whenever anything is posted onto the production site, it is considered a paid service by Amazon and will be subject to all fees. Additionally, the system requires that every submission from a worker be approved or rejected so that payment can be doled out or withheld. The system is designed so that if a requester does not approve a submission within a specified period of time (usually determined by the requester prior to posting the HIT), Amazon will automatically approve the submission and pay the worker.

Figure 7. The MTurk Production Site. Only HITs that are ready to go live and public should ever be posted to site. If you plan to do any tests, you should use the developer sandbox and worker sandbox sites instead.

The only thing that should go on the production site is the finished product we actually want to use; anything else should go to the developer sandbox (described in the next section).

IV.Requester/ProductionSandbox

Before you attempt to post any HITs to the production site, it is best practice to test the HIT first to ensure the HIT is working properly with Amazon and to ensure that all data are received. Amazon created a few sandbox sites for you to help with this. A sandbox is simply any environment that is self-contained and typically used for development. A sandbox site exists for both the requester and production sites (the hyperlinks to these sites are provided below).

Think of the sandbox as an exact copy of the production and requester sites with one major difference: nothing you do in the sandbox impacts your reputation, finances, or posting history in the MTurk community. No money is paid out when HITs are posted or completed/approved to either the worker or Amazon in the sandbox. The sandbox is only for development and testing of HITs and is free of charge.

You can post HITs through the sandbox requester site (or command line tools described below) to the sandbox production site. Every means of communication between the two sandbox sites is exactly the same as the actual requester and production site.

However, you should NEVERuse the sandbox production site when you are ready to start collecting data. As this site does not pay out any actual money and is an entirely independent site from the production site, if you post a HIT to the sandbox no actual worker will complete it.

The requester sandbox can be accessed at this site. The production sandbox can be accessed at this site.

V.Building a Study for MTurk

Although MTurk has the ability to build HITs through the requester site, it does not provide enough space or capabilities to run studies that we are used to in the lab. Because of this, it is more preferable to host our studies locally on a college web server. MTurk allows you to do this through a feature called external submit, where workers can be redirected from the MTurk production site to an external site to complete the study. The external site can then send the information back to MTurk to notify Amazon that the worker completed the HIT and that the requester should review the submission.

To access the college web server here, you will need to first download an FTP client program (e.g., Cyberduck or Filezilla). An FTP client is simply a program that allows file transfers between two remote machines. You will need this to transfer the web code for your study from your local machine to the server. Additionally, you will want some kind of text editor to help you build your code. The recommended program would be TextWrangler (a text editor Mac app that can interpret many different programming languages and organize your code logically according to the syntax of the language). Apple’s TextEdit or Emacs in the Unix environment would also be fine.

To access the web server, you will need to set up an FTP connection to the following address on Port 21: web.artsci.wustl.edu. The username is dcl. The lab manager can provide the password for the account.Once you have connected to the web server, you will be placed in the dcl’s home directory. There should be a folder called “public_html”. This folder contains allow the web code publiclyaccessible over the Internet. You should create a new directory for your experiment on the web server and upload any web code you have into that directory.

For our experiments, we use several different programming languages (depending on the task). These languages typically include HTML, JavaScript, PHP, AJAX, and CSS. Tutorials for each of these languages can be found at w3schools.com or codeacademy.com.

The final submission page for your HIT should contain the following code:

You should have this code execute when the workers clicks on the final submit button for the HIT. The purpose of this code is to send all collected data through an external page controlled by Amazon to our requester account for review and approval. If this code is not included, then the data from the worker will not be sent and it will be like the worker never completed the HIT. It is very important to have this code.

Another way to collect data online without having to learn or use web code described above is to use Qualtrics. Qualtrics is useful for running Likert scale surveys or simple tasks in a web browser. The university has a license accessible to all university members. Inquire the department chair’s office to learn how to gain access to this service.

The methods available through MTurk to prevent repeat workers (i.e., workers who complete more than one HIT for us) are extremely limited. Therefore, it is best to handle this process on our end. The best method we have discovered so far is two-fold. First, at the very end of your experiment (on the final submit page), you should write a function that prints the worker’s id number to a text file stored on the web server. Accessing the worker id is not very hard as Amazon provides this information when workers access our external site. Once the worker id number is recorded, you can prevent them from accessing your study again by including another page at the beginning of the study that opens the text file, searches for the worker’s id number, and, if located, prevents them from going any further. If they are not in the text file, then the worker can proceed to the study. Creating such a process does not take much beyond some simple JavaScript, PHP and AJAX code.