ATHK1001 Analytical Thinking

Subject Code :
ATHK1001
Country :
AU
University :
The University Of Sydney

Question:

Background and Aims

Data can have characteristic that surprise us. Hill (1998) describes one such surprise is that across numbers representing many types of data the first digit has a systematic and predictable distribution, in particular it has a log distribution such that about 30% of the time the first digit is 1, 18% it is 2, down to 5% for the digit 9. This was first shown by the engineer Frank Benford, so this first digit distribution is known as Benford’s law. Since he demonstrated it there have been hundreds of papers published showing evidence that Benford’s law applies to all sorts of data (though not all).

Despite all this research into Benford’s law, there is very little research into whether it has any impact on human behaviour, even though an implication of all this evidence for Benford’s law is that people are constantly exposed to it. In this assignment we will test one idea about how it might affect human behaviour, that it might influence our memories even if we are not consciously aware that low first digits are more common.

Things that are familiar tend to be easier to remember, so if low first digits are more familiar to us (due to us being exposed to them more) perhaps we have will have better memory for numbers starting with low first digits. To test this, we will set up an experiment of our ability to recognize numbers we have seen before, and examine if we have better memory for numbers starting with low first digits.

Method

Participants

A total of 327 students from analytic thinking course participated as part of a class experiment. Additional students participated but either did not complete the experiment or did not consent to having their data analysed. Of these 189 were female, 138 were male and they had a mean age 19.4 years).

Materials

Sets of nine question and their answers were generated for nine different domains. For example, for the domain “areas of countries” the nine questions were:

1. The Total Area (square kilometres) of Greece ANSWER: 131940

The Total Area (square kilometres) of Gabon ANSWER: 267667
The Total Area (square kilometres) of Poland ANSWER: 312685
The Total Area (square kilometres) of Cameroon ANSWER: 475440
The Total Area (square kilometres) of Madagascar ANSWER: 587040
The Total Area (square kilometres) of South Sudan ANSWER: 619745
The Total Area (square kilometres) of Chile ANSWER: 756950
The Total Area (square kilometres) of Pakistan ANSWER: 803940
The Total Area (square kilometres) of Tanzania ANSWER: 945087

Domains used were areas of countries, electricity consumption, homicide rates, GND, road deaths, number of protestants, populations, birth rates, and square roots. These domains were selected because there is evidence that Benford’s law applies to them and we doubt participants came into the experiment knowing the answers to questions in these domains. Each answer in a domain started with a different first digit, and the order of questions was randomized for each participant.

Procedure

During tutorials for the class Analytic Thinking at the University of Sydney participants completed the experiment individually on computers. They then completed the experiment in a set of steps.

In Step 1 participants were given nine question with their correct numerical answers and given 15 seconds to study each one. All nine question came from the same, randomly chosen, domain. Each answer starting with a different first digit (1-9). This step setup the recognition test in Step 3.

In Step 2 participants were presented with four or five question from a different domain and again given 15 seconds to study each. This step setup the recall test in Step 4.

In Step 3 participants received the same nine questions as they received in Step 1, but the answers were either the same as they saw in Step 1 or different. Participants were instructed to:

“Please try to recall the answer to the following question. If you recognize the answer as correct then click ‘yes’, if you do not recognize it then click ‘no’. If don't know the answer, then give your best guess.”

We varied whether the answers were the same or different to ones given in Step 1. Randomly, either the answers starting with odd first digits were the same, or those with even first digits were the same. Incorrect answers had the same magnitude as the correct answers, but the first digit was increased by 2 (note that 9 became 1, and 8 became 2) and subsequent digits were randomly changed.

Participants were also asked how confident they were that each of their answers was correct. They were asked “How confident are you that your response is correct?” and answered with one of five options: ”Not at all”, “Slightly”, “Moderately”, “A lot”, or “Extremely” arranged along a line.

In Step 4 participants were asked to recall the answers to all nine of the questions used in the domain given in Step 2. Data from this task is not included as part of this assignment, so it will not be further described.

Hypotheses

We proposed four hypotheses. First, we have to verify that participants showed some evidence of having correct memories of the items they had to recognize. This predicts that people’s performance on items that test recognition memory will be above chance, thus

Hypothesis 1: Participants’ total recognition scores will be higher than chance.

Hill (1998) described how the numbers we encounter do not have an even distribution of first digits. One way to test whether first digits of answers made a difference to memory performance would be to compare the two recognition memory conditions, thus

Hypothesis 2: Participants’ total scores will be different in the odd condition than the even condition.

If participants have better memory for numbers with low first digits, then they should have better recognition memory performance for the items with low first digits, thus

Hypothesis 3: Participants’ low total will be different to their high totals If participants are aware of which items they remember then there should be an association between memory performance and confidence, thus

Hypothesis 4: Participants’ total scores will be positively correlated with their confidence.

Results

This assignment can be found there as well as an Excel file called “Assignment1_dataset.xls”. This Excel file contains all the data for the assignment and has 327 data lines, one for each participant. Each participant has values for 19 variables.

The first variable is an id number then the variable “condition” which has the value “1” if even numbered first digits for answers to the recognition questions were correct and the value “2” if the odd number first digits were correct.

The variables “correct1” to “correct9” represent whether the participant gave the correct response to each of the nine recognition questions starting with the first digits 1 to 9. If either the participant responded “yes” when the answer was exactly the same as the correct one (i.e., the one they initially saw) or they responded “no” when then answer was not the one they saw previously, then they scored “1”. Otherwise they scored “0”.

The variables “confdence1” to “confidence9” represent participants’ confidence level in each of their nine answers to the recognition questions. Answers were coded as 1 (not at all) to 5 (extremely).

Task:

Your task is to analyse the data in order to test the four hypotheses proposed above. You will do this by addressing each of the following ten questions. Answer all questions with complete sentences, not with just numbers, notes or tables. Do not include the text of the questions in your assignment (this will trigger a plagiarism warning), but you should include the number of the question being addressed.

1) For each participant calculate the total number of questions they correctly answer. State the mean, median and standard deviation for total score.

2) For the mean total score you calculated in Question 1 use a t-test to test Hypothesis 1 that participants scored better than chance. Chance responding would give a mean of 4.5 items correct. Report the p-level for the t-test and state clearly whether or not Hypothesis 1 was supported, and why. (Note that we will be discussing hypothesis testing in lectures in Week 4 and practicing using Excel to test hypotheses in tutorials in Week 5. So you may need to wait to answer this question until we have covered the relevant material in class.)

3) For participants whose correct answers had even numbered first digits, calculate and present the mean and standard deviation of total scores (i.e., when condition = 1). For participants whose correct answers had odd numbered first digits, calculate and present the mean and standard deviation of total scores (i.e., when condition = 2).

4) Use a t-test to calculate whether condition affected total scores. Report the p-level and state clearly whether or not the Hypothesis 2 was supported, and why.

5) Calculate two new totals, the participant’s “low total”, as the total score for answers starting with low first digits (i.e., correct1 to correct4) and “high score” as the total for answers starting with high first digits (i.e., correct5 to correct9). To make these scores comparable you need to convert these scores into proportions by dividing them by the maximum possible score (i.e., 4 or 5). For each of the two proportions report the mean and standard deviation.

6) Use a t-test to calculate whether “low scores” differed from “high scores”. Report the p-level and state clearly whether or not Hypothesis 3 was supported and why.

7) Calculate participants’ total confidence score across the nine questions, and report the mean and standard deviation for total confidence. Calculate the correlation between total confidence and total score, and report this correlation. For a sample of 327 participants, all correlations above .108 are statistically significant, so state whether Hypothesis 4 was supported and why. Present the scatter graph of this relationship. State what you can conclude about the relationship between confidence and recognition memory performance, and how strong it is.

8) Identify three different issues with the way we collected data which could limit our ability to draw conclusions from it. These issues could relate to one or more of the hypotheses. Clearly differentiate the three issues as “Issue 1”, “Issue 2” and “Issue 3” and explain how each of these issue relates to a data collection consideration raised in ATHK1001 lectures. For each issue suggest a way it might be resolved in future research or if it can’t be resolved then explain why.

9) What do you think OUR data analysis tell us about whether Benford’s law impacts on human behaviour? Did we find evidence that first digits affected memory performance? Do our results add anything to Hill’s (1998) description of Benford’s law? Explain your answers with reference to the results of your testing of the hypotheses and possibly the issues you raised in Question 8.

10) Include a reference section which lists the full reference for any paper you have cited when addressing these questions. You must cite Hill (1998) in Question 9, and include citations where ever appropriate. You should use APA style for citations and references, but we will accept other standard journal article referencing formats.

Posted on : April 9th, 2020