Suppose University A sent out a survey in 2017 to a random sample of 300 students in Australian campuses and collected various information of them, including gender, birthday, GPA, pulse rate, the child rank in the family, the number of children in the family, the number of speed tickets, whether he/she is a smoker or not, the number of hours sleep per day, region, whether he/she lives on campus or not and whether he/she is over twenty years old. The data is

reported in the Survey.xlsx data file.

(1) What is population University A is interested in

(2) Use Microsoft Excel to conduct and present the appropriate graphic and descriptive analysis on the following variables: GPA, birthday and the child rank in the family, interpret the results, and prepare a report.

(3) Suppose University A also conducted the same survey in 2016, the proportion of rural students was 0.20. They would like to know if this proportion has decreased.

• What is the appropriate statistics used here to test it

• What are the null and alternative hypotheses for this test

• Produce the test results using Microsoft Excel, interpret it and make a final conclusion.

Suppose you want to rent a two-bedroom apartment in Sydney and you are interested two main suburbs (Suburb 1 and Suburb 2) and want to find out:

• whether the average rent in Suburb 1 is significantly higher than $450 per week

• whether the average rent in Suburb 1 is higher than that in Suburb 2

In order to find the answers, you determined to do a research and conduct quantitative analysis. You are required to:

(2) What are populations you are interested in for each of the above two questions

(3) Conduct your analysis using HYPOTHESIS TESTING AND ESTIMATION method and summarie your results in a report format.

This report looks at two datasets to answers different questions relating to them. The first dataset relates to a 2017 survey conducted across Australian campuses to get relevant information on key parameters that include: gender, birthday, GPA, pulse rate, the child rank in the family, the number of children in the family, the number of speed tickets, whether he/she is a smoker or not, the number of hours sleep per day, region, whether he/she lives on campus or not and whether he/she is over twenty years old. Using this data we present out findings on

- GPA
- The weekday on which a birthday falls
- The rank of the child in the family
- Changes in the proportion of students from rural backgrounds since 2016

The second dataset relates to a sample of 50 houses each in Suburb 1 (Newton) and Suburb 2 (Hurstville)in Australia. We investigate if the difference in average rents in these suburbs is statistically significant.

We first look at the data on GPA.

Mean | 3.243051 |

Standard Error | 0.029442 |

Median | 3.3 |

Mode | 3.4 |

Standard Deviation | 0.487344 |

Sample Variance | 0.237504 |

Kurtosis | 2.178944 |

Skewness | -1.0878 |

Range | 3 |

Minimum | 1 |

Maximum | 4 |

Sum | 888.596 |

Count | 274 |

From the data we get 274 sample values as some values are missing and need to be dropped. The mean GPA is 3.24, while the mode and median are both greater than mean. As the table above for descriptive statistics shows there is a small negative skewness in data. This is confirmed by the histogram drawn below. Most of the GPA values are concentrated y=towards the higher end of the range. The frequency of GPA within 3.25 and 3.5 is highest with 68 students scoring in this bracket. The minimum GPA is 1 but there are only 7 students who score less than 2.25 as GPA.

class interval | Frequency |

<1.25 | 1 |

1.25 - 1.5 | 2 |

1.5 - 1.75 | 0 |

1.75 - 2 | 3 |

2 - 2.25 | 1 |

2.25 - 2.5 | 17 |

2.5 -2.75 | 18 |

2.75 - 3 | 40 |

3 - 3.25 | 41 |

3.25 -3.5 | 68 |

3.5 - 3.75 | 46 |

>3.75 | 37 |

When we examine what dates the birthdays of students fall we see that the most birthdays fall on Wednesday. The proportion of such students with a birthday on Wednesday is 50/280 =0.178 or 17.8%. At the other extreme Friday sees least birthday- only 10% students having a birthday on Friday.

Moving to the rank of the students in the family, we have a maximum rank of 7, and rank 1 as the smallest rank. There is a dominance of students with rank 1- 88 out of 280 get rank 1. While only 1 students reported the rank 7. Rank 6 is equally scarce with just 2 students reporting it. As the rank rises. We have lesser students reporting it.

Lastly we come to the question of changes in rural population in the survey. As per our data 50 students report a rural background or 50/280 = 0.178. this is the sample proportion. The 2016 figure was 0.2. To test if this rural proportion has reduced in a statistically significant way we use a hypothesis test.

HO: null hypothesis: p=.2

H1; alternative hypothesis : p <.2

We use a z test here. The critical z value at 95% confidence level will be -1.645.

The test value = (sample proportion – hypothesized proportion/ SE )

The standard error = ( .2*.8/280)^.5 = 0.0239

The test value = ( 0.178 - 0.2)/.0239 = -0.00053

As the test value is less than critical value in absolute terms we DO NOT REJECT the null hypothesis. There is NO statistical evidence that the proportion of rural background students has decreased between 2016 and 2017.

We now use the technique of hypothesis testing to determine

- Is the average rent in suburb 1 greater than 450.
- If the average rent in suburb 1 is greater than in suburb 2 in a significant way

we use Excel to look at difference in rents. Using the DATA ANALYSIS tab we conduct a z test for difference in mean. A z test is appropriate as the sample size exceeds 30 for both.

Let us consider the first question and formulate an appropriate hypothesis.

HO: null hypothesis: µ= 450

H1: alternative hypothesis : µ > 450

We use a z test here. The critical z value at 95% confidence level will be 1.645.

Sample average = 720

The test value = (sample average – hypothesized average/ SE )

The standard error = (standard deviation / 280 ^.5) = 124.3309/280^.5 = 7.439

The test value = ( 720-450)/7.439 =36.33

As the test value is more than critical value in absolute terms we DO NOT ACCEPT the null hypothesis. There is statistical evidence that the average rents in Newton exceed $450.

We now turn to the second question.

Ho: µ1 = µ2

H1: µ1 > µ2

| Suburb 1 (Newtown) |

Mean | 720 |

Known Variance | 15458.16 |

Observations | 50 |

Hypothesized Mean Difference | 0 |

z | 11.43148321 |

P(Z<=z) one-tail | 0 |

z Critical one-tail | 1.644853627 |

P(Z<=z) two-tail | 0 |

z Critical two-tail | 1.959963985 |

Using a 1 tail (right tail) test we can see that the z test value is 11.43, while the critical value at 95% level of confidence is 1.645. As test value > critical value we DO NOT ACCEPT the null hypothesis. There is statistical evidence that the rents in Newton are higher than in Hurstville.

Based on the datasets given we can make the following conclusions:

- GPA scores are negatively skewed with no significant outliers.
- Rank 1 is most common.
- Wednesday is the most common day for birthdays.
- There is evidence that the average rents in Newton exceed $450.
- There is NO statistical support that the proportion of rural background students has decreased between 2016 and 2017.
- There is statistical evidence that the rents in Newton are higher than in Hurstville

