## Question:

Analyse the investor data of the XYZ Investment advisors, Assume this is an international company that makes investments on behalf of its customers. The lecturer roughly based the data on real world data. Investor data will give you the skills to analyse real data.

Prepare basic advice for people for new employees at the XYZ company based on the previous findings, if this is too difficult give a general discussion about the issues with using samples.

We are living in the Big Dara era where a lot of data is collected daily from various sources and analysed to help understand the current trends and maybe even help forecast the future trends. It is for this reason that statistics remain a crucial tool in the business world. Various components of statistics such as summaries, visual displays, regression models, etc., can be used to gain actionable information that empowers better decision making in businesses.  This research evaluates the investment practices for students by studying the connection between their investment levels, their income, their marital statuses, age, number of children, etc. the sample size of the data used to carry out the various analyses in this study consists of 40,000 observations, each with fourteen (14) variables. These variables can be broken down into two major categories: Categorical variables and Numerical variables. These two divisions are further subdivided into various measurement levels:

1. ### Categorical Variables:

These variables consist of categorical data, i.e., data which can be divided into groups.

Categorical variables consist of nominal variables and ordinal variables. The categorical data used in this analysis are all of nominal measurements. They include:

1. ### Investment type

This variable contains information about the type of investment practised by a student. It has two levels of measurement: Low risk and High risk.

1. ### Fees too high

This variable displays information about the opinion of the students on whether the fess are high or low. It has two levels: Fess too high and Fees not too high.

1. ### Gender

This variables provides information about the sex of the students participating in the survey. It has two levels: Male and Female.

1. ### Married

This variable provides data about the marital status of the students. It has two level: Married and Not married.

1. ### Country

This variable provides information about the country of residence for the students. It has two levels: Country1 and Country2; a student can be from either countries.

1. ### Have children

This variable categorizes the students into: those with children and those without children.

1. ### Age group

This variable categorizes the students into two groups: “Below 50” and “Above 50”.

1. ### Numerical Variables

The data represented by this type of variable contain numbers. All the numerical data used in this analysis are of ratio measurements. They include:

1. ### Student ID

The data contained in this variable are identification numbers for the students who participated in the survey.

1. ### Income

This variable displays information about the income of the students, in dollars.

1. ### Amount Invested

This variable represents information about the amounts invested by the students. It is also measured in dollars.This variable represents information about the students’ investment returns.The data contained in this variable is the information about the amount paid by each student.The variable provides information about the number of children the students have.

1. ### Review of an academic source related to investors

Alesina & Perotti (1994) in their paper, tests on a sample size of 71 nations over a certain time period, a variety of hypotheses. Of interest to our study, is the test performed on income inequality and investments. They come to the conclusion that an inverse relationship exists between income inequality and investment, i.e., in other words, an increase in income inequality results in a decrease of investments. To come to this conclusion, a model was generated using robust regression model techniques.

1. ### A simple Bivariate analysis of the investor data

Based on Alesina & Perotti’s study, it is expected that a relationship exists between income and investment. This section of the report tests this claim. The comparison seeks to determine a relationship between two numerical variables: Income, and amount invested. The scatterplot given by this analysis is presented below:

The above graph shows the scatterplot for amount invested versus income. This plot reveals a presence of a relationship between Income and Amount invested. The relationship between the two variables is a positive linear correlation, meaning that as the income increases, the amount invested also increases.

A correlation analysis for the two variables results in The correlation coefficient of the test is 0.7244, this is a relatively huge value. It implies a strong positive linear relationship between Income and amount invested, i.e., as one variable increase, the other one also increases.

The correlation analysis however is not sufficient to determine whether the increase in one variable is caused by the other. To make this conclusion, a regression analysis is carried out.

### Hypothesis test

A regression analysis is carried out at 0.05 significance level to test the relationship between the two variables. The dependent variable is “Amount Invested” while the independent variable is “Income”.

• ### Hypothesis

Null hypothesis (H0): The model is insignificant

Alternate hypothesis (HA): The model is significant

• ### Results

The least squares regression line is given by Where  is the predicted value of amount invested, and X is the amount of Income. This equation means that; all variables held constant, the amount invested should be -\$161471.40, i.e., if a student has no income, the amount invested should be less than a dollar.  Unit increase in income increases the amount invested by \$3.711.

The coefficient of determinant (R-square) for the regression analysis is given by 0.5248, i.e., 52.48% of the variation in amount invested is explained by Income. The regression model is therefore significant.

The p-value of the analysis is less than the alpha, the null hypothesis is therefore rejected in favour of the alternate hypothesis. This means that the model is significant and can therefore be used for prediction. Income of a student can thus be used to predict amount of money they can invest.

1. Investigation of the variable ‘return per \$1000’
2. Finding the 95% confidence  interval
3. 95% Confidence  Interval
 95% Confidence Interval for 'return per \$1000' Sample size 40000 Sample mean 39.12975 Standard Deviation 14.59115271 Confidence Coefficient 0.95 Significance Level 0.05 Margin of Error 0.142990669 Point Estimate 39.12975 Lower Limit 38.98675933 Upper Limit 39.27274067

The above table from Excel contains the calculated margin of error and the point estimate (mean) of the return per \$1000 data. The upper confidence interval is given by adding the margin of error to the point estimate, while that of the lower confidence interval is given by subtracting the margin of error from the point estimate.

CI (Upper limit) = 39.12975 + 0.14299 = 39.27274

CI (Lower limit) = 39.12975 – 0.14299 = 38.98676

At 0.05 significance level, the average of return per \$1000 lies between 38.98676 and 39.27274.

1. Testing the claim the return per \$1000 is above  \$30”

This claim is tested using the One-sample T-Test.

• ### Hypothesis

Null hypothesis (H0): µ ≤ 30

Alternate hypothesis (HA): µ >30

• Level of significance = 0.05
• Critical region: Reject H0 if t-stat > t critical
• Results
 Count Mean Std Dev Std Err t df 40000 39.12975 14.59115 0.072956 125.1409 39999 Two Tail 0 1.960023 38.98676 39.27274 yes
The t-stat value exceeds the t-critical value; there is sufficient evidence to reject the null hypothesis. The p-value is also less than the alpha value, the null hypothesis is therefore rejected. This means that the return per \$1000 is above \$30.
1. ### Basic advice for XYZ Company Workers

One of the most significant and prevalent issues that people face in the investment and financial industry is deciding whether or not to invest, what to invest on, and how much to invest. From previous studies, it has been realized that compared to men, women have the least literacy in investment and the practices surrounding it. This, to some extent explains why female are a smaller percentage of the entire customer database.

Based on this claim, my advice to the workers of the XYZ Company would be that; they develop a model, similar to the regression model in the previous section. This model could take the inputs about the financial characteristic information of the customers and use them to generate a rough predicate of the percentage of income one ought to put into investments, so as to achieve maximum returns and also not overwhelm oneself financially. The model could also generate a rough estimate of the returns one should expect. Putting this suggestion into action could result in an increment of customers in the company, since investment would have been made possible for even persons with little investment knowledge.

## Conclusion

The study efficiently demonstrates the application of statistics in business through the hypothesis tests, summary measures, creation of models, etc. From the investigation of the Return in \$1000 variables, we see that the company’s average returns in \$1000, exceeds \$30. This is an indication that the company is doing well by their investors. Among the most significant realizations made from the analysis is that amount invested can be forecasted based on the other variables. A further study on the other variables affecting Amount invested is recommended.

## References

Alesina, A. and Perotti, R., 1994. Income distribution, political instability, and investment.

