Analysis Of Koala Population In South East Queensland
Question 1
Question 1
Researchers are concerned with the decline in the proportion of koalas in the wild who are juveniles, as this will impact the future adult koala population. Historically, the proportion of koalas in South East Queensland who are juveniles is 20%. Use the information in the dataset koalas17.sav to answer the following questions:
- Using SPSS, calculate the proportion of koalas in the study who are juveniles.
Solution
Table 1: Frequency distribution of age
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
||
Valid |
Adult |
313 |
86.5 |
86.5 |
86.5 |
Juvenile |
49 |
13.5 |
13.5 |
100.0 |
|
Total |
362 |
100.0 |
100.0 |
From table 1 above, the proportion of koalas in the study who are juveniles is 13.5%
- Without using SPSS, determine whether there is evidence to support the theory that the proportion of koalas who are juveniles is less than 20%. Perform a hypothesis test to statistically justify your answer by completing the following:
- State the appropriate hypotheses (define any symbols used).
Solution
- Check the conditions and assumptions for this test.
Solution
Because of the central limit theorem (large sample size), the sampling distribution of p is normally distributed
- Calculate the test statistic for this test.
Solution
- Calculate the P-value for this test.
Solution
The p-value associated with the computed z score value is 0.000997.
- Interpret the P-value and write a meaningful conclusion in the context of this situation.
Solution
The p-value is less than 5% level of significance we therefore reject the null hypothesis and conclude that that the proportion of koalas who are juveniles is significantly less than 20%.
Question 2
Use the information in the dataset koalas17.sav to answer the following questions. You should use SPSS to calculate the sample statistics you will need to do this question, but for parts (b) and (c) you are required to do all other calculations by hand, using a calculator. Regardless of your answer to part (a), complete all parts of this question.
- (7 marks) Check the appropriate conditions and assumptions needed to calculate either a confidence interval or hypothesis test in relation to the population mean height of the trees in which juvenile koalas are sighted in South East Queensland. Comment on what these checks indicate about the appropriateness of proceeding with the analysis. Include an appropriate graph to support your answer.
Solution
Normality Test
One of the key assumptions is related to the normality of the data. So we checked whether the variable height of the trees in which juvenile koalas are sighted in South East Queensland is normally distributed. Results are in table 2 below;
Table 2: Tests of Normality
Age |
Kolmogorov-Smirnova |
Shapiro-Wilk |
|||||
Statistic |
df |
Sig. |
Statistic |
df |
Sig. |
||
heightoftree |
Adult |
.074 |
313 |
.000 |
.960 |
313 |
.000 |
Juvenile |
.101 |
49 |
.200* |
.954 |
49 |
.052 |
|
*. This is a lower bound of the true significance. |
|||||||
a. Lilliefors Significance Correction |
Using either Kolmogorov-Smirnov test or Shapiro-Wilk test, results shows that the variable height of the trees in which juvenile koalas are sighted in South East Queensland is indeed normally distributed (p-value > 0.05).
Test of homogeneity of variance
Using Levene’s test, we checked whether the variable height has equal variance. As can be seen in table 3 below, the assumption on homogeneity is met (p-value > 0.05). The two populations have equal variance.
Table 3: Test of Homogeneity of Variances
heightoftree |
|||
Levene Statistic |
df1 |
df2 |
Sig. |
.416 |
1 |
360 |
.519 |
Other conditions that we found to have been met include
- Each value in the sample was sampled independently from each other value
- (6 marks) Estimate the population mean height of the trees in which juvenile koalas are sighted in South East Queensland, using a 95% confidence interval (show all working).
Solution
First we obtain the sample statistics.
Table 4: Descriptive Statistics
Age |
N |
Minimum |
Maximum |
Mean |
Std. Deviation |
|
Adult |
heightoftree |
313 |
5 |
50 |
17.43 |
5.390 |
Valid N (listwise) |
313 |
|||||
Juvenile |
heightoftree |
49 |
6 |
23 |
15.86 |
4.578 |
Valid N (listwise) |
49 |
Confidence interval;
Lower limit: 15.86 – 1.28184 = 14.57816
Upper limit: 15.86 + 1.28184 = 17.14184
Thus the 95% confidence interval is between 14.57816 and 17.14184
- (9 marks) From historical data, it is known that the mean height of trees in which juvenile koalas have been sighted is 15 metres. Perform a hypothesis test to see if there is evidence to support a suspicion that the mean height of trees in which juvenile koalas are sighted in South East Queensland is more than this. In performing this test:
- State appropriate hypotheses (define any symbols used).
Solution
- Calculate the value of a suitable test statistic for this test.
Solution
- Calculate the P-value of this test.
Solution
The p-value associated with the z score value of 1.314985 is 0.094272.
- Write a meaningful conclusion at the 5% level of significance.
Solution
Solution to Question 1
Since the p-value is greater than the 5% level of significance, we fail to reject the null hypothesis and conclude that there is no enough statistically significant evidence to conclude that the mean height of trees in which juvenile koalas sighted in South East Queensland is more than 15 metres.
Question 3
Use the information in the dataset koalas17.sav to answer the following questions. You should use SPSS to calculate any sample statistics you will need to do this question, but for parts b(ii) and (c) you are required to do all other calculations by hand, using a calculator. Koalas of two broad age groups were sighted in South East Queensland – adult and juvenile. As a researcher you are interested to see whether there is a difference in the mean height of trees in which these two groups of koalas are sighted. Regardless of your answer to part (a), complete all parts of this question.
- (6 marks) Check the appropriate conditions and assumptions needed to perform a hypothesis test comparing the population mean heights of trees in which the two groups of koalas, based on age, are sighted. Comment on what these checks indicate about the appropriateness of proceeding with the analysis. Include an appropriate graph to support your answer.
Solution
- The very first assumption that needs to be made is in regard to the scale of measurement that is applied on the collected data should be continuous or may be ordinal scale. The variable height is a continuous variable hence this assumption was met.
- The second assumption is in relation to randomness of the sample, the idea is that the collected data should be a representative data that is selected randomly from a portion of the total population. This assumption was also met.
- The third assumption is that the data follows a normal distribution, bell-shaped distribution curve.
Results are in table 3 below;
Table 2: Tests of Normality
Age |
Kolmogorov-Smirnova |
Shapiro-Wilk |
|||||
Statistic |
df |
Sig. |
Statistic |
df |
Sig. |
||
heightoftree |
Adult |
.074 |
313 |
.000 |
.960 |
313 |
.000 |
Juvenile |
.101 |
49 |
.200* |
.954 |
49 |
.052 |
|
*. This is a lower bound of the true significance. |
|||||||
a. Lilliefors Significance Correction |
Using Kolmogorov-Smirnov test, results shows that the variable height of the trees in which either adult or juvenile koalas are sighted in South East Queensland is indeed normally distributed (p-value > 0.05).
- The other assumption is in relation to a reasonably large sample size. As could be seen, the sample size was 362 which is reasonably large hence this assumption was met.
- The last assumption is of equal variance i.e. homogeneity of variance. The data need to have equal variances.
Using Levene’s test, we checked whether the variable height has equal variance. As can be seen in table 3 below, the assumption on homogeneity is met (p-value > 0.05). The two populations have equal variance.
Table 3: Test of Homogeneity of Variances
heightoftree |
|||
Levene Statistic |
df1 |
df2 |
Sig. |
.416 |
1 |
360 |
.519 |
- Using an appropriate statistical test, determine if, on average, there is a difference in the height of trees in which the two groups of koalas, based on age, are sighted in South East Queensland. In performing the test, include:
- State appropriate hypotheses, clearly defining all symbols.
Solution
- Calculate a suitable test statistic (you can use the results from part (a) in this calculation.
Solution
Table 5: Descriptive Statistics
Age |
N |
Minimum |
Maximum |
Mean |
Std. Deviation |
|
Adult |
heightoftree |
313 |
5 |
50 |
17.43 |
5.390 |
Valid N (listwise) |
313 |
|||||
Juvenile |
heightoftree |
49 |
6 |
23 |
15.86 |
4.578 |
Valid N (listwise) |
49 |
- Find the P-value of the test (and include the degrees of freedom).
Solution
- Interpret the P-value and write a meaningful conclusion in the context of the question.
Solution
The p-value associated with the z score value of 2.1761 is 0.0148. This value is less than the 5% level of signicance. We therefore reject the null hypothesis and conclude that there is a significant difference in the height of trees in which the two groups of koalas, based on age.
- Now use SPSS to check your results for this hypothesis test. Attach or copy and paste the relevant output from SPSS for this test to your assignment.
Solution
Using SPSS we obtain the following;
Group Statistics |
|||||
Age |
N |
Mean |
Std. Deviation |
Std. Error Mean |
|
heightoftree |
Adult |
313 |
17.43 |
5.390 |
.305 |
Juvenile |
49 |
15.86 |
4.578 |
.654 |
Independent Samples Test |
|||||||||||
Levene’s Test for Equality of Variances |
t-test for Equality of Means |
||||||||||
F |
Sig. |
t |
df |
Sig. (2-tailed) |
Mean Difference |
Std. Error Difference |
95% Confidence Interval of the Difference |
||||
Lower |
Upper |
||||||||||
heightoftree |
Equal variances assumed |
.416 |
.519 |
1.933 |
360 |
.054 |
1.571 |
.813 |
-.027 |
3.169 |
|
Equal variances not assumed |
2.177 |
70.583 |
.033 |
1.571 |
.721 |
.132 |
3.010 |
An independent samples t-test was done to compare the mean height of trees for the adults and for the juvenile. Results showed that the average height of trees for the adults (M = 17.43, SD = 5.39, N = 313) had significant difference with the height of trees for the juvenile (M = 15.86, SD = 4.578, N = 49), t (320) = 1.93, p < .05, two-tailed.
- Briefly comment on how the test statistic and P-value from SPSS output are similar to or differ from your hand calculations.
Solution
The test statistics from the SPSS is the same from that obtained from the hand calculations.
- (6 marks) Estimate, with 90% confidence, the population mean difference in height of trees in which the two groups of koalas, based on age, are sighted in South East Queensland. Ensure you explain the confidence interval in the context of the question.
Solution
Lower limit: 1.57 – 1.186836 = 0.383164
Upper limit: 1.57 + 1.186836 = 2.756836
Therefore the 90% confidence interval for the population mean difference in height of trees in which the two groups of koalas, based on age is between 0.3832 and 2.7568.
Question 4
Use the information in the dataset koalas17.sav to answer the following questions. Researchers have theorized that juvenile koalas, when feeding, move down the tree rather than up. In order to test this theory, the height from the ground at which each juvenile koala was located sleeping within a tree was initially measured when the koala was sighted (positionintree) and then again three hours later (laterpositionintree). It now needs to be determined whether, on average, the height of the juvenile koalas in the trees has decreased after three hours.
- Use a parametric test to answer this question by completing the following (parts i. to v. are to be completed without the aid of SPSS, although summary statistics, i.e. mean and standard deviation, required for the test can be found using SPSS):
- State appropriate hypotheses (define any symbols used).
Solution
Where,
- State (but do not check) the assumptions for carrying out this test. Describe the assumptions in the context of this question.
Solution
- The very first assumption that needs to be made is in regard to the scale of measurement that is applied on the collected data should be continuous or may be ordinal scale. The variable height is a continuous variable hence this assumption was met.
- The second assumption is in relation to randomness of the sample, the idea is that the collected data should be a representative data that is selected randomly from a portion of the total population. This assumption was also met.
- The third assumption is that the data follows a normal distribution, bell-shaped distribution curve.
- The other assumption is in relation to a reasonably large sample size. As could be seen, the sample size was 362 which is reasonably large hence this assumption was met.
- The last assumption is of equal variance i.e. homogeneity of variance. The data need to have equal variances.
- Calculate the value of a suitable test statistic for this test.
Solution
Descriptive statistics
Descriptive Statistics |
||||||
Age |
N |
Minimum |
Maximum |
Mean |
Std. Deviation |
|
Adult |
Position of koala in tree (m) |
313 |
4 |
29 |
13.14 |
4.926 |
Position of koala in tree 3h after first sighting (m) |
313 |
0 |
21 |
11.02 |
4.263 |
|
Valid N (listwise) |
313 |
|||||
Juvenile |
Position of koala in tree (m) |
49 |
4 |
21 |
12.28 |
4.276 |
Position of koala in tree 3h after first sighting (m) |
49 |
1 |
19 |
11.02 |
4.323 |
|
Valid N (listwise) |
49 |
- Calculate the P-value of this test.
Solution
The p-value related to the test statistics is 0.001759.
- Interpret the P-value and describe the outcome of the test in the context of this question.
Solution
The given p-value is greater than the 5% level of significance we therefore reject the null hypothesis and conclude that on average, the height of the juvenile koalas in the trees has decreased after three hours (John , 2006).
- Now use SPSS to carry out the analysis. Copy and paste the relevant SPSS output to your assignment. Do these results agree with those found in part iv? (Hint: comment on the pvalue).
Solution
Using SPSS we obtained the following results;
Paired Samples Statistics |
|||||
Mean |
N |
Std. Deviation |
Std. Error Mean |
||
Pair 1 |
Position of koala in tree (m) |
12.28 |
49 |
4.276 |
.611 |
Position of koala in tree 3h after first sighting (m) |
11.02 |
49 |
4.323 |
.618 |
Paired Samples Correlations |
||||
N |
Correlation |
Sig. |
||
Pair 1 |
Position of koala in tree (m) & Position of koala in tree 3h after first sighting (m) |
49 |
.779 |
.000 |
Paired Samples Test |
|||||||||
Paired Differences |
t |
df |
Sig. (2-tailed) |
||||||
Mean |
Std. Deviation |
Std. Error Mean |
95% Confidence Interval of the Difference |
||||||
Lower |
Upper |
||||||||
Pair 1 |
Position of koala in tree (m) – Position of koala in tree 3h after first sighting (m) |
1.255 |
2.862 |
.409 |
.433 |
2.077 |
3.070 |
48 |
.004 |
Yes the above results agree with those obtained by hand.
- Describe an alternative statistical test that could be used to answer this question. Include in your answer:
- the name of the test,
Solution
Wilcoxon Signed Rank Test
- the conditions/assumptions required for this test (in the context of the question),
Solution
The only condition for this test is that the values need to be numerical. It however does not rely on the assumptions that parametric tests need to have.
- a definition of the test statistic that would need to be calculated to perform this test,
Solution
The test statistics is given as;
- the relative advantages and/or disadvantages of this test compared with the test you conducted to answer part (a),
Solution
The main benefit of using Wilcoxon Signed Rank Test is the fact that it does not depends on the parent distribution nor on the parent parameters (Kerby, 2017). The assumptions on the distribution shape in not necessary when using this test.
- the circumstances under which you would use this test in preference to the one used in part (a).
Solution
This test can be used or applied anytime that the population of a given data cannot be assumed to be distributed normally..
- Now use SPSS to carry out the analysis. Copy and paste the relevant SPSS output into your assignment.
Solution
Ranks |
||||
N |
Mean Rank |
Sum of Ranks |
||
Position of koala in tree 3h after first sighting (m) – Position of koala in tree (m) |
Negative Ranks |
31a |
24.71 |
766.00 |
Positive Ranks |
14b |
19.21 |
269.00 |
|
Ties |
4c |
|||
Total |
49 |
|||
a. Position of koala in tree 3h after first sighting (m) < Position of koala in tree (m) |
||||
b. Position of koala in tree 3h after first sighting (m) > Position of koala in tree (m) |
||||
c. Position of koala in tree 3h after first sighting (m) = Position of koala in tree (m) |
Test Statisticsa |
|
Position of koala in tree 3h after first sighting (m) – Position of koala in tree (m) |
|
Z |
-2.821b |
Asymp. Sig. (2-tailed) |
.005 |
a. Wilcoxon Signed Ranks Test |
|
b. Based on positive ranks. |
- State and interpret the P-value from the SPSS output and describe the outcome of the test in the context of this question.
Solution
The p-value is 0.005, this value is less than 5% level of significance hence the null hypothesis is rejected and by rejecting the null hypothesis we conclude that the median height of the juvenile koalas in the trees has decreased after three hours.
References
John, R. A. (2006). Mathematical Statistics and Data Analysis. Journal of Statistical Computing, 42-53.
Kerby, D. S. (2017). Comparing Two Samples from an Individual Likert Question. Journal of Statistical Theory and Practice, 56-66.