Analysis Of Proportion, Distribution And Hypothesis Testing: A Statistical Study
Part 1: Proportion of Females and Confidence Interval
Question 1
- Calculate the Point estimate and the 95% confidence interval for proportion of females in the population NSW 17-year-olds using a random sample of NSW 17-year-olds assigned.
- Point estimate for proportion of females in the population
Based on the sample dataset given, the number of males are 97 while that of females are 98. The point estimate for proportion (p) of females is given by; = 0.50264
- The 95% confidence interval for proportion of female in the population
The 95 % where q = (1-p) and = 1.96 (estimated from the standard normal)
< Proportion (p) <
- What the confidence interval obtained in part a means (2marks).
From part (a) the proportion of girls to boys is 0.50264. The confidence interval tell us that when we are 95% confident with the data we have, then the lower limit of girls’ proportion is 0.4324 while the highest limit the proportion of girls is 0.5728. Hence the point estimate falls within the confidence interval obtained in part (a) above
- The result in part (a) is consistent with the statement; “50% of 17 year-old in NSW are females since the proportion is 0.50264.
Question 2
- The appropriate chart to show the distribution of the self-reported hours of MVPA is the histogram. Figures 1 and 2 below are the histograms plotted in R that shows the distribution. The table 1 below shows the number of hours of MVPA per sex.
MALE |
16 |
18 |
23 |
26 |
30 |
30 |
30 |
30 |
30 |
35 |
FEMALE |
14 |
18 |
20 |
22 |
25 |
25 |
25 |
25 |
25 |
30 |
R codes used plotting the histogram above
MALES=c(16,18,23,26,30,30,30,30,30,35)
> hist (MALE,col=”darkmagenta”,border=”red”)
> hist (MALE,col=”darkmagenta”,border=”white”)
R codes;
> FEMALES=c(14,18,20,22,25,25,25,25,25,30)
> hist (FEMALE,col=”blue”,border=”red”)
Description of the histograms: The histograms above shows the distribution of the number of hour for MVPA on each gender. Based on the histograms above, it is evident that the most frequent number of hours on males is 30 (frequency of 6) while that of females is 25 hours per week.
- Hypothesis Testing
The appropriate non-parametric test to be applied in this case is Wilcoxon Signed rank test since we want to compare two related samples on a single sample to assess whether their population mean ranks differ
Step 1: Stating and Formulation of the null and alternative hypotheses
The average self-reported hours of moderate to vigorous physical activity (MVPA) per week is equal between males and females in the population of NSW 17-year-olds
The average self-reported hours of moderate to vigorous physical activity (MVPA) per week is not equal between males and females in the population of NSW 17-year-olds
The level of significance level, α = 0.05
Step 2: Selection of an appropriate test statistic
To make use of the available data on the size of the effect we shall apply Wilcoxon Signed rank Test. The test statistics W is the smaller of the sum of the positive ranks and the sum of the negative ranks.
Step 3: Components of the calculations
Male |
16 |
18 |
23 |
26 |
30 |
30 |
30 |
30 |
30 |
35 |
|
Female |
14 |
18 |
20 |
22 |
25 |
25 |
25 |
25 |
25 |
30 |
By testing the hypothesis in R, the following is the result from output window;
Step 3: Calculations (R codes/output codes)
> Male<-c(16,18,23,26,30,30,30,30,30,35)
> Female<-c(14,18,20,22,25,25,25,25,25,30)
> wilcox.test(Male,Female,alternative=”two.sided”)
Wilcoxon rank sum test with continuity correction
data: Male and Female
W = 73, p-value = 0.08224
alternative hypothesis: true location shift is not equal to 0
Warning message:
In wilcox.test.default(Male, Female, alternative = “two.sided”) :
cannot compute exact p-value with ties.
Step 4: Decision on the hypothesis
Since our P-value obtained is 0.082, at 95 % confidence level, we fail to reject the null hypothesis
Step 5: Conclusion
We then conclude that the average self-reported hours of moderate to vigorous physical activity (MVPA) per week equal between males and females in the population of NSW 17-year-olds.
Question 3
- This is a one-sided hypothesis test. This is because the researcher is interested in knowing whether the emissions from aluminum smelters has decreased since the introduction of the new laws.
- The appropriate statistical test to address this hypothesis is Wilcoxon sign-rank test. This is because we want to compare two related samples on a single sample to assess whether their population mean ranks differ and thus Wilcoxon sign-rank test is applicable in the case.
Question 4
- The following is a contingency table between gender and license status.
LICENSE STATUS |
|||||
GENDER |
Valid |
Revoked |
Suspended |
Total |
|
Male |
49 |
32 |
16 |
97 |
|
Female |
50 |
33 |
15 |
98 |
|
Total |
99 |
65 |
31 |
= 195 |
By using R command in testing the hypothesis, the output codes are as shown below;
R output
> Male<-c(49,32,16)
> Female<-c(50,33,15)
> gender.survey<-data.frame(rbind(Male,Female))
> names(gender.survey)<-c(‘valid’,’revoked’,’suspended’)
> chisq.test(gender.survey)
Pearson’s Chi-squared test
data: gender.survey
X-squared = 0.052617, df = 2, p-value = 0.974
- There no evidence of association between gender and license status in this sampleof NSW 17-year-olds. This is because our p-value is 0.974 which is higher than 0.05 hence failing to reject the null hypothesis concluding that mode of transport don’t differ by gender in the population of NSW 17-year-olds.
- The requirements for a Chi-Square test are met since the sample is more than 45 observations.
Step 1: Setting up the hypotheses
And p-vale
Step 2: Selection of appropriate test statistics
To make use of the available data on the size of the effect we shall apply Chi-Square Test.
Step 3: Decision on the hypothesis
The null hypothesis will be rejected if the computed P-value is less than 0.05
Step 4: Computation of the test statistics in R
> Male<-c(49,32,16)
> Female<-c(50,33,15)
> gender.survey<-data.frame(rbind(Male,Female))
> names(gender.survey)<-c(‘valid’,’revoked’,’suspended’)
> chisq.test(gender.survey)
Pearson’s Chi-squared test
data: gender.survey
X-squared = 0.052617, df = 2, p-value = 0.974
Step 5: Conclusion
Our p-value obtained in this case is 0.974 that is higher than 0.05. Hence the null hypothesis is accepted and we can conclude that mode of transport don’t differ by gender in the population of NSW 17-year-olds. It implies that there no evidence of association between gender and license status in this sample of NSW 17-year-olds.
Question 5
- Different researches require different sample sizes since each and every research have different aims and objectives making them to have different target group during the study.
- Using the Online calculator the following five steps are applied;
Step 1: The required margin of error is E = 0.05
Step 2: The estimated standard deviation of the difference is δ = 3.0
Step 3: To produce 95% confidence, we use ? = 1.96
Step 4: Therefore the minimum required sample size is n = ( ) ² = 138.29 which is approximately = 138
Step 5: Hence the required sample size to achieve the power subject to the condition given is 138.
- The sample size of 40 is relative a smaller sample size to be used during the study. The sample size will lead to a bigger margin of error and will also lower the confidence interval hence making the data to be biased.