PUBH620 Biostatistics Analysis Of Dataset
Question one
- Age descriptive statistics
Descriptive Statistics |
|||||
|
N |
Minimum |
Maximum |
Mean |
Std. Deviation |
AGE |
38681 |
16 |
59 |
20.50 |
4.889 |
Valid N (listwise) |
38681 |
|
|
|
|
Table 1
It can be observed from the table above that the mean age for the participants was 20.5. The youngest participant was 16 years old while the oldest participant was 59 years old.
- Frequency for new age category
Statistics |
||
Age category |
||
N |
Valid |
38681 |
Missing |
0 |
Table 2
Age category |
|||||
|
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
|
Valid |
18 years |
11881 |
30.7 |
30.7 |
30.7 |
19 – 21 years |
11666 |
30.2 |
30.2 |
60.9 |
|
22 – 25 years |
5494 |
14.2 |
14.2 |
75.1 |
|
26 or more |
3755 |
9.7 |
9.7 |
84.8 |
|
system missing |
5885 |
15.2 |
15.2 |
100.0 |
|
Total |
38681 |
100.0 |
100.0 |
|
Table 3
The table above table shows the frequency of age groups. Participants who were 18 years old were 11,881 representing 30.7%. This was followed closely by those within the age of 19 to 21 years. They were 11,666 representing 30.2%. Those who were 26 years old and above were 3,755, representing 9.7% of the total.
Question two
Age descriptive statistics
Descriptive Statistics |
|||
|
N |
Sum |
Mean |
AGE |
38681 |
792845 |
20.50 |
Valid N (listwise) |
38681 |
|
|
Table 4
The mean age of the participants was 20.5 while the sum total of their age was 792,845 years.
State descriptive
STATE |
|||||
|
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
|
Valid |
NSW |
15860 |
41.0 |
41.0 |
41.0 |
Victoria |
13571 |
35.1 |
35.1 |
76.1 |
|
Queensland |
7528 |
19.5 |
19.5 |
95.5 |
|
ACT |
1722 |
4.5 |
4.5 |
100.0 |
|
Total |
38681 |
100.0 |
100.0 |
|
Table 5
From the table above, it can be observed that 41% of the participants come from NSW, 35.1% come from Victoria, and 19.5% come from Queensland while the minority of the participants (4.5%) comes from ACT.
GENDER |
|||||
|
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
|
Valid |
Male |
10449 |
27.0 |
27.0 |
27.0 |
Female |
28232 |
73.0 |
73.0 |
100.0 |
|
Total |
38681 |
100.0 |
100.0 |
|
Table 6
It can be observed that majority of the participants were females. They were 28,232 in number and represented 73%. The rest were females who were 10,449 representing 27%.
LIVING_ARRANGE |
|||||
|
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
|
Valid |
At home |
20840 |
53.9 |
53.9 |
53.9 |
College/student accommodation |
6850 |
17.7 |
17.7 |
71.6 |
|
Independently |
10991 |
28.4 |
28.4 |
100.0 |
|
Total |
38681 |
100.0 |
100.0 |
|
Table 7
The table above shows the how participants are accommodated. It can be observed that 53.9% (20,840) were being accommodated from their homes. 17.7% (6,850) were accommodated at the college while 28.4% (10,991) had their own independent accommodation.
FACULTY |
|||||
|
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
|
Valid |
Arts and Sciences |
9004 |
23.3 |
23.3 |
23.3 |
Education |
15038 |
38.9 |
38.9 |
62.2 |
|
Health Sciences |
11729 |
30.3 |
30.3 |
92.5 |
|
Theology and Philosophy |
588 |
1.5 |
1.5 |
94.0 |
|
Business |
2322 |
6.0 |
6.0 |
100.0 |
|
Total |
38681 |
100.0 |
100.0 |
|
Table 8
The table above shows the distribution of the student participants based on their faculties. Majority of them came from the faculty of education (15,038) representing 38.9%. This is followed by students from the faculty of health sciences (11,729) who represented 30.3%. The least number of students came from the faculty of theology and philosophy. They were 588 representing 1.5%.
DEGREE_TYPE |
|||||
|
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
|
Valid |
Single |
34620 |
89.5 |
89.5 |
89.5 |
Double |
4061 |
10.5 |
10.5 |
100.0 |
|
Total |
38681 |
100.0 |
100.0 |
|
Table 9
The table above shows distribution of participants by the type of their degrees. It can be observed that 89.5% were pursuing single degrees while 10.5% were pursuing double degrees.
METRO |
|||||
|
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
|
Valid |
Metro |
27223 |
70.4 |
84.4 |
84.4 |
Non-metro |
5015 |
13.0 |
15.6 |
100.0 |
|
Total |
32238 |
83.3 |
100.0 |
|
|
Missing |
System |
6443 |
16.7 |
|
|
Total |
38681 |
100.0 |
|
|
Table 10
The table above shows the location of origin of the students. It can be observed that majority of them came from metropolitan areas (70.4%) while 13% came from non-metropolitan areas.
STUDY_MODE |
|||||
|
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
|
Valid |
FT |
34770 |
89.9 |
89.9 |
89.9 |
PT |
3911 |
10.1 |
10.1 |
100.0 |
|
Total |
38681 |
100.0 |
100.0 |
|
Table 11
From the table above, it can be observed that 89.9% of the students pursued full time studies while 10.1% pursued part time studies.
FEE_STATUS |
|||||
|
Frequency |
Percent |
Valid Percent |
Cumulative Percent |
|
Valid |
Domestic |
32238 |
83.3 |
83.3 |
83.3 |
International |
6443 |
16.7 |
16.7 |
100.0 |
|
Total |
38681 |
100.0 |
100.0 |
|
Table 12
It can be observed that 83.3% (32,238) of the students are domestic students while 16.7% (6,443) are international students.
- Test for the difference in mean for aggression, thrill seeking and risk acceptance scores by gender
Independent Samples Test |
||||||||||
|
Levene’s Test for Equality of Variances |
t-test for Equality of Means |
||||||||
F |
Sig. |
t |
df |
Sig. (2-tailed) |
Mean Difference |
Std. Error Difference |
95% Confidence Interval of the Difference |
|||
Lower |
Upper |
|||||||||
driver_agg |
Equal variances assumed |
.117 |
.732 |
.083 |
38679 |
.934 |
.004 |
.050 |
-.093 |
.102 |
Equal variances not assumed |
|
|
.083 |
18712.803 |
.934 |
.004 |
.050 |
-.093 |
.102 |
|
thrill |
Equal variances assumed |
.847 |
.357 |
-.370 |
38679 |
.711 |
-.005 |
.014 |
-.033 |
.022 |
Equal variances not assumed |
|
|
-.371 |
18783.250 |
.710 |
-.005 |
.014 |
-.033 |
.022 |
|
risk_accep |
Equal variances assumed |
.054 |
.817 |
1.571 |
38679 |
.116 |
.078 |
.050 |
-.019 |
.176 |
Equal variances not assumed |
|
|
1.571 |
18663.180 |
.116 |
.078 |
.050 |
-.019 |
.176 |
Table 13
From the t-test table results above, it can be observed that the p-values computed are large compared to the level of significance (0.05). This means that the mean aggression, thrill seeking and risk acceptance scores do not differ by gender.
- Test for the difference in mean for aggression, thrill seeking and risk acceptance scores by metropolitan background status
Independent Samples Test |
||||||||||
|
Levene’s Test for Equality of Variances |
t-test for Equality of Means |
||||||||
F |
Sig. |
t |
df |
Sig. (2-tailed) |
Mean Difference |
Std. Error Difference |
95% Confidence Interval of the Difference |
|||
Lower |
Upper |
|||||||||
driver_agg |
Equal variances assumed |
1.060 |
.303 |
.714 |
32236 |
.475 |
.048 |
.067 |
-.083 |
.178 |
Equal variances not assumed |
|
|
.719 |
7029.087 |
.472 |
.048 |
.066 |
-.082 |
.177 |
|
thrill |
Equal variances assumed |
1.845 |
.174 |
.686 |
32236 |
.493 |
.013 |
.019 |
-.024 |
.050 |
Equal variances not assumed |
|
|
.692 |
7048.178 |
.489 |
.013 |
.019 |
-.024 |
.049 |
|
risk_accep |
Equal variances assumed |
3.228 |
.072 |
-.866 |
32236 |
.386 |
-.058 |
.067 |
-.189 |
.073 |
Equal variances not assumed |
|
|
-.874 |
7040.476 |
.382 |
-.058 |
.066 |
-.188 |
.072 |
Table 14
From the t-test table results above, it can be observed that the p-values computed are large (0.3, 0.17 and 0.07) compared to the level of significance (0.05). This means that the mean aggression, thrill seeking and risk acceptance scores do not differ by metropolitan background status.
- Test for the difference in mean for aggression, thrill seeking and risk acceptance scores by study mode.
Independent Samples Test |
||||||||||
|
Levene’s Test for Equality of Variances |
t-test for Equality of Means |
||||||||
F |
Sig. |
t |
df |
Sig. (2-tailed) |
Mean Difference |
Std. Error Difference |
95% Confidence Interval of the Difference |
|||
Lower |
Upper |
|||||||||
driver_agg |
Equal variances assumed |
.323 |
.570 |
-.309 |
38679 |
.757 |
-.023 |
.073 |
-.166 |
.121 |
Equal variances not assumed |
|
|
-.310 |
4834.453 |
.757 |
-.023 |
.073 |
-.166 |
.121 |
|
thrill |
Equal variances assumed |
.222 |
.637 |
.132 |
38679 |
.895 |
.003 |
.021 |
-.038 |
.043 |
Equal variances not assumed |
|
|
.132 |
4829.635 |
.895 |
.003 |
.021 |
-.038 |
.043 |
|
risk_accep |
Equal variances assumed |
.045 |
.832 |
-2.269 |
38679 |
.023 |
-.167 |
.073 |
-.311 |
-.023 |
Equal variances not assumed |
|
|
-2.261 |
4823.706 |
.024 |
-.167 |
.074 |
-.311 |
-.022 |
Table 15
From the t-test table results above, it can be observed that the p-values computed are large (0.57, 0.63 and 0.83) compared to the level of significance (0.05). This means that the mean aggression, thrill seeking and risk acceptance scores do not differ by study mode.
- Test for the difference in mean for aggression, thrill seeking and risk acceptance scores by RTA (follow up survey)
Independent Samples Test |
||||||||||
|
Levene’s Test for Equality of Variances |
t-test for Equality of Means |
||||||||
F |
Sig. |
t |
df |
Sig. (2-tailed) |
Mean Difference |
Std. Error Difference |
95% Confidence Interval of the Difference |
|||
Lower |
Upper |
|||||||||
driver_agg |
Equal variances assumed |
3179.609 |
.000 |
-93.863 |
38679 |
.000 |
-5.552 |
.059 |
-5.668 |
-5.436 |
Equal variances not assumed |
|
|
-144.454 |
11183.466 |
.000 |
-5.552 |
.038 |
-5.627 |
-5.476 |
|
thrill |
Equal variances assumed |
1715.363 |
.000 |
-92.063 |
38679 |
.000 |
-1.539 |
.017 |
-1.572 |
-1.507 |
Equal variances not assumed |
|
|
-133.493 |
10036.697 |
.000 |
-1.539 |
.012 |
-1.562 |
-1.517 |
|
risk_accep |
Equal variances assumed |
1951.956 |
.000 |
-78.154 |
38679 |
.000 |
-4.775 |
.061 |
-4.895 |
-4.655 |
Equal variances not assumed |
|
|
-106.209 |
9076.181 |
.000 |
-4.775 |
.045 |
-4.863 |
-4.687 |
Table 16
From the t-test table results above, it can be observed that the p-values computed are less (0.00) compared to the level of significance (0.05). This means that the mean aggression, thrill seeking and risk acceptance scores differ significantly by RTA.
Question four
- Depression by gender
ANOVA |
|||||
depression |
|||||
|
Sum of Squares |
df |
Mean Square |
F |
Sig. |
Between Groups |
.026 |
1 |
.026 |
.280 |
.597 |
Within Groups |
3555.361 |
38679 |
.092 |
|
|
Total |
3555.387 |
38680 |
|
|
|
Table 17
The anova results show that the computed p-value (0.57) is greater compared to the level of significance (0.05). This means that the null hypothesis is accepted. It is concluded therefore that null hypothesis is significant at 95% level of confidence.
- Depression by metropolitan background status
ANOVA |
|||||
depression |
|||||
|
Sum of Squares |
df |
Mean Square |
F |
Sig. |
Between Groups |
.010 |
1 |
.010 |
.114 |
.736 |
Within Groups |
2962.189 |
32236 |
.092 |
|
|
Total |
2962.200 |
32237 |
|
|
|
Table 18
The anova results show that the computed p-value (0.736) is greater compared to the level of significance (0.05). This means that the null hypothesis is accepted. It is concluded therefore that null hypothesis is significant at 95% level of confidence.
- Depression by study mode
Results table
ANOVA |
|||||
depression |
|||||
|
Sum of Squares |
df |
Mean Square |
F |
Sig. |
Between Groups |
.282 |
1 |
.282 |
3.072 |
.080 |
Within Groups |
3555.105 |
38679 |
.092 |
|
|
Total |
3555.387 |
38680 |
|
|
|
Table 19
The anova results show that the computed p-value (0.08) is greater compared to the level of significance (0.05). This means that the null hypothesis is accepted. It is concluded therefore that null hypothesis is significant at 95% level of confidence.
- Depression by fee status
Results table
ANOVA |
|||||
depression |
|||||
|
Sum of Squares |
df |
Mean Square |
F |
Sig. |
Between Groups |
.000 |
1 |
.000 |
.003 |
.956 |
Within Groups |
3555.387 |
38679 |
.092 |
|
|
Total |
3555.387 |
38680 |
|
|
|
Table 20
The anova results show that the computed p-value (0.956) is greater compared to the level of significance (0.05). This means that the null hypothesis is accepted. It is concluded therefore that null hypothesis is significant at 95% level of confidence.
Question five
- Binary logistic regression (RTA and Demographics).
Table of results
Variables in the Equation |
|||||||
|
B |
S.E. |
Wald |
df |
Sig. |
Exp(B) |
|
Step 1a |
Age_category |
-.003 |
.000 |
42.072 |
1 |
.000 |
.997 |
GENDER |
-.262 |
.033 |
64.299 |
1 |
.000 |
.769 |
|
LIVING_ARRANGE |
-.049 |
.019 |
6.801 |
1 |
.009 |
.952 |
|
FEE_STATUS |
.248 |
.041 |
36.290 |
1 |
.000 |
1.282 |
|
Constant |
-1.673 |
.031 |
2885.476 |
1 |
.000 |
.188 |
|
a. Variable(s) entered on step 1: Age_category, GENDER, LIVING_ARRANGE, FEE_STATUS. Table 21 |
From the results table above, it can be observed that the value of the coefficient for the living arrangement is -0.049. This value is close to zero. It is an indication that there is no association between RTA and living arrangement. To add on, the odds of the predictor variables are tending towards 1, this is an indication that they cause a great variation in RTA if they are increased.
- Binary logistic regression (RTA and driving distance).
Results table
Variables in the Equation |
|||||||
|
B |
S.E. |
Wald |
df |
Sig. |
Exp(B) |
|
Step 1a |
dist_driving |
-.016 |
.031 |
.268 |
1 |
.605 |
.984 |
Constant |
-1.885 |
.024 |
5937.971 |
1 |
.000 |
.152 |
|
a. Variable(s) entered on step 1: dist_driving. |
Table 22
From the results table above, it can be observed that the value of the coefficient for the driving distance is -0.016. This value is close to zero. It is an indication that there is no association between RTA and driving distance. The odd of the predictor variable is 0.94 indicating a strong influence on RTA.
- Binary logistic regression (RTA with aggression, thrill seeking and risk acceptance).
Table of results
Variables in the Equation |
|||||||
|
B |
S.E. |
Wald |
df |
Sig. |
Exp(B) |
|
Step 1a |
Driver aggression |
.612 |
.024 |
661.998 |
1 |
.000 |
1.844 |
thrill |
.516 |
.078 |
43.584 |
1 |
.000 |
1.675 |
|
Risk acceptance |
.596 |
.009 |
4017.531 |
1 |
.000 |
1.815 |
|
Constant |
-17.579 |
.327 |
2887.675 |
1 |
.000 |
.000 |
|
a. Variable(s) entered on step 1: driver_agg, thrill, risk_accep. |
Table 23
From the results table above, it can be observed that the values of the coefficients for the predictor variables are 0.61, 0.52 and 0.596. It is an indication that there are significant associations between RTA and predictor variables. To add on, the odds of the predictor variables are tending towards 1, this is an indication that they cause a great variation in RTA if they are increased.
Question six
- Binary logistic regression (OB and Demographics).
Results table
Variables in the Equation |
|||||||
|
B |
S.E. |
Wald |
df |
Sig. |
Exp(B) |
|
Step 1a |
Age_category |
-.003 |
.000 |
40.576 |
1 |
.000 |
.997 |
GENDER |
-.267 |
.033 |
66.548 |
1 |
.000 |
.766 |
|
LIVING_ARRANGE |
-.012 |
.017 |
.447 |
1 |
.504 |
.988 |
|
Constant |
-1.654 |
.031 |
2863.657 |
1 |
.000 |
.191 |
|
a. Variable(s) entered on step 1: Age_category, GENDER, LIVING_ARRANGE. |
Table 24
From the results table above, it can be observed that the value of the coefficient for the living arrangement is -0.012. This value is close to zero. It is an indication that there is no association between obesity at third year follow up and living arrangement.
- Binary logistic regression (OB and overweight and depression).
Table of results
Variables in the Equation |
|||||||
|
B |
S.E. |
Wald |
df |
Sig. |
Exp(B) |
|
Step 1a |
depression |
1.787 |
.037 |
2325.596 |
1 |
.000 |
5.970 |
BL_owob |
-.017 |
.032 |
.270 |
1 |
.603 |
.983 |
|
Constant |
-2.186 |
.027 |
6644.566 |
1 |
.000 |
.112 |
|
a. Variable(s) entered on step 1: depression, BL_owob. |
Table 25
- Binary logistic regression (OB and edu_par and presence or absence of obese).
Variables in the Equation |
|||||||
|
B |
S.E. |
Wald |
df |
Sig. |
Exp(B) |
|
Step 1a |
owob_par |
1.881 |
.169 |
124.423 |
1 |
.000 |
6.561 |
edu_par |
-2.210 |
.061 |
1331.774 |
1 |
.000 |
.110 |
|
Constant |
-2.826 |
.170 |
277.272 |
1 |
.000 |
.059 |
|
a. Variable(s) entered on step 1: owob_par, edu_par. |
Table 26
The odds of the predictor variable (parents university education) is low (0.11), this is an indication that it causes minimal variation in obesity if they are increased.