Classifying Variables In A Data Table
Categorical and Numeric Variables
The following figure shows an excerpt of the recoded data out of the available data. The data for age was missing from the given data set and the column was thus left blank.
Frequency Table |
TRANSPORT in past month |
|||
Driver/rider of car or motor cycle |
Passenger of car |
Others (excluding public transport) |
||
GENDER |
female |
23 |
59 |
37 |
male |
45 |
76 |
31 |
Frequency Table: Row Percentage |
TRANSPORT in past month |
|
|||
Driver/rider of car or motor cycle |
Passenger of car |
Others (excluding public transport) |
Total |
||
GENDER |
female |
19.32773% |
49.57983% |
31.09244% |
100% |
male |
29.60526% |
50% |
20.39474% |
100% |
Frequency Table: Column Percentage |
TRANSPORT in past month |
|||
Driver/rider of car or motor cycle |
Passenger of car |
Others (excluding public transport) |
||
GENDER |
female |
33.82353% |
43.7037% |
54.41176% |
male |
66.17647% |
56.2963% |
45.58824% |
|
|
Total |
100% |
100% |
100% |
The row percentages show that 19.32% of females drove to their destinations in the past month, although most, that is, 49.57% were mainly driven by someone else, that is they were passengers. 31.09% reported some other means of transport. For the males, 50% were driven by someone else, 29.60% drove by themselves and 20.39% travelled by some other means. It is thus seen that most of the people are driven by someone.
Grouping |
Statistical Measure on Number of Activities in the past month |
||
License Status |
Mean |
Standard Deviation |
Pearson’s Skewness |
not licensed |
6.269 |
2.164326 |
-0.14226 |
learners permit |
6.405 |
2.181031 |
-0.10214 |
licensed |
8.408 |
2.110251 |
0.327959 |
The statistical measure which explains the centre of the distribution of the number of activities in the last month is the mean. The mean number of activities for the “not licensed” was found to be 6.26 , the mean for those with “learner’s permit” is 6.405 and the mean for those with “license” was 8.408. The measure of spread of the distribution for the respective groups is the standard deviation. It is 2.16 for the not licensed, 2.18 for the ones with learners permit and 2.11 for the licensed. The measure which explains shape of a distribution is the skewness measure.
A distribution with Pearson’s skewness more than 0 is leptokurtic, those less than 0 are mesokurtic and those have same shape as Gaussian or normal. The further away from normal, the larger is the absolute value of the coefficient. Those with license are leptokurtic whereas the other two groups are mesokurtic. The table above shows the measures as described.
The results from part (a) and part (b) implies that, the individuals with a license are the ones with most activity. The distribution for the unlicensed and those with learners permit has greater variation with fatter tails than normal.
b.
The relationship between the number of sedentary hours spent last month and the number of activities last month was found to be negatively related. The line of best fit, as depicted in blue in the figure in part (a) is explained by the regression equation:
The equation shows that with unit increase in sedentary hours, the number of the activities decrease by 0.434 units. The absence of sedentary hours implies that the number of activities would be 11.819.
- The probability that a person chosen at random from the 8 students are at least 18 years of age is given by the ration of the number of students who are greater than or equal to 18 years in age by the total number of students. The probability as computed using R commander was found to be equal to 0.875.
- The probability that a person chosen at random out of the 8 students in female and a psychology major is given by the ratio of count of the number of individuals who are female and have psychology as major by the total number of students, that is, 8. The probability was found to be 0.75.
- The conditional probability that a student is aged at least 21 years of age, given that the student is female is given by the ratio of count of the number of individuals who are female and have psychology as major by the total number of women. The probability was found to be 0.25.
- The probability of an Australian adult to have blood type B was given to be 0.1. Then the probability that a random sample of 250 people will contain at most 25 people with blood type B is given by P( X< 25) where X denotes the number of people in a sample of 250 who have blood type B. X then follows binomial with size 250 and probability parameter 0.1. Then the required probability was computed using R commander as 0.0838.
- It is given that it is of interest to determine the maximum number of blood type B’s such that 12% of multiple samples of size 250 of Australian adults have. This means that it is of interest to determine the value x where P ( X> x) = 0.12 where X is binomial(250,0.1). The value was computed using R commander as 19. So at most 19 people is found to have blood type B among 12% of the samples of size 250 drawn of Australian adults.
- The mean number of people with blood type B is then computed as the expectation of binomial distribution of size 250 and probability parameter 0.1. The mean value is then 250×0.1 which equals 25.
- The z-score for a random variable X following normal distribution is defined as
Z= (X- mean of X)/standard deviation of X
Then using R commander the Z score when equal to 1, mean of normal is 8.2 and standard deviation 0.6, the value of X is given by X = 0.6* Z + 8.2 = 8.8.
- The probability that a variable X denoting the hours of sleep of the 17 year olds, following Normal(8.2, 0.6) will have value between 7.5 and 8 is given by:
P (7.5 <X< 8.0) = P(X<8.0) – P(X<7.5) = 0.247
- The distribution of the mean of a sample of size n of a random variable which follows normal distribution with mean ‘m’ and standard deviation ‘s’ is a normal distribution with mean ‘m’ and standard deviation ‘s’/n. Then the distribution of the mean hours of sleep of a sample of 16 students is Normal(8.2, 0.6/16). Let the mean statistic be denoted by Xbar
Then the probability that the mean lies between 7.5 and 8.0 is given by P (7.5 <Xbar< 8.0) = P(Xbar <8.0) – P(Xbar <7.5) which was computed to be 5×10-8as per R commander.
- The number of students who are then expected to sleep for 7.5 to 8 hours among the 16 students is given by the expectation of a binomial distribution with size 16 and probability 5*x10-8. Then the expected number of students is approximately 0.