Frequency Analysis Of Smartphone Entertainment Types And Statistical Analysis Of Income Distribution
Random Number Selection Process
The student ID number selected for this particular assignment is MIT17122. Thus, according to the provided guideline of selecting the relevant random numbers, the last three digits of the student ID are considered. Those are 122. Consequently, the random number selection procedure has been started from row number 22 and column number 1. As the random numbers are provided in sets of 6 digits, each set or block provides two random numbers (Hamman et al., 2016). The first and last three digits of each block represent two distinct random numbers of sizes three. In the corresponding excel sheet, the first column denotes the random number selected. Second column denotes the respective values of the random numbers selected. For instance, the first selected random number is 937 and so on (Chatterjee & Hadi, 2015). Third column describes whether the selected random number is “Good” or “Not-Good”. Good means the number can be selected as a sample number (Wilson, Bhatnagar & Townsend, 2017). Not good means it has to be rejected. Random numbers from 001 to 300 are selected otherwise it is rejected, including 000.
The selected samples are outlaid in the file named “SampleSmartPhoneData”, containing 50 samples from the provided list of 300.
As asked to provide, a Frequency Column Chart and a Relative Frequency Pie-chart has been constructed to depict the number of and proportions of different entertainment type (Wun et al., 2016).
As per the following frequency column chart, 21 of the samples contain entertainment in the form of Music, Videos and Movies (Weaver et al., 2018).
It is evident from the frequency column chart that that music, videos and movies are the most commonly downloaded form of entertainments.
0.18 of the sample proportion of entertainments are that of eBooks.
The table below shows the incomes from a higher to lower order. Corresponding CN numbers are also attached for convention.
CN |
V1 |
Rank |
60 |
$250,000 |
1.5 |
193 |
$250,000 |
1.5 |
140 |
$180,000 |
3.5 |
225 |
$180,000 |
3.5 |
72 |
$160,000 |
5 |
113 |
$155,000 |
6 |
114 |
$102,983 |
7 |
300 |
$101,262 |
8 |
137 |
$100,267 |
9 |
243 |
$100,200 |
10 |
252 |
$99,742 |
11 |
237 |
$99,398 |
14.5 |
242 |
$99,398 |
14.5 |
223 |
$99,398 |
14.5 |
248 |
$99,398 |
14.5 |
57 |
$99,398 |
14.5 |
165 |
$99,398 |
14.5 |
273 |
$99,374 |
18 |
46 |
$99,336 |
19 |
202 |
$98,955 |
20 |
241 |
$98,678 |
21 |
180 |
$98,673 |
22.5 |
205 |
$98,673 |
22.5 |
102 |
$98,645 |
24 |
249 |
$98,191 |
25 |
146 |
$97,756 |
26 |
277 |
$97,338 |
27.5 |
277 |
$97,338 |
27.5 |
134 |
$97,000 |
29 |
293 |
$96,286 |
30 |
98 |
$95,957 |
31 |
88 |
$95,931 |
32 |
49 |
$95,877 |
33 |
131 |
$95,297 |
34 |
62 |
$95,000 |
35 |
153 |
$93,250 |
36 |
153 |
$93,250 |
37 |
91 |
$90,164 |
38 |
221 |
$90,025 |
39 |
169 |
$88,887 |
40 |
123 |
$72,000 |
41 |
268 |
$70,000 |
43 |
176 |
$70,000 |
43 |
176 |
$70,000 |
43 |
2 |
$62,500 |
45.5 |
251 |
$62,500 |
45.5 |
4 |
$55,000 |
47 |
234 |
$45,000 |
48 |
234 |
$45,000 |
49 |
25 |
$40,000 |
50 |
The formula to determine the location of the percentile, that is to find the value of the corresponding percentile from the data provided, is as follows –
; Where n is the total number of observations and P is defined as the desired percentile.
Here, the desired percentile is 70. Thus P =70. Substituting the value of P and considering n = 50, the location of the parameter is found out to be –
It can be written as IR+FR=35+0.7=35.7
The value with rank 35 is $95000 and the value of 36th rank element is $93,250. Further to determine the exact value corresponding to the 70th percentile, the formula used is –
Frequency Analysis of Smartphone Entertainment Types
0.7 (95000-93250) + 93250
= 0.7*1750+93250
=1125+93250
= $ 94475
CN |
V1 |
Rank |
60 |
$250,000 |
1 |
193 |
$250,000 |
1 |
140 |
$180,000 |
2 |
225 |
$180,000 |
2 |
72 |
$160,000 |
3 |
113 |
$155,000 |
4 |
114 |
$102,983 |
5 |
300 |
$101,262 |
6 |
137 |
$100,267 |
7 |
243 |
$100,200 |
8 |
252 |
$99,742 |
9 |
237 |
$99,398 |
10 |
242 |
$99,398 |
10 |
223 |
$99,398 |
10 |
248 |
$99,398 |
10 |
57 |
$99,398 |
10 |
165 |
$99,398 |
10 |
273 |
$99,374 |
11 |
46 |
$99,336 |
12 |
202 |
$98,955 |
13 |
241 |
$98,678 |
14 |
180 |
$98,673 |
15 |
205 |
$98,673 |
15 |
102 |
$98,645 |
16 |
249 |
$98,191 |
17 |
146 |
$97,756 |
18 |
277 |
$97,338 |
19 |
277 |
$97,338 |
19 |
134 |
$97,000 |
20 |
293 |
$96,286 |
21 |
98 |
$95,957 |
22 |
88 |
$95,931 |
23 |
49 |
$95,877 |
24 |
131 |
$95,297 |
25 |
62 |
$95,000 |
26 |
153 |
$93,250 |
27 |
153 |
$93,250 |
28 |
91 |
$90,164 |
29 |
221 |
$90,025 |
30 |
169 |
$88,887 |
31 |
123 |
$72,000 |
32 |
268 |
$70,000 |
33 |
176 |
$70,000 |
33 |
176 |
$70,000 |
33 |
2 |
$62,500 |
34 |
251 |
$62,500 |
34 |
4 |
$55,000 |
35 |
234 |
$45,000 |
36 |
234 |
$45,000 |
36 |
25 |
$40,000 |
37 |
The first and third quartiles represent the 25th and 75th percentile. The calculations are carried out in a similar fashion. To determine the 25th percentile value,
This can be expressed as 12+ 0.75 = IR+FR =12.75
The value with rank 12 is $99,336 and the value of 13th rank element is $98,955. Further to determine the exact value corresponding to the 25th percentile, the formula used is –
0.75* (99336-98955) + 98995
= 0.75*381+98995
=285.75+93250
= $ 99280.75
CN |
V1 |
Rank |
60 |
$250,000 |
1.5 |
193 |
$250,000 |
1.5 |
140 |
$180,000 |
3.5 |
225 |
$180,000 |
3.5 |
72 |
$160,000 |
5 |
113 |
$155,000 |
6 |
114 |
$102,983 |
7 |
300 |
$101,262 |
8 |
137 |
$100,267 |
9 |
243 |
$100,200 |
10 |
252 |
$99,742 |
11 |
237 |
$99,398 |
14.5 |
242 |
$99,398 |
14.5 |
223 |
$99,398 |
14.5 |
248 |
$99,398 |
14.5 |
57 |
$99,398 |
14.5 |
165 |
$99,398 |
14.5 |
273 |
$99,374 |
18 |
46 |
$99,336 |
19 |
202 |
$98,955 |
20 |
241 |
$98,678 |
21 |
180 |
$98,673 |
22.5 |
205 |
$98,673 |
22.5 |
102 |
$98,645 |
24 |
249 |
$98,191 |
25 |
146 |
$97,756 |
26 |
277 |
$97,338 |
27.5 |
277 |
$97,338 |
27.5 |
134 |
$97,000 |
29 |
293 |
$96,286 |
30 |
98 |
$95,957 |
31 |
88 |
$95,931 |
32 |
49 |
$95,877 |
33 |
131 |
$95,297 |
34 |
62 |
$95,000 |
35 |
153 |
$93,250 |
36 |
153 |
$93,250 |
37 |
91 |
$90,164 |
38 |
221 |
$90,025 |
39 |
169 |
$88,887 |
40 |
123 |
$72,000 |
41 |
268 |
$70,000 |
43 |
176 |
$70,000 |
43 |
176 |
$70,000 |
43 |
2 |
$62,500 |
45.5 |
251 |
$62,500 |
45.5 |
4 |
$55,000 |
47 |
234 |
$45,000 |
48 |
234 |
$45,000 |
49 |
25 |
$40,000 |
50 |
In order to find the 75th percentile, proceeding in a similar fashion, we get
This can be expressed as 38+ 0.25 = IR+FR =38.25
The value with rank 38 is $90,164 and the value of 39th rank element is $90,025. Further to determine the exact value corresponding to the 75th percentile, the formula used is –
0.25* (90,164-90025) + 90025
= 0.75*139+90025
=34.75+90025
= $ 90059.75
- c.
Before answering this specific question, it is important to clarify the idea of percentiles. Percentile refers to the percentage of population above a certain point. For instance, 70th percentile would mean the no of people of above that specific value. In this particular case, the value is found out to be $94475. Which implies that among the total 50 selected samples, 70 percent of them have the annual income of above $94475.
- d.
Inter quartile range is defined as the difference between the third quartile and the first quartile. Thus, the inter quartile range in this case is –
= 90059.75 – 99280.75 = $9221
Inter quartile range is determined with primary focus on the deviation or variation within a data set. Inter quartile range basically provides an idea about the 50% of the values spread across the mean or the average. Thus, in this case the inter quartile range is $9221. This implies that the annual income of the 50% of middle range of the provided data is spread within a range of 9221.
The following descriptive statistics table has been constructed in excel and then pasted here.
Column1 |
|
Mean |
101554.46 |
Standard Error |
5815.206028 |
Median |
97973.5 |
Mode |
99398 |
Standard Deviation |
41119.71617 |
Sample Variance |
1690831058 |
Kurtosis |
5.801565461 |
Skewness |
2.114964151 |
Range |
210000 |
Minimum |
40000 |
Maximum |
250000 |
Sum |
5077723 |
Count |
50 |
The upper and lower inner fences are calculated by the provided formulae.
103891.3
85449.25
The suitable measure of central tendency chosen is the mean or the average. Among all the other measures of central tendency, viz. median, mode and others, Mean is regarded as the best measure. Thus it is chosen primarily for this purpose. Also since there the inter quartile range or even the range shows that the data is well spread, median and mode will not be the best choice. The mean is defined as –
Percentiles and Quartiles of Income Distribution
The suitable measure of dispersion chosen for this particular set of data is standard deviation (SD). SD is defined as the square root of the sum of the squares of deviations from the mean. Since the measure of central tendency is chosen as the mean, it is convenient from a practical perspective to use the standard deviation to calculate the level of dispersion. SD is defined as –
The V1 variable is defined as the annual income of the samples under consideration. The mean, as mentioned above, is found out to be $101554.46. This implies on an average the annual income of the 50 samples is the aforementioned amount. This may not seem like a middle or central value as the incomes range from $250,000 to $40,000.
The median or the middle most value of the entire data set is calculated as $97973.5. This means that half of the observation set, that is income of 50% of the observations lie above this value point and consequently rest lie underneath this point. The median also depicts that the majority of the people have income in the vicinity of the mentioned value.
Quartiles are referred to the groups or sections when the entire data set is divided in four of them. All of the quartiles values are calculated till now. Here,
The first quartile provides the value above which 25% of the observations lie. Consequently third quartile does the same with that of 75% of the observations. Median or the 50th percentile or the second quartile is the middle value of the data. This means 50% of the observations are above this value and the rest are below.
Measures of variation include the range and the sample SD and the sample variance. All the values are calculated through excel and mentioned above. The values are found out to be
Standard Deviation |
41119.71617 |
Sample Variance |
1690831058 |
Range |
210000 |
Clearly the SD and the Variance are very high. The range also indicates the dispersion of the data set.
The three measures that help in recognizing whether the data is follows a normal distribution or not are – Mean Median and Skewness. In case of Normal distribution, Mean, Median and Mode shall all be equal. That is not the case for this particular data set (Leamer, 2016). The Skewness is also high as Skewness for a normal distribution tends to zero. Thus the data does not follow a Normal Population.
Here, the following table is drawn to conclude the number observations within the asked range.
CN |
V1 |
Z |
25 |
$40,000 |
-1.49696 |
234 |
$45,000 |
-1.37536 |
234 |
$45,000 |
-1.37536 |
4 |
$55,000 |
-1.13217 |
2 |
$62,500 |
-0.94977 |
251 |
$62,500 |
-0.94977 |
268 |
$70,000 |
-0.76738 |
176 |
$70,000 |
-0.76738 |
176 |
$70,000 |
-0.76738 |
123 |
$72,000 |
-0.71874 |
169 |
$88,887 |
-0.30806 |
221 |
$90,025 |
-0.28039 |
91 |
$90,164 |
-0.27701 |
153 |
$93,250 |
-0.20196 |
153 |
$93,250 |
-0.20196 |
62 |
$95,000 |
-0.1594 |
131 |
$95,297 |
-0.15218 |
49 |
$95,877 |
-0.13807 |
88 |
$95,931 |
-0.13676 |
98 |
$95,957 |
-0.13613 |
293 |
$96,286 |
-0.12812 |
134 |
$97,000 |
-0.11076 |
277 |
$97,338 |
-0.10254 |
277 |
$97,338 |
-0.10254 |
146 |
$97,756 |
-0.09238 |
249 |
$98,191 |
-0.0818 |
102 |
$98,645 |
-0.07076 |
180 |
$98,673 |
-0.07007 |
205 |
$98,673 |
-0.07007 |
241 |
$98,678 |
-0.06995 |
202 |
$98,955 |
-0.06322 |
46 |
$99,336 |
-0.05395 |
273 |
$99,374 |
-0.05303 |
237 |
$99,398 |
-0.05244 |
242 |
$99,398 |
-0.05244 |
223 |
$99,398 |
-0.05244 |
248 |
$99,398 |
-0.05244 |
57 |
$99,398 |
-0.05244 |
165 |
$99,398 |
-0.05244 |
252 |
$99,742 |
-0.04408 |
243 |
$100,200 |
-0.03294 |
137 |
$100,267 |
-0.03131 |
300 |
$101,262 |
-0.00711 |
114 |
$102,983 |
0.034741 |
113 |
$155,000 |
1.299755 |
72 |
$160,000 |
1.421351 |
140 |
$180,000 |
1.907736 |
225 |
$180,000 |
1.907736 |
60 |
$250,000 |
3.610082 |
193 |
$250,000 |
3.610082 |
Measures of Central Tendency and Variation of Income Distribution
The z scores are defined 1.5 and -1.5. From the standard normal table, the value found out is 0.43319. For both sides, the total of 86.638 % observations lies between the mentioned regions (Wan et al., 2014). This means about 43 observations lie in between the specified region.
The following table has been constructed to provide an idea of the region asked for.
CN |
V1 |
TRUE/FALSE |
25 |
$40,000 |
TRUE |
234 |
$45,000 |
TRUE |
234 |
$45,000 |
TRUE |
4 |
$55,000 |
TRUE |
2 |
$62,500 |
TRUE |
251 |
$62,500 |
TRUE |
268 |
$70,000 |
TRUE |
176 |
$70,000 |
TRUE |
176 |
$70,000 |
TRUE |
123 |
$72,000 |
TRUE |
169 |
$88,887 |
TRUE |
221 |
$90,025 |
TRUE |
91 |
$90,164 |
TRUE |
153 |
$93,250 |
TRUE |
153 |
$93,250 |
TRUE |
62 |
$95,000 |
TRUE |
131 |
$95,297 |
TRUE |
49 |
$95,877 |
TRUE |
88 |
$95,931 |
TRUE |
98 |
$95,957 |
TRUE |
293 |
$96,286 |
TRUE |
134 |
$97,000 |
TRUE |
277 |
$97,338 |
TRUE |
277 |
$97,338 |
TRUE |
146 |
$97,756 |
TRUE |
249 |
$98,191 |
TRUE |
102 |
$98,645 |
TRUE |
180 |
$98,673 |
TRUE |
205 |
$98,673 |
TRUE |
241 |
$98,678 |
TRUE |
202 |
$98,955 |
TRUE |
46 |
$99,336 |
TRUE |
273 |
$99,374 |
TRUE |
237 |
$99,398 |
TRUE |
242 |
$99,398 |
TRUE |
223 |
$99,398 |
TRUE |
248 |
$99,398 |
TRUE |
57 |
$99,398 |
TRUE |
165 |
$99,398 |
TRUE |
252 |
$99,742 |
TRUE |
243 |
$100,200 |
TRUE |
137 |
$100,267 |
TRUE |
300 |
$101,262 |
TRUE |
114 |
$102,983 |
TRUE |
113 |
$155,000 |
TRUE |
72 |
$160,000 |
TRUE |
140 |
$180,000 |
FALSE |
225 |
$180,000 |
FALSE |
60 |
$250,000 |
FALSE |
193 |
$250,000 |
FALSE |
It is evident 46 of the observations fall in the given region.
The regression equation is.
The primary purpose of this model is to test whether there is a linear relation between the age and percentage of phone usage for work purposes (Abdullah, Doucouliagos & Manning, 2015). If there is a linear relation between these two then will not be equal to zero.
Since the value of is 0.608, the variables are related in a linear positive manner.
The intercept coefficient provides the least square estimate of.
The slope coefficient is the value corresponding to age in the Coefficients column (Kolaczyk & Csárdi, 2014). That is also. Thus as described above, it represents a positive linear relation.
Coefficient of determination is the R-squared value in the table. It is 0.126743917. It means 12.6% of the variation of the variable around its mean can be described through the other regressors.
References
Hamman, R. D., Kennedy III, W. C., Rump, W. J., & Irwin, K. E. (2016). U.S. Patent Application No. 15/152,009.
Wun, T., Payne, J., Huron, S., & Carpendale, S. (2016, June). Comparing bar chart authoring with Microsoft Excel and tangible tiles. In Computer Graphics Forum (Vol. 35, No. 3, pp. 111-120).
Weaver, K. F., Morales, V., Dunn, S. L., Godde, K., & Weaver, P. F. (2018). Showing Your Data. An Introduction to Statistical Analysis in Research: With Applications in the Biological and Life Sciences, First, 61-190.
Wan, X., Wang, W., Liu, J., & Tong, T. (2014). Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC medical research methodology, 14(1), 135.
Kolaczyk, E. D., & Csárdi, G. (2014). Statistical analysis of network data with R(Vol. 65). New York: Springer.
Leamer, E. E. (2016). S-values: Conventional context-minimal measures of the sturdiness of regression coefficients. Journal of Econometrics, 193(1), 147-161.
Draper, N. R., & Smith, H. (2014). Applied regression analysis. John Wiley & Sons.
Harrell Jr, F. E. (2015). Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. Springer.
Marcolini, G., Bellin, A., & Chiogna, G. (2017). Performance of the Standard Normal Homogeneity Test for the homogenization of mean seasonal snow depth time series. International Journal of Climatology.
Mowery, D. C., Nelson, R. R., Sampat, B. N., & Ziedonis, A. A. (2015). Ivory tower and industrial innovation: University-industry technology transfer before and after the Bayh-Dole Act. Stanford University Press.
Chatterjee, S., & Hadi, A. S. (2015). Regression analysis by example. John Wiley & Sons.
Wilson, L., Bhatnagar, P., & Townsend, N. (2017). Comparing trends in mortality from cardiovascular disease and cancer in the United Kingdom, 1983–2013: joinpoint regression analysis. Population health metrics, 15(1), 23.
Abdullah, A., Doucouliagos, H., & Manning, E. (2015). Does education reduce income inequality? A meta?regression analysis. Journal of Economic Surveys, 29(2), 301-316.