Guidelines For Building Reliable And Valid Personality Assessments
Importance of Item Syntax, Grammar, and Wording
Reliably and validly assessing people’s personality is essential to a mental evaluation. Therefore, considerable energy has been put into building self-report action. Researcher typically seeks to capitalize on the content rationality of the tests and each item’s content is thoroughly selected to apprehend the construct. But, an influence of item syntax, grammar, and wording are underestimated or neglected (Allemand et al., 2008, pp.758). Not only the content but also the answering scales (formats) are crucial and affect psychometric aspects of a scale such as a criterion and construct intensity.
Some basic guideline exists that ought to be tracked to safeguard that items are correctly built. It is crucial to keep all the item constructs in details, and being confident not to combine things that evaluate constructs with that assessing outcomes or reactions (Ziegler & Hagemann, 2015, pp. 43). Another essential aspect is that statement should be modest and as short as possible, and the linguistic used should be familiar to the target respondents (Kline, 2005, pp. 30). Reverse-scored or negatively-phrased should be used with care as a few of these items scattered can have a detrimental result on the psychometric features. For the meaningful responses to be obtained, questions should be comprehended by the respondent as intended by the researchers (Kline, 2005, pp. 49). Finally, content redundancies are necessary when building numerous items because they are the basis of inner consistency.
Content adequacy assessment is also critical to the item construction. In many instances, a researcher invests a lot of resource and time only to discover that they have flawed necessary measure. Ensuring the content are sufficient before final questionnaire development offers aid for construct validity and permits the removal of items that may be theoretically unreliable (Ziegler & Hagemann, 2015, pp. 45). Similarly, researcher should take care of the factor analysis as it assists in determining how many aspects exist for the set of items. Internal consistency assessment is also a significant element in item constructs. But, the most recommended for measuring the scales internal consistency is Cronbach Alpha which illustrates how the item gauges the same construct. Construct validation and replication is also significant concerns in the constructing the question (Bayoglu et al., 2013, pp. 331). For researchers to circumvent the typical source concerns, it is commended that facts from sources other than the respondents, such as regular appraisals, be composed where possible. The duplication includes factors analysis, evaluation of internal reliability and constructs authentication. Therefore, the above reviews should offer the scholar with the sureness that the completed measures hold the validity and reliability and would be appropriate for use in an upcoming study.
Answering Scales and Psychometric Aspects
A scale is a form of a merged gauge that is comprised of numerous objects that have empirical or logical assembly (Ashton & Lee, 2007, pp.107). Therefore, the scales make use of variances among the pointers of the variables. For instance: when a query has the reaction selection such as at all times, occasionally, hardly and never. It signifies a scale because the response selections have differences in intensity and are rank-ordered. Numerous types of range exist, but in this paper, it will analyze four regularly used scales in social science study and how they are created (Fishman & Galguera, 2003, pp. 61).
Likert scale
The scale is regularly used in social science survey. The level is termed after the psychologist Rensis Likert (Dittrich et al., 2007, pp. 4). Being one of the commonly used scales, it illustrates how a candidate provides their view on something by affirming the level to which they approve or differ. “The scale looks similar such as, intensely agree, and agree, neither agree nor disagree, disagree, sturdily disagree” (Dittrich et al., 2007, pp. 5).
For researchers to build the size, each answer selection is allocated a range starting from 0 to 4 (Dittrich et al., 2007, pp. 6). Likert items can be supplemented collectively for each to get a complete score. For instance: measurement of prejudice against women (Kline, 2005, pp. 64). The score of every statement would be summed for each candidate to build a complete count of bias. If we had five comments and candidates hugely disagreed to each item, her or his compete prejudice score would be 0, denoting a slight degree of bias against women (Dittrich et al., 2007, pp. 9).
The Bogardus scale is a method for gauging the readiness of individuals to take part in social links with others. Entirely, the range requests individuals to confirm the extent to which they are tolerating another group. “Each item on the level is counted to reveal the social space, from 1.00 as a measure of number social distance to 5.00 measuring the maximum social distance”(Bayoglu et al., 2013, pp. 337). When the answer of each reaction is averaged, a higher magnitude denotes a lower level of approval.
The differential gauge is a collaboration of more than one range. The maximum disparity scale is used in trade-off analyses (Revelle et al., 2011, pp. 3). Max diff study is utilized in new products aspect or even in market division study to get proper arrangements of the most vital prediction feature. The scale asks the candidates to answer the questionnaires and selects between two different standings (Revelle et al., 2011, pp. 7). For example, assume one wanted to have respondent’s opinion about new video games shows. One would first choose what scopes to gauge and discover two different items that signify the aspects. For instance: not relatable and relatable, not funny and funny, enjoyable and un-enjoyable. Therefore, one build the rating sheet for the candidates to indicates how they feel about the video games in each length (Revelle et al., 2011, pp. 19).
Content Adequacy Assessment
The scale was generated by Thurstone Louis, planned to advance design for creating groups of pointers of a variable that has an innovative construction. For instance, the discrimination study, one would build a list of items (like ten) and ask a candidate to allocate a score of 1 to 10 for each object (Ashton & Lee, 2007, pp.110). In essence, a candidate has ranked the elements according to the feeblest indications of discernment all the way to the most robust gauge. Once the candidates have recorded the details, the study investigates the scores allocated to each element by determining which items the applicants settled upon most. If the scale items were sufficiently established and rated, the effectiveness and economy of facts reductions present in the Bogardus social distance scale would appear.
Likert-type scales: range is essential when measuring latent construct; characteristics of individuals such has opinion, attitudes or feelings (Dittrich et al., 2007, pp. 10). Latent is usually constructs that are thought of discreet people physiognomies. Typically, the scale use statement and response from 3-7 point response scales (Dittrich et al., 2007, pp. 13). The items should be expressed in a manner that only possesses one distinct per question so that it is precisely what the individual is replying. Also one should avoid using a word such as not or other unfavorable terms directly as it become perplexing about what it means to disagree with a negative. Likert scale is also significant when one wants to measure the intensity of opinion. Likert scales can be counted in a range of techniques. One would rely upon each item so that higher scores specify more of some of the characters and then take the mean of all elements (Dittrich et al., 2007, pp. 15). However, number will not have any intrinsic sense. For instance, the measurement of the politician attitudes, scoring a 4.4 does not mean excluding that on average (Ax & Fagan, 2007, pp.69). Finally, one repeatedly would wish to check the consistency of the Likert style scale using internal reliability (Dittrich et al., 2007, pp. 14). Arithmetically, internal consistency is the mean of all potential split-half correlation.
Three item construct
Warmth |
Assertiveness |
Positive outcomes |
NE0-PI-R extraversion |
According to Rauthmann (2011), items means capturing traits content that are numerous in their arrangements and structures, yet they can be planned according to specific scopes: point of references, construct indicators, item format, and contextuality/ conditionality (119). Attribute items use staticity statement or explanations such as ‘I go out and talk to individuals.’ Secondly, it uses frequency description of mental process and behaviors. For instance, ‘I regularly go out and communicate with individuals’’. Additionally, it uses explanations regarding the valency of one’s emotional state towards something. For instance, ‘I like going out and speaking with individuals’ (Norris & Lecavalier, 2010, pp. 10). The three tactics can deemed as overall item layouts into which most items apt. Staticity explanations use an occurrence and valency description tactic. It is worth noting that they require not to be specified within the item but can be established in the answer scales. Nevertheless, using a universal method to item format, it also comprises various construct-applicable indicators which mostly attributes context, behaviors, mental process and situations (Rauthmann, 2011, pp. 120). Moreover, the items can be unconditional or conditional. Limited items frequently use ‘if or when’ phrases to provide a background specification under which specific psychological actions and conducts happen. Similarly, the items perspective can be differentiated: there can be a first person mentioning to her or his qualities, practices and mental courses such as I am/ sense/ act/ think/ do (Ax & Fagan, 2007, pp.71). However, one can possessively apply to her or his psychological process, behaviors or attributes like my behavior, feelings or thoughts.
Factor Analysis and Internal Consistency Assessment
Therefore, most items mean catching trait-construct that can be explained in four-way collaboration of the following scopes. Reference point includes; indicators, possessive, first person) (Cheung et al., 2011, pp.593). General item plan comprises valency, staticity, incidence, and valency+ frequency. Construct sign: mental, behavior, attribute and contextual. Conditionality includes conditional and unconditional (Rauthmann, 2011, pp. 122). But, not all interfaces are conceivable, and though most items have stood on all four dimensions, various items may have two standing within a proportion.
All studies were executed on NEO-PI-R. The style was selected due to broad applicability and extensiveness. The feature of extraversion is an essential aspect of human nature (Cheung et al., 2011, pp.593). The item format analyses were done according to four elements of items; reference point, general item approach, constructs indicators and conditionality. Numerous procedures for each length and its subdivision were used to assess each item create on the sub-dimension. Outcomes of each piece offered a tally for all four scores. For instance, an object suggesting ‘I am assertive and dominant’ would be categorized as an item in staticity method in an unconditional approach with attribute indicators and first-person situation (Ax & Fagan, 2007, pp. 80). The table below shows results of formal item analysis for NEO-PI-R aspects of extraversion (assertiveness, positive emotions, and warmth)
% Point reference |
% general item approach |
% Construct indicator |
%Conditionality |
1 |
2 |
3 |
4 |
S |
F |
V |
F+V |
A |
B |
M |
C |
NO |
Warmth |
87.5 |
0 |
12.5 |
0 |
75 |
0 |
25 |
0 |
25 |
37.5 |
37.5 |
0 |
100 |
0 |
Assertiveness |
87.5 |
0 |
12.5 |
0 |
25 |
50 |
25 |
0 |
12.5 |
87.5 |
0 |
0 |
100 |
0 |
Positive Outcomes |
100 |
0 |
0 |
0 |
50 |
50 |
0 |
0 |
37.5 |
37.5 |
25 |
0 |
100 |
0 |
S-staticity tactic, f -frequency, V-valency method, A – attributes, B- behavioral, C- contextual, M-mental approach
1-oneself first person
2- About own characteristic
3- Oneself and one’s trait, behavioral and psychological procedures
4-Construct-relevant indicator
Point of references; the first person standpoint was established in greatest items. Only two items had other individuals as the reference point and also just two had construct indicators as the references point. Item method used NEO-PI-R extraversion facet was the staticity method, followed by the frequency approach and finally the valency approach. It is evident most of the items of the extraversion was in a staticity setup which creates no references to occurrences or valences. Construct indicators: most gauges are behavior in the extraversion features and also showed attribute and mental signs. Finally, conditional items were scarce in the NEO-PI-R version.
Allemand, M., Zimprich, D., & Hendriks, A. A. (2008). Age differences in five personality domains across the life span. Developmental Psychology, 44(3), 758.
Ashton, M.C., & Lee, K. (2007). Empirical, theoretical, and practical advantages of the HEXACO model of personality structure. Personality and Social Psychology Review, 11, 150–166.
Ax, R. K., & Fagan, T. J. (2007). Corrections, mental health, and social policy: International perspectives. Charles C Thomas Publisher, pp.61-80.
Bayoglu, B., Unal, O., Elibol, F., Karabulut, E., & Innocenti, M.S. (2013). Turkish Validation of the PICCOLO (Parenting Interactions with Children: Checklist of Observations Linked to Outcomes). Infant Mental Health Journal, 34(4), 330–338.
Cheung, F. M., van de Vijver, F. J., & Leong, F. T. (2011). Toward a new approach to the study of personality in culture. American Psychologist, 66(7), 593.
Dittrich, R., Francis, B., Hatzinger, R., & Katzenbeisser, W. (2007). A paired comparison approach for the analysis of sets of Likert-scale responses. Statistical Modelling, 7(1), 3-28.
Fishman, J. A., & Galguera, T. (2003). Introduction to test construction in the social and behavioral sciences: A practical guide. Rowman & Littlefield Publishers, pp.53-169
Kline, T. (2005). Psychological testing: A practical approach to design and evaluation. Sage, pp. 29-75.
Norris, M., & Lecavalier, L. (2010). Evaluating the use of exploratory factor analysis in developmental disability psychological research. Journal of autism and developmental disorders, 40(1), 8-20.
Rauthmann, J. F. (2011). Not only item content but also item format is important: Taxonomizing item format approaches. Social Behavior and Personality: an international journal, 39(1), 119-128.
Revelle, W., Wilt, J., & Condon, D. M. (2011). Individual differences and differential psychology. The Wiley-Blackwell handbook of individual differences, 1-38.
Ziegler, M., & Hagemann, D. (2015). Testing the unidimensionality of items, pp. 42-49.