Summary of Chapter 7

      2 Comments on Summary of Chapter 7

Scale Reliability & Validity

  • Why must we test scales?
    1. To ensure these scales indeed measure the unobservable construct that we wanted to test – (i.e. the scales are “valid”).
    2. To ensure they measure the intended construct consistently and precisely (i.e. the scales are “reliable”)
    3. Reliability and validity are the yardsticks against which the adequacy of our measurement procedures are evaluated in scientific research.
  • A measure can be reliable but not valid if it is measuring something very consistently but is consistently measuring the wrong construct.
  • A measure can be valid but not be reliable if it is measuring the right construct, but not in a consistent manner.

Reliability

  • Reliability is the degree to which the measure of a construct is consistent or dependable. In other words, if we use this scale to measure the same construct multiple times, do we get the same result every time, assuming the underlying phenomenon is not changing?
    1. Reliability implies consistency but not accuracy.

What are the sources of unreliable observations in social science measurements?

  • The observer’s (or researcher’s) subjectivity.
  • Asking imprecise or ambiguous questions.
  • Asking questions about issues that the respondents are not familiar about or care about.

What are the different ways of estimating reliability?

  • Inter-rater reliability
    1. Also called inter-observer reliability, is a measure of consistency between two or more independent rates (observers) of the same construct.
    2. Usually, this is assessed in a pilot study.
  • Test-retest reliability
    1. Measure of consistency between two measurements (tests) of the same construct administered to the same sample at two different points of time.
    2. The time interval between the two tests is critical.
      1. Generally, the longer the time gap, the greater the chance for the two observations may change during this time (due to random error).
  • Split-half reliability
    1. Measure of consistency between two halves of a construct measure.
    2. The longer the instrument, the more likely it is that the two halves of the measure will be similar (since random errors are minimized as more items are added).
  • Internal consistency reliability
    1. Measure of consistency between different items in a construct.
    2. If a multiple-item construct measure is administered to respondents, the extent to which respondents rate those items in a similar manner is a reflection of the internal consistency.
    3. Cronbach’s Alpha – A reliability measure designed by Lee Cronbach in 1951, factors in scale size in reliability estimation.

Validity

  • Validity refers to the extent to which a measure adequately represents the underlying construct that it is supposed to measure.
  • Validity can be assessed using theoretical and empirical approaches and ideally both.
    1. Theoretical assessment focuses on how well the idea of a theoretical construct is translated into or represented in an operation model.
      1. This is called translation validity and consists of two subtypes: face and content validity.
    2. Empirical assessment examines how well a given measure relates to one or more external criterion, based on empirical observations.
      1. This is call criterion-related validity and includes four subtypes: convergent, discriminant, concurrent and predictive.

What are the different ways to measure validity?

  • Face Validity
    1. Refers to whether an indicator seems to be a reasonable measure of its underlying construct “on its face”.
  • Content Validity
    1. Assessment of how well a set of scale items matches with relevant content domain of the construct that it is trying to measure.
    2. Requires a detailed descriptions of the entire content domain of a construct.
  • Convergent Validity
    1. Refers to the closeness with which a measure relates to (or converges on) the construct that it is purported to measure.
  • Discriminant Validity
    1. Refers to the degree of which a measure does not measure (or discriminates from) other constructs that it is not supposed to measure.
    2. Usually convergent and discriminant validity are assessed jointly.
      1. Convergent and discriminant validity can be valuated with bivariate correlations, exploratory factor analysis or the multi-trait multi-method (MTMM) approach.
  • Predictive Validity
    1. Degree to which a measure successfully predicts a future outcome that is theoretically expected to predict.
  • Concurrent Validity
    1. Examines how well one measure relates to other concrete criterion that is presumed to occur simultaneously.

Theory of Measurement

  • Classic Test or True Score Theory
    1. Psychometric theory that examines how measurements work, what it measures and what it does not measure.
    2. Theory postulates that every measurement has a true score T that can be observed accurately if there were no errors in measurements.
    3. However, the presence of measurement errors E results in a deviation of the observed score X from the true score.

X                                  =          T                      +          E

Observed scored                     True Score                   Error

  • Measurements errors can be two types: random and systematic.
    1. Random error is the error that can be attributed to a set of unknown and uncontrollable external factors that randomly influence some observations but not others.
      1. Random error reduces the reliability of measurement by increasing validity in observations.
    2. Systematic error is an error that is introduced by factors that systematically affect all observations of a construct across an entire sample in a systematic manner.
      1. Systematic error reduces the validity of measurement by shifting the central tendency measure.

Integrated Approach to Measurement Validation

  • Complete and adequate assessment of validity mush include both theoretical and empirical approaches.
  • Integrated approach starts in the theoretical realm.
    1. Conceptualize the constructs of interest
    2. Select or create items or indicators for each construct base on our conceptualization of the construct.
    3. Q-Sort for item refinement and dropping
    4. Examine face and content validity.
  • Integrated approach moves into empirical realm.
    1. Collect pilot test data.
    2. Factor analysis for convergent/discriminant validity.
    3. Examine reliability and scale dimensionality.
    4. Examine predictive validity.
    5. If construct measures satisfy most of all of the requirements of reliability and validity, the operational measures are reasonably adequate and accurate.

2 thoughts on “Summary of Chapter 7

  1. Brittney Wright

    When learning about validity and reliability in the past. I would always use a picture to help me distinguish between the two. Being able to see what it means to have low and high validity and reliability is important.

    https://conjointly.com/kb/reliability-and-validity/

    I have never heard of Cronbach Alpha before this class and so I decided to do some extra research to make sure that I understood the concept better and when it should be used. I found that Cronbach alpha is the most common meseause of internal reliability. It is most commminly used when there are multiple likert questions in a survey or questionnaire that form a scale.

  2. Renee Saxton

    Chapter Seven made clear the importance of validating collected information to make sure it is reliable. The bullseye’s diagram was really clarifying, and it also points out potential pitfalls as a researcher. The cluster showing a tightly grouped set outside the mark denotes a corrupt data set. I’m most concerned about that potential in my research. As researchers, we can certainly corrupt findings if we approach the research with unintentional biases. The importance of selecting from one of the motley suggestions in the chapter cannot be overstated. Just reading through the validation offerings, I’ll probably use the internal consistency reliability metric. I’m learning to appreciate the number of choices available rather than becoming overwhelmed by them.

Comments are closed.