0% found this document useful (0 votes)
37 views5 pages

Measurement Errors

This document discusses various types of errors that can occur in clinical trials and strategies to address them, including blinding participants and clinicians. It also covers internal and external validity, how study design impacts these, and the difference between clinical relevance versus statistical significance. Finally, it discusses the hierarchy of evidence from different study designs.

Uploaded by

Indahtul Mufidah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views5 pages

Measurement Errors

This document discusses various types of errors that can occur in clinical trials and strategies to address them, including blinding participants and clinicians. It also covers internal and external validity, how study design impacts these, and the difference between clinical relevance versus statistical significance. Finally, it discusses the hierarchy of evidence from different study designs.

Uploaded by

Indahtul Mufidah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Measurement Errors

Some systematic and random errors may occur during measurement (Table 1 ). Of interest to

clinical trials are the strategies to reduce performance bias (additional therapeutic interventions

preferentially provided to one of the groups) and to limit information and detection bias (ascertainment or

measurement bias) by masking (blinding) [ 9 ]. Masking is a process whereby people are kept unaware of

which interventions have been used throughout the study, including when outcome is being assessed.

Patient/ clinician blinding is not always practical or feasible, such as in trials comparing surgery with non-

surgery, diets, and lifestyles. Finally, measurement error can occur in the statistical analysis of the data.

Important elements to specify in the protocol include: definition of the primary and secondary outcome

measure; how missing data will be handled (depending on the nature of the data there are different

techniques); subgroup (secondary) analyses of interest; consideration of multiple comparisons and the infl

ation of the type I error rate as the number of tests increases; the potential confounders to control for; and

the possible effect modifi ers (interaction). This issue has implication for modeling techniques and is

discussed in subsequent chapters.

External and Internal Validity

The operational criteria applied in the design infl uence the external and internal validity of the

study (Fig. 3 ). Both construct validity and external validity relate to generalization. However, construct

validity involves generalizing from the study to the underlying concept of the study. It refl ects how well

the variables in the study (and their relationships) represent the phenomena of interest. For example, how

well does the level of proteinuria represent the presence of kidney disease? Construct validity becomes

important when a complex process, such as care for chronic kidney disease, is being described.

Maintaining consistency between the idea or concept of a certain care program and the operational details

of its specific components in the study may be challenging. External validity involves generalizing

conclusions from the study context to other people, places, or times. External validity is reduced if study

eligibility criteria are strict, or the exposure or intervention is hard to reproduce in practice. The closer the

intended sample is to the target population, the more relevant the study is to this wider, but defi ned, group

of people, and the greater is its external validity. The same applies to the chosen intervention, control and

outcome including the study context. The internal validity of a study depends primarily on the degree to
which bias is minimized. Selection, measurement, and confounding biases can all affect the internal

validity.

Fig. 3 Structure of study design. The left panel represents the design phase of a study, when

Patient, Intervention, Control and Outcome (PICO) are defi ned (conceptualization and

operationalization). The right panel corresponds to the implementation phase. Different types of bias

can occur during sampling, data collection and measurement. The extent to which the results in the

study can be considered true and generalizable depends on its internal and external validity. With

permission Ravani et al., Nephrol Dial Transpl

In any study there is always a balance between external and internal validity, as it is difficult and

costly to maximize both. Designs that have strict inclusion and exclusion criteria tend to maximize internal

validity, while compromising external validity. Internal validity is especially important in efficacy trials to

understand the maximum likely benefit that might be achieved with an intervention, whereas external

validity becomes more important in effectiveness studies. Involvement of multiple sites is an important

way to enhance both internal validity (faster recruitment, quality control, and standardized procedures for

data collection, management, and analysis) and external validity (generalizability is enhanced because the

study involves patients from several regions).

Clinical Relevance vs. Statistical Significance

The concepts of clinical relevance and statistical signifi cance are often confused. Clinical

relevance refers to the amount of benefit or harm apparently resulting from an exposure or intervention

that is sufficient to change clinical practice or health policy. In planning study sample size, the researcher

has to determine the minimum level of effect that would have clinical relevance. The level of statistical

significance chosen is the probability that the observed results are due to chance alone. This will

correspond to the probability of making a type I error, i.e., claiming an effect when in fact there is none.

By convention, this probability is usually 0.05 (but can be as low as 0.01). The P value or the limits of the

appropriate confidence interval (a 95 % interval is equivalent to a signifi cance level of 0.05 for example)

is examined to see if the results of the study might be explained by chance. If P <0.05, the null hypothesis

of no effect is rejected in favor of the study hypothesis, despite it is still being possible that the observed

results are simply due to chance. However, since statistical signifi cance depends on both the magnitude
of effect and the sample size, trials with very large sample sizes theoretically can detect statistically signifi

cant but very small effects that are of no clinical relevance.

Hierarchy of Evidence
Fundamental to evidence-based health care is the concept of “hierarchy of evidence” deriving

from different study designs addressing a given research question (Fig. 4 ). Evidence grading is based on

the idea that different designs vary in their susceptibility to bias and,therefore, in their ability to predict the

true effectiveness of health care practices. For assessment of interventions, randomized controlled trials

(RCTs) or systematic review of good quality RCTs are at the top of the evidence pyramid, followed by

longitudinal cohort, case–control, cross-sectional studies and case series at the bottom [ 10 ]. However, the

choice of the study design depends on the question at hand and the frequency of the disease. Intervention

questions ideally are addressed with experiments (RCTs) since observational data are prone to

unpredictable bias and confounding that only the randomization process will control. Appropriately

designed RCTs allow also stronger causal inference for disease mechanisms. Prognostic and etiologic

questions are best addressed with longitudinal cohort studies in which exposure is measured fi rst and

participants are followed forward in time. At least two (and possibly more) waves of measurements over

time are undertaken. Initial assessment of an input–output relationship may derive from case–control

studies where the direction of the study is reversed.

Fig. 4 Examples of study designs. In cross-sectional studies inputs and output are measured

simultaneously and their relationship is assessed at a particular point in time. In case–control studies

participants are identified based on presence or absence of the disease and the temporal direction of

the inquiry is reversed (retrospective). Temporal sequences are better assessed in longitudinal cohort

studies where exposure levels are measured fi rst and participants are followed forward in time. The

same occurs in randomized controlled trials RCTs) where the assignment of the exposure is under the

control of the researcher. With permission Ravani et al., Nephrol Dial Transpl [ 20 ]. P Probability (or

risk)

Participants are identifi ed by the presence or absence of disease and exposure is assessed

retrospectively. Cross-sectional studies may be appropriate for an initial evaluation of the accuracy of new

diagnostic tests as compared to a gold standard. Further assessments of diagnostic programs are performed

with longitudinal studies (observational and experimental). Common biases affl icting observational

designs are defi ned in Chapter 3 and discussed in more detail in Chapter 20 .
Experimental Designs for Intervention Questions
The RCT design is appropriate for assessment of the clinical effects of drugs, procedures, or care

processes, defi nition of target levels in risk factor modifi cation (e.g., blood pressure, lipid levels, and

them was no more than 3 mmHg. Non-inferiority trials have been criticized, as imperfections in study

execution, which tend to prevent detection of a difference between treatments, actually work in favor of a

conclusion of non-inferiority. Thus, in distinction to the usual superiority trial, poorly done studies may

lead to the desired outcome for the study sponsor.

Designs for Diagnostic Questions


When assessing a diagnostic test the reference or “gold standard” tests for the suspected target

disorders are often either inaccessible to clinicians or avoided for reasons of cost or risk. Therefore, the

relationship between more easily measured phenomena (patient history, physical and instrumental

examination, and levels of constituents of body fluids and tissues) and the fi nal diagnosis is an important

subject of clinical research. Unfortunately, even the most promising diagnostic tests are never completely

accurate. Clinical implications of test results should ideally be assessed in four types of diagnostic studies.

Table 4 shows examples from diagnostic studies of troponins in coronary syndromes. As a first step, one

might compare test results among those known to have established disease to results from those free of

disease. Crosssectional studies can address this question (Fig. 4 ). However, since the direction of

interpretation is from diagnosis back to the test, the results do not assess test performance. To examine test

performance requires data on whether those with positive test results are more likely to have the disease

than those with normal results [ 12 ]. When the test variable is not binary (i.e., when it can assume more

than two values) it is possible to assess the trade-off between sensitivity and specificity at different test

result cutoff points [ 13 ]. In these studies it is crucial to ensure independent blind assessment of results of

the test being assessed and the gold standard to which it is compared, without the completion of either

being contingent on results of the other. Longitudinal studies are required to assess diagnostic tests aimed

at predicting future prognosis or development of established disease [ 12 ]. The most stringent evaluation

of a diagnostic test is to determine whether those tested have more rapid and accurate diagnosis, and as a

result better health outcomes, than those not tested. The RCT design is the proper tool to answer this type

of question [ 14 ].
Maximizing the Validity of Non-experimental Studies
When randomization is not feasible the knowledge of the most important sources of bias is

important to increase the validity of any study. This may happen for a variety of reasons: when study

participants cannot be assigned to intervention groups by chance either for ethical reasons (e.g., in a study

of smoking) or participant willingness (e.g., comparing hemodialysis to peritoneal dialysis); the exposure

is fixed (e.g., gender); or the disease is rare and participants cannot be enrolled in a timely manner. When

strategies are in place to prevent bias, results of non-experimental studies may approach those of rigorous

RCTs.

Reporting
Adequate reporting is critical to the proper interpretation and evaluation of any study results.

Guidelines for reporting primary (CONSORT, STROBE, and STARD for example) and secondary

studies (PRISMA) are in place to help both investigators and consumers of clinical research [ 15 – 18 ].

Scientifi c reports may not fully reflect how the investigators conducted their studies, but the quality of the

scientifi c report is a reasonable marker for how the overall project was conducted. The interested reader is

referred to the above referenced citations for more details of what to look for in reports from prognostic,

diagnostic, and intervention studies.

You might also like