Review Paper Moderator: Gail Furman, PhD, MSN A Methodological Review of the Assessment of Humanism in Medical Students Era Buck, PhD, Mark Holden, MD, and Karen Szauter, MD Abstract Background Humanism is a complex construct that defies simplistic measurement. How educators measure humanism shapes understanding and implications for learners. This systematic review sought to address the following questions: How do medical educators assess humanism in medical students, and how does the measurement impact the understanding of humanism in undergraduate medical education (UME)? Method Using the IECARES (integrity, excellence, compassion, altruism, respect, empathy, and service) Gold Foundation framework, a search of English literature databases from Humanism provides the passion that animates authentic professionalism. —Jordan J. Cohen T he optimal practice of medicine requires much more than mastering the rapidly evolving foundational and clinical sciences. Humanism is core to the provision of excellent patient care. Fostering the development of humanistic physicians poses a great challenge to medical educators as methods to promote and assess humanism throughout the educational continuum are methodologically complex and highly contextual. The importance of professionalism in health care has been described and studied extensively over the past two Correspondence: Era Buck, PhD, Office of Educational Development, University of Texas Medical Branch, 301 University Blvd., Galveston, TX 77555-0408; telephone: (409) 772-3235; e-mail: Acad Med. 2015;90:S14–S23. doi: 10.1097/ACM.0000000000000910 S14 2000 to 2013 on assessment of humanism in medical students revealed more than 900 articles, of which 155 met criteria for analysis. Using descriptive statistics, articles and assessments were analyzed for construct measured, study design, assessment method, instrument type, perspective/source of assessment, student level, validity evidence, and national context. Results Of 202 assessments reported in 155 articles, 162 (80%) used surveys; 164 (81%) used student self-reports. One hundred nine articles (70%) included only one humanism construct. Empathy was the most prevalent construct present in 96 (62%); 49 (51%) of those used a decades. Finding a common definition for professionalism has been challenging, as most agree that this construct has many facets. Several definitions of professionalism include a focus on the humanistic qualities of health care providers.1 Dr. Jordan Cohen posits a unique relationship between these two constructs, defining professionalism as a “way of acting” and humanism as “a way of being.”2 He argues that behavior grounded in deep-seated humanistic qualities is more likely to be sincere and resistant to the many challenges posed by the systems in which health care providers train and work. The Arnold P. Gold Foundation suggests integrity, excellence, compassion, altruism, respect, empathy, and service (IECARES) as the core characteristics of humanism.3 Measuring complex attitudes and behaviors poses a special challenge to medical educators. What we know about humanism—its core attributes, our ability to reinforce desirable qualities, and the variability of humanism within an individual over time—is shaped by the way we measure it. Assessment strategies single instrument. One hundred fifteen (74%) used exclusively quantitative data; only 48 (31%) used a longitudinal design. Construct underrepresentation was identified as a threat to validity in half of the assessments. Articles included 34 countries; 87 (56%) were from North America. Conclusions Assessment of humanism in UME incorporates a limited scope of a complex construct, often relying on single quantitative measures from self-reported survey instruments. This highlights the need for multiple methods, perspectives, and longitudinal designs to strengthen the validity of humanism assessments. must offer valid measures, be practical, and facilitate contextual interpretation.4 High-quality assessment is an essential foundation for providing feedback to learners, evaluating curricular strategies, and refining our theoretical and conceptual understanding of humanism among medical students. Measuring a construct as multidimensional as humanism requires a variety of techniques and approaches. Closely examining the existing tools and methods in current use is imperative to inform the design of future educational programs and assessment activities. We undertook this work to better understand the current approaches to the measurement of humanism in undergraduate medical education. We specifically asked, “How do medical educators assess humanism in medical students, and how does the measurement impact what we know about humanism in undergraduate medical education?” Method We conducted a systematic narrative examination of the literature to better Academic Medicine, Vol. 90, No. 11 / November Supplement 2015 Copyright © by the Association of American Medical Colleges. Unauthorized reproduction of this article is prohibited. Review Paper understand how constructs of humanism were represented throughout work related to medical students. The review process and data abstraction were conducted according to guidelines for systematic reviews, and the synthesis was primarily narrative rather than statistical. This approach allowed inclusion of qualitative and quantitative research studies in the review. Specifically, we sought to examine approaches to the assessment of humanism, anticipating that this might illuminate types of bias present in the literature and identify gaps in the assessment of humanism in undergraduate medical education. Search strategy In consultation with a reference librarian, we conducted electronic searches for English-language articles published between 2000 and 2013 in PubMed, ERIC, CINAHL, and Web of Science databases. Search terms included undergraduate medical education combined with each of the humanism constructs included in IECARES: integrity, excellence, compassion, altruism, respect, empathy, and service, as well as the term humanism.3 We identified additional articles during the review process by carefully examining the titles in the reference lists of all included and excluded articles. In addition, we conducted forward citation searches for 76 review articles and conceptual or theoretical articles identified in the review process. This iterative process was continued throughout the study period until saturation was reached—that is, all articles identified in reference lists as potentially relevant for review were already in the database of retrieved articles. Inclusion and exclusion criteria An initial review of each article determined whether it met basic inclusion or exclusion criteria. To progress to full review, an article must have met all inclusion criteria and none of the exclusion criteria. Inclusion criteria. (1) Peer-reviewed articles; (2) assessment of medical students; (3) sufficient description of assessment to allow validity assessment; and (4) assessment of one of the following: integrity, excellence, compassion, altruism, respect, empathy, service, humanism, or identity development. We relied on the designation within each article of the construct being assessed as well as the following definitions3: •฀ Integrity is congruence between values and behavior. •฀ Excellence is defined in our review as a commitment to clinical competence stemming from a sense of duty to do what is best for patients. We did not include articles assessing levels of competence at performing clinical skills. •฀ Compassion refers to an awareness and acknowledgment of suffering coupled with a desire to relieve it. •฀ Altruism is the willingness and ability to put others’ needs before one’s own. •฀ Respect is regard for autonomy and values of another including patients and colleagues. •฀ Empathy is defined broadly as the ability to put oneself in another’s situation. •฀ Service refers to giving beyond what is required. •฀ Humanism in health care is characterized by a respectful and compassionate relationship between physicians, as well as all other members of the health care team, and their patients. It reflects attitudes and behaviors that are sensitive to the values and the cultural and ethnic backgrounds of others. •฀ Identity development refers to the transformative process of becoming a humanistic physician.5 Exclusion criteria. (1) Review articles, (2) duplicate reports, or (3) insufficient description of assessment method. Articles were reviewed independently by at least two members of the study team to determine whether they met the criteria listed above, and discrepancies were resolved by consensus. Data abstraction Data abstraction began with an iterative process of developing a data collection form. The coinvestigators met for an intensive training over two days. During Academic Medicine, Vol. 90, No. 11 / November Supplement 2015 that time, common articles were reviewed individually by each member of the team, followed by group discussion of each article. Coding criteria were explicated through consensus. As data collection proceeded, we met weekly to discuss coding issues and maintain consistency among the team as reviewers. For each article, the following information was considered: Target(s) of assessment. We coded each article for integrity, excellence, compassion, altruism, respect, empathy, service, humanism, and/or identity development. We allowed multiple categories to be coded for a single article. Method(s) of assessment. We coded each method as quantitative or qualitative; survey, interview, observation, or reflection; and cross-sectional or longitudinal. For qualitative studies, prompts were recorded on the data abstraction form. We allowed for multiple methods to be coded for a single article. Measurement and variable types. We coded assessments as dichotomous, continuous, ordinal, or descriptive and as independent or outcome variables. There were multiple categories coded for each article. Setting or context of assessment. We coded the nation where data were collected; level of training for students assessed; and setting of assessment. Perspective or source of assessment. We coded perspectives as self, patient, peer, investigator, faculty, standardized patient (SP), or clinical team member. Each assessment in an article was coded for perspective—for instance, an article could be coded as having a self-report survey and an observation by an SP. Validity evidence. Sources of evidence were coded as content, response process, internal structure, relationship to other variables, or relationship to consequences. Threats to validity. We coded for construct underrepresentation and/ or construct-irrelevant variance. When considering the validity of the assessment, we used criteria based on the work of Downing and Haladyna.6,7 When authors identified specific constructs as the target of assessment, we coded S15 Copyright © by the Association of American Medical Colleges. Unauthorized reproduction of this article is prohibited. Review Paper them to be consistent with the authors’ designation. In articles where the author was not explicit about target constructs, we made coding decisions as a team. For example, unless otherwise designated by the authors, we included articles assessing patient orientation or patient centeredness as respect, and we coded assessments of emotional intelligence as compassion. Analysis From the plethora of potential analyses for these articles, we focused on those that most directly informed our research question, cataloging assessment strategies and synthesizing that information to understand the impact on the understanding we derive from the literature. We sought to identify areas of strength, gaps, and potential bias that may impact the understanding of medical educators as consumers of the literature. We employed descriptive frequencies and proportions to analyze the data as a whole. On the basis of an initial examination of the data, we also examined subgroups of data, most notably studies assessing empathy among medical students. We assessed the strength of the validity evidence provided by authors as well as threats to the validity of the assessments provided. Because we were not synthesizing the study results, we did not evaluate the quality of study designs. Results Trial flow Figure 1 summarizes the results of the search process. Through the literature search based on key words along with forward and backward citation searches, over 900 articles received at least an initial review. Initial review resulted in exclusion of over 660 articles not meeting the complement of inclusion criteria. Additional articles were excluded after full review when in-depth reading revealed that the assessment of humanism was not of medical students or the description of the assessment approach was not sufficient for the reviewer to determine what might contribute to or detract from the validity of the information in the article. After full review, 155 articles8–162 were included in the database for analysis. Study characteristics The literature we reviewed was published entirely in English yet was very international in origin. Studies with multiple assessments resulted in a total of 202 assessments recorded in our database. Some analyses consider data at the article level and some at the assessment level as needed to provide the most informative results. A summary of articles, organized by construct, is shown in Table 1. Of the assessments (n = 202), 80% (n = 162) were surveys; 12% (n = 25) were observations (21 of which were ratings by SPs); and the remaining 8% (n = 16) were interviews and assessments of medical students’ reflective writing. Similarly, 81% (n = 164) were from the perspective of the students (self-report), 10% (n = 20) from the perspective of an SP, and the remainder distributed among faculty and peers. We reviewed a single paper23 using information from actual patients about students. The total number of citations to these articles is 4,551. Six (3%) articles have been cited more than 100 times, an additional 6 were cited between 50 and 99 times, and 28 articles were cited 25 to 49 times. The remainder (n = 115) were cited less than 25 times. Whereas integrity, altruism, and service were infrequently assessed and most often in combination with additional constructs, global humanism, compassion, respect, and Figure 1 Trial flow for review. S16 Academic Medicine, Vol. 90, No. 11 / November Supplement 2015 Copyright © by the Association of American Medical Colleges. Unauthorized reproduction of this article is prohibited. Academic Medicine, Vol. 90, No. 11 / November Supplement 2015 Data Summary Organized by Constructs Citationsj (range) Articlesa Designb Methodc Instrumentd Perspectivee QNf QLg Mixedh Validityi Nationsk Integrity8–12 5 Cr (3) Lon (2) Survey (5) Int (1) Multi (1) All study specific 3 1 1 Hi (3) Mod (2) 170 (3–95) (range) 4 Commitment to excellence8,13–17 6 Cr (5) Lon (1) Survey (4) Obs (2) Reflect (1) Multi (1) All study specific Self (4) Invest (1) Peer (1) Multiple (1) Self (4) Faculty (1) SP (1) Peer (4) Multi (2) 5 1 0 Hi (2) Mod (4) 70 (3–21) 1 Compassion9,18–49 33 Cr (21) Lon (12) Survey (25) Int (2) Obs (7) Reflect (5) Multi (6) Study specific (12) PPOS (6) Self (25) Faculty (2) Patient (1) SP (3) Invest (8) Multi (6) 22 8 4 Hi (12) Mod (18) Low (3) 544 (0–104) 11 Altruism8,10,18,19,50–56 11 Cr (8) Lon (3) Survey (7) Int (2) Reflect (2) All study specific Self (7) Invest (3) Faculty (1) Peer (1) Multi (2) 7 2 2 Hi (4) Mod (6) Low (2) 207 (0–95) 5 Respect8,9,13,18–34,50,57–75 40 Cr (27) Lon (13) Survey (28) Obs (14) Reflect (5) Int (2) Multi (8) Study specific (18) PPOS (13) Doctor patient scale (3) Self (30) Invest (8) Faculty (2) Patient (1) Peer (1) Team (1) Multi (9) 27 8 5 Hi (17) Mod (18) Low (5) 1,003 (1–163) 12 Empathy20–26,35–44,51,57–62,76–147 96 Cr (66) Lon (30) Survey (86) Int (3) Obs (13) Reflect (6) Multi (14) JSE (49) Study specific (17) IRI (8) BEES (4) Self (87) Invest (9) SP (7) Faculty (6) Patient (1) Peer (2) Multi (17) 80 10 7 Hi (27) Mod (46) Low (23) 2,147 (0–165) 23 Review Paper Construct (Table continues) S17 Copyright © by the Association of American Medical Colleges. Unauthorized reproduction of this article is prohibited. Table 1 4 388 (7–104) Hi (3) Mod (7) The articles (n = 155) represented data from 34 countries, with 56% (n = 87) occurring in North America. Seventy percent (n = 109) of articles considered a single construct, with 18% (n = 28) of articles considering two and 12% (n = 18) three or more constructs. Study quality Articles column is the number of articles included in the review addressing the specified construct. Design identifies how many studies in the category were cross-sectional (Cr) and how many were longitudinal (Lon). c Method indicates how many articles used surveys (Survey), interviews (Int), observation (Obs), reflection (Reflect), and combinations of these (Multi). d Instrument identifies published assessment tools used multiple times in the category and the number of tools developed for a single study (study specific). e Perspective indicates who completed the assessment, the student being assessed (self), a faculty evaluator (faculty), a peer, a standardized patient (SP), clinical team member(s) (team), the article author (invest), patient, and a combination of these (multi). f QN indicates the number of articles reporting only quantitative assessment. g QL indicates the number of articles reporting only qualitative assessment. h Mixed indicates the number of articles reporting both quantitative and qualitative assessments. i Validity indicates the number of categories of validity threats: 0 (Hi), 1 (Mod), 2 (Low). j Citations provides the total number of citations for the set of articles and the range of citations per article. k Nations indicates the number of nations represented in the set of articles for the specified construct. a S18 b 5 3 Self (6) Invest (5) Multi (1) Study specific (8) Survey (5) Reflect (2) Int (3) Obs (1) Multi (1) Cr (7) Lon (3) 10 Identity formation9,29,30,45,54,77,159–162 Cr (18) Lon (5) 23 Humanism11,14,15,18,20,28,35,45,52,53,56,77,78,149–158 Cr (2) Lon (2) 4 Service 14,27,76,148 empathy were assessed frequently and often independently of other constructs. The most prevalent assessment in this literature was some form of the Jefferson Scale of Empathy, a self-report survey. 2 8 304 (0–38) Hi (10) Mod (13) Self (7) Faculty (1) Peer (1) Invest (6) Team (1) Multi (6) Study specific (20) Survey (8) Reflect (6) Int (2) Obs (2) Multiple (3) 13 10 1 1 59 (5–45) Mod (4) 0 0 Study specific (2) PVIPS MSATU Survey (4) 4 All self Instrumentd Methodc Designb Articlesa Construct (Continued) Table 1 Perspectivee QNf QLg Mixedh Validityi Citationsj (range) Nationsk Review Paper Construct underrepresentation was identified as a threat to validity for 50% (n = 101) of assessment reports; construct-irrelevant variance was found for 28% (n = 56) of measurements. Both threats to validity were seen in 12.3% (n = 25) of all assessments. No threats to validity were coded in 36.2% (n = 73), whereas 51.5% (n = 104) were coded as having one category of threat. Examples of construct underrepresentation include using single observations as the basis for scoring a student on a construct and an instrument containing questions regarding students’ beliefs about the importance of physician empathy as a measure of the construct of empathy. Examples of situations leading to a threat of construct-irrelevant variance include poor interrater agreement and poor response rates. Synthesis Some assessments addressed multiple constructs; for instance, a rating scale completed by an SP might address compassion and respect. Thus, some articles are reported related to more than one construct. Integrity. An assessment of student integrity appeared in 3% (n = 5) of the total set of 155 articles. All of the 5 articles assessing integrity also assessed at least one additional construct, and 2 of the articles assessed more than three constructs. The assessments reported for integrity in these articles were all designed specifically for the study reported in the article; none used assessment tools with published psychometric data. Excellence. Four of the six articles assessing students’ commitment to clinical excellence also contained Academic Medicine, Vol. 90, No. 11 / November Supplement 2015 Copyright © by the Association of American Medical Colleges. Unauthorized reproduction of this article is prohibited. Review Paper assessments of additional constructs. The articles all (n = 6) reported study-specific assessments. Compassion. Compassion was assessed in 22% (n = 34) of the included articles. Selfreport surveys in cross-sectional designs were the modal approach of these articles. Assessments of emotional intelligence constituted 20% (n = 7) of the assessments of compassion, 6 of which also measured empathy. The most frequently used instrument for assessing compassion was the Patient–Practitioner Orientation Scale,163 a measure of patient centeredness comprising two factors (caring and sharing), reported in 6 studies.26,28,30–32,34 Altruism. Altruism was assessed in 7% of the included articles (n = 11). This was accomplished primarily using self-report surveys developed specifically for the study (see Table 1 for details). Ten of the 11 articles included assessments of additional constructs associated with humanism. Respect. Respect was assessed in 26% (n = 40) of all included articles. Whereas 8 contained multiple assessment strategies, 17 used surveys. Most articles relied on self-report data. The most common instrument used for assessing respect was the Patient–Practitioner Orientation Scale.163 Most of the studies were crosssectional and used quantitative measures (see Table 1 for details). Empathy. Empathy was the most commonly assessed construct, included in 62% (n = 96) of the articles. The articles represent work conducted in 23 countries (n = 51 conducted in the United States). The approach most commonly used was self-report survey. Seventy-three articles used only surveys, and an additional 8 combined survey with another approach. Just over 50% (n = 49) of the articles used a form of the Jefferson Scale of Empathy164 including versions in eight languages in addition to English. The next most commonly used instrument to assess empathy was the Interpersonal Reactivity Index165 in 5% (n = 8) of articles. Assessment tools developed specifically for the study reported and having no published psychometric data were used in 11% (n = 17) of the articles assessing empathy. Service. Assessments of service were reported in 2.5% (n = 4) of the articles; three of the four articles included additional humanism constructs from IECARES. The fourth article148 used the Physician Values in Practice Scale comprising scales for prestige, service, autonomy, lifestyle, management, and scholarly pursuits. Thus, none of the articles assessed only the construct of service. All of this small set of articles contained data collected in the United States. Humanism. Humanism was assessed broadly in 14% (n = 22) of the articles included in this review. Less than half (n = 8) of these used survey alone, and 45% (n = 10) used qualitative or mixed methods for assessment. These studies were conducted in seven countries, with 16 of the 23 conducted in the United States. Identity formation. Identity formation as a humanistic physician was assessed in 6% (n = 10) of the articles. Half of the set used survey assessments, and half used qualitative methods. Four of the five surveys were developed specifically for the study reported in the article. For the entire set of articles, 67.7% (n = 105) used only surveys as assessment tools, and 13.5% (n = 21) used multiple assessments. Likewise, 64.5% (n = 100) used assessments based solely on the perspective of the students (i.e., self-report), and 13.5% (n = 21) included assessments from multiple perspectives. Articles analyzing only quantitative data constituted 74.2% (n = 115), whereas 17.4% (n = 27) used a qualitative approach. Mixed methods were identified in 8.4% (n = 13). Designs were predominantly cross-sectional (69% [n = 107]), with only 31% (n = 48) employing a longitudinal approach. Discussion Analysis at both the level of assessment and article revealed a predominance of surveys in this literature on the assessment of humanism in undergraduate medical education. Notable exceptions to this pattern were for the constructs of respect and humanism. Articles assessing these two constructs used analysis of reflective writing and a combination of multiple methods more than other constructs examined in this review. Extending across instrumentation was a preponderance of quantitative measurement and cross- Academic Medicine, Vol. 90, No. 11 / November Supplement 2015 sectional design. Although surveys are an efficient method for gathering information about large groups of students, they are vulnerable to the influences of social desirability bias. As we consider the implications for the assessment of humanism, it is necessary that we balance the use of surveys with other methods to check for congruence across methods to judge whether there may be bias resulting from social pressures to be humanistic. The work of Chen and colleagues135 comparing survey results with observed behavior found only a weak correlation. Assuming a long-term concern with student–patient interactions, medical educators must investigate whether the relationship between survey results and actual interaction is strong enough to warrant relying heavily on surveys as the current literature does. Some of the constructs embedded within humanism have very little representation in this literature; altruism and service have been reported relatively infrequently. It is not clear why empathy, compassion, and respect have been examined more frequently than other aspects of humanism. It may be a reflection of the availability of established assessment tools (e.g., the Jefferson Scale of Empathy). It is also possible that medical educators consider these attributes to be more amenable to educational intervention. The impact of this body of literature is reflected in geographic diversity as well as the large number of citations. It is a vibrant area of inquiry conducted in every region of the globe. Our need for human connection in health care transcends culture. This underscores the need for authors to be attentive to issues of validity. Although few articles were coded as having threats to validity in both categories (construct underrepresentation and construct-irrelevant variance), most assessments were identified as having at least one of these threats. Rarely did authors discuss validity unless the aim of the article was to provide psychometric data about an instrument. Limitations Our work represents an extensive review of the literature from a limited, albeit recent time period (January 2000–December 2013). We focused S19 Copyright © by the Association of American Medical Colleges. Unauthorized reproduction of this article is prohibited. Review Paper our analysis on the tools and types of assessment methods employed, and we did not analyze the validity of the methodology related to the specific study questions. We included articles published only in English which may have restricted the representation of some constructs in our review. The analysis of articles we undertook focused specifically on work that described the assessment of medical students, thereby potentially limiting some of the assessment methods used. Assessments of physician humanism from the perspective of actual patients are more often reported for physicians in practice. Finally, we limited our analysis to articles specifically describing humanism. Many of the papers that we excluded from our analysis addressed parallel or overlapping concepts including professionalism, ethics, or communication skills. By using the IECARES framework as an organizing principle for this review, we accepted the assumptions that the constructs are indeed part of humanism and that there is benefit to examining the constructs independently in addition to a more holistic approach to humanism assessment. Implications for practice and future work We recommend that medical educators employ a programmatic approach166 to the assessment of humanism. As described by van der Vleuten and colleagues,167 this approach strengthens the validity of assessment by including multiple methods and assessments from multiple perspectives. This will provide opportunities to more fully capture the complexity of humanism and the constructs embedded therein. Humanism might be likened to a fine symphony, best appreciated with a full complement of instruments, melodies, and percussion. So might humanism best be understood with information from diverse perspectives with a balanced view of the full complement of IECARES constructs. We encourage diversifying assessment by including multiple perspectives of assessment. Self-report can differ from the report of a third party168 and both provide valuable information to students and faculty. In addition, we recommend that increased attention be given to validity issues in publications. Authors must be expected to provide validity evidence beyond previous S20 publication of an instrument. The use of instruments developed for a specific investigation was a common finding in this literature along with the adaptation of instruments without providing any evidence of the validity in the context used. Validity applies to the interpretation of test scores in a particular context, not to an instrument. Even instruments with published psychometric properties cannot be assumed to be valid in a different context or if modified in some way.7,166 Another gap in this body of literature that may be shaping our understanding of humanism in medical education is the paucity of longitudinal studies. Many of the studies coded as longitudinal in this review were of relatively brief time spans. We can expect that students mature over time and are shaped by cumulative experiences. Longitudinal investigations can provide valuable insight into the nuances of the development or possible loss of humanistic characteristics. Fostering humanism among medical students is essential to the quality of health care and foundational for the reform of medical education.169 We must maintain high standards for investigating and assessing this important and complex phenomenon to ensure that educational practice supports humanistic clinical practice. Acknowledgments: The authors wish to thank Julie Trumble, MLIS, for her unfailing support with literature searching; Julia Buck, MS, for data manipulation and analysis; and the support staff in the Office of Educational Development for many forms of assistance provided throughout the project. Funding/Support: This project was made possible with a Mapping the Landscape, Journeying Together grant from the Arnold P. Gold Foundation Research Institute. Other disclosures: None reported. Ethical approval: Reported as not applicable. 