Academia.eduAcademia.edu

Validity of Student Self-Assessments

2000

This paper examines the validity of self-assessment as a tool for measuring student abilities, in particular, whether self-assessments can be used as valid substitutes for instructor evaluations. Data is obtained from in-class student self-assessments and subsequent course tests that measure the same abilities. Correlation between self-assessment ratings and test scores are used to comment on the validity of self-assessments. Some

Session 3530 Validity of Student Self-Assessments Sanjiv Sarin, Donald Headley North Carolina A&T State University Abstract This paper examines the validity of self-assessment as a tool for measuring student abilities, in particular, whether self-assessments can be used as valid substitutes for instructor evaluations. Data is obtained from in-class student self-assessments and subsequent course tests that measure the same abilities. Correlation between self-assessment ratings and test scores are used to comment on the validity of self-assessments. Some observations are also made regarding the extent to which students over or under report their abilities on self-reports. 1. Introduction Interest in valid methods for assessing student outcomes has grown in higher education. This is especially true in engineering education as a consequence of the new Engineering Criteria 2000 (EC 2000) requirements that became fully effective starting from the Fall 2001 visitation cycle. The new EC 2000 criteria represent a major shift in the philosophy behind accreditation of engineering programs. Instead of focusing on resources and inputs provided to an engineering program, the new accreditation criteria emphasize student learning, i.e., outcomes. The use of outcomes assessment data to guide the improvement of educational processes is a cornerstone of EC 20001. Given the limitations of achievement tests in fulfilling outcomes assessment requirements, the assessment community has recommended several alternative approaches for assessing student outcomes. These include portfolios, capstone design project evaluations, student, alumni and employer surveys, and placement data of graduates. Yet, written surveys administered to current students are the most frequently used assessment instruments, due in part to two reasons – one, they are relatively inexpensive to conduct, and, two, a high response rate is almost guaranteed. A natural question is whether these student self-assessments are valid substitutes for test questions created and scored by an instructor. This paper reports the results of a study undertaken to examine the validity of student selfassessment surveys as a tool for measuring the mastery of concepts. The term validity is defined as the correlation between self-assessment and test scores on the same measures. The paper reports results of data collected from three courses (two sophomore and one junior) taught over one academic year. The data represent over one hundred students in these courses. St udents Page 7.1293.1 Proceedings of the 2002 American Society for Engineering Education Annual Conference & Exposition Copyright Ó 2002, American Society for Engineering Education were asked to subjectively rate themselves on very specific abilities. The same students were then given a test that measured the same abilities. Correlations between students’ self-assessment and their performance on test questions are used to comment on the general validity of engineering student self-assessments. 2. Background Interest in student self-reports has been motivated by the recognition that achievement tests alone are not sufficiently viable for outcomes assessment. This is due to several reasons. First, it is questionable whether a test can adequately measure all outcomes of interest, especially those involving soft skills such as team work and life long learning. Second, there is concern whether a single nationally administered test (or a small number of tests) can be used by a variety of institutions given that each institution has a different set of priorities and educational missions. However, before students’ self-assessments can be used as proxies for tests and other direct assessments, it is important to establish their validity. Several large-scale studies in this area have concluded with mixed results. Ewell et al7, Pike 14 and Tsang 17 provided some degree of empirical support for student self reports as valid proxies for tests. However, the work of Astin 2 and Dumont and Troelstrup 6 has reported low correlations between self-reports and test scores. LeBold et al 11 utilized Purdue’s Mathematics and Science Inventory to demonstrate the validity of student self-perceptions regarding their mathematics and chemistry development as they progress through several courses. LeBold et al also noticed strong relationships between self-reports and final course grades, in that, students who received D’s or F’s did not report appreciable pre-to-post test gains on their self-reports whereas students who received A’s reported the maximum gains. Comparisons between self-assessments and peer-assessments of the same entities have been conducted in the context of team based student work evaluation. Falchikov8 studied self and peer assessments among psychology students and found high levels of agreement between peer scores with no under-scoring or over-scoring by self-scorers. A similar result was reported by Stefani16 who compared self (and peer) assessment scores and tutor’s scores in the biological sciences and found that student assessment (both self and peer) was as reliable as tutor assessment. Further, there was no evidence that higher achievers were under-scored or that lower achievers were overscored. In contrast, in their study of business management students, Saavedra and Kwan 15 found that high performers were more discriminating than low performing students and were better able to assess performance. As part of the same study, the authors also concluded that below average students rated themselves significantly higher than the peer ratings they received. As part of a larger study aimed at identifying individual effort in teamwork, Kaufman, Felder and Fuller 9 concluded that individual self-ratings were statistically indifferent from ratings received from peers. The authors also reported that inflated self-ratings were less common than deflated selfratings. Kruger and Dunning 10 studied the relationship between student confidence and their ability and concluded that those who are incompetent often appear to be more confident about Page 7.1293.2 Proceedings of the 2002 American Society for Engineering Education Annual Conference & Exposition Copyright Ó 2002, American Society for Engineering Education their abilities than students who are competent. On the other hand, students who were competent often underestimated their abilities. For a review of literature relating to student-based assessment, the reader is referred to the paper by Maskell 12. It bears mentioning here that self-assessments are also conducted in employment settings to comply with a variety of company regulations. For instance, many organizations require employees to perform self-appraisals prior to a formal annual evaluation by a supervisor, and to report employee conformance to substance use policies. Reported work in this area also sheds light on the degree to which self-reports can be used as substitutes for direct measures. For example, Cook et al5 studied 1,200 employees of a steel manufacturing company to assess illicit drug use through three methods – self-reports, urinalysis and hair analysis. Results of this study indicated that actual prevalence of drug use (as confirmed by urine and hair samples) was approximately 50% higher than estimates given in self-reports. In addition to the validity of self-reports, there is also a larger and more useful context in which self-assessments are being studied. This involves the impact of self-assessment on learning itself. Recent research summarized by Black and William3 demonstrates that student self-assessment skills enhance student achievement and learning. Realizing that self-assessment skills are necessary for enhancing learning, some instructors (see for example, Brackin and Gibson4) require students to demonstrate these skills as part of their course requirements. The purpose of this paper is to extend past work in studying the correlation between self-reports and test scores. The paper reports the results of a yearlong experiment to study this correlation in an engineering program. 3. Approach The subjects for this experiment were students enrolled in three courses offered by the Industrial Engineering department at North Carolina A&T State University during the Fall 1999 and Spring 2000 semesters. All three courses were taught by one of the authors. Catalog descriptions of these courses and prerequisite information are available from the authors upon request. Enrollment values and other pertinent details about these courses are presented in Table 1. Table 1: Course Information Course Number INEN 270 INEN 330 INEN 260 Course Title Course Level Semester Engineering Statistics Operations Research Engineering Economy Sophomore Junior Sophomore Fall 1999 Fall 1999 Spring 2000 Number of students 32 35 39 The data collection consisted of a written survey followed by a test in each of the courses. The following procedure was used. In the last week of the semester, a written survey was Page 7.1293.3 Proceedings of the 2002 American Society for Engineering Education Annual Conference & Exposition Copyright Ó 2002, American Society for Engineering Education administered to allow students to self-assess their proficiency in material learned in each course. This was immediately followed by a test that carefully measures the same proficiencies that were addressed in the written surveys. Each survey and test consisted of five items. The survey responses carried no points towards course grade, but the tests were worth 15% of the course grade. Copies of the written survey and corresponding test for each course are not presented here in the interest of space, but are available from the primary author. In addition to student responses on the survey and test questions, their overall grade on the course was also recorded. Both the surveys and the tests were encoded to promote anonymity of responses as well as to allow the computation of correlation between responses on corresponding responses by individual students. Students were asked to rate their ability in a very specific area on a scale of 1 (strongly disagree that I can do this) to 5 (strongly agree that I can do this). Each question on the survey corresponds to a specific question on the test. The test questions were graded on a scale from 0 (incorrect or no answer) to 5 (complete and accurate answer). This was appropriate since the test questions required students to show assumptions, steps and the final answer. Therefore, an answer to a question can be “more correct” or “more wrong.” The correlation between survey ratings and test scores can then be used to comment on the validity of the self-report surveys as surrogates for the tests. As an example, in INEN 270, one of the questions on the survey was as follows: Given a probability distribution function for a discrete random variable, I can compute its expected value. Students were asked to select one of the following five choices in response to this question: Strongly Disagree 1 Disagree Neutral Agree 2 3 4 Strongly Agree 5 The corresponding question on the test is as follows: The number of orders per day for a component used in an assembly plant is a random variable with the following probability distribution: No. of Orders 100 Probability 0.156 110 0.200 120 0.250 130 0.196 140 0.143 150 0.055 What is the expected (average) number of orders in a day? Page 7.1293.4 Proceedings of the 2002 American Society for Engineering Education Annual Conference & Exposition Copyright Ó 2002, American Society for Engineering Education Clearly, the question posed on the test was designed to measure the same ability referred to in the self-assessment survey. Furthermore, the test is administered immediately after the students have completed the survey. This was done to avoid bias due to time lag and any associated changes in student abilities over time. 4. Data Analysis And Discussion The data was analyzed with two primary objectives. One, to explore whether student selfassessments are significantly correlated with achievement test scores and if so, whether selfassessments can serve as useful proxies for test scores. Second, to verify whether high performing engineering students demonstrate an ability to perform more accurate self-assessments as compared to low performing engineering students. To assess relationships between the ordinal level self-confidence and test scores, the nonparametric Spearman’s correlation coefficient test was used. Correlations between student confidence reported on the survey and test scores for each course as well as for the aggregate over all courses are reported in Table 2. Each coefficient is based on data pairs of the five survey and corresponding test items. For example, for the INEN 270 sample, the number of pairs is 160 (5 times 32 students). The hypothesis tested is a two-tailed test (i.e., the alternative hypothesis is that there is a relationship between self-confidence and test scores) with an assumed Type I error (a) = 0.05. Therefore, the results are deemed significant if the p-value exceeds a/2 or 0.025. The formal statement of the hypothesis is as follows: H0: Spearman’s r = 0 H1: Spearman’s r ? 0 Table 2: Correlation Coefficients (Spearman’s) of Self Confidence and Test Scores Sample Observed Conclusion Value INEN 270 0.258 Significant INEN 330 0.221 Significant INEN 260 0.125 Not significant All courses 0.184 Significant Although the correlations between self-assessments and test scores are generally significant, their absolute values are low and suggest a low correlation between the two variables. If the purpose of student self-assessments is formative in nature, their use can be justified. However, the use of self-assessments to satisfy summative evaluation requirements is questionable. To distinguish between the self-assessment abilities of high performing versus low performing students, the data was sorted to separate responses of high performing (those receiving course grades of A or B) and those of low performing (those receiving course grades of C or lower). Each student’s self confidence rating (CR) and corresponding test score (TS) was transformed Page 7.1293.5 Proceedings of the 2002 American Society for Engineering Education Annual Conference & Exposition Copyright Ó 2002, American Society for Engineering Education into a deviation D = CR – TS. With this transformation, a deviation of zero represents perfect ability to perform self-assessment (assuming that the instructor’s grade has no error). A value of D > 0 implies over-confidence whereas D < 0 implies under-confidence. In order to test for sample differences in the deviation statistic D, the D values were analyzed using a two -tailed Mann-Whitney statistical test. The obtained standardized z statistic’s observed value was 6.1, and indicates significantly different unequal populations. The average value of D for high performers is 0.1 and the same statistic for low performers is 1.2. Thus, on average, low performers appear to exhibit a higher degree of over-confidence, whereas high performing students are more discriminating. These results provide confirmatory support for the previously reported finding that competent students can more accurately provide self-assessments. 5. Summary Students in three courses were administered a written self-assessment survey followed by an instructor-graded test, both intended to measure the same set of abilities. The study finds statistically significant correlations between student self-assessments and test scores. However, the strength of the correlation is weak. Additional investigation of the data reveals that high performing students possess higher ability to perform self-assessments as compared to low performing students. Based on the results of this study and past work done in this area, it can be concluded that selfassessments can be used as surrogates for achievement tests in formative assessment. It can also be argued that self-assessments performed by high performing students be used instead of aggregated self-assessments of all students. 6. Acknowledgment The research leading to this paper was supported in part by a grant from SUCCEED, an NSF coalition that aims to revitalize undergraduate engineering education. This support is gratefully acknowledged. The first author would also like to thank his students in INEN 260, 270 and 330 for voluntarily participating in this study. References 1. ABET, Criteria For Accrediting Programs In Engineering In The United States, Accreditation Board for Engineering and Technology, Inc., 111 Market Place, Suite 1050, Baltimore, Maryland 21202-4012. http://www.abet.ba.md.us/EAC/eac2000.html 2. Astin, A.W., What Matters in College: Four Critical Years Revisited. Jossey-Bass Publishers, San Francisco, CA, 1993. 3. Black, P. and William D., “Inside the Black Box: Raising Standards through Classroom Assessment,” Phi Delta Kappan, 80(2), 139-148, 1998. 4. Brackin, P.M. and Gibson, J.D., “Techniques for Assessing Industrial Projects in Engineering Design Courses”, Proceedings of the Annual Conference of the ASEE, Albuquerque, NM, June 24-27, 2001. Page 7.1293.6 Proceedings of the 2002 American Society for Engineering Education Annual Conference & Exposition Copyright Ó 2002, American Society for Engineering Education 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. Cook, R.F., Bernstein, A. Arrington, T.A., and Marshall, G., “Methods for Assessing Drug Use Prevalence in the Workplace: A Comparison of Self-Reports, Urinalysis and Hair Analysis,” The International Journal of Addictions, 30(4), 403-426, 1995. Dumont, R.G. and Troelstrup, R.L., “Exploring Relationships between objective and subjective measures of instructional outcomes,” Research in Higher Education, 12, 37-51, 1980. Ewell, P.T., Lovell, C.D., Dressler, P. and Jones, D.P., A Preliminary Study of the Feasibility and Utility for National Policy on Instructional Good Practice Indicators in Undergraduate Education. US Government Printing Office, Washington, D.C., NCES 94-437, 1994. Falchikov, N., “Group Process Analysis: Self and Peer Assessment of Working Together as a Group,” Educational and Training Technology International, Vol. ETTI 30.3, 275-284, 1991. Kaufman, D.F., Felder, R.M. and Fuller, H., “Accounting for Individual Effort in Cooperative Learning Teams,” Journal of Engineering Education, 89(2), 133-140, 2000. Kruger, J., & Dunning, D., “Unskilled and unaware of it: How difficulties in recognizing one's own incompetence lead to inflated self-assessments,” Journal of Personality and Social Psychology, 77, 1121-1134, 1999. LeBold, W.K., Budny, D.D. and Ward, S.K., “How Do Students Grade Their Learning?” Proceedings of the Frontiers in Education Conference, Atlanta, GA, 1995. Maskell, D., “Student-based Assessment in a Multi-disciplinary Problem-based Learning Environment,” Journal of Engineering Education, 88(2), 237-241, 1999. Mehta, S. and Danielson, S., “Self-Assessment by Students: An Effective, Valid, and Simple Tool?” Proceedings of the Annual Conference of the ASEE, St. Louis, MO, June 18-21, 2000. Pike, G.R., “The Relationship between Self-Reports of College Experiences and Achievement Test Scores,” Research in Higher Education 36, 1-22, 1995. Saavedra, R. and Kwan, S.K., “Peer Evaluation in Self-Managing Work Groups,” Journal of Applied Psychology. 78(3), 450-462, 1993. Stefani, L.A.J., “Peer, Self and Tutor Assessment: Relative Reliabilities,” Studies in Higher Education. 19(1), 6975, 1994. Tsang, E., “Assessing Student Learning For a Materials, Manufacturing and Design Lab,” Proceedings of the Annual Conference of the ASEE, Seattle, WA, June 28-July 1, 1998. Author Biographies SANJIV SARIN Sanjiv Sarin is a Professor of Industrial Engineering and Associate Dean for the College of Engineering at North Carolina A&T State University. He received a bachelor’s degree in Chemical Engineering from the Indian Institute of Technology, New Delhi and a Ph.D. in Industrial Engineering from the State University of New York, Buffalo. He is a member of ASEE and IIE, and a registered Professional Engineer in North Carolina. DONALD B. HEADLEY Donald Headley is a Visiting Professor at North Carolina A&T State University. He is a Human Factors Research Psychologist with the Army Research Laboratory – Human Research and Engineering Directorate, Aberdeen Proving Ground, Maryland. He received a B.S. in Research Psychology from the University of Massachusetts, Amherst, and a M.S. and Ph.D. in Research Psychology from Oklahoma State University. He has completed postdoctoral work in statistics, program evaluation, and research administration. Page 7.1293.7 Proceedings of the 2002 American Society for Engineering Education Annual Conference & Exposition Copyright Ó 2002, American Society for Engineering Education