2000 Butzlaff PDF
2000 Butzlaff PDF
2000 Butzlaff PDF
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
University of Illinois Press is collaborating with JSTOR to digitize, preserve and extend access to Journal of
Aesthetic Education.
http://www.jstor.org
RON BUTZLAFF
JournalofAestheticEducation,
Vol. 34, Nos. 3-4, Fall/Winter2000
@2000Boardof Trusteesof the Universityof Illinois
Literature Search
The REAP researchers searched seven electronic data bases from their in-
ception through 1998: Arts and Humanities Index (1988-1998), Dissertation
Abstracts International (1950-1998), Educational Resource Information Clear-
inghouse (1950-1998), Language Linguistics Behavioral Abstracts (1973-1998),
MedLine (1966-1998), PsychLit/PsychINFO (1984-1998), and Social Science
Index (1988-1998). The search term music was combined with the following
search strings: (instruct or train) and (educate or learn or cognition or achieve
or intelligence or IQ) and (measure or outcome or effect or evaluation) and
(read). In addition, they conducted hand-searches of 41 journals from 1950
to 1998 (listed in Table 1 of the introductory paper in this issue) that publish
articles in education, development, and the arts. They checked the bibliog-
raphies of all identified articles and sent requests to over 200 arts education
researchers for unpublished data or manuscripts not yet published (for
which they received a modest rate of return). This search produced a total
of ninety-four articles or books that were reviewed for inclusion in this
analysis.
I retained for meta-analysis only studies that met the following three cri-
teria: a standardized measure of reading ability was used as the dependent
variable; a test of reading followed music "instruction;"5 statistical informa-
tion was sufficient to allow for the calculation of an effect size. Studies that
randomly assigned children to music vs. control conditions, and that as-
sessed reading ability before and after exposure to music, were classified as
experimental; those that did not randomly assign children to conditions
and that had no pretest of reading ability were classified as correlational.
Six experimental and twenty-five correlational studies were identified and
submitted to separate meta-analyses.6 I first discuss the findings from the
correlational studies to determine whether some kind of association be-
tween music and reading exists. As will be shown below, the analysis al-
lowed the conclusion that there is indeed such an association. I then turn to
the experimental studies to determine whether the relationship between
music and reading can be said to be a causal one in which music instruc-
tion/exposure leads to enhanced reading ability. As will be shown, no such
causal conclusion can be drawn.
Correlational Studies
Table 1 lists the 24 correlational studies included, along with publication
date, sample size, effect size, type of music experience the participants re-
ceived, and the reading outcome measure used.7 In all of these studies,
reading performance by students with some music experience was com-
pared to reading performance by students without music experience. Ten
of the studies consist of data provided by the College Board comparing
College Board 648,144 .16 125.76 At least one high school Verbal
(1988) (p<.0001) course in instrumental Scholastic
or vocal music. Assessment Test
College Board 587,331 .16 125.98 At least one high school Verbal
(1989) (p<.0001) course in instrumental Scholastic
or vocal music. Assessment Test
College Board 549,849 .17 127.07 At least one high school Verbal
(1990) (p<.0001) course in instrumental Scholastic
or vocal music. Assessment Test
College Board 551,253 .18 136.28 At least one high school Verbal
(1991) (p<.0001) course in instrumental Scholastic
or vocal music. Assessment Test
College Board 545,746 .19 138.42 At least one high school Verbal
(1992) (p<.0001) course in instrumental Scholastic
or vocal music. Assessment Test
College Board 546,812 .21 151.96 At least one high school Verbal
(1994) (p<.0001) course in instrumental Scholastic
or vocal music. Assessment Test
College Board 561,125 .21 159.29 At least one high school Verbal
(1995) (p<.0001) course in instrumental Scholastic
or vocal music. Assessment Test
College Board 568,072 .22 164.75 At least one high school Verbal
(1996) (p<.0001) course in instrumental Scholastic
or vocal music. Assessment Test
College Board 581,642 .22 167.50 At least one high school Verbal
(1997) (p<.0001) course in instrumental Scholastic
or vocal music. Assessment Test
College Board 592,308 .22 167.98 At least one high school Verbal
(1998) (p<.0001) course in instrumental Scholastic
or vocal music. Assessment Test
Stem Leaf
+.6 5
+.5
+.4 4
+.3 7
+.2 1,1, 2,2,2, 6
+.1 0,4, 6, 6, 6, 7, 8, 8,9
+.0 2
-.0 2,5,6,8
-.1 9
Z-test to the just barely significant level of p = .05. The file drawer calcula-
tion indicates that 805,587 studies averaging null results would need to be
found to render the Z-test barely significant (p = .05).
This analysis demonstrates that there is indeed a strong and reliable as-
sociation between the study of music and performance on standardized
reading/verbal tests. However, correlational studies cannot explain what
underlies this association. For example, it is possible that students who are
already strong in reading choose to study music; it is possible that students
who are interested in music are also interested in reading because they
come from families which value both music and reading; or it is possible
that a causal relationship exists, such that either music instruction transfers
to reading achievement or the reverse. For a test of the directional and
causal hypothesis that instruction in music leads to heightened achievement
in reading, an examination of experimental studies is required.
Experimental Studies
Table 3 lists the experimental studies included, along with publication date,
sample size, effect size, type of treatment condition, and dependent variable
measure. Effect sizes are shown in a stem and leaf display in Table 4.13 The
mean effect size r for the experimental studies was r = .18, with a weighted
effect size of r = .11 (weighted by sample size). The Stouffer's Z-test for
combined probabilities proved significant, Z = 2.38, p = .009. The fact that
this test was significant might lead us to believe that we should reject the
null hypothesis that there is no relationship between music and reading.
However, there are several reasons why we cannot reject this null hypothesis,
as explained below.
First, let us examine the confidence intervals around the mean r. At the
95% level, the confidence interval is r = -.21 to +.52 as the upper limit; at the
99% level, r = -.41 to +.67 as the upper limit. Both of these intervals span
zero. In addition, only three of the six studies have positive effect sizes. The
'robustness' value is .48, suggesting that there is considerable variability
around this small effect size, and this in turn means that the mean effect
size, while significant statistically, is not robust. The wide range of effect
Notes:
reported as "no significant difference," entered as r = .00, p = .50
a
Table 4: Stem and Leaf Display of 6 Effect Size rs from Experimental Studies
Stem Leaf
+.7
+.6 4
+.5 7
+.4
+.3
+.2
+.1
+.0 0,0.6
-.0
-.1
-.2
-.3 4
sizes also demonstrates this large variability, with the smallest r = -.34 and
the largest r = +.64, a median effect size of r = .03, and a standard deviation
of the effect size r's of SD = .38. The chi-square test of the heterogeneity of
the effect sizes was also significant, X2= 17.94, df = 5, p = .003, indicating that
the effect sizes from the studies in this sample are not normally distributed,
and that there is significant heterogeneity in this sample of effect sizes.
Further evidence for the nonrobustness of the mean effect size comes
from the result of the t-test of the mean Zr. For this sample, the t-test of the
mean Zr is t = 1.06, df = 5, p = .34, which is evidence that the mean effect size
r is not significantly different from zero.
A file drawer analysis further supports this interpretation. I found that
only seven studies squirreled away in researchers' file drawers and averag-
ing null results would be needed to render the results barely significant.
Given that unpublished studies in researchers' file drawers are more likely
to have nonsignificant results than are published studies, it is not improbable
that seven such unpublished, nonsignificant studies actually exist.
In order to try to account for the heterogeneity of effect sizes found, I
performed a linear contrast analysis to examine the hypothesis that the
magnitude of effect sizes increases over time. The reasoning here is that in
more recent studies, experimenters often explicitly set out to show that mu-
sic had a positive impact on students' academic performance, whereas in
earlier years the researchers were merely trying to demonstrate that allow-
ing students to attend "pull-out" music programs in place of regular class
time would not decrease academic performance. Two different experi-
menter expectancies are suggested by these differing hypotheses, and I
wondered if the effect sizes varied in the same direction as these two ex-
pectancies. The contrast r for publication year was r = .81, and this was
significant, Z = 3.45, p = .0003. The magnitude and direction of effect sizes
significantly changed with publication year, from negative to positive.
Discussion
These two meta-analyses present two very different pictures. The meta-
analysis of the correlational studies shows that students studying music do
in fact have significantly higher scores on standardized reading tests (or on
the verbal portion of the Scholastic Assessment Test). The mean effect size
found, though small, was more robust than that found for the experimental
studies, and a considerable number of file drawer studies would be needed
to alter this finding. Of course, however, correlational studies allow no con-
clusion about causality. While the correlational results are consistent with
the interpretation that music study enhances reading ability, the results are
equally consistent with other noncausal interpretations. For instance, stu-
dents who score well on reading tests may for some reason choose to pur-
sue music; they may be better equipped for some reason to learn music; or
they may read more and their reading experience may enhance their musi-
cal interests. Neither the existence nor the direction of causality (if there is
causality) can be established in these studies.
The experimental studies, which are designed to test the hypothesis that
music study enhances (or causes) reading improvement, yielded no reliable
effect. A very small number of file drawer studies could overturn the sig-
nificance of the result. In addition, there is considerable variation in the ef-
fect sizes, indicating that the overall finding is not stable. The correlational
studies show this same heterogeneity of effect sizes. In both populations,
there are probably other effects not accounted for by the researchers that
account for this heterogeneity.
The effect sizes varied widely in the six experimental articles. A brief dis-
cussion of the individual articles may help explain some of this variation.
The study by Douglas and Willats yielded a significant and large effect size
r = .64, but this study also found a significant and large interaction (F(1,10)
= 7, p = .02, r = .64). This interaction occurred because the control group
scores were lowered at the same magnitude as the music group scores were
elevated. The authors do not describe who taught the music group, but it is
likely that the study was not a double-blind experiment. The researchers
probably knew who was in each group and had contact with both the treat-
ment and control groups. Thus, experimenter expectancy effects cannot be
ruled out. In addition, the researchers chose subjects with an eye for those
students who "might benefit from extra help."14 Other research has shown
that low-achieving students are often the ones who most benefit from
teacher expectancy effects.15
A similar explanation could account for the large effect found in Fetzer's
study. Here the music group was taught music by the researcher, while the
control group was instructed by another music teacher. Thus, experimenter
expectancy effects could have helped the music group.16 In addition, there
was a fairly large subject loss from the control group (eight out of twenty, or
40%) while in the treatment group, only one subject out of nineteen, or 5%,
dropped out. This suggests that something different was occurring in the
two groups. Finally, the music group was given more attention than the
control group, having been videotaped as a group on three occasions. This
extra attention alone is a confounded variable and could possibly account
for the positive effect size found in this study.
The study by Roskam had a negative effect size, based upon the scores
for music and reading comprehension. However, two other scores were
also reported, one for spelling and another for word recognition. These ef-
fect sizes were positive (r = .34 and r = .37, respectively), and one could ar-
gue that the average of the three effect sizes (r = .12) should have been used
in the meta-analytic calculations. However, this average would actually
have rendered the results of the meta-analysis less stable and less reliable:
only one study would have been needed to bring the Stouffer's Z to barely
significant, and the confidence intervals would still clearly include zero
(-.05 to +.51 at the 95% level). Because spelling and word recognition are not
equivalent to actual reading comprehension, I felt it was best to include
only the effect size from reading comprehension. It should also be noted
that the author reported that the music group demonstrated much more
"severe behavioral difficulties"17than did the control group. Thus, although
the author states that a larger sample might have yielded a more positive
result, this is not likely.
These meta-analyses of studies assessing the relationship between music
or music education and reading test scores show that the somewhat signifi-
cant relationship between these two variables in the experimental studies is
neither large, robust, nor reliable. However, only six relevant experimental
studies were found, a very small number. In addition, two were assigned
an effect size of zero because the author reported no significant difference
between groups but did not report statistics that made possible the compu-
tation of an exact effect size. This is a conservative solution to this problem,
and it is possible that these two studies had positive, though small effect
sizes. Finally, the wide variability in effect sizes suggests that further re-
search is needed: as shown in Table 3, two studies were associated with
large positive effects; three with minimal or no effects; and one with a nega-
tive effect. The fact that two experimental studies did produce large effect
sizes suggests that further exploration of this question is merited.
The contrast on publication year was performed to examine possible ex-
pectancy effects in the data. Researchers carrying out the more recent stud-
ies are likely to be expecting a positive relationship between music and
reading, since this kind of relationship has been touted more and more by
arts advocates as a justification for music programs in schools. I found that
the effect sizes increased from generally negative or negligible sizes to
larger, positive magnitudes as the publication years go by. This suggestion
of expectancy effects calls for a more stringent research methodology by the
experimental researchers before it can be adequately addressed.
It is worth noting that there exists a large body of studies that address a
different but related question to the ones addressed here. Many researchers
have examined whether music interfereswith or enhances academic perfor-
mance. In these studies, students hear music at the same time as they read.
The effects of music listening on reading have not been adequately sum-
marized. A meta-analysis of these studies would help to clarify another
possible relationship between music and reading.
NOTES
1. For these suggestions,see L. L. Kelly, "A CombinedExperimentaland Descrip-
tive Study of the Effectof Music on Readingand Language"(Ph.D. diss., Uni-
versity of Pennsylvania,1981), and D. L. Roberts,"An ExperimentalStudy of
the Relationship between Musical Note-Reading And Language Reading"
(Ph.D. diss., University of Missouri, 1978).
2. Researchers have argued that a structured program in music may help children
develop a "multi-sensory awareness and response to sounds," p. 99, from S.
Douglas and P. Willats, "The Relationship between Musical Ability and Literacy
Works Cited
Cohen ,J., Statistical Power Analysis for the BehavioralSciences (New York: Academic
Press, 1977).
College Bound Seniors Profile of SAT and Achievement Test Takers.College Board, 1987-
1997.
*Douglas, S., and P. Willats, "The Relationship between Musical Ability and Literary
Skills." Journalof Researchin Reading 17, no. 2 (1994): 99-107.
*Engdahl, Pherbia M., "The Effect of Pull-out Programs on the Academic Achieve-
ment of Sixth-Grade Students in South Bend, Indiana" (Doctoral diss., Andrew
University, 1994).
Fetzer, L., "Facilitating Print Awareness and Literacy Development with Familiar
Children's Songs" (Doctoral diss., East Texas University, 1994).
Friedman, Bernard, "An Evaluation of the Achievement in Reading and Arithmetic
of Pupils in Elementary School Instrumental Music Classes" (Doctoral diss., New
York University, 1959).
*Groff, F. H., "Effect on Academic Achievement of Excusing Elementary School Pu-
pils from Classes to Study Instrumental Music" (Doctoral diss., University of
Connecticut, 1963).
Hurwitz, Irving, Peter Wolff, Barrie Bortnick, and Klara Kokas, "Nonmusical Effects
of the Kodily Music Curriculum in Primary Grade Children." Journalof Learning
Disabilities 8, no. 3 (1975):167-74.
*Kelly, L. L., "A Combined Experimental and Descriptive Study of the Effect of
Music on Reading and Language" (Doctoral diss., University of Pennsylvania,
1981).
*Kvet, Edward J., "Excusing Elementary School Students from Regular Classroom
Activities for the Study of Instrumental Music: The Effect on Sixth-Grade Reading,
Language, and Mathematics Achievement." Journalof Researchin Music Education
32 (1985): 45-54.
*Lamar, H. B., "An Examination of the Congruency of Music Aptitude Scores and
Mathematics and Reading Achievement Scores of Elementary Children" (Doctoral
diss., University of Southern Mississipi, 1989).
Madon, S., L. Jussim, and J. Eccles, "In Search of the Powerful Self-fulfilling Prophecy."
Journalof Personalityand Social Psychology 72, no. 4 (1997): 791-809.
*McCarthy, Kevin J., "Music Performance Group Membership and Academic Suc-
cess: A Descriptive Study of One 4-year High School" (Paper presented at the
Colorado Music Educators Association, 1992).
*Olanoff, M. and Kirschner, L. C., "Musical Ability Utilization Project," U.S. Depart-
ment of Health, Education and Welfare, Final Report, Project No. 2600 (1969).
*Roberts, D. L., "An Experimental Study of the Relationship between Musical Note-
Reading and Language Reading" (Doctoral diss., University of Missouri, 1978).
Rosenthal, Robert, Meta-Analytic Proceduresfor Social ScienceResearch(Newbury Park,
Calif.: Sage Publications, 1984).
Rosenthal, R., and L. Jacobson, Pymalionin the Classroom:TeacherExpectationsand Pupils'
Intellectual Development(New York: Holt, Rinehart and Winston, 1968).
Rosenthal, Robert, and Ralph L. Rosnow, Essentials of BehavioralResearch:Methodsand
Data Analysis (New York: McGraw-Hill, 1991).
Roskam, K., "Music Therapy as an Aid for Increasing Auditory Awareness and
Improving Reading Skill." Journalof Music Therapy16, no. 1 (1979): 31-42.
*Weeden, Robert E., "A Comparison of the Academic Achievement in Reading and
Mathematics of Negro Children Whose Parents Are Interested, Not Interested, or
Involved in a Program of Suzuki Violin" (Doctoral diss., North Texas State Uni-
versity, 1971).