Psychology AS LEVEL NOTES

Download as pdf or txt
Download as pdf or txt
You are on page 1of 106

RESEARCH METHODS

EXPERIMENTS
Is an objective, scientific procedure used to make a discovery, run and test a hypothesis and to
present a known fact - to check validity.
An investigation that is conducted to establish a primary cause and effect relationship is called
an experiment.

By finding out the cause and effect relationship, the experiment may be designed to manipulate,
isolate and maneuver certain variables to achieve a desired aim of the study. These variables
are known as the IV and DV.

IV - Has a causal effect on the DV - meaning the IV is the variable that researchers
systematically use, manipulate and control and the DV is the variable which bears the effect and
thus is measured by the experimenter.

Confounding Variable - Has an unintentional and undetermined effect on the DV (talent /


interest / ability / personality of participant).
Extraneous Variable - Could affect the DV but is it a variable that can be controlled by the
experimenter (diet / sleep routine).

Demand characteristics - features of the experimental situation which give away the aims
causing participants to change their behaviour. Reduces validity.
Random Allocation - A way to reduce the effect of confounding variables by such as individual
differences. Participants are put in each level of the IV such that each person has an equal
chance of being in any condition.
Participant Variables - Individual differences between participants that could affect their
behaviour in a study. They could hide or exaggerate differences.

Order Effects: Practice and Fatigue effects are the consequences of participating in a study
more than once. They cause changes in performance between conditions that are not due to
the IV.
Practice Effect - A situation where participants’ performance improves because they
experience the experimental task more than once and gain familiarity.
Fatigue Effect - A situation where participants’ performance declines because they have
experienced an experimental task more than once. Ex; due to boredom or tiredness.

counterbalancing - used to overcome order effects in a repeated measures design. ABBA


design.
Standardization - keeping the procedure for every participant exactly the same to ensure that
any differences between participants or conditions are due to the variables under the
investigation.
Reliability - extent to which a procedure would produce the same results with the same people
on each occasion. The consistency of a research.
Validity - the extent to which the researcher is testing what they claim to be testing.
Generalise - apply the findings of a study to a wider setting / population.
Ecological validity - extent to which the findings of research in one situation would generalize
to other situations. Influenced by whether the situation represents the real world effectively and
has mundane realism.

EXPERIMENTAL DESIGNS
How participants in the study are assigned to different settings, environments and scenarios in
an experiment.

Repeated Measures Design:


Uses the same group of participants in different conditions and scenarios repeatedly.
Strengths:
Less chances of participant variables, as each participant experiences all levels of the IV.
Therefore, it is less likely to misinterpret the effects of the IV on the DV.
Fewer people needed to conduct the experiment, hence it is quicker and may warrant faster
results in a study with less logistical issues.
Weaknesses:
Order effects are likely to follow (the effects of an experimental order design which may distort
results and reduce validity)
Participants may exhibit demand characteristics as they are familiarized with the objective of the
study.

Independent Groups Design:


Taking the group of participants, randomly dividing them into different factions and then making
them go through conditions so that each participant may only be in one condition and thus have
limited exposure to the aim of the study.
Involves using 2 separate groups of participants; one in each condition.
Strengths:
Different participants are used in each level of the experiment (IV) so less order effects are to be
expected.
Less demand characteristics as they do not experience or witness all levels of the IV
Difference in results will be detected quickly.
Weaknesses:
The results may be altered based upon the factor of participant variable, given that there may
be significant individual differences on each respective level of the IV.
More participants are needed to use this experimental design and thus may be more expensive
and time-consuming.

Matched Pair Design:


Categorizes participants based on similar characteristics such as age, gender, ethnicity, IQ, etc.
into pairs and then randomly assigning them to the different conditions. One member of each
pair must be randomly assigned to the experimental group and the other to the control group.
Each condition uses different participants but they are matched in terms of similar variables.
Strengths:
Participants are exposed to only 1 level of the IV, hence there are reduced demand
characteristics.
Less participant variables because experimenter has attempted to pair up the participants so
that each condition has people with similar abilities and characteristics.
Reduced order effects.
Weaknesses:
It is extremely risky in the sense that the loss of 1 participant will warrant the loss of 2
participant’s data.
Very time consuming
Has the chance to distort results unless a reliable and validated matching criteria is established
as having the same similarities is very difficult and rare.
Small sample size and thus not generalizable for the larger context.

TYPES OF EXPERIMENTS
Laboratory
Experimental procedures are mainly conducted in artificial controlled settings. Controls are
applied to administer and operationalize effective measurements and variables. Participants are
tested under strict conditions, set up by the examiner.
Strengths:
High controls - More controls on extraneous variables from affecting the DV. Standardized
procedures - more reliable.
When variables are controlled and monitored, we can find out the cause and effect relationship
faster and easier.
This improves validity (accuracy and authenticity of research)
Weaknesses:
High controls means low ecological validity.
Participants might get an idea of the aim and the setting might make the results prone to
demand characteristics.
Low ecological validity also limits generalizability of the outcomes to a real-life context.
This makes the findings prone to researcher bias (a confounding variable).
Poor external validity

Field Experiments
Take place in a natural environment (real-life setting) for the behaviour being studied. Influence
of extraneous variables cannot be as strictly controlled. But the researcher manipulates
something (IV) to see the effect of this on the DV.
Strengths:
High ecological validity as it is more reflective of the participant’s behaviour in real life situations.
Participants may be unaware of the aim or the objective of the study - having less exposure to
the IV there is an immunity to demand characteristics.
High in generalizability and representativeness.
Weaknesses:
Harder to control variables in the study, difficult to standardize and thus replicate.
This may threaten credibility
If certain controls are not established, vital information may be missed out depending on the
scope of the experiments.
Human errors in researching are highly probably - difficult to account for.
May raise ethical issues of consent as participants are unaware of them being studied.

Natural Experiments
Is conducted in the participant’s natural everyday life setting where they are unaware that they
are being researched making it a covert observation.
Experiments have no control over the moniropring or manipulation of variables or the levels of
the IV. They happen, change and occur by themselves and so do the differences and variations
in the experiment. IV is naturally occurring and researchers cannot assess, operationalise or
control the variables.
Strengths:
Extremely high ecological validity and representativeness as participants are exhibiting the most
natural of their behaviour patterns.
Less prone to demand characteristics as it is a covert observation.
Can be used to investigate variables that are not practical or ethical to manipulate.
Can be used to study real life world issues.
Weaknesses:
Being unable to randomly allocate participants to conditions means that sample bias may be an
issue. A casual IV-DV effect is unlikely.
Ethical issues (lack of informed consent and deception is often required)
Less prospects of standardizing or controlling procedures which may add confounding
variables, which may add confounding variables, altering the results of the experiment.
Low validity as it is not replicable due to researcher’s lack of control, procedure cannot be
repeated so reliability of results cannot be checked.
Possible more time consuming than labs or fields
There is no control over the variables.

OVERT OBSERVATION
Participants can be promptly asked for their informed consent. However, to lessen the
probability of demand characteristics / hide aim of research, the participants may possibly be
deceived. It is important that after the experiment has concluded, they must be debriefed.

In natural / field experiments, because the observation is covert, it is highly unlikely that the
participants are aware of the situation taking place. The ethical issue arises when considering
the withdrawal rights in a covert scenario.
Because they don't know the implications of the effects of the procedure in the experiment they
will also not know when to withdraw and backout in order to protect themselves from plausible
physiological and psychological harm.
To maintain objective integrity of the experiment, privacy and confidentiality is a necessity in the
cases. In lab experiments, confidentiality can be respected as if there are interviews and
questionnaires they are most probably pre-planned and set carefully with the prospect in mind.
However, the invasion of privacy is a risk when considering field or natural experiments, as they
are usually covert but are in the daily, personal spaces of the participants’ lives.

Confidentiality however, can be respected in all the experiments by keeping their participation
(via names and other identity details) anonymous. The prospect of them having any trace or link
to the study in the future, which might reveal sensitive information such as their workplace,
home or name - risking compromising classified participation (confidentiality) must also be taken
care of when designing a study.

Informed Consent
Deception and Debriefing
Withdrawal rights
Protection from harm (physical and psychological)
Confidentiality
Privacy

SELF REPORTS
How the participants dispense information about themselves to the researcher directly. Involves
informed consent as the participant knows he's in a study.

QUESTIONNAIRES
Questions are presented to the participant in a written format either online or on paper.
Closed ended - Have a fixed and predetermined response set such as ‘Yes/No’. Take the form
of simple choices or ones that are specific to sector information.
Open ended - ask for descriptive qualitative responses that are individual to the participants
themselves. Naturally contain more in depth quality which aid in exploring reasons behind a
particular action or response. Keyword; ‘Why…’ ‘Describe…’
If more than 1 researcher is involved, there may be differences between them. - Inter-rater
reliability.

Rating scales - psychometric measurement tool to assess and quantify variables. Easier to
assess statistically and improves chances of organizing.
Advantages:
Quick / easy
Provides quantitative data - easier to analyse / organize data
Easier to summarize / distribute data
Privacy is respected as it's anonymous. Reduces social desirability bias - increases validity
It is replicable
Disadvantages:
Participants may respond to demand characteristics threatening validity.
Data provided is not qualitative and not in depth - vague
Gives a limited perspective of research
No guarantee they're not lying - social desirability bias
Interrater reliability - The extent to which the way the 2 researchers interpreting the qualitative
responses will produce the same records from the same raw data.
Filler Questions - items put into a questionnaire or test to disguise the aim of the study by hiding
important questions among them.

INTERVIEWS
Face to face research method using verbal questions.
Question and answer sessions are followed and responses are noted. Allow a far more
collection of qualitative data.

Structured Interview - Questions asked are commonand same among all participants with the
order of them being fixed. There may even be specific instructions for the researcher i.e body
language (relaxed or strict) / dress code and overall demeanour - depending on the kinds of
responses they might want to prompt.
Questions are all standardized.

Unstructured interview - Has no limitations, no standardisation. Questions are not in a


predetermined format. They are flexible according to what the participant says and thus
questions may be different for each participant. It however, may be hard to collect and
categorize and harder to compare.

Semi-structured Interview - Contains a mix of fixed questions and improvisational ones.


Comparisons can be made and average can be calculated. Also allows researchers to develop
ideas and explore issues. Can gather more clarity about a certain topic. Edits are thus possible
and allow researchers to explore underlying issues to make correlations / causal relations.

Advantages of self reports:


Participants are given the chance to express a wide range of feelings, thoughts and then explain
them. Data is rich - detailed (qualitative)
Data is numeric (quantitative) - easier to analyse and statistically relevant.
A large sample can be dealt with quickly and efficiently as a large audience can be reached.
Increasing representativeness and generalisability.
Easy to replicate - reliable. They are likely to be administered in a consistent way.

Disadvantages:
Closed questions often limit the range of expression of a participant which may miss out vital
information.
Participants may provide socially desirable responses (demand characteristics may surface) if
they are aware of the objectives of the research.
There are high chances of validity being low as a limited range of response sets might not
reflect a participant’s actual viewpoint and they may be compelled to answer differently.
Open ended can be time consuming to analyse.
Withdrawal is common
Researchers must be careful not to be subjective. They should aim for objectivity. Responses to
open questions may be interpreted differently by the researchers - may differ in opinion.

Subjectivity - personal viewpoint which may be biased by one's feelings, beliefs and
experiences. So may differ between individuals.
Objectivity - An unbiased external viewpoint that is not affected by an individual’s feelings.

CASE STUDIES
A detailed investigation which goes on for a certain extended period of time which focuses on
one subject. It is however not exclusive to one person – it may be an organization, a family, etc.
They involve a ‘longitudinal research’ which often used in therapies, includes a non-constricted
time-limit meaning it can go on for months and in some cases even years, which then develops
the study based on that particular subject which is being used to study a particular behaviour. It
is however not solely used for therapeutic purposes.

Detailed and in depth data gathered via different techniques. Useful for following developmental
changes.

Advantages:
Situations where it is logistically difficult or impossible to have a large participant sample – case
studies are ideal In those situations that allow behaviours to be studied in great detail.

Longitudinal study results in the collection of both quantitative data and qualitative data (rich and
detailed), which may measure and quantify developing behaviours. Should all lead to similar
conclusions.

Sample may be self-selecting so this frees the researcher up from ethical considerations such
as informed consent, privacy and confidentiality.

Ecological validity is usually quite high, as the behaviour that is being studied is a part of
everyday life.

Disadvantages:
Case studies very rarely produce quantitative data sufficient enough for statistical analysis –
which brings in the argument of this being a mere collection of anecdotal evidence (evidence
that is collected without strict controls or support, in a casual manner which is reliant heavily on
personal testimony.)
Level of detail may be invading a person’s private life. Hard to disguise their identity risk bearing
the guideline of confidentiality.

These often require a quite intense and intimate relationship between the participant and the
researcher and thus the problem of objectivity arises. They may develop opinions that directly
influence results gathered as they might be emotionally involved.
Conclusive decisions cannot be made as it only includes very few or one participant.
Non-generalizable.

Because the participant is unique, this might make researchers proceed with invalid procedures
and may draw false conclusions, making assumptions on lackluster grounds of evidence. Hard
to draw valid assumptions and unbiased findings as the study is only valid for the researcher.
Cannot be replicated. Findings may be limited to only this one case.

OBSERVATIONS
Observations are the procedure of watching and then consequently recording and documenting
the behaviour of the human or animal participants.
Can be done in 2 standard ways;

1. Naturalistic observation - conducted in the participants’ normal atmosphere without any


interference from the researcher (who are observing) them in their usual physical and
natural environment.
2. Controlled observation - conducted in an environment that has been manipulated by the
researcher. (lab or field)

If one considers the whole spectrum of possible behaviours it is a possibility that observations
may be non-focused – if this lack of strict controls continues then it is deemed an unstructured
observation.
Unstructured - observer records the whole range of possible behaviors, which is usually
confined to an pilot study stage at the beginning of a study to refine the behavioural categories
to be observed.
Advantage:
Ensures that any important information or behaviour is recognised but it may be very difficult to
record all the activities accurately and many may be irrelevant.
Likely that a structured will produce more reliable data.

A structured observation however is designed to concentrate on a specific set and range of


behaviours, record them and then proceed to categorize them. This also helps for the testing
and verification of the study’s reliability via a technique called inter-rater reliability (the
consensus of 2 or more experimenters to verify the validity of the study by judging on the
degree of agreement in their respective research results via the same, common methods.)

Behavioural categories must be observable actions and operationalised. This helps the
observers to be consistent i.e improves inter-observer reliability.

Inter-observer reliability - the consistency between 2 researchers watching the same event
and whether they will produce the same records.

Observations are often also conducted in social settings, either participant or non participant.
A participant observation - A researcher who watches from the perspective of being part of
the social setting. They are part of the situation / setting.
Non participant observation - A researcher who does not become involved in the situation being
studied. Ex; by watching through 1 way glass or by keeping apart from the social group of the
participants.
Observers often variate their stance as:
Overt observers - role of the observer is obvious to the participants. Observers are openly
watching and documenting the participant behaviour with the participant knowing they are being
studied.
Strengths:
Does not raise ethical issues
Is practical and thus can be conducted over an extended period of time.
The researcher can make notes and record details openly without having to rely on memory as
they don't have to worry about blowing their cover.
Researchers can ask a number of questions using different methods.
Weaknesses:
A very high risk of demand characteristics which lowers validity as activity recorded is less likely
to reflect real-world behaviour.
High risk of incurring socially desirable responses from the participants.
Results may not always be representative - questioning the credibility of research

Covert observers - role of the observer is not obvious as it is ‘undercover’ / hidden or


disguised.
Strengths:
Increases validity - less or no exposure to the aim so no demand characteristics
Reduced effects of social desirability
Data can be better controlled as researchers can dig deeper and assess more in their natural
state of behaviour.
High rate of inter-rater reliability as 2 observers may be simultaneously observing.
Weaknesses:
Raises ethical issues such as informed consent / deception / privacy
Patients may feel distressed at the violation of their privacy
The legality of this is often questioned
There is risk of identity being revealed which would discredit the whole operation and leaves the
researcher in constant stress. Data recorded may lack validity as it is based off of memory
Hard to sustain or conduct over time. Participants may interact with researchers in ways they
wouldn't want to if they knew what the real purpose was.

Advantages of observations:
The observed behavior is natural, authentic as they are unaware – this increases ecological
validity.
The data collected is often quantitative though structured controls which have clearly defined
categories is on terms with being objective and statistically comprehendible via analysis.
Chances gathering extremely rich data is very high if the observation is unstructured.
If participants are unaware, risk of inducing demand characteristics is improbable which
increases validity.

Disadvantages of observations:
The participant cannot explain or elaborate for the cause of them behaving a certain way as it is
a subjective approach (which when asked might expose the aim of the research).
Observations may not be reliable due to natural and logistical issues such as view obstruction,
missing out on details, relying on memory etc.
Naturalistic observations make it hard for controls to be established and this in turn, makes it
harder to control confounding variables – making it difficult to formulate a cause-and-effect
relationship.
Difficulty in replication.
Various ethical issues arise – deception, lack of informed consent as people are being observed
without their permission.

CORRELATIONS
Technique used to investigate a link between 2 measured variables.
Useful when it is possible only to measure variables rather than manipulate them.
If and any link is found between two variables in a correlation cannot be assumed to be a causal
relationship. We cannot assume that one variable is the reason that there is change in another
variable.

In order to look for or establish a correlation between two variables, each variable must exist
over a range/spectrum and it must be possible to measure them numerically. To collect data and
information for correlations all of the above mentioned research methods are used (self-reports,
observations, etc.)

It is important to note that before assuming that one correlation is the cause of increase in a
variable which in turn has caused an increase in the other variable – there are other factor
factors that might respectively cause changes in both variables.

All that is possible to be established is that the two variables that exist in a relationship vary
together, not that there exists a causal relationship between them, as it even may coincidental.
As a result, in a correlation there are ‘measured variables’ or ‘co-variables’ rather than
dependent and independent variables.

The relationship’s nature between the two variables in a correlation can be described in terms of
its directions – positive or negative.

In a positive correlation, the two variables present increase together, in the same direction, so
higher values on one variable consequently correspond with higher values on the other (directly
proportional). For example, a positive correlation would be between exposure to aggressive
models and violent behavior – greater exposure to aggressive models would result in increased
violent behavior (as witnessed in Bandura et al.’s study).

In a negative correlation, the two variables present increase and decrease together
consequently (inversely proportional). Higher values on one variable consequently correspond
with lower values on the other. For example, a negative correlation might exist between
‘Obesity’ and ‘Low income’ – with higher levels of obesity being observed with lower levels of
income, given low-quality food with next to zero nutritional value is often cheap such as fast
food, candy, etc.

Correlation Coefficient
Is a number between -1 and +1 that states how strong a correlation is. If it is close to 0 then
there is very little connection between the 2 variables at all.
If it is approaching +1 there is a positive correlation aka the variables are directly proportional,
with the both of them increasing as a consequence of one’s increment. (or decrement).
If it is approaching -1 then there is a negative correlation aka the variables are inversely
proportional, with the both of increasing/decreasing as a consequence of the other
decreasing/increasing.

Evaluation:
A correlational study can only be effective/valid if the measures of both variables test real
occurrences. For this, the variables must be clearly well-defined and relate directly to the
relationship that is being investigated.
The reliability of the correlation is dependent on the consistency of the variables. For some
correlations, such as those which utilize scientific scales – the measures will be high in reliability
(as they can be tested again and results will be objective).
For other cases, in which variables were measured using techniques such as self-reports or
observations, there is the plausible risk that reliability will be lower (as it will be difficult to
replicate and results will be subjective).

They are useful because they facilitate researchers to explore and then navigate problems
(hypothetically) when it is not practically or ethically possible to conduct experiments.

Advantages:
The main strength of a correlation is that it can provide precise information about the degree of
a relationship’s variables.
• Study behavior that is otherwise difficult/impossible to study.
• Collect quantitative data for statistical analysis which will help in determining whether the data
supports the study or not.
Disadvantage:
The main weakness of a correlation is that it is inconclusive i.e. it cannot show cause-and-effect
(which variables control which).
• No control over other influencing factors and variables.

Correlation does not mean Causation

RESEARCH PROCESSES
AIMS:
Tells us the purpose of the investigation. Help explain the reasons why a particular hypothesis is
being investigated. Expresses what the study intends to show.
In a correlation, the aim is to investigate the link between 2 measured variables.

HYPOTHESIS:
Is a ‘testable’ statement that is used to make the research more precise / exact. Predicts a
difference between levels of the IV or a correlation.
Ideally provides more detail about the variables being investigated and should be ‘falsifiable’ as
it is ‘tested’.
Alternative Hypothesis is the main hypothesis which can be written in many different ways. They
differ in the nature of prediction.

Types of Hypothesis:
Non-Directional Hypothesis (2 tailed hypothesis):
Used to determine the change in the IV and DV - it does however not indicate the direction
change i.e whether or not the effect results in an increase or decrease.
Predicts that there will be a change but not the direction of that effect.
IV will change the DV but not whether the effect will be an increase or decrease. Predicts that
there will be a relationship between the 2 measured variables.
This type of hypothesis is usually chosen if the effect of a certain variable is being used for the
first time, there is therefore no previous evidence to suggest what the results might be.
Directional Hypothesis (one tailed hypothesis):
When ‘previous evidence’ or ‘previous research’ suggests the direction of an effect (of the IV’s
on the DV), it is then when a Directional hypothesis is used. This is also known as the
‘one-tailed’ hypothesis.
In an experiment it means saying which condition will be best (produces the highest score).
In a correlational study whether there will be a negative or positive effect.
It is important to remember that your hypothesis should not say that one factor instigates a
change in the other.

Null Hypothesis:
In an experiment, this states that any difference in the DV between the levels of the IV is so
small that it is likely to have arisen by chance. The difference between the DV and IV is so
insignificant it is probably a coincidence. Predict scenarios of pure chance.
You always have to state both of the levels of the IV and DV. Correlational study predicts no
relationship.

DEFINING, MANIPULATING AND CONTROLLING VARIABLES


Variables are factors that change or can be changed. In experiments these are the IV and DV as
well as any extraneous variables that are or are not controlled.

Experiments look for changes and variations in the DV between two or more than two, levels of
the IV which are put down by the experimenter/researcher.

The essential aspect is for the IV to be concretely defined or better, operationalized so that the
manipulation of the conditions, project the intended effects.

To make it clear:
Variables are factors that are prone to change or can be changed and moulded.

Operationalizing: Involves defining each variable of interest in terms of the operationalization in


terms of the operations taken to ‘measure’ it. This allows vague components to be empirically
measured and observed.
DV must also be operationalised so we can measure it effectively.

In order to be sure about their research’s findings, variables need to be controlled. Specifically,
in experiments where extraneous variables are likely to disrupt and complicate the results and
distort them for interpretation.

Confounding Variables, can either work against the reaction of the IV or in favour by
increasing the intended/expected outcome of the IV because they selectively act on DVs. Thus,
they serve as a “consequential effect” of the IV and you are left with no chance of knowing what
caused the change. They confuse the results.
Extraneous Variables which randomly affect all levels of the IV aren’t so problematic. The
difficult part is to identify and select which variables to manage before the experiment launches.
It is however. Also, important to note how if extraneous variables are not recognized and
acknowledged beforehand, they become uncontrollable variables, which would make the results
difficult to be construed as it would be difficult to distinguish the reactions/effects of the IV from
those of other variables that affect the DV.

Standardisation:
There are controls present that ensure that IVs represent what they are designed to I.e the
differences between them will produce the intended scenarios to examine the hypothesis –
ensuring validity and reliability. This enables every participant in the study to be treated equally
so that no participant variable arises. This is called standardization.

This is achieved by having a unified, standardized set of instructions, that provide the same
instructions to every participant involved in the study. For instance, a 10-question questionnaire
which asks about people’s dietary habits – all the participants should be told how to answer it
only strictly regarding their food patterns, so if any social desirability is there, it should be equal.

Procedures also need to be strictly standardized - this involves having equipment, tests and
designs that are consistent, measuring the same variable every time and always do so in the
exact, identical way. Assess the questionnaire about people’s dietary habits again. They should
again focus on strictly people’s food consumption patterns rather than why food patterns are like
that. It is related but not necessary to the context.

In laboratory experiments, standardization is easier because variables such as equipment are


better and more easily controlled. An example of this would be the stopwatch which is used to
regulate time intervals in the experiment or FMRI Brain scans which is an objective measure.
They do however also have to be performed in a standardized manner for them to be
interpreted. The controls used should be appropriated and be taught how to implement.

Situational Variable - A confounding variable caused by an aspect of the environment


Control - A way to keep a potential extraneous variable constant
SAMPLING OF PARTICIPANTS
A population is a group of people (or animals) with one or more characteristics in common. A
population of people can also be defined as people who share certain interests or have a
common feature. The sample of that population is what is recruited in a research.
Sample - is a group of people who participate in a study.
They are taken from the population and should be ideally representative of that group so that
findings are generalisable.
Target population of the study should also be recognized early on so that the sample the
researcher chooses should be relevant and representative.

Important things to consider when sampling:

• Sample details such as age, ethnicity, gender. They are basic essentials that should always
be considered.

• Sample details such as socio-economic standing, employment, education, occupation,


geographical location.

• Sample size. (Should be balanced in terms of being representative)

• Small samples usually are less reliable and less representative and thus generally less valid
to the clause of research.

SAMPLING TECHNIQUES:
Opportunity Sample - involves the researcher approaching people who are easy to find and
easily available, such as students who are studying mathematics in the same university
department. They are chosen because they are available.
Advantages:
It is relatively quick and easy to recruit participants. A large, It is convenient for the researcher
and the participant as in some cases – there are also incentives.
Representative samples can be obtained without a lot of effort.
Disadvantages:
Participants in the study are unlikely to be actually representative of the target population in the
sense that it could be biased (if they’re paid or given credits to), when the researcher chooses
the sample.

Volunteer Sampling (Self Selecting) - This usually revolves around researcher and
experimenters advertising for participants. An advert could usually appear in a newspaper or on
notice boards, online too. The people who reply are ‘self-selecting’ - that is they willingly
volunteer themselves for the research. Sometimes volunteers are not given incentives at all,
neither credentialled nor paid, often they are given a small amount of both or one of the other
Those who reply to the notices become the sample.
Advantages:
They are useful when the research requires participants that are specific to the needs of the
experiment. They are likely to be committed.
Recruiting participants is easier because the advert can easily be placed in print media, social
media and digital media which has a large enough reach.
It is less time-consuming.
Disadvantages:
This is expensive (adverts in the media would cost a lot of capital investment) and in some
cases it would take even more effort to convince the participant to be in the study.
People may not see the advert, they might ignore it, they might not reply even after seeing it.
Extraneous variables such as the actual eligibility criteria of the participants being different from
what’s actually required.
No way to assure the representativeness of the target population. People who respond may be
similar (have free time).
Plausible demand characteristics and social desirability is risked to effect as the findings in the
case where an incentive is involved when they volunteer.
Random Sampling:
Each participant is given the equal chance of being selected from the target population. If the
target population is ‘factory workers’ and there are 800 of them – the only way to actually
randomly select the sample is to put all 800 names together and pick out the first 20, 30 names
(depending on the required sample size of the study). May be allocated numbers and selected
in an unbiased way.
Advantages:
They are more likely to be representative than opportunity or self-selecting samples as the
clause for bias does not exist as the selection is up to chance, random.
It is efficient if a certain demographic is to be studied and out of that lot the participants are
selected at random.
Disadvantages:
It is however often time-consuming when a large target population is considered. If for example
not all names of the potential participant pool, it would be difficult to conduct a random sample.
The chance of equal opportunity in the random selection process is often too idealistic in the
sense that not everybody might be inclined to fully participate. It is possible for them to leave
and then for the researcher to recruit and replace a new participant.
This might bias the sample.

DATA AND ANALYSIS


Psychological research often requires a numerical and quantitative organization of results that
they get from their findings – the results in question are called ‘raw data’. To categorize large
findings in these scenarios, it is often mathematically simplified and visually represented via
graphs.

The numerical results collected by psychologists is known as ‘quantitative data’, the data
which is detailed and descriptive is called ‘qualitative data’. Quantitative data indicates the
quantity of the psychological measure I.e. the strength or amount of a response and tends to be
measured on scales, such as time, or as numeric score on tests such as Personality, IQ and
T-maze tests.

Quantitative data is associated with experiments and correlations which use numeric scales but
it is also possible to collect quantitative data from observations and interviews.
Advantages:
It usually uses objective measures and scales.
They are reliable (can be tested repeatedly) aka replicated.
Quicker to analyse statistically when there are large volumes of data involved – in terms of
statistical comparison.
Disadvantages:
This method of data collection often limits responses so there is an aspect of the findings being
less valid and less representative. No explanation of ‘why’.
Large samples are needed for the findings to be generalizable.

Qualitative is a descriptive in depth result indicating the quality of the psychological


characteristic.
Advantages:
Data is often valid because it is descriptive and detailed, not limited by fixed choices.
Can often help researchers control certain variables by making them aware of it (eg. Childhood,
family) allowing them to estimate cause and effects.
Data is more in-depth which inhibits a deeper understanding of the study.
Important responses are less likely to be ignored because of averaging.
Disadvantages:
It is subjective to the studies/experiments, and cannot be usually generalized for the larger
context.
Data may have bias – of both the participant and researcher. This may render data invalid.
Difficult to statistically analyse and comprehend.
Difficult to replicate without strict standardization and thus low reliability.

DATA ANALYSIS
Measures of Central Tendency:
A set of quantitative results can be summarised down to one number that represents the middle
score and an aggregate – this is known as measures of central tendency or average.

3 different measures:
Mode - frequently repeated score, number in a data set. There can be more than 2 modes. It is
unaffected by extreme scores and it is useful to observe repetitive behavioural patterns.
Limitations of using this measure is that it offers no insight about the other scores, it isn’t very
‘central’, it is also very fluctuating from one sample to another.

Median - is only used with numerical data on a linear scale. To find the median, all the scores in
the data set are put in a list from smallest to the largest. The middle one in the list is called the
‘median’. To configure this, all scores are arranged from ascending order – the middle number in
this is the median value. If there are an even number of participants, in which case there are two
numbers in the middle, these are added together and then divide by 2.

In essence, the median value is the halfway point that separates the lower quartile from the
upper quartile. It is unaffected by extremes, in the sense that there is no distortion of results. It
however can be misleading when there are only a few scores and doesn’t take into account
most of them.

Mean - The mean is the measure of central tendency that we usually call the ‘average’. It can
only be used with numerical data from linear scales. The mean is worked out by adding up all
the scores in the data set and dividing them by the total number of scores. It is the most
thorough and informative measure of central tendency as it takes into account all scores. There
is however the probability of it giving a distorted result if there are any anomalous scores.

It is done by adding up all the values to find a total, dividing the value by the number of values
added together that were present.

MEASURES OF SPREAD
This indicates how far spread, dispersed and varied data is within a set. If two data sets are the
same size, with the same mean, they could still vary in terms of how close the majority of data
points were to that average. Differences such as this are described by measures of spread: the
range and the standard deviation.

Range - To calculate:
1. Find the largest and smallest value in the set of data.
2. Subtract the smallest value from the largest value and add 1.

Conventionally, the addition of 1 is not done. In psychological research this is done so that we
measure the gaps between points, not the points themselves.
Standard Deviation - takes into account the difference between each data point and the mean –
this is known as deviation from the standard.

As the standard deviation tells us the spread of a group, groups with scores that are more
widely dispersed have a larger standard deviation. When the standard deviations of two groups
are similar, this indicates they have a similar variation around the mean/average.

Graphs - This is used to visually illustrate data, with a variety of them for different purposes. The
ones being included in our syllabus being Bar charts, Histograms and scatter graphs.

Bar Charts -used when data is in separate categories rather than a continuous scale. Bar charts
are therefore used for the totals of data collected in named categories and for all measures of
central tendency.

Histograms - useful to show the pattern in the whole data set, where the data is continuous in
which case the data is being measured on a scale rather than distinct categories. A histogram
may be used to illustrate the distribution of a set of scores.

Scatter Graphs - The results which are collected from a correlational study are presented on a
scatter graph. To construct a scatter graph, a dot is marked at a point where the participant’s
score on each variable cross, there is also the ‘line of best fit’ reoccurring on a scatter graph.
The position of this line is calculated and its line is drawn so that it comes close to as many
‘points’ as possible. In the case of a strong correlation, all the data points lie near/close to the
line whereas in a weak correlation’ its vice versa – they are more spread out. When there is no
correlation, a concrete line is not formed.

Normal Distribution Curve - bell shaped is symmetrical and is even spread.

ETHICAL CONSIDERATIONS
Experiments and studies conducted using humans or animals have the potential to cause
concerns about the welfare of the participants – these are called ethical issues. There are
certain problems that may arise when the nature of the study is put into context – such as
psychological discomfort, harm, stress, the procedure’s nature, the need to lie to hide the aim of
the study. Ethical issues may also arise from the implications of their research, for example the
possibility of results having a negative impact on the society.

To regulate these concerns, organizations and council bodies exist which produce a code of
conduct, with rules such as approval charters from the governing bodies (such as universities)
and guidelines that help experimenters work in way that do not violate the ethics code as it
instructs limitations and concerns of the welfare of the individuals involved in the study.

This is important because if participants take away a negative perception and experience from
their participation it will negatively impact the whole psychological community which in turn will
lose credibility.

ETHICAL GUIDELINES RELATED TO HUMAN PARTICIPANTS:


Informed Consent -
In order to reduce or negate the variable of demand characteristics, social desirability and
validity of the study it is important to hide the aim of the study. It is however important for them
to know what is in the study so they can provide their informed consent.
Ideally, informed consent should be obtained from participants before the study commences, not
by revealing the aim of the study but by providing them with enough sufficient information so
that they may decide whether or not to participate in the study. However, in the cases of
naturalistic observations and field experiments it is not possible for informed consent to be
taken. This is where ‘presumptive consent’ comes in. This means the researchers might ask a
group similar to the target population (sharing similar traits) whether they would find the study
acceptable or not so a relevant result is acquired, thinking that the target population may also
would have agreed.

Protection
A study may have the potential to cause psychological discomfort, stress and harm to the
participants involved (for eg. Milgram et al.). In situations like those it is imperative that
participants should be protected, should not be put at higher risks and steps should thus be
taken to eliminate the risk altogether. It is also a preventive measure that the study being
conducted should be stopped if unexpected risks arise.

Right to Withdraw
Participants are also given the right to withdraw and it must be made clear to participants at the
start of the study. Although participants can be offered incentives to join a research, these
cannot be retracted away if they wish to leave for valid reasons. Researchers cannot abuse
their position of authority, forcing a participant to stay if they don’t want to. Participants and
Researchers should both be aware of this.

Deception
If possible, they (participants) should not be deliberately misinformed. When it is absolutely
essential to do so – they should be apologized and debriefed instantly. They should also be
allowed to remove their results if they wish to.

Confidentiality
All the data that is collected and stored should comprise separately from the participants’
personal information – age, name, gender, ethnicity, occupation. This information must not be
shared with any other 3rd party – this would be a breach of confidentiality. Ideally, to ensure
confidentiality the personal details of the participant should be destroyed so that any breach is
impossible. If by any chance, there is a need to initiate contact with the participant again or to
pair up an individual’s score in each condition in say, a repeated measures design – a serial
number can be allotted to the participant(s) to identify them.

Privacy
Research methods such as self-reports and observations which ask personal questions in a
study risks invading privacy. This means invading personal space or an emotional territory that
the individuals do not want to share. They can make this clear, setting boundaries with the
researcher. In the case of a questionnaire, participants should be allotted personal space. In
observations, people should only be observed/watched where the participants would usually
expect to be observed. Their information can be published only when the participant themselves
grants them permission via informed consent or hyper-exceptional circumstances where the
safety and lives of the participant or others are at stake.

Debriefing
It is done by thanking participants who have been in a study, apologized to when deceived and
they are provided the chance to ask questions. They are also informed of the full aim of the
study and ensure that they do not want to withdraw their data. It is however, important to
understand how debriefing does not serve as a clause to designing an unethical procedure or
experiment, thus it is important for the researchers to consider minimizing ‘collateral damage’
and distress to the participants, in any case.

ETHICAL GUIDELINES RELATED TO USE OF ANIMALS


Animals are frequently used in psychological research for a number of different reasons as
suggested by psychologists – they are convenient models, a way to execute procedures that
could not be possible (because of ethical considerations) and because of redundancy. This is
why research is conducted on animals but their welfare needs safeguarding.

Animals are also often protected by law but these guidelines specifically consider the effects of
research in which animals may be caged/confined, harmed, in pain or stressed – this suffering
should be minimized. Veterinary help/advice should be sought in case where needed.

REPLACEMENT
Researchers should consider replacing actual animal experiments with alternatives such as
videos from previously conducted studies or computer simulations.

SPECIES AND STRAIN


The chosen species and strain should be the one that is least likely to go through distress or
pain. Other relevant and important factors such as if the animals were bred in captivity, if the
animals were participants in a study prior to the current one and the sentence period of the
studies.

NUMBER OF ANIMALS
Only the minimum number of animals needed to produce reliable and valid findings should be
utilized. To minimize the number, pilot studies, reliable measures of the DV, good experimental
design and research method along with solid data analysis.

PROCEDURES: PAIN AND DISTRESS


Research that may potentially cause disease, injury, physiological and psychological distress,
discomfort and death should be avoided at all costs. The experimental design should work on
reducing any possible pain of the animals, rather than worsen the situation.
Alternatively, naturally occurring instances may be used – such as during research, attention
has to be paid to the animals’ daily care and veterinary needs and any costs inflicted upon the
animals should be justified by an objective, scientific explanation that benefits the work.

HOUSING
Isolation and overcrowding can cause animals to become distressed as some of them have
solitary, territorial tendencies and habits. The caging condition should be considered according
to the social behaviour patterns of the animals. Overcrowding can cause aggression and
consequently, distress. Their food and water should be sufficient regarding their dietary habits.
However, the artificial environment only needs to recreate the aspects of the natural
environment that are important to welfare and survival. Eg. Warmth, space for exercise or
somewhere to hide. Cage cleaning should top priority.

REWARD AND DEPRIVATION AND AVERSIVE STIMULI


When initiating studies that concern the dietary habits of animals, it should be designed to
satisfy the needs. The usage of preferred food should be considered as an alternative to
deprivation and alternatives to aversive stimuli (Aversive stimuli by definition is an intentionally
simulated unpleasant event/occurrence that intends to decrease the plausible probability of a
behaviour, when it is presented as a consequence for example – a punishment.) Deprivation
should be used where possible.

ANAESTHESIA, ANALGESIA AND EUTHANASIA


Animals should be protected from pain.
Anaesthesia: It is a process of temporary loss of sensation, awareness and consciousness that
is induced through IV (Intravenous circulation). This is usually to induce a paralysis (for muscle
relaxation).

Analgesia: Medication used to relieve pain, inflammation and etc.

Euthanasia: It is also known as mercy killing. It is the process of intentionally killing, relieving the
subject from the pain and suffering, withholding artificial-life support and treatments.

EVALUATING RESEARCH METHODS


Reliability
Whenever research is conducted data is inherently obtained. Researchers must attempt to
make sure that the way in which these results are collected is the same every time. When
differences in findings occur upon times of repeating the research, such inconsistencies are
deemed problems in reliability.

The reliability of the measures used to collect data depends on the ‘tool’ used. A researcher
collecting reaction times or pulse rates as data will probably have reliability as the machines
used are likely to produce very consistent measures of time or rates.

The way to check reliability is to use the test-retest procedure. This involves using a measure
once, and then using it again in the same situation. If the reliability is high, the same results will
be witnessed and collected on both occasions meaning there will be a high correlation between
the two score sets.
There is also the problem in reliability that there are subjective interpretations of data. For
instance, a researcher who is using a questionnaire or interview with open questions may come
to find that the same answers could be interpreted in different ways, producing low reliability. If
these differences arose between different researchers, this would come to be called an
inter-rater reliability problem. This however, can be solved by operationalizing.

Similarly, in an observation researcher gave different interpretations of the same actions, this
would be low inter-observer reliability. If the reliability was low, the researchers in either case
would need to discuss why the differences arose and find ways to make their interpretations or
observations more alike. This can be done by agreeing on operational definitions of the
variables being measured and by looking at examples together. These steps would help to
make the research indefinitely more objective.

To minimize differences, in the way research is conducted that could effectively reduce
reliability, standardization can be used, that is if the procedure is kept the same. This could be
done by including instructions, materials and apparatus, although it is important to note that
there would be no reason to change many of these. The important aspects of standardization
are those factors which might differ, such as experimenter’s manner towards participants in
different levels of the IV, an interviewer’s body language, verbal mannerisms or an observer’s
success at covering their presence.
Validity
Many factors affect validity (and this includes reliability too because a test or task cannot
measure what it actually intends to unless the methods are consistent. Objectivity also affects
validity in the sense that if a researcher is subjective in their handling and specifically
interpretation of data, their findings will not properly reflect the intended measure.

There are different types of validity that are important – this includes face validity (which is
essentially the measure of the procedure and how it appears) A test or task must seem to test
what it is actually supposed to. Consider a test of helping behaviour that involved offering to
assist people who were stuck in a bathtub full of spiders or lizards.
It might not be a valid test of helping because people who were frightened of spiders or lizards
would not help, even though they might otherwise be of altruistic nature (selflessly helping). This
would be deemed a lack of face validity.

If participants start to think that they understand the aim of the study, their behaviour patterns
and characteristics are very likely to be affected by what we call social desirability and demand
characteristics – this obviously lowers validity. When designing a study, the researcher should
aim to minimize demand characteristics that do not make apparent or indicative to the
participants how they are expected to behave.
Another problem of validity is whether the research’s findings are too specific to that own study,
not being able to apply it to other situations. This lacks the general reach it was supposed to
have – this means there is a lack of ecological validity. This type of validity explores if findings
from the laboratory have a real-life application into the ‘real world’.
The task itself matters too. If in a task, participants are asked to do tasks that are similar to the
ones in real-life contexts then it has mundane realism (the degree of it being similar to events in
real-life contexts). This is significant for a study to have as it would naturally have higher
ecological validity if the tasks are realistic. For instance, in an experiment on emotions
responses to dangerous animals such as Bears, Insects, Bats or Tigers can be used.

As it is highly likely that a small number of people would have seen bears, tigers, a few more
would have seen a bat but insects are more likely to have been seen by everybody in the
participant sample – having higher mundane realism and thus higher ecological validity. This is
a variant of; external validity. External Validity is basically referring to whether or not the findings
of the study can be generalized beyond the present study.

Generalisability
As it is apparent, Ecological Validity contributes to the generalisability of the results. Another
factor which affects the ability to generalize is the participants of the sample.

If the sample is very small, or does not contain a wide range of the different kinds of people in
the population (such as gender, age, ethnicity, etc) it is actually unlikely to be representative.

Restricted samples like the one mentioned are more likely to occur when the sampling methods
of either opportunity or volunteer sampling is used, rather than if random sampling is used.

Important Things To Remember About Research Methodology And Processes:

• Are measures reliable?

• Are the tools and equipment being used collecting consistent results?
• Are the researchers using those in ways that are consistent?

• Is the interpretation of data objective?

• Is the study valid? Does it represent what the aim intends to find out?

• Take into account the position of reliability and generalizability when it comes to validity.
• Are there any variables that may affect results? Such as Social Desirability, Demand
Characteristics, Familiarity Bias, Researcher Bias, etc?

• To improve the study, light focus needs to be on: Method, Design, Procedure and Sampling
Tool.

COGNITIVE APPROACH
Main assumptions:
Behaviour and emotions can be explained in terms of the role of cognitive processes such as
attention, language, thinking and memory. Our complex mental processes can be studied
scientifically.
Humans can be seen as data processing systems. The workings of a computer and the human
mind are alike. They encode information - store information - provide an output.

ASSUMPTIONS OF THE COGNITIVE APPROACH


- behaviour and emotions can be explained in terms of cognitive processes such as attention,
language, thinking and memory
​ - similarities and differences between people can be understood in terms of individual patterns
of cognition
STRENGTHS WEAKNESSES
- it is possible to infer cause and - it is more subjective as we can only infer how
effect as the approach typically people think or process information
uses the experimental method
- assumes that people's cognitive processes
- many psychologists would say are the same, and therefore does not account for
that the mind is central to individual differences
understanding human psychology,
so cognitions are of high - the analogy that the human mind is like a
importance computer may be too reductionist

ANDRADE DOODLING 2010


People have known to daydream frequently when presented with something boring.

Background: Prior to this study, it was not known whether the act of doodling does impair
attention processes by taking away resources from the primary attention tasks or whether it
actually aids concentration towards the primary task, additionally maintaining arousal.
It is common in research on attention to pose the participants with dual tasks to monitor
performance and see which cognitive processes are needed to complete these tasks.

Aim:
To test whether doodling aided concentration in a boring task.
Find out whether doodling aided in concentration (information processing) by enhancing
memory or increasing the efficacy of listening.

Procedure:
Research method - Laboratory Experiment
Experimental design - Independent groups design for participants were distributed in either of
the 2 conditions.
IV - Doodling and Control group
DV - number of correctly recalled names (mean correct recall / false alarms / memory scores)
Sample - 40 members of the Medical Research Council (MRC) Applied Psychology Unit
participant panel at the University of Plymouth (UK) from the general population with ages
ranging from 18-55. They were paid for participating and were randomly assigned to the control
(20 - 18 females and 2 males) or doodling group (20 - 17 females and 3 males)
Sampling Technique - Recruited using opportunity sample (participants were on their way
home from another study)

The researcher recorded a mock telephone message about a boring birthday party using a
cassette recorder.
A fairly monotonous voice was used and was played at a comfortable volume for all listeners.
Average speaking rate was 227 words per minute.
Script included names of;
8 people who would be attending a party
3 people and 1 cat who would not attend
8 places (mentioned)

Participants were recruited just after finishing an unrelated experiment for another researcher
and were asked if they would mind spending another 5 minutes helping with this research.
The intention was to enhance the boredom of the task by testing people who were already
thinking about going home.
Participants were randomly assigned to the 2 conditions and were tested individually in a quiet
and visually dull room.
They were asked to note down the names of all the people attending the party and nothing else.
They were also told that they do not need to remember any of it.
Standardized Instructions - Ignore names of people not attending the party and remember
(write down) the names of those attending.

Participants in the control condition were given a piece of lined paper and a pencil.
Participants in the doodling group were given an A4 sized paper with alternating rows of 10
squares and circles (1 cm in diameter with a 4.5 cm margin on the left hand side where they
could write any target information.)

Doodling group was asked to shade in shapes. They were told that it “does not matterhorn
neatly or quickly you do this - it is just something to help relieve the boredom”.
Participants listened to the tape for 2.5 minutes and wrote down the information as directed.
As soon as the recording finished, the researcher came in and collected the sheets and talked
to the participant for a minute.
Conversation included a debriefing and an apology for misleading them about the memory test.
Participants were asked if they suspected a memory test.

Monitoring Task - Recalling names of the individuals who will go to the party
Recall Task - Recalling the names of places mentioned. (surprise test)
Counterbalancing was used to reduce order effects by switching the order of the recall and
monitor task.
DV - operationalized by accepting names misspelled due to the participants hearing it wrong
and considering the names of the individuals who weren't going to the party as false alarms.
Final Score - calculated by subtracting false alarms from the number of correct names
provided.

Results:
Participants in the doodling group shaded a mean of 36.3 shapes (range 3-110). One participant
did not doodle and was replaced.
None of the participants in the control group doodled.
3 participants in the doodling group and 4 in the control group suspected a memory test. None
of them claimed that they actively tried to remember the information for the test. They were
excluded to eliminate effects due to demand characteristics.
If a response indicated a plausible mishearing, it was scored as correct.
Names not similar to the ones given, names of people who didn't attend or responses such as
‘sister’ were scored as false alarms.

MONITORING PERFORMANCE SCORE = Number of correct names - false alarms


15 participants in doodling and 9 in the control group scored the maximum score.
Monitoring performance score was significantly higher in the doodling condition (mean 7.7 and
SD 0.6) compared with the control condition (mean 6.9 amd SD 1.3).

Those in the doodling group recalled a mean of 7.5 pieces of correct information compared to
the control group (5.8).
Monitored names were recalled more than places.
Recall for monitored and incidental information was significantly better for those in the doodling
group.
The Doodling group provided 29% more correct information about the names of individuals and
places than the control group.

Conclusions:
Participants who performed shape-shading tasks concentrated better on a mock telephone
message than those who listened with no concurrent task.
Doodling aided concentration on primary tasks since they recalled more information. They either
noticed more of the target words because their attention was enhanced or their memory was
improved due to doodling supporting a deeper information processing.

However, it is difficult to assess which of these suggestions is more realistic due to there not
being information collected about daydreaming. A self report or brain scan could have been
effective in checking if doodling reduced the activation of the cortex which is associated with
daydreaming. Additionally, shading in shapes is not equivalent to spontaneous real doodling.

Ethical Issues:
Deception about memory test - The participants were unable to give fully informed consent as
they were given an unexpected test. This could also cause them distress exposing them to
psychological harm.
Debriefing
Strengths:
Standard Procedure - easy to replicate for reliability.
Lab Experiment / Controls - The study was a laboratory experiment. It gave a high degree of
control to the researchers. All participants were listening at a volume comfortable for them and
using a recorded telephone message so there was no difference in the stress of words.
Using an independent measures design, it was possible to control extraneous variables,
ensuring participants were listening at a comfortable volume. It was also standardized so that
participants were all equally likely to be bored and therefore to daydream. This was achieved by
the monotony of the recording, using a dull and quiet room and asking them to do the
experiment when they were expecting to go home. This meant the research was valid. Could be
sure that the differences in results between the 2 conditions were due to doodling or not - more
reliable as all participants were equally bored. The operationalization of doodling was also
standardized, using the doodling sheets, otherwise there may have been individual differences
in participants and some may have not doodled at all. This increased validity. Participants were
discouraged from doodling in the control group by being given a lined sheet. (Control). Replaced
the non-doodler in the experimental group.

Weaknesses:
Low generalisability - sample was from a volunteer participant panel, therefore they may be
qualitatively different and results may not reflect the whole population. The participants were all
members of a recruitment panel and the kinds of people who volunteer may be all similar. Such
as having time to spare or interest in psychology.
Could be a bias sample - lowering validity. Sample consisted of more females than males in
both conditions. This reduced the representativeness of the sample, as an inclusion of an equal
number of both participants would be much more desirable. It was based on convenience and a
more representative sample in terms of diversity would be more desirable for increased
generalizability.
Participant variables might have affected the findings - Risk of participant variables
confounding the results, as the amount of shapes shaded differed. Risk of demand
characteristics - some suspected a memory test - but were roughly equal in each condition.
Low mundane realism and ecological validity - High degree of control means that the
relevance and applicability of the study to real life situations is questionable. There is low
ecological validity. Few boring tasks in the real world are usually a combination of visual and
auditory nature. Doodling is not shading shapes it may be drawing shapes and figures.

Issues and Debates:


Application - useful for students while they are revising or studying / learning in class.

Individual Explanation - Participants may have used a similar strategy before or have a
personality trait that requires stimulation when processing information.
Situational Explanation - The process of doodling could have caused the improvement in
recall.
BARON COHEN ET AL.
“Reading the Mind in the Eyes” Test Revised Version 2001.

The main idea of the eyes test was to investigate the theory of mind. This is the ability to
attribute mental states to oneself or another person and this ability is the main way in which we
make sense of or predict another person’s behaviour.
The notion is that many autistic individuals do not understand that other people have their own
plans, thoughts and points of view.
It appears they have difficulty understanding other people’s beliefs, attitudes and emotions.

Background:
In 1997, the “Reading the Mind in the eyes” test was developed to assess the Theory of Mind.
This appeared to discriminate between adults with the Asperger Syndrome (AS) or
high-functioning autistic (HFA) adults from control adults.
The 2 former groups scored significantly worse. However, the researchers were not satisfied
with the elements of the original study and wanted to ‘update’ their measures to make it better.

Individuals who are diagnosed with autism need to meet 2 criteria;

● Impairment of social communication and social interaction skills and


● Evidence of restricted, repetitive patterns of behavior, interests and activities.

A ‘Theory of Mind’ is a cognitive ability enabling us to realize that others have different feelings,
beliefs, knowledge and desires from our own. ‘Theory of Mind’ is often linked to empathy.
Empathy is the ability to understand the world as another person does to appreciate their
feelings or emotional state separate from our own.

Autism is a condition characterized by challenges in social skills, impaired verbal and non-verbal
communication and lack of imaginative abilities.

Individuals with autism, therefore, struggle with understanding the intentions of others, coping
with change, realizing what they experience and lack empathy. Baron-Cohen named this
decreased cognitive process a lack of ‘theory of mind’ – they cannot impersonate others.
Autism is a failure to develop particular cognitive processes linked to social interaction that
occurs in approximately 1% of the population. Individuals with autism often unusually have
narrow interests.

Aims:
To test a group of adults with AS and HFA on the revised version of the eyes test. This was in
order to check if the deficits in this group that had been found in the original study could be
replicated. Test if the revised version of the “Reading the Mind in the Eyes’ test would be
successful at differentiating participants with AS or HFA from the general population.
To test if in a sample of normal adults, an inverse (negative) correlation would be found between
performance on the revised eyes test and the autism spectrum quotient (AQ).
To test whether females scored better on the eyes test than males.

Hypotheses:
1. Participants with AS or HFA will have a significantly lower score on the revised task than
the control group, showing a lack of theory of mind in these participants.
2. Participants with AS or HFA will have significantly higher scores on the Autism Spectrum
Quotient test (AQ) measure.
3. Females in the ‘normal’ groups (2 and 3) will score higher on the ‘Reading the Mind in
the Eyes’ task than males in the same group. (Females have a greater theory of mind)
4. Males in the student comparison group (3) will score higher on the AQ than females.
(Mild autistic traits generally in males)
5. The scores on the AQ and ‘Reading the Mind in the Eyes’ task will be negatively
correlated.

Procedure:
Research method - Quasi experiment - natural experiment as the IV is naturally occurring but
taken in a lab and a questionnaire.
Experimental Design - Independent groups
IV - 4 groups of participants (type of participant in each condition) IQ / social background / HFA
or AS or normal.
DV - scores on the AQ test and the Eyes task

Sample:
Group 1 AS/HFA participants - 15 male adults with AS or HFA. Recruited via volunteer
sampling through adverts in the UK National Autistic Society Magazine. They were diagnosed
by the DSM or ICD criteria, with a mean IQ of 115 and mean age 30. They spanned an
equivalent range of socioeconomic classes and education levels.

Group 2 Adult comparison group - 122 normal adults drawn by an opportunity sample from the
adult community and educational classes in Exeter or from public library users in Cambridge.
They had a broad range of occupations and educational levels. Mean age 46.5 years.

Group 3 Student Comparison Group - 103 normal adult students (53 male and 50 female)
studying for undergraduate degrees in Cambridge University. Opportunity Sample. This group is
not representative of the general population and can be considered to have a high IQ. mean
age 20.8

Group 4 IQ matched Group - Randomly selected 14 adults from the general population who
were matched for their IQ with group 1. Mean IQ 116 and mean age 28.

The revised eyes task consisted of 36 sets of eyes,(18 female and 18 males), each with 4
choices of emotion on the face of the target. Example; Aghast / Irritated / Reflective / Impatient.

A pilot was conducted prior to the study.


For each of the 36 sets of eyes, the target and foil words were developed using groups of 8
judges (4 male and 4 female). At least 5 of the judges had to agree on which target word was
the most appropriate for the eyes and no more than 2 judges could select any of the foil words.
Participants were tested individually in a quiet room in Cambridge or Exeter.
Participants in the AS/HFA group were asked to judge the gender of each eye pair additionally.
Groups 1, 3 and 4 completed a questionnaire to measure their AQ.
Participants were asked to read through the glossary and ask if they were unsure of any word.
They were also reassured that they could refer to the glossary at any time.

PROBLEMS & SOLUTIONS


Original Problems - Forced choice between 2 responses (that were always opposites) meant
that there was a 50/50 chance of guessing something right. Narrow range of scores.
New Design Elements - Forced choice but with 4 response options. There were 36 sets of eyes
as opposed to the prior 25. This meant that individual differences could be examined better in
terms of statistics.

Original Problems - Ceiling effect was created as the test was too easy (with basic and complex
mental states being included) and individuals with autism would score similarly to those without.
New Design Elements - Only complex mental states were used so as to make the tests much
more challenging increasing the likelihood of obtaining a greater range of performance in a
random sample of adults.

Some emotions could be easily labelled by checking the direction of the gaze. These were
excluded from the revised test.

The old test had more female eye pairs than males. The ratio between the number of male and
female eyes was imbalanced. The revised test included an equal number allowing a control
condition.

There may have been comprehension problems with the old test that might have contributed to
an individual’s score. In the revised version, a glossary with all terms was available for the
participants to use at all times.

AQ test - This is a self-report questionnaire that measures the degree to which a normal adult of
normal IQ possesses traits that are linked to autism. It is scored from 0 to 50.

Results:
Participants in the 4 groups did not differ in the number of words in the glossary that they were
unsure of. No one checked more than 2 words.
The AS/HFA group performed significantly worse than the other groups in the eyes test.
In general, females scored better than males.
The AS/HFA group scored higher on the AQ than other groups.
34.4 - AQ test for HFA/AS
18.3 - AQ test for Student Comparison Group
21.9 - Eyes Test for AS/HFA
26.2 - Eyes Test adult comparison group
30.9 - Eyes test IQ Matched group
28 - Eyes Test for SCG

Conclusions:
The revised version of the eyes test could still discriminate between AS/HFA adults and controls
from different sections of society as it replicated previous findings. The new test appeared to
have overcome the initial problems.
Individuals with AS/HFA have an impaired cognition at identifying emotions and thus lack a
theory of mind.
The revised version of the Reading of the Mind in the Eyes test is more efficient at measuring
social intelligence than the original version and thus allows a more accurate method of
deciphering individual differences when assessing autistic traits.

Ethical Issues:
Protection - Tests may have caused stress in the patients especially those with AS/HFA, this
could have altered results.
Informed consent - This was obtained from all participants.
Confidentiality - Confidentiality was respected.

Strengths:
High validity - All patients were tested on the same scale. Experiment was standardised in a
way that every participant was tested in the same way. The use of standardised procedures in
the way that the photographs were presented ensured that the researchers could claim with
some certainty that the independent variable which is the characteristics of Autism was causing
the Dependent Variable which was the eyes task.
High reliability - Because the psychometric tests all have a fixed format with close-ended
questions, they can be taken again and again. This makes the study replicable, so other
researchers can confirm the consistency of the findings. As the study’s procedure and tasks are
replicable, the results are more likely to be reliable.
Controls - The researchers controlled variables like age, sex and IQ. This means that they could
be more sure that it was only the factor of Autism that was affecting the scores, as opposed to a
factor like age confounding the results.
Lab - All participants completed the task in a standardized way in an artificial setting. This
allowed many confounding variables to be controlled for as all participants saw the same set of
eyes for exactly the same amount of time. This improves internal validity and allows research to
be repeated to check the reliability of the results.
Quantitative Data - Quantitative data (i.e. scores on tests) was collected in this study. This data
is all numerical, which means it can be analysed easily while comparisons can also be made
(such as the comparison of group scores). It also means that the study’s results are objective,
which makes them free from personal bias.

Weaknesses:
Validity - Psychometric tests do not always test what they claim to test. For example, was the
Eyes test measuring theory of mind or was it just measuring the participants’ capability of
completing an Eyes test? How do we know for sure that theory of mind was tested alone?
However, in the original paper, the researchers do attempt to justify the validity of the test.
Ecological validity - Some of the Participants were tested at a lab at a university and this
strange situation may have had an affect on performance. The eye task can be questioned as it
is an unusual task which is much simpler than the demands of real life situations where stimuli
are not static.
Quantitative - The reasons for particular behavior were not explored.
Ecological validity and mundane realism - The stimuli were just static images of eyes. In real-life
social situations, we interpret emotions of real people who are not stuck in one expression; the
situation is much more different. This lowers the ecological validity of the study and creates an
issue of mundane realism with regards to the task. Also, the lab setting for some participants
also lacks ecological validity.
Experimental sample of this research (AS/HFA participants) is small, therefore when
generalizing the results from the research we must be aware that the group may not be
representative of all individuals who have been diagnosed with AS/HFA.

Issues and Debates:


Application - Plan support lessons or therapy sessions for students or people with AS/HFA
Reductionist - does not take into account the full picture of understanding emotions.

Improve:
Might choose to use videos of eyes rather than images to improve the validity of conclusions.

LANEY ET AL. (FALSE MEMORY)


People’s memories of events of their own lives can be incorrect. Researchers have been able to
implant false details

Research method: Laboratory Experiment


Experimental design: Independent Measures Design - participants were randomly allocated to
either the ‘love’ of control condition.
Human memory is subject to many types of distortion - even people’s memory of the events of
their own lives can be incorrect. Researchers have been able to implant false details for actual
events and even entirely new events. These false events have even included impossible events
like meeting Bugs Bunny at Disneyland (Braun, Ellis and Loftus, 2002) or undergoing a specific
medical procedure (Mazzoni and Memon, 2003).
In a study by Bernstein, Laney, Morris and Loftus (2005) subjects were given false feedback that
suggested that they had gotten sick as a child after eating either dill pickles or hard-boiled eggs.
A substantial minority of subjects believed this, but those that did have a false memory
implanted into them later experienced a self-reported unwillingness to eat pickles or eggs, and
their likings of these foods decreased. In addition, false memories had also been shown to deter
people from fattening foods by implanting false memories about being sick after eating such
foods.
All of these previous studies, however, implanted negative false memories (and saw negative
consequences as a result) or neutral false memories, but there had been no research into
whether positive false memories could be implanted into people and the repercussions of this.

EXPERIMENT 1
AIMS
● To investigate if giving false feedback would generate a false belief that a person likes
eating asparagus

SAMPLE
The subjects were 128 undergraduates at the University of California who received course
credit for their participation. 99 of the participants were female (77%) and 29 were male (23%).
Participants had a mean age of 20.8. They were randomly assigned to the ‘love’ group (n=63)
and the control group (n=65). The experiments were run in groups of up to 8.

Independent variable:
● Whether or not participants were in the love group

- Dependent variable:
● The scores on the questionnaires - did they change from Week 1 to Week 2?​

SESSION 1
​ ubjects were told that they would be completing a series of questionnaires for a study
S
investigating the relationship between ‘food preferences and personality’ - subjects were not told
about false memories to limit demand characteristics. Subjects completed the following
questionnaires
● Food History Inventory (FHI)
● Restaurant Questionnaire (RQ)
● 3 filler questionnaires (personality measure, social desirability scale and eating habits)
The filler questionnaires were interspersed with the 2 critical questionnaires and were designed
to distract away from the true aims of the study.
SESSION 2
Approximately one week later, subjects were given false feedback about their responses from
Session 1. They were told that a computer had generated a profile of their early childhood
experiences with certain foods. A portion of the profile was identical for all subjects: as a young
child, ‘you disliked spinach’, ‘you enjoyed fried foods’ and ‘you felt happy when a classmate
brought sweets to school’. The critical item - ‘you loved to eat cooked asparagus’ - was
embedded in the third position of the profile for subjects in the Love group only.
To ensure that participants processed the feedback, they all responded to brief questions
about the sweets at school, and the Love Group also responded to these questions about the
critical asparagus item. Subjects were asked:
● ‘Imagine the setting in which this experience might have happened. Where were you?
Who was with you?’
● ‘To was extent did this experience affect you adult personality?’ on a scale of 1 to 9, 1 =
not at all and 9 = very much.
Subjects then completed the following questionnaires:
● Food History Inventory (FHI)
● Restaurant Questionnaire (RQ)
● Food Preferences Questionnaire (FPQ)
● Food Costs Questionnaire (FCQ)
● Memory or Belief Questionnaire (MBQ)
When all questionnaires were completed, subjects were fully debriefed.
What determined whether participants were in the ‘love’ group or the control group?
Whether or not participants had the sentence ‘you love to eat cooked asparagus’ on their food
personality profile in the third position.

How was ‘memory’ and ‘belief’ operationalised?
Memory needed to have a specific time/place at which the event took place
Belief meant that the participant knew that the event happened, but could not recall any
specifics about it

The three criteria that needed to be met to be labelled a ‘believer’:


1. Gave a low rating on the FHI in Week 1 for asparagus - i.e. did not like asparagus prior
2. FHI score showed an increase in Week 2
3. Gave a positive ‘memory’ or ‘belief’ score on the MBQ
FEATURES OF THE IMPORTANT QUESTIONNAIRES
Memory or Belief Questionnaire
Subjects were asked to respond to three items from the FHI, including the critical item, by
indicating whether they had a specific memory of the event, had a belief that the event occurred
(but lacked specific memory) or were positive that the event had not occurred.

​Food Preferences Questionnaire


Subjects rated 62 separate food items (including the critical ‘asparagus’ item on a 1 to 8 scale,
​1 = definitely don’t like to eat and 8 = definitely like to eat.

Restaurant Questionnaire (RQ)


This addressed the subject's desire to eat a selection of 32 separate dishes, including the
critical item ‘sautéed asparagus spears’, in a restaurant setting. This questionnaire was
formatted to resemble a menu with 5 categories (appetisers, soups and salads, entrées, sides
and desserts). Subjects were asked to imagine that they were out for a special dinner, and then
decide how likely they were to order each item on the menu, regardless of price. Subjects
circles their ratings on a scale of 1 to 8, 1 = definitely no and 8 = definitely yes for each item.

​ ood History Inventory Questionnaire (FHI)


F
The FHI was a questionnaire in which participants were given statements of events to do with
food (e.g. ‘you loved cooked asparagus’). Participants were asked to rate each statement on a
scale of 1 to 8, 1 = definitely did not happen and 8 = definitely did happen, both before the age
of 10. It contained 24 items including the critical item ‘loved asparagus the first time you tried it’
in the 16th position.
Food Costs Questionnaire (FCQ)
On this questionnaire, subjects indicated the most that they would be willing to pay for each of
21 different food items at a grocery store (including the critical item ‘one pound of asparagus’)
by circling a price option. Several items that had also appeared on previous questionnaires (e.g.
tortilla chips, zucchini and rice), were included. For each item, subjects were given 8 different
choices (‘would never buy it’ and 7 price options). The price options were $1.90, $2.50, $3.20,
$3.80, $4.40, $5.00 and $5.70.

RESULTS/FINDINGS FOR EXPERIMENT 1:


Results/Findings (Functional number of participants was 97 because 31 were excluded due to
them scoring 5 or greater on the FHI for asparagus (i.e. already liked asparagus).
Believers vs. Non Believers: 48% of participants in the Love group met the criteria to be
labelled believers, 52% did not and were labelled nonbelievers.
FHI - mean ratings of participants in the Love Group (n=46) increased 2.6 points after
manipulation, compared with only an 0.2 increase for the control group (n=51). The ratings of
believers increased an average of 4.5, nonbelievers increased an average of just 0.9 points
RQ: there was only a slight (around 0.1 point) increase in willingness to order asparagus in the
Love group. This is because nonbelievers rating actually decreased from 5.2 to 5.0, while
believers ratings increased from 5.3 to 6.0. The control’s ratings decreased from 4.3 to 4.1.
FPQ: the love group (6.14) reported liking asparagus significantly more than the control (3.84).
FCQ: Believers were willing to pay significantly more for asparagus than controls. Also, over a
quarter (n=14) of controls said they would never buy asparagus, while no believers chose that.
EXPERIMENT 2
AIMS
● To examine the underlying mechanisms of the false memory effect by seeing if once the
false memory was implanted, participants would find the sight of asparagus more
appealing
● To check the reliability of the first results

SAMPLE
Consisted of 103 participants who were Undergraduates from the University of Washington, all
recruited via opportunity sampling. 62% of the participants were female (39 male, 64 female)
and their mean age was 19.9. They received course credit for their participation. The subjects
were randomly assigned to the two conditions: Love (n=58) and control (n=45)
PROCEDURE

Session 1

Participants were told that their data would be entered into a computer to generate a profile
based on their answers - no false aim was given. Subjects completed the following
questionnaires:

● Food History Inventory (FHI)


● Restaurant Questionnaire (RQ)
● Food Preferences Questionnaire (FPQ)
● 2 filler questionnaires (personality measure and social desirability)​
Session 2

Subjects were given the food personality profile in the same way as experiment 1. The critical
item was slightly different to before, now it read ‘you loved asparagus the first time you ate it’.
Only subjects in the Love Group completed the elaboration exercise:

They answered questions about their memory of the event, and if they did not have a memory,
to imagine what might have happened. Specifically, they were asked for their age at the time of
the event, the location, what they were doing, who was with them and how it made them feel.

All subjects were then asked ‘What is the most important childhood, food-related event in your
life that your food profile did not report?’

Subjects then viewed a slideshow of 20 slides, each photo displayed for 30 seconds at a time,
and were asked to complete 4 ratings of the photos. Subjects then completed the following
questionnaires:

● Food History Inventory (FHI)


● Restaurant Questionnaire (RQ)
● Food Preferences Questionnaire (FPQ)
Features of the slideshow

Subjects viewed a series of 20 slides and completed 4 ratings on each slide. Each slide was
displayed for 30 seconds each and they were photographs of common foods (e.g. pizza,
spinach, and the critical item asparagus). Participants rated each photograph according to how
appetising they found the food, how disgusting they found the food, whether they thought the
photo was taken by a novice, amateur or expert photographer and the artistic quality of the
image. The first, second and fourth questions were rated on a scale of 1 to 8, 1 = not at all and 8
= very much.

RESULTS/FINDINGS FOR EXPERIMENT 2:


Results: (functional number was 73 - 30 excluded - for the same reason as Exp. 1)
Believers vs. Non Believers: 53% of love groups were believers, 47% nonbelievers.
FHI: Love group ratings increased from 1.70 to 4.20 while control increased from 1.45 to 2.52.
Believers increased from 1.95 to 6.48, nonbelievers from 1.42 to 1.68.
FPQ: believers reported more desire to eat asparagus than the controls.
RQ: neither group’s ratings changed significantly
Photograph ratings: believers rated asparagus more appetising and less disgusting.
COMPARISONS OF EXPERIMENT 1 AND EXPERIMENT 2
Similarities
- both used undergraduates (opportunity sample)
- both used FHI, RQ, FCQ, FPQ and MBQ.
- both lab experiments and used independent measures design
Differences
- number of questionnaires used
- Ex. 2 did not use the eating habits filler questionnaire
- location (Ex. 1 at Uni. of California, Ex. 2 at Uni. of Washington)
- Slideshow added in Ex. 2
- False aim was used in Ex. 1, whereas no aim was used in Ex. 2
- Different aims
- Different ratio of genders (Ex.1 - 77% were female, Ex.2 - 62% were female)
CONCLUSIONS
- It’s possible to implant false beliefs and memories for a positive childhood experience, such as
liking or loving asparagus the first time one tried it
- False beliefs and memories are associated with positive attitudinal and behavioural
consequences, such as increased self-reported preference to asparagus, willingness to spend
more on it and increased willingness to eat asparagus in a restaurant
LIMITATIONS
You cannot assess how long the apparent consequences of false beliefs will last - especially
since subjects were debriefed after the study. You do not know for certain whether these effects
will translate to genuine eating behaviours as completing paper-and-pencil tasks may not
involve the same process as choosing what to eat.
EXAMPLE 10-MARK EVALUATION
One strength of the study by Laney et al. is the experimental design used. Laney et al. used a
repeated measures design which increases the internal validity as order effects are reduced.
Laney et al. also randomly allocated participants to each group, increasing internal validity
because the individual differences between participants are likely to balance out between the
two groups, and also the groups are not affected by researcher bias. If a repeated measures
design was used, order effects such as practice effects may have affected the results, and it
would not have worked because the participants would have figured out the aim of the study,
then leaving the results open to demand characteristics. Therefore, the independent measures
design was a strength because it eliminated order effects and any bias that could have been
caused by participants figuring out the real aim of the study.

On the other hand, one weakness of this study is that it used self-reports. Self-reports are often
subjective and open to bias which decreases the internal validity of the study. For example,
participants may have been aware of the aims of the experiment, and therefore their responses
would be biased by demand characteristics - they may alter their answers to fit what the
researchers are looking for. In addition, participants may be embarrassed of their usual eating
habits, and therefore may change their answers to appear more healthy. This means that social
desirability could affect the results, and therefore the implantation of the positive false memory
of liking asparagus is not the only variable that will affect the responses. Therefore, self-reports
lower internal validity because they are open to bias.
Another strength of the Laney et al. study however, is that it is high in reliability because of the
standardised measures used. One standardised measure used was that all of the
questionnaires were standardised as they used the same questions in the same order. For
example, in the Food History Inventory, the participants were asked to rate each statement on a
scale of 1 to 8, 1 = definitely did not happen and 8 = definitely did happen, both before the age
of 10. The scale used was kept the same for each of the 24 food items, and the critical item
(asparagus) was always kept in the 16th position. These standardised procedures increase the
reliability of Laney et al.’s study as the procedure was kept the same for all of the participants.
Laney et al. even tested the reliability of the study by replicating the study at a different
university, yielding similar results, and therefore showing that this study is highly reliable

However, another weakness of this study is that it lacked ecological validity and mundane
realism due to it being performed in a lab and that the participants did not eat anything
throughout the experiment - they only recorded their feelings towards asparagus via
questionnaires. Laney et al. did attempt to combat this through the use of the Restaurant
Questionnaire made to look like a menu, however, even with this questionnaire, mundane
realism and ecological validity is very low because the situation did not represent what
participants would normally experience (for example, going out and physically buying groceries).
This means that the findings may not generalise to real-life eating behaviours, as pen-and-paper
tasks do not accurately represent real life.

BIOLOGICAL APPROACH
The biological approach attempts to explain behaviour as the direct product of interactions
within the body.
Main assumptions:
1. Behaviour, cognition and emotions can be explained in terms of the working of the brain
and the effect of hormones. They are controlled by biological systems and processes
such as evolution, genes and hormones.
2. It examines thoughts, feelings and behaviours through a biological point of view.
3. Can be investigated by manipulating and measuring biological responses such as eye
movements, brain activity and pulse rate.
4. Similarities and differences between people can be explained and understood in terms of
biological factors and their interaction with other factors.
5. There is a direct correlation between brain activity and cognition
6. Behaviour can be inherited as it is determined by genetic information.

ASSUMPTIONS OF THE BIOLOGICAL APPROACH


- emotions, behaviour and cognition are controlled by biological systems and processes, such
as evolution, genes, the nervous system, hormones and brain structure
- emotions, behaviours and cognition can be investigated by manipulating and measuring
biological responses, such as eye movements, brain activity and pulse rate.

STRENGTHS WEAKNESSES
- method is less subjective and open to bias - can often be reductionist (see notes on
than self-reports debates)
- able to determine cause and effect - findings often show a correlation, but we
cannot always infer cause and effect
​ - physiological functioning is the same in all
cultures, meaning it is generalisable (see notes on types of data)
- qualitative data tends not to be used

All things are ultimately controlled by our biological aspects. Even if we were physically doing
nothing, our brain was active, and the biological process of chemical and electric signalling was
active between the nerve cells.
Various parts of our brain are designated to perform different functions and actions. Ex; a
hormone called ‘adrenalin’ would be released during an excitement of a race and would help
you run faster.

CANLI ET AL. (2000)


Event Related activation in the human amygdala associates with later memory for individual
emotional experiences.

Psychologists now employ brain study and research of people through brain scans and thus can
now draw objective conclusions about the relationship between behaviour and brain structure.

There are 2 types of basic medical scans - structural and functional scans.
● Structural scans - Take detailed pictures of the structure of the brain, the nervous system
and help in diagnosing physical injuries such as concussions and large scale intracranial
disease such as tumors.
● Functional scans - These are able to show different activity levels in different parts of the
brain. Functional Magnetic Resonance Imaging (fMRI) is a neuroimaging procedure
using MRI technology that measures brain activity and blood flow by detecting changes
that are associated with it.
fMRI is a non-invasive brain scanning technique. It uses radio waves coupled with a strong
magnetic field to create a very detailed image of the brain. The scanner traces the journey of
strong oxygenated blood around the brain. Areas of high activity receive more oxygenated
blood. This is called blood-oxygen-level-dependent (BOLD) signal.

The scanner maps all of the activity and produces a map of squares called voxels which
represent thousands of neurons. The pictures are colour coded to represent this intense activity.

The amygdala is an almond shaped set of neurons located deep in the brain’s medial temporal
lobe and has been shown to play a key role in the processing of emotions such as pleasure,
fear and anger. The amygdala is also responsible for determining where memories are stored in
the brain and which ones are kept.

Background:
Imaging studies have shown that amygdala activation correlates with emotional memory in the
intact brain.

In 1998, LaBar and Phelps suggested that emotional experiences are often better recalled than
non-emotional ones and emotional arousal appears to increase the likelihood of memory
consolidation during the storage stage of memory.

These first imaging studies have identified a correlation between amygdala activation and
declarative memory for emotional stimuli across different individuals.
This could be for 3 reasons:
1. Some individuals are more responsive to emotional experiences than others.
2. Some individuals, during a particular scanning session, may have been in some sort of
state that enhanced responsiveness to emotional experience.
3. The amygdala is responsive in a dynamic or phasic way to moment-to-moment individual
emotional experience, so that amygdala activation would reflect a flexible, rapidly
changing emotional response that ought to be observable within an individual.

Aim:
Canli wanted to show that emotive images will be remembered better than those that have little
emotional valence for an individual.

To investigate whether an area of the brain called the amygdala is sensitive to different levels of
emotions based on subjective emotional experiences. (Testing if the amygdala is sensitive to
varying degrees of emotional intensity)
To investigate whether the degree of emotional intensity affects the role of the amygdala in
aiding memory recall of stimuli classes as being ‘emotional’. (If the varying degrees of emotional
intensity affects the role in memory enhancement, if an emotional stimulus is involved.)

Procedure:
Research Method - Laboratory Experiment
Experimental Design - Repeated Measures Design (participants contributed to each condition)
IV - Intensity of emotional arousal to each of the 96 scenes.
DV - Level of activation of the amygdala measured by the fMRI, during the 1st stages of the
experiment when the participants were exposed to 96 scenes and the measure of memory of
the scenes, 3 weeks later during the recognition of the images.

Measurement of Data was via a 4 point likert scale that ranged from 0 - 3, 0 being not
emotionally intense at all and 3 being extremely emotionally intense.

Sample - 10 right handed healthy female volunteers. They were all females as they were more
likely to report intense emotional experiences and show more physiological reactivity in
concordance with valence judgments than men. All participants had given informed consent and
were aware of the nature of the experiment.

The procedure was divided into 2;


Behavioural Procedure and fMRI

Behavioural procedure:
During scanning, participants were shown 96 scenes through a mirror directed at the back of the
projection screen.
Each of these scenes had a normative rating for arousal and valance from the International
Affective Picture System stimuli set.
The scenes ranged from a rating:
1.17 (highly negative) to 5.44 (neutral) for valence
1.97 (tranquil) to 7.63 (highly arousing) for arousal.

The order of the scenes was randomised across participants and each scene was shown for a
period of 2.88 seconds. There was an interstimulus interval of 12.96 seconds during which the
participants viewed a fixation cross.
Participants were instructed to view the entire picture for the time it was shown and as soon as
the cross appeared, they were to rate the scene by pressing the relevant button with their right
hand. The rating scale for emotional arousal ranged from 0 = not emotionally intense at all, to 3
= extremely emotionally intense.

3 weeks after the scan, participants were tested in an unexpected recognition test, during which
they viewed all the previously seen scenes and 48 new ones (foils).
The foils matched the valence and arousal ratings of the original scenes.
The normative rating for valance ranged from:
1.31 (highly negative) to 5.78 (neutral)
The normative rating for arousal ranged from:
2.74 (tranquil) to 7.22 (highly arousing)
During the recognition test, participants were asked if they remembered the scene (seen it
before or not). If they did, they were asked whether they remembered it with certainty, coded as
‘remembered’ or with a less certain feeling of familiarity coded as ‘familiar’. If the answer was
no, it was deemed as ‘forgotten’.

fMRI:
Data was acquired in a 1.5 Tesla General Electric Signa MR imager, which was used to
measure BOLD contrast.
During the scanning, individuals operating the scanner were fully trained and competent staff,
following safety protocol as should be in a medical scan.
For structural images, 8 slices perpendicular to the axial plane of the hippocampus were
obtained.
The anterior slice was positioned 7 mm anterior to the amygdala.
Functional images were obtained using a 2 dimensional spin echo sequence with 2 interleaves.
A whole-head coil was used for all participants.
Head movement was minimized by a bite bar using which was formed with each participant’s
dental imprints.
During functional scanning, 11 frames were captured per trial. Each frame assigned was either
an activation image or baseline image.

Results:
Individual's experience of emotional intensity in the present study correlated well with normative
rating on emotional valence and arousal.
The average correlations coefficients between participants’ intensity ratings and normative
ratings were -0.66 and 0.68.
Participants’ ratings of emotional intensity reflected the valence and arousal ratings of the
scenes.
There was found to be an appropriate and significant correlation with higher ratings of
‘experienced’ emotional intensity. This provides evidence that amygdala activation is related to
the subjective sense of emotional intensity and the participants’ perceived arousal is associated
with the activation of the brain’s amygdala.
Participants’ ratings of emotional intensity were similarly distributed across the 4 categories with
0 being 29%, 1 being 22%, 2 being 24% and being 25%.
Memory recall was significantly better for those scans rated as emotionally intense. Scenes
rated 0-2 had similar distributions of % forgotten, familiar or remembered. However, those rated
3 were rated familiar or remembered with a higher frequency.
For scenes rated highly emotional, the degree of the left amygdala activation predicted whether
an individual stimulus would be forgotten, familiar or remembered in a later memory test. Little
activation to a scene that was rated as highly emotional was associated with being forgotten,
intermediate activation indicated that the scene was familiar and high activation was associated
with the scene being remembered.
When the left amygdala was analysed further, there was a significant correlation between
emotional intensity and the amygdala’s activation.
Conclusions:
This study found that amygdala activation is sensitive to individually experienced emotional
intensity of discrete visual stimuli. - suggests that the more emotionally intense an image would
be, it is naturally more likely to be remembered – this might help to explain why people tend to
remember emotionally intense experiences well enough.
Activity in the left amygdala during encoding is predictive of subsequent memory.
The degree to which the amygdala activation at encoding can predict subsequent memory is a
function of emotional intensity. - It was also observed and analysed that, when participants were
granted exposure to an event like this (causing the arousal), such as witnessing a crime, the
trace of memory would be more powerful. - It was also found that the amygdala is sensitive to
individuals who witnessed and experienced emotional intensity of visual stimuli with activity in
the left amygdala during encoding being predictive of subsequent memory.

Ethical Issues:
Participants were exposed to emotionally charged images which may have stressed them.
There is no record of participants being exposed to ‘happier’ imagery in order to alleviate any
negative mental state they were found in.
Potential harm from strong magnetic fields.

Strengths:
Lab Experiment - Lab Experiment - Participants were tested in a standardized environment and
were given the same items to rate in each condition. There was high control over extraneous /
external variables, improving validity. Can be tested for reliability.
Internal validity is high - as all variables such as time intervals for example, were
operationalized. This controls the influence to confounding variables that may distort results.
Because of these controls, researchers can be more confident that there are fewer confounding
variables affecting the DV. Researchers can be confident in establishing a casual relationship.
As all the participants were tested via fMRI machines and thus it was highly standardized.
The use of a scientific apparatus such as an fMRI machine produced highly objective,
quantitative data which is high in validity.
Collection of quantitative data - fMRI scan readings that measured DV. Related to the activation
of amygdala, enabled them to carry statistical analysis about level of activation and subsequent
memory.
Lack of bias - low demand characteristics. Increasing validity of the data collected -
sophisticated analysis.
Objective findings - fMRI scanners measure biological response in the brain - no need for
researchers to interpret any results.
Repeated Measures Design - reduced questioning.
Use of randomization - reduced order effects.

Weaknesses:
Insufficient knowledge of the physiological basis of the fMRI signal to interpret data confidently,
with respect to neutral activity and how this traces to specific behaviors.
Small sample size - All right handed females made this gynocentric - low generalizability
because not representative.
Introduced participant variables - distort outcomes of the research decreasing the validity.
Lab Experiment - Unnatural environment and stimuli - low ecological validity. Low mundane
realism.
We also need to take into account the difference in levels of emotional intensity experienced in a
lab setting and that in the real world. Some participants may already be emotionally aroused.
And thus, the baseline itself may be flawed.
The researchers also need to be considering the fact that there are certain biological, cerebral
anomalies that a mere fMRI scanner can never fully represent all behaviours exhibited by
different and specific parts of the brain.
Only quantitative data collected - did not explore the participants’ reason for choosing a
particular rating.

Issues and Debates:


Application - The findings may be useful for advertising agencies.
Emotional memory of negative experiences - useful in therapy that attempts to help people with
trauma or amygdala damage.

Nature VS Nurture:
Findings of this study support the nature side of this debate, however, as experiences are not
taken into account, nurture could have caused the results.
Study provides a nature-based explanation as it correlates the person’s own amygdala
functioning to his or her experience of emotions and subsequent memory. The amygdala
functions similarly for all humans, and has developed out of evolution.
Strengths - Explains the impact of natural human inheritance on human emotions.
Weaknesses - Fails to account for the differences people might show in their emotional
experiences as a result of the different environments in which they were brought up.

This study explains why different memories in a person’s lifetime may be remembered with
different intensity. However, it fails to account for the fact that some memories may be
remembered as strongly as others even if lesser in emotional impact.

DEMENT AND KLEITMAN (1957)


Our body follows 2 types of sleep; REM and nREM.
In REM sleep, our eyes move rapidly under the lids.
Aserinsky and Kleitman’s 1995 study; they observed periods of rapid, conjugate eye movements
during sleep and found a high incidence of dream recall in awakening participants during these
periods and a low incidence when awakened at other times.
REM sleep is known as paradoxical sleep. It resembles wakefulness as our eyes move, we
often experience vivid thoughts in the form of dreams and our brains are active.

Background:
Sleep and dreaming are clearly hard to investigate because the participant is necessarily asleep
and so cannot communicate with the researcher. Even when awake, only self-report data can
be obtained about dream content, which alone may be invalid as it is subjective.
The electro-encephalograph (EEG) monitors the electrical activity of the brain.
The electrooculogram (EOG) allowed the electrical recording of eye-movement patterns, their
presence or absence, their size and direction (horizontal or vertical).

The EEG recorded brain activity and eye movements that showed we have several stages
during the night where we alternate between REM and nREM.

The EEG detects and records tiny electrical charges associated with nerve and muscle activity.
In REM sleep, EEG is relatively low voltage / amplitude and high frequency.
In nREM sleep, EEG has either high voltage / amplitude and slow (low) waves or frequent ‘sleep
spindles’ which are short lived high voltage, high frequency waves.

Aim:
To investigate dreaming in an objective way by looking for the relationship between eye
movements in sleep and the dreamer’s recall.
Specific aims:
1. To test whether dream recall differs between nREM and REM sleep
2. To investigate whether there is a positive correlation between subjective estimates of
dream duration and the length of REM periods
3. To test whether eye-movement patterns are related to dream content.

Procedure:
Research Method - Lab experiment / observations / interviews
Experimental Design - Repeated Measures Design

3 approaches were used to test 3 specific aims.

Sample - 7 adult males and 2 adult females. 5 of them were studied intensively while the data
gathered from the other 4 was minimal with the intent of confirming results. It was all opportunity
sampling.
Participants studied in detail (5) spent between 6-17 nights with 50-77 awakenings.
The other 4 spent only 1 or 2 nights with a total of 4-10 awakenings.
Participants were identified by their initials to maintain confidentiality.
Participants reported to the lab a little before their usual bedtime.
They were instructed to eat normally but to abstain from alcoholic or caffeine-containing
beverages on the day of the experiment.
Participants were fitted with electrodes on their scalp and around their eyes.
Once they were in bed, in a quiet and dark room, the wires were gathered into a ‘ponytail’ to
allow freedom of movement.
The EEG ran continuously to monitor the participants sleep stages and to inform the
researchers when they should be woken up.
Participants were woken up by a doorbell that was loud enough to rouse them from any sleep
stage.
The doorbell rang at various times during the night and the participants indicated whether they
had been dreaming and described their dream into a voice recorder.
Analysis of the dream narrative - It was only considered a dream if there was a coherent, fairly
detailed description of the content. If it was vague, fragmentary impressions were not scored as
dreams.

AIM 1:
Natural Experiment in a laboratory setting
IV - REM and nREM stages
DV - Whether the participants reported a dream and if so, the details (descriptive account) had
to be reported.
Method - Participants were woken either from REM or nREM sleep, but were not told which
stage they were in. They confirmed whether they were having a dream, and if so, reported the
content into a recorder.

Aim 2:
Experimental Analysis
IV - 5 or 15 minutes
DV - Participant’s choice
Participants were woken up following 5 or 15 minutes of REM sleep.
They were asked if they thought they had been dreaming for 5 or 15 minutes.
Their dream narrative was recorded too, and the number of words were counted.
Correlational analysis - 2 variables were the participant’s time estimation and the number of
words in their dream narrative.

Aim 3:
Participants were woken up after exhibiting a single eye-movement pattern for longer than 1
minute.
This was measured using the electrodes and the EOG
IV - Eye-movement pattern type (mainly horizontal, mainly vertical, vertical and horizontal, very
little or no movement). This could not be manipulated.
They were then asked to report their dream.
DV - Report of dream content.
In this, participants were investigated upon the basis if the patterns in dreams, directions
(vertical or horizontal) represented visual experience of the content of the dream or if they were
just randomized, investigating the activation of CNS (Central Nervous System) during sleep.

Results:
Study 1
Participants described dreams often when woken in REM but rarely from nREM sleep. When
awakened in nREM, they tended to describe feelings but these did not relate to specific dream
content.
The waking pattern did not affect dream recall.
Specifically, participant WD was no less accurate despite being misled and DN was no more
accurate even though he might have guessed the pattern of awakenings. This showed that
practice effect was not a factor affecting the results of the experiment.
When woken from nREM sleep, participants returned to nREM sleep and the next REM stage
was not delayed.
When woken from REM sleep, participants generally did not dream until the next REM phase.
Recall of dreams during nREM sleep was much more vague than when waking up from an REM
stage as in REM visual, vivid and clear dream content was reported.
79.6% (152/191) of awakenings produced dream recall in REM, and 93% (149/160) of
awakenings from nREM did not produce dream recall.
End of nREM period - 17 nREM awakenings soon after the end of a REM stage (within 8
minutes), 5 dreams were recalled (29% of occasions). However, from 132 awakenings following
periods longer than 8 minutes after a REM stage, only 6 dreams were recalled.
Of 39 REM awakenings, when dreams were not reported, 19 occurred in the first 2 hours of
sleep, 11 from the second 2 hours and 5from the third 2 hours and 4 from the last 2 hours.

Study 2
Participants’ responses were 88% accurate for 5 minute REM duration and 78% accurate for 15
minute REM duration.
Although most participants were highly accurate (with 0-3 incorrect responses), DN was not. He
often found that he could only remember the end of his dream, so it seemed shorter than it
actually was. Hence, DN chose 5 minutes more frequently and consistently (raising questions to
the validity). But this did prove that DN was accurate on estimates of 5 minutes instead of 15.
There was a significant positive correlation between REM duration and the number of words in
dream narrative. The r values varied between 0.40 and 0.71 for different participants.
Dream narratives for very long durations were not much longer than those for 15 minutes. The
participants did report that they felt as though they had been dreaming for a long time,
suggesting that they couldn't recall the early part of the dream.

Study 3
Participants couldn’t recall the dream with such high precision.
3 of the 9 participants showed periods of vertical movements, and each was allied to a narrative
about vertical movement.
One of them dreamed of standing at the bottom of a tall cliff operating a hoist. The participant
reported looking up at the climbers at various levels and down at the hoist machinery.
Another dreamed of climbing up a series of ladders and looking up and down as he climbed.
In the third one, the dreamer was throwing basketballs in a net, first shooting them and looking
up at the net, then looking down to pick another ball off the floor.

Only one instance of pure horizontal movements was seen. In this the participant was watching
people throwing tomatoes at each other.
On 10 occasions, participants were awakened after little or no eye movement. Here, they
reported watching something in the distance or just staring fixedly at some object.
In 2 of these awakenings, participants’ pattern was just a minute of inactivity followed by a large
eye movement to the left just a second or two before awakening.
In one, the participant was driving a car and staring at the road ahead. He approached an
intersection and was startled by the sudden appearance of a car speeding at him from the left
as the bell rang.
The other participant was also driving a car and staring at the road ahead. Just before
awakening he saw a man on the left side of the road and hailed him as he drove past.

21 awakenings followed mixed eye movements. These involved the participants looking at
things close to them, objects or people. There was no recall of distant or vertical activity.

Conclusions:
Study 1 - Dreams probably occur only during REM sleep, which occurs regularly throughout the
night. Dreams reported when woken up from nREM sleep are usually from previous REM
episodes.
Study 2 - The finding that the length of a REM period and its estimation by the participants are
very similar shows that dreams are not instantaneous events but rather are experienced in ‘real
time’.
Study 3 - Eye movements during REM sleep correspond to where and at what the dreamer is
looking in the dream. This suggests that eye movements aren't random events caused by the
activation of the CNS but are directly related to dream imagery.

Further Improvement:
If previous recordings were not continuous/consistent, they may have failed to catch instances
of dream sleep in exclusively every participant.
The equipment might have missed out/neglected small movements that might be pivotal to the
conclusivity of the results.
Greater sample size for generalization of results. Present study investigated the sleep and
dreams of only 5 adults. Sample was very restricted in terms of gender and age.
Study required participants to verbally narrate their dreams. Such a task may be challenging for
participants who are hesitant in speaking or are not very fluent. A task less reliant on verbal
ability of the participants is more desirable. Fragmented dream reports were discarded from
consideration - which may have been stemmed from the participants’ inability to narrate a
comprehensive account of their dreams.
Further research could look into the impact of more natural influences on sleep and dreams
such as caffeine, drugs, noisy environment, etc. which give a more realistic approach and
understanding.

Ethical Issues:
Confidentiality - Was maintained as they were identified using their initials so that dream content
cannot be related to any of them.
Protection may not have been fully provided as participants were sleeping in unnatural
situations so it may have affected their sleep or ability to concentrate the next day.
Strengths:
Lab Experiment - It was possible to control extraneous variables. If some participants had
woken more slowly they would have forgotten more of their dream. This was avoided by using a
loud doorbell that woke them immediately.
-Participants came to the lab a little before regular bedtime.
-Could not have caffeine or alcohol on days of experiment.
Dement and Kleitman conducted the additional experiment comparing 5 and 15 minute REM
sleep periods.
Control of Demand Characteristics - Participants were not told about their EEG patterns (or
whether they were in REM or nREM) or whether their eyes were moving. If they thought they
were supposed to remember more detailed dreams in REM sleep, they might have tried harder
to do so. Raised Validity.
Operationalisation - Dream clearly defined as recollection that included content, not just
remembering dreaming in general. Raises Validity as D&K could be sure they were recording
actual dreams. Question 2 task was limited to 5-15 minutes. This helped raise validity as it
reduced participant variables such as differences in ability to recall dreams.
EEG - Objective way to measure sleep and dreaming as it is a biological measure.
Reliable as it is unaffected by experimenter's personal views.
The consistent placing of the electrodes ensured recordings taken from each participant would
provide the same information.
Reliability of the findings is supported by the similarity of the results to those of the previous
studies.
Produced quantitative data (brain waves, eye movement patterns, length of REM sleep).
Easy to compare and analyze.
Able to measure REM sleep duration accurately, ensuring that comparisons to dream duration
estimates were valid.
Self Reports - Description of dreams provided rich qualitative data, which lacked from EEG.
Helped provide insights into the reasons for the eye movement detected.

Could help raise validity. Helps better understand participants.


Confidentiality - Researchers only use participant's initials when they published the data.This
way, results could not be linked to participants.

Weaknesses:
Generalizability - 7 men and 2 women with only 5 studied in depth. Very small sample size.
More men than women. Also no info about participants. Ethnocentrism. Different cultural ideas
of sleep. Not generalizable because study was done over a short period of time.
-Ps stressed/overworked/jet-lagged, results would not be generalizable.

Lowers external validity and reliability.


Low Ecological Validity - Environment of experiment not like real life and could have affected
sleeping behavior. Sleeping with wires on. Sleeping being watched. Sleeping in a strange room.
If Participants normally drank alcohol or had caffeine, they could have slept or dreamt differently
than usual. Method of waking up Ps (using a doorbell) could have influenced P's ability to recall
dreams.

No mention of informed consent or right to withdraw.


Self Reports - Researchers are unsure if dream content reported was accurate.
Some participants may have "filled the gaps" to make dreams seem coherent, rather than
reporting exactly what they remembered. Could reduce validity.
EEG - Gives basic readings of brain activity. No detail, only general brain activity. Don't know Ps
thoughts or feelings, only brain activity. No insight. Reductionist. Based only on biological
mechanisms.
Differences in narrative length, based on how expressive the participants were, making these
reports more subjective.

Issues and Debates:


Application - The findings of this study could be used for treating or checking for sleep
disorders. Sparked new waves into sleep research.

Nature VS Nurture:
REM and nREM sleep are universal and hence, due to nature. However, the individual
differences could have been due to environmental factors suggesting that they can affect
sleeping patterns too.

SCHACHTER AND SINGER

Research Method: laboratory experiment


Experimental Design: Independent groups design as each participant took part in only one of
the seven experimental groups
Different theories exist regarding how and why people experience emotion. These include
evolutionary theories, the James-Lange theory, the Cannon-Bard theory and Schachter and
Singer’s Two-Factor Theory. The James-Lange theory is that people experience emotion
because they perceive their body's physiological responses to external events. According to this
theory, people don’t cry because they feel sad, rather people feel sad because they are crying.
The Cannon-Bard Theory states that the experience of emotion happens at the same time that
the physiological arousal happens. Neither one causes the other. Therefore, the brain gets a
signal that causes the experience of the emotion at the same time that the autonomic nervous
system gets a signal that causes physiological arousal.
Schachter and Singer then introduced the Two-Factor Theory of emotion, which combined
the two elements of physiological arousal and cognition. This experiment aimed to test the
Two-Factor theory.

AIM
To test the Two-Factor Theory of Emotion - that emotional experience is the result of both the
physiological arousal of a person and the cognitive interpretation of a situation. ​
HYPOTHESES
1. If a person does not have an explanation to their state of arousal (are ignorant) then they will
label their feelings based on the cognitions available to them.
2. If a person has an explanation for their state of arousal, they won’t take into account the
cognitions available to them to label their emotions.
3. If a person experiences a previously encountered emotional situation, they will only react
or feel emotional if they are physiologically aroused.
WHAT IS THE TWO-FACTOR THEORY OF EMOTIONS?
The Two-Factor Theory of Emotions states that for an emotion to be experienced, a
physiological state of arousal is necessary AND situational factors will then determine how we
interpret this arousal. In other words, an event causes physiological arousal first, and then we
must identify a reason for this by using the cognitions available to us through our surroundings.
The strength of the physiological arousal will determine the strength of emotion experiences, but
our surroundings determine the type of emotion experienced. Previous theories of emotion did
not include cognitive labelling.

SAMPLE
The study consisted of 185 male college students who were taking a course of introductory
psychology at the University of Minnesota. 1 of the subjects refused the injection and withdrew,
leaving the effective sample size to be 184. 90% of the participants were part of a voluntary pool
in which they receive 2 extra points on their final exam for every hour that they serve as
experimental subjects.Therefore, they received course credit for their participation. Health
records of the participants were looked at to ensure that the Epinephrine injection would not
cause any harmful effects.
VARIABLES
Independent variables:
- The emotional condition (anger or euphoria)
- The information given to participants about the injection (informed, misinformed, ignorant)
Dependent variable:
- The reaction that participants gave to the actions of the stooge
-This was recorded via observation through a one-way mirror and via the results of the
self-report
WHAT WERE THE GROUPS?
Euphoria: Anger:
- EpiInf (Epinephrine Informed) - EpiInf (Epinephrine Informed)
- EpiMis (Epinephrine Misinformed) - EpiIgn (Epinephrine Ignorant)
- EpiIgn (Epinephrine Ignorant) ​- Placebo
- Placebo

WHY WAS EPIMIS (EPINEPHRINE MISINFORMED) A CONTROL GROUP?


EpiMis was a control group because it was believed that telling a participant of possible
symptoms could make them more introspective in the EpiInf group and make them slightly
troubled by their physical state, which may have affected their indicated emotion. Since EpiMis
would also cause participants to become more introspective, the researchers could then check
whether there is a difference between EpiMis and EpiInf, and if so, could conclude that the
change in the dependent variable was not caused by introspectiveness etc.
EpiMis was only a condition in the Euphoria condition, it was not used in the Anger condition.

PROCEDURE:
Instructions given to participants about the injections:
“In this experiment we would like to make various tests of your vision. We are particularly
interested in how certain vitamin compounds and vitamin supplements affect
the visual skills. In particular, we want to find out how the vitamin compound called
'Suproxin' affects your vision. What we would like to do, then, if we can get your permission, is
to give you a small injection of Suproxin. The injection itself is mild and harmless; however,
since some people do object to being injected we don't want to talk you into anything. Would
you mind receiving a Suproxin injection?”
Depending on the condition, participants were then injected with either Epinephrine or a
placebo. What they were told by the experimenter also depended on which condition
participants were in.
● Epinephrine informed were told:

“I should also tell you that some of our subjects have experienced side effects from the
Suproxin. These side effects are transitory, that is, they will only last for about 15 or 20 minutes.
What will probably happen is that your hand will start to shake, your heart will start to pound,
and your face may get warm and flushed. Again these are side effects lasting about I5 or 20
minutes.”
● Epinephrine misinformed were told:

“I should also tell you that some of our subjects have experienced side effects from the
Suproxin. These side effects are transitory, that is, they will only last for about 15 or 20 minutes.
What will probably happen is that your feet will feel numb, you will have an itching sensation
over parts of your body, and you may get a slight headache.”
● Epinephrine ignorant:

In this condition, when the subject agreed to the injection, the experimenter said nothing more
relevant to side effects and simply left the room. While the physician was giving the injection told
the subject that the injection was “mild and harmless and would have no side effects”.
● Placebo:

Participants receiving the placebo were given the same treatment as those in the EpiIgn
condition.

Emotional conditions
(The full procedure for the emotional conditions is given in the original study, below is a
condensed version of the key points that you should know). Participants were told by the
experimenter that they had to wait for around 20 minutes for the ‘Suproxin’ to take effect. In both
conditions, the subject is then left in a room alone with a stooge, trained to act either
euphorically or angrily, where the participants can be observed through a one-way mirror without
their knowledge.

● Euphoria

In the Euphoria condition, before leaving the stooge and the participant alone in the room, the
experimenter also apologised for ‘the condition of the room’ as it was made to be messy. As
soon as the experimenter left, the stooge introduced himself and made some icebreaker
comments. He then started his euphoric routine:
1. He doodles a fish for 30 seconds
2. He crumples up the paper and tries to throw it into the wastebasket. He purposely
misses the first time and turns it into a game of ‘basketball’. If the participant does not
join in, the stooge will encourage them by saying ‘Here, you try it.’
3. The stooge then starts to make paper aeroplanes, flying them around the room and
eventually throwing the plane at the participant.
4. The stooge then takes paper from the paper aeroplane and shoots it using a slingshot
made from a rubber band.
5. He then builds a tower of folders and begins to shoot at the tower and cheers when the
tower falls.
6. He then goes to pick the folders up, and notices a hula hoop behind a blackboard and
starts to play with it.
7. Finally, he sits down with his feet on the table and the experimenter comes back in the
room.

All of the injection conditions were run for Euphoria and the stooge was unaware of which
condition the participant was in.
● Anger

In the Anger condition, the experimenter asked the stooge and the participant to fill out a
questionnaire while the ‘Suproxin’ was taking effect. The questionnaire is 5 pages long, contains
36 questions, and starts off neutral, but the later questions become more and more insulting,
ending in "With how many men (other than your father) has your mother had extramarital
relationships?". The stooge is trained to complete the questionnaire at the same speed as the
participant, and his routine is as follows:
1. Before looking at the questionnaire, the stooge says ‘I think it's unfair for them to give
you shots’.
2. The stooge then flicks through the questionnaire, saying “Boy, this is a long one."
3. He then makes several angry remarks at some of the questions. At number 25 which
asks whether they ‘bathe or wash regularly’ the stooge refuses to fill it in and angrily
crosses it out.
4. At question number 28 ("How many times each week do you have sexual intercourse?")
the stooge yells “To hell with it! I don’t have to tell them all this!” He then sits sullenly for
a few moments, rips up the questionnaire into pieces and storms out of the room.

EpiMis did not take part in this condition as it was a control.


Data collection - How emotional state was measured:
Observation
A controlled observation through a one-way mirror was used to assess the subject’s behaviour.
This is considered ‘semi-private’ behaviour because participants were in a room with someone,
but did not know they were being observed by the researchers. For both conditions, 2 observers
were used to test for inter-rater reliability
For the Euphoria condition, the activities were scored as follows:

● 5 - hula hooping
● 4 - shooting with slingshot
● 3 - paper airplanes
● 2 - paper basketballs
​*These scores were multiplied by
● 1 - doodling
● 0 - does nothing the time spent doing the activity

The observers agreed 88% of the time.

For the Anger condition, the behaviour categories were as follows:


Category 1 - Agrees with comments made by the stooge (scored +2)
● Category 2 - Disagrees with comments made by the stooge (scored -2)
● Category 3 - Neutral response to the stooge’s comments (scored 0)
● Category 4 - Initiates agreement or disagreement (scores +2 or -2)
● Category 5 - Makes no response (scores 0)
● Category 6 - Ignores the stooge (scores -1)
The two observers agreed completely 71% of the time.
Questionnaire
The second type of measurement was the questionnaire taken after participants had been with
the stooge which was a self-report measure. This is considered ‘public’ indications of mood
because the experimenter would be able to read their responses. They are told that they need
to report what mood they are in as this supposedly affects their vision as well as the Suproxin,
and are given a questionnaire to fill out. Some questions were as follows:
● Critical Questions:

“How irritated, angry or annoyed would you say you feel at present?”
Responses were given on a 5-point scale from 0-4, 0 = I don’t feel at all irritated or angry to 4 = I
feel extremely irritated and angry. (higher scores meant more angry)
“How good or happy would you say you feel at present?
Responses were given on a 5-point scale from 0-4, 0 = I don’t feel at all happy or good to 4 = I
feel extremely happy and good. (higher scores meant more happy)
● Other questions were also put in to measure the physiological effect of the Epinephrine
on the participant:

e.g. “Did you feel any tremor (involuntary shaking of the hands, arms or legs)?”
Responses were on a 4-point scale from 0-3, 0 = not at all to 3 = an intense amount
After the questionnaire was completed, the participant’s pulses were taken and participants
were debriefed about the true aims of the experiment.

FEATURES OF THE INJECTIONS GIVEN


Depending on the condition, participants were either given an injection of Epinephrine or a
placebo:
● Epinephrine

Received ½ cm³ of of 1:1000 of a saline solution of epinephrine bitartrate. Some of the effects of
the epinephrine injection were increased heart rate, slightly accelerated breathing and increased
blood sugar levels. These effects would occur within 3-5 minutes of the injection, and would
normally subside after 15-20 minutes.
*Epinephrine is adrenaline
● Placebo

Participants in the placebo condition received saline solution which has no physiological side
effects
RESULTS AND FINDINGS
Before we get into the results, it is important to note that 16 participant's data was disregarded.
5 participants experienced no physiological effects as a result of the injection. In addition, 11
participants were very suspicious about a part of the experiment which may have led them to
act in a different way or perceive their emotions differently, so their results were also
disregarded. The results are as follows:
Those who received the Epinephrine injections showed more sympathetic arousal and had
higher scores on the questions in the questionnaire about tremors and palpitations. Pulse rate in
the epinephrine condition also increased, while the heart rate of those receiving the placebo
decreased.

Euphoria
EpiMis was the most happy with their self-report happiness score being 1.90, whereas EpiInf
was the least happy with their self-report score being 0.98. EpiMis were also the most active
and engaged with the stooge with an activity index score of 22.56, compared to EpiInf which
was the least active group with an activity index score of 12.72.

Anger
EpiIgn were the least happy with a self-report score of 1.39 (most angry) and EpiInf were the
most happy (least angry) with a score of 1.91. It can also be seen that EpiIgn were the most
angry as they had an anger score of +2.28, whereas EpiInf were the least angry with an anger
score of -0.18.

Problem with self-report data from the angry condition - The participants were students who
would gain course credit for their participation, so they were less willing to record feeling angry
towards the experiment as they knew the researchers would read their responses and they
would risk losing their points.
So to summarise, the emotional state of the stooge had little effect on EpiInf, but transferred the
most to EpiIgn (or EpiMis in the Euphoria condition)
In the Anger condition, the scale does not change, and so is the same used for the Euphoria
condition. This means that a higher score in the Anger conditions means that the participants
are MORE HAPPY, but LESS ANGRY
CONCLUSIONS
They found that the results supported the Two-Factor Theory of Emotions - that situational
factors will then determine how we interpret physiological arousal. All of the 3 original
hypotheses were also arguably supported by the findings.
EXAMPLE 10 MARK EVALUATION
One strength of the study by Schachter and Singer is that it used standardised procedures
which increases the internal validity of the study. One standardised procedure used was that the
stooge in both conditions had to follow a script in order for the interactions to be the same for all
participants. For example, in the angry condition, the stooge would flick through the
questionnaire and exclaim ‘Boy, this is a long one’ before even filling it out. The stooge would
also have to complete the questionnaire at the same pace of the participant so that the
participant could relate with his remarks. The questions on the questionnaire were also
standardised, and in the anger condition, they increased in insultingness as the questionnaire
went on. For example, the final question was ‘how many men (other than your father) has your
mother had extramarital relationships?’ with the options being 4 and under, 5-9, 10 and over.
These standardised procedures increased internal validity as the content of the questions and
the way that the stooge acted was the same for all participants in that condition.
On the other hand, one weakness of this study is the use of a self-report. Participants knew
that their answers would be seen by the researcher, and therefore it was reported that some
were reluctant to record the true extent of their emotions, especially in the anger condition,
because they did not want to lose their course credit. This is an example of demand
characteristics because participants wanted to please the researchers with the answers that
they thought the aim of the experiment was in order to gain their course credit and not offend
them by reporting that they were angry or frustrated. This lowers the internal validity of the study
because the results may not be a true representation of the emotional arousal of the
participants.
However, another strength of this study is that it used an independent measures design,
meaning that order effects would not affect the participant’s behaviour. Schachter and Singer
used an independent groups design, as it would have been impossible for participants to
complete all conditions. Participants were randomly assigned to one of 7 conditions (for
example, EpiInf in the Euphoric condition), and did not perform in any other condition. This is a
strength because participants would not have become bored of the experimental task, meaning
that they may not experience such intense emotions as they would have if they were
experiencing the condition for the first time. Therefore, internal validity is increased because
participants all have a baseline of emotions because none of them had experienced any of the
conditions beforehand.
On the other hand, another weakness of this study is that it is low in ethics. Participants were
deceived as they were told that the aim of the experiment was to investigate the effects of a
vitamin compound of Suproxin on vision, and participants were told that they were being
injected with Suproxin. In fact they were actually being injected with Epinephrine (which they
had not consented to), and the real aim was to test the two-factor theory of emotion (how they
will label the reason behind their state of physiological arousal). In addition to this deception,
participants in the EpiMis conditions were also deceived about the effects of the drug that they
were being injected with. These deceptions are highly unethical and since participants were
deceived, they could not give informed consent. Therefore, this study has very low ethics.

LEARNING APPROACH
The learning approach focuses on observable behaviours rather than mental concepts, and
explains behaviour in terms of learning, for example, through social learning theory and
classical/ operant conditioning.
ASSUMPTIONS OF THE LEARNING APPROACH
- all behaviour is learned (nurture) and nothing is inherited (nature)
- the subject matter of psychology should have standardised procedures, with an emphasis on
the study of observable behaviours that can be measured objectively, rather than a focus on the
mind or consciousness.

STRENGTHS WEAKNESSES
- originally focused on observable data, - the learning approach now considers
so it is less subjective as it is not as open cognitive factors, which may be more
to interpretation inference-based
- can explain how many mental illnesses - the idea that behaviour is just based on
(such as phobias) can be acquired and learning is reductionist, as behaviours are
treated influenced by many other factors

BANDURA ET AL.

AIM:
Investigate whether children would learn aggressive behavior by observing a model and would
reproduce this behavior in the absence of the model and whether the sex of the model was
important.

4 hypotheses:
1. Observed aggressive behavior will be imitated so children seeing aggressive models will be
more aggressive than those seeing a non-aggressive model.
2. Observed non-aggressive behavior will be imitated so children seeing non-aggressive models
will be less aggressive than those seeing no model.
3. Children are more likely to copy a same-sex model.
4. Boys will be more likely to copy aggression than girls.

BACKGROUND:
Children copy adults. The immediate social setting makes the child imitate what he or she is
watching. This is 'facilitation of behavior’. Observation of a behavior could lead the child to
acquire a new response that he or she could reproduce independently. The new behavior
should generalize to new settings and so would be produced in the absence of an adult model.

If this imitative learning occurred, it would arise in response to observing either aggressive or
non aggressive behavior.

Imitative social learning: The learning of a new behavior which is observed in a role model and
imitated later by the absence of the model.

Children are also differentially rewarded for their copying.

Boys - rewarded for their copying of sex-appropriate behavior. Girls discouraged from
sex-inappropriate behavior.

Bandura suggested this would lead to 2 kinds of differences.


1. Boys and girls would be more likely to imiotate same sex models
2. They should differ in the readiness with which they imitate aggression - with boys doing
so more readily as this is seen as a more masculine type of behaviour.

Social Learning Theory: Social behaviour is learned primarily by observing and imitating
others. It is “learning by proxy”.
The four components to it are:
Attention: Observers must pay attention to behaviour of the model. The model must have some
feature that attracts the observer.
Retention: Observers must store the behaviour in their long-term memory so that the information
can be used again (when the observer wants to imitate the behaviour).
Reproduction: Observers must feel capable of imitating the retained, observed behaviour.
Motivation: If observers experience vicarious reinforcement, they are more likely to imitate the
behaviour. This is when the model has been rewarded for performing the observed behaviour.
Vicarious punishment can also happen: the role model is punished for the observed behaviour,
so is less likely to imitate it.

METHOD:
Research Method and Design - Laboratory experiment; environment was not the normal
condition of that where the children played and the situation was controlled.
It was an independent measures design as different children were used in each of the levels of
the IV’s, although the children were matched for aggression in threes as there were 3 IV’s.

The IV’s were:


Model Type - whether the child saw an aggressive model, non-aggressive model or no model
Model Gender - same gender as child (boys watching a male model and girls watching a
female model) or different gender (boys watching a female model and girls watching a male
model).
Learner Gender - whether the child was a boy or girl.

The participants were divided into threes - all with very similar initial aggression levels. One of
each of these individuals was placed into each of the 3 different conditions of model type.

The DV was the learning the child displayed. This was measured through a controlled
observation of the children and measures of aggressive behaviour were recorded.

There were a total of eight experimental groups. Out of these participants, 24 were
assigned to a control group that would not be exposed to adult models. The rest of the
children were then divided into two groups of 24 participants each. One of the
experimental groups would be exposed to aggressive models, while the other 24
children would be exposed to non-aggressive models.

These groups were divided again into groups of boys and girls. Each of these
subgroups was then divided so that half of the participants would be exposed to a
same-sex adult model and the other half would be exposed to an opposite-sex adult
model.

Before conducting the experiment, Bandura also assessed the children's existing levels
of aggression. Groups were then matched equally so that they had average levels of
aggression.

SAMPLE:
72 children aged 3-6 years
36 boys and 36 girls obtained from Stanford University Nursery School.
2 adults served as role models, 1 male and 1 female.
1 female experimenter conducted the study for all 72 participants.
Technique - opportunity sampling
Materials

Aggressive toys:

■ 5ft high Bobo doll


■ a mallet
■ dart guns

Non-aggressive toys:

■ a tea set
■ toy cars
■ dolls.

PROCEDURE:
Prior to the experimental part of the study, children were observed in their nursery school by the
experimenter and a teacher who knew them well. They were rated on four five point-scales
measuring physical aggression, verbal aggression, aggression to inanimate objects and
aggression to inhibition (anxiety). They were then assigned to 3 groups, ensuring that the
aggression levels of the children in each group were matched. Of the 51 children rated by both
observers (the rest were rated by only one observer), similar ratings were generally produced.
Their ratings were compared as a measure of ‘inter-rater reliability’, which showed a high
correlation between the observers of r=0.89.

Inter-rater reliability - the extent to which 2 researchers rate the same activity that they have
observed in the same way. This is judged using a correlation (an r value) between the 2 ratings,
which will be high (close to 1) if they are reliable.

12 boys and 12 girls were allocated to control groups who saw no model. The remaining
children were divided equally by sex between aggresive and non-aggresive model groups, and
within those between the same and opposite sex models.

The experimental procedure started with all participants being deliberately mildly annoyed. This
was done for 2 reasons:
1. Because watching aggression may reduce the production of aggression by the observer
(even if it has been learned) and it was necessary to see evidence of learning.
2. To ensure that even the non-aggressive condition and control participants would be likely
to express aggression, so that any reduction in that tendency could be measured.

Each individual child was shown to a room with attractive toys such as a fire engine and a baby
crib but after about 2 minutes of play, they were told that these were the best toys and were to
be kept for other children. The experimenter and child then moved to the observation room,
where the experimenter showed the child to a table and chair in their ‘play area’ - shown how to
make potato prints and sticker pictures; activities that were previously identified as interesting
for children. The opposite corner of the room contained a table and chair, a Tinkertoy set, a
mallet and a 5 foot Bobo doll. Bobo doll - inflatable clown like doll which bounced back when hit.
The experimenter remained in the room so that the child would not refuse to be alone or try to
leave early but they appeared to be working quietly at their desk.

The 3 groups were then treated differently.


Non-aggressive condition - model assembled the Tinkertoys (wooden building kit) for 10
minutes.
Aggressive condition - lasted only 1 minute after which the model attached the bobo doll. The
doll was laid on its side, sat on and punched in the nose, picked up and hit on the head with the
mallet, tossed up in the air and kicked. This sequence was performed 3 times over 9 minutes
accompanied by aggressive comments such as ‘Kick him’ and 2 non-aggressive comments
such as ‘he sure is a tough fella’.

Of children in the 2 model groups - half saw a same-sex model and the others saw a model of
the opposite sex.
A control group did not see any model - saw no aggression.

A test of the child’s aggression then followed in which the child observed for 20 minutes using a
one-way mirror.
For the aggressive model group - test of delayed imitation. This experimental room contained a
3 foot bobo doll, a mallet and peg board, 2 dart guns and a tether ball with a face painted on it
which hung from the ceiling. It also contained some non-aggressive toys like a tea set, crayons
and colouring paper, a ball, 2 dolls, 3 bears, cars and trucks and plastic farm animals. These
toys were always presented in the same order.

The children’s behaviour was observed in 5 second intervals (240 response units per child),
there were 3 ‘response measures’ of the children’s imitation.
Imitation of physical aggression - striking the bobo doll with the mallet, sitting on the doll and
punching it in the nose, kicking the doll and tossing it in the air.
Imitation of verbal aggression - repetition of phrases ‘sock him’, ‘hit him down’, ‘kick him’, ‘throw
him in the air’ or ‘pow’.
Imitative non-aggressive verbal responses - repetition of ‘he keeps coming back for more’ or ‘he
sure is a tough fella’.

Partially imitative aggression was scored if the child imitated these behaviours incompletely. The
2 behaviours here were:
Mallet aggression - striking objects other than the bobo doll aggressively with mallet.
Sits on bobo doll - laying the bobo doll on its side and sitting on it, without attacking it.

2 further categories were:


Aggressive gunplay - shooting darts or aiming a gun and firing imaginary shots at objects in the
room.
Non-imitative physical and verbal aggression - physically aggressive acts directed toward
objects other than the bobo doll and any hostile remarks except for those in the verbal imitation
category (‘stupid ball’ ‘cut him’ ‘shoot the bobo’ ‘knock over poeple’ ‘horses fighting, biting’).

Finally, behaviour units were also counted for non-aggressive play and sitting quietly not playing
at all and records were kept of the children’s remarks about the situation.

One male scored all the children’s behaviours and, except for those conditions in which he was
the model, he was unaware of which condition the child had been in. To test his reliability, a
second scorer independently rated the behaviour of half of the children and the reliability was
high, around r=0.9 for different categories of behaviour.

Each child was tested individually to ensure that behavior would not be influenced by other
children. The child was first brought into a playroom where there were a number of different
activities to explore.The experimenter then invited an adult model into the playroom and
encouraged the model to sit at a table across the room from the child that had similar activities.

Over a ten minute period, the adult models began to play with sets of tinker toys. In the
non-aggressive condition, the adult model simply played with the toys and ignored the Bobo doll
for the entire period. In the aggressive model condition, however, the adult models would
violently attack the Bobo doll.

"The model laid the Bobo on its side, sat on it, and punched it repeatedly in the nose. The model
then raised the Bobo doll, picked up the mallet, and struck the doll in the head. Following the
mallet aggression, the model tossed the doll up in the air aggressively and kicked it about the
room. This sequence of physically aggressive acts was repeated three times, interspersed with
verbally aggressive responses."

In addition to physical aggression, the adult models also used verbally aggressive phrases such
as "Kick him" and "Pow." The models also added two non-aggressive phrases: "He sure is a
tough fella" and "He keeps coming back for more."

After the ten-minute exposure to the adult model, each child was then taken to another room
that contained a number of appealing toys including a doll set, fire engine, and toy airplane. The
children were permitted to play for a brief two minutes, then told they were no longer allowed to
play with any of these tempting toys. The purpose of this was to build up frustration levels
among the young participants.

Finally, each child was taken to the last experimental room. This room contained a number of
"aggressive" toys including a mallet, a tether ball with a face painted on it, dart guns, and, of
course, a Bobo doll. The room also included several "non-aggressive" toys including crayons,
paper, dolls, plastic animals, and trucks.
Each child was then allowed to play in this room for a period of 20 minutes. During this time
raters observed the child's behavior from behind a one-way mirror and judged each child's
levels of aggression.

RESULTS:
Children exposed to aggressive models imitated their exact behaviours and were significantly
more aggressive, both physically and verbally, than those children in the non-aggression model
or control groups. These children also imitated the model’s non-aggressive verbal responses.
This effect was greater for boys than girls although boys were more likely to imitate physical
aggression and girls more likely to imitate verbal aggression. Boys were also more likely to
imitate a same sex model as, to a lesser extent, were girls.

The results of the experiment supported three of the four original predictions.

Bandura and his colleagues had predicted that children in the non-aggressive group would
behave less aggressively than those in the control group. The results indicated that while
children of both genders in the non-aggressive group did tend to exhibit less aggression than
the control group, boys who had observed an opposite-sex model behave non-aggressively
were more likely than those in the control group to engage in violence.

Children exposed to the violent model tended to imitate the exact behavior they had observed
when the adult was no longer present.

Researchers were correct in their prediction that boys would behave more aggressively than
girls. Boys engaged in more than twice as many acts of physical aggression than the girls.

There were important gender differences when it came to whether a same-sex or opposite-sex
model was observed. Boys who observed adult males behaving violently were more influenced
than those who had observed female models behaving aggressively. Interestingly, the
experimenters found in same-sex aggressive groups, boys were more likely to imitate physical
acts of violence while girls were more likely to imitate verbal aggression.
Children in the aggressive condition showed significantly more imitation of physical and
verbal aggressive behaviour and non-aggressive verbal responses than children in the
non-aggressive or control conditions.

Children in the aggressive condition showed more partial imitation and non-imitative
physical and verbal aggression than those in the nonaggressive or control conditions.
Results here were however not always significant.

Children in the non-aggressive condition showed very little aggression, although results
were not always significantly less than the control group.

Children who saw the same sex model imitated the model’s behaviour significantly more
in the following categories:

1. Boys imitated male models more than girls for physical and verbal aggression,
non-imitative aggression and gun play.
2. Girls imitated female models more than boys for verbal imitative aggression and
non-imitative aggression. However, the results were not significant.

The behaviour of the male model exerted greater influence than the female model.
Overall, the boys in Bandura et al. (1961) produced more imitative physical aggression
than girls.

Mean for imitative physical aggression for male subjects: 25.8 (much higher than that
for female subjects which was 7.2). Boys imitated the physical aggression of a male
model more than the girls.

However, with a female model, girls imitated less (5.5) than with the male model. Girls
imitated the male models more than female models.

Children seeing a non-aggressive model were much less likely than either the
aggressive model group or controls to exhibit mallet aggression - particularly apparent
for girls.

Non-aggressive play - girls played more with dolls, tea sets and colouring papers. Boys
engaged more in exploratory play and gun play. No gender differences in play with farm
animals, cars or the tether ball.

According to Bandura, the violent behavior of the adult models toward the dolls led children to
believe that such actions were acceptable. He also suggested that as a result, children may be
more inclined to respond to frustration with aggression in the future.

In a follow-up study conducted in 1965, Bandura found that while children were more likely to
imitate aggressive behavior if the adult model was rewarded for his or her actions, they were far
less likely to imitate if they saw the adult model being punished or reprimanded for their hostile
behavior.

CONCLUSION:

Children will imitate aggressive/non-aggressive behaviours displayed by adult models,


even if the model is not present.
Children can learn behaviour through observation and imitation.
Behaviour modelled by male adults has a greater influence on children’s behaviour than
behaviour modelled by a female adult.
Both boys and girls are more likely to learn highly masculine-typed behaviour such as
physical aggression from a male adult rather than a female.
Boys and girls are likely to learn verbal aggression from a same-sex adult.

EVALUATION:

As with any experiment, the Bobo doll study is not without criticisms:

Acting violently toward a doll is a lot different than displaying aggression or violence against
another human being in a real world setting.

Because the experiment took place in a lab setting, some critics suggest that results observed
in this type of location may not be indicative of what takes place in the real world.

It has also been suggested that children were not actually motivated to display aggression when
they hit the Bobo doll; instead, they may have simply been trying to please the adults.

Since data was collected immediately, it is also difficult to know what the long-term impact might
have been.

Some critics argue that the study itself was unethical. By manipulating the children into
behaving aggressively, they argue, the experimenters were essentially teaching the children to
be aggressive.

The study might suffer from selection bias. All participants were drawn from a narrow pool of
students who share the same racial and socioeconomic background. This makes it difficult to
generalize the results to a larger, more diverse population.

– Ethics – the children were placed in a situation wherein they had witnessed
aggressive behaviour. This manipulation led some of the children to become
aggressive, which could have led the children to have more aggressive dispositions
across their lives, thus the children had not been adequately protected from harm.

The study has high levels of standardization and hence a higher reliability.
The study has many controls and this increases its validity.
There was a high interobserver/ inter-rater reliability.
Low risk of demand characteristics.
Using a matched pairs design reduced the effects of participant variables.
Use of quantitative data allows for easier statistical analyses.

Weaknesses
The study lacks ecological validity and mundane realism.

Issues and Debates


Application to everyday life: can be useful to advertising agencies.
Individual and Situational explanation: this study supports the situational side of the debate as
the situation that the children found themselves in caused the imitated aggressive behaviour.
Nature versus Nurture: this supports the nurture side of the debate as the environment they
found themselves in caused the imitated aggressive behaviour.
The use of children: less susceptible to demand characteristics however could become more
aggressive after this study.

Generalisability

A sample in all three studies was large: 72, 96 and 66, large enough that anomalies (eg
disturbed children) might be cancelled out (e.g. by particularly mild mannered ones). The
samples were all taken from the same nursery, which was for the students and staff at one of
the world’s top universities. These children might have unusual home lives and particularly
educated parents, making them unrepresentative of normal children.

Another problem is generalising from children to adults. This might not matter if all of our
important behaviour is learned in childhood (even if we don’t act it out until adulthood – the 1965
study shows children can learn behaviours but not act them out until later). However, the studies
may not tell us much about how adults learn new behaviour because adults might be less
influenced by role models.

Reliability

Bandura’s procedure is very reliable because it can be replicated – as Bandura did, replicating
the study in ’63 and ‘65. This was easy to do because of the standardised procedure (same
script, same checklist categories, etc).

Bandura also used two observers behind the one-way mirror. This creates inter-rater reliability
because a behaviour had to be noted by both observers otherwise it didn’t count.

Finally, Bandura filmed the 1963 study and the films can be watched by anyone, which adds to
the inter-rater reliability.

Application

Bandura, Ross & Ross (1961) can be applied to parenting and teaching styles. It suggests
children observe and imitate adults, so if you want your children to grow up calm and
well-behaved, you need to keep your temper and keep them away from aggressive role models.
Calm role models seem to have a big effect, which might apply to “buddy” systems used in
schools or prisons to help troubled students or prisoners learn from a role model.

Bandura, Ross & Ross (1963a) has much more application to TV censorship. Bandura claims
the study was inspired by a news story about a boy in San Francisco who was seriously hurt
when his friends re-enacted a TV fight scene. The study suggests even cartoon violence (like
Tom & Jerry) might be causing children to learn aggressive behaviour. This study is used to
support censoring TV, films and video games aimed at children.

The 1963 study also counts against the “catharsis” argument which is often used to defend
violent sports like boxing or WWE (which is very popular with young boys). Defenders often say
watching wrestling helps audiences “vent” their aggression harmlessly, but Bandura suggests
the opposite is true. If Bandura is right, these sports should not be shown to children.

Bandura (1965) also applies to media censorship. Heroes in TV shows, films and video games
are often rewarded for using violence: James Bond, Spider-Man, Lara Croft and every Arnold
Schwarzenegger movie. In video games, violence is explicitly rewarded by “levelling up”; in films
the violent hero saves the day or gets the girl. More media censorship might reduce violence in
society. Alternatively, films and games should be made to show the real consequences of
violence rather than the rewards.

Validity

The main criticism of all Bandura’s studies is that they lack validity. The children were put in a
strange situation, exposed to some unusual adult behaviour and given toys to play with which
encouraged them to act unnaturally. For example, a Bobo Doll is designed to be hit and knocked
over (it bounces back upright); children would suppose the experimenters wanted them to play
with the Bobo Doll in this way. This sort of behaviour is called demand characteristics,
because the participants do the stuff they think the researchers demand of them.

Bandura did address this by creating a film in which an adult woman attacked another adult
dressed as a clown; when an actor in a clown costume entered the observation room, the
children used plastic mallets to aggress against him.

(NB. Although this "real life clown" variation is mentioned in many textbooks and websites, I
can't find a citation for it. I wonder if it's one of those Psychology 'urban myths'. Nevertheless,
it's so well-attested in textbooks it is surely appropriate for students to use in the Exam)

The other criticism of Bandura's conclusions is that there are other explanations for aggression -
biological ones. The study by Raine et al. (1997) shows that aggression is linked to certain brain
deficits, like a weak prefrontal cortex; people with these deficits might need no excuse to start
behaving aggressively and misinterpret the role model's behaviour as an invitation to do so.
Ethics

There are many ethical issues with Bandura’s studies. The major issue is harm and the
wellbeing of participants. The children may have been distressed by the aggressive behaviour
they witnessed and the aggressive behaviour they learned from the study may have stayed with
them, going on to become a behavioural problem. Participants are supposed to leave a study in
the same state they entered it, which may not have happened here. This is an example of what
the BPS Code of Ethics calls "normalising unhelpful behaviours".

Although the children could not give valid consent to take part, their nursery teachers agreed
and it is assumed the children’s parents agreed to; this is presumptive consent. Nonetheless,
the children could not withdraw from the study and no effort seems to have been made to
debrief them afterwards (by explaining that the aggressive adults were only pretending).
Bandura would argue that the benefits to society outweighed the risks to any of the children that
took part. His research has shown us the influence that role models have on aggressive
behaviour, especially role models on TV and film. This has been an important contribution to the
debate over censorship in TV, films, videos and games.

SAAVEDRA AND SILVERMAN (BUTTON PHOBIA)


Evaluative learning is a form of classical conditioning in which a person comes to perceive or
“evaluate” a previously neutral object negatively.
It does not depend on the individual expecting or being aware of the association between the
neutral object and the negative outcome.
An individual may negatively evaluate a specific object or event without anticipating the threat of
an objective contaminant.
This elicits a feeling of disgust rather than fear.

AIM:
To examine classical conditioning in regards to fear and stimulus avoidance.
To find weather exposure therapy would reduce disgust and avoidance towards buttons in the
case of a boy with button phobia.
BACKGROUND:
Evaluative learning is a form of classical conditioning where a neutral stimulus becomes
negative as the product of complex thought processes and emotions associated with it.
Pavlov observed classical conditioning in dogs. The dog’s salivation was an unconditioned
response to food (an unconditioned stimulus). When dogs learned that the ringing of the bell (a
neutral stimulus) meant they would be fed, the dogs, therefore, started associating the food with
the bell ring. Now, when the dogs heard the bell (conditioned stimulus) they would start
salivating even in the absence of food, therefore salivation would become a conditioned
response.
Psychologists proposed the idea of phobias being learned like other behaviors by evaluative
learning meaning that the removal of the fear and disgust feelings towards a stimulus would
help the individuals unlearn the phobia and would act as a treatment.

RESEARCH METHOD:
The experiment was a case study where data were collected by self-report. The boy and his
mom were interviewed beforehand about the origins of the phobia. Results from treatment were
measured using the ‘Feelings Thermometer’ – a 9-point scale for disgust.
Research Method: Case Study, Observations and Questionnaires
Quantitative data: Distress ratings and Severity ratings
Qualitative data: Questions about why the boy found buttons disgusting

SAMPLE:
The participant was a 9-year-old Hispanic American boy recruited from the Child Anxiety and
Phobia program at the Florida International University. He has been experiencing button phobia
symptoms for four years beforehand.

PROCEDURE:
Both the boy and his mother gave consent for participation in the study and publishing of the
results. They had to give an interview with which the experimenters could determine where the
phobia originated. It is believed his phobia started after an unpleasant experience with buttons
at the age of 5 when he dropped a bowl of buttons during a crafts lesson in front of his teacher
and classmates. The persistent avoidance of buttons worsened his quality of life and interfered
with his daily normal functioning.
Prior to the treatments, a hierarchy of the boy’s feared stimuli was constructed, consisting of 11
items in increasing severity. This Feelings Thermometer measure was an 8 point scale and the
boy rated clear, small, plastic buttons an 8.
The treatment involved contingency management and imagery exposure interventions.
→ The contingency management was an in vivo technique (physically happening) involving
positive reinforcement, in which the boy was gradually exposed to the 11 stimuli on the
hierarchy and was rewarded with his mother’s affection for completing each hierarchy level. The
sessions ranged between 20 and 30 minutes.
→ The imagery exposure was an in vivo technique (using imagination) involving visualization
techniques in which the boy was asked to imagine the buttons falling on him, and considering
how they looked, smelled and felt. The imagery button exposures were also done according to
the Feelings Thermometer hierarchy. Throughout the sessions, the boy had to perform cognitive
self-control strategies – self-talk in which the individual considers positive thoughts when
troubling ones occur.

Features of the Disgust/Fear Hierarchy (“Feelings Thermometer”):


A Fear/Disgust Hierarchy was devised in the sessions in order to track the boy’s progress as he
underwent treatment. This was described to the boy as being a “Feelings Thermometer”, and it
was a 9-point scale ranging from 0-8 ( 0 = not disgusting/ scary at all, 8 = extremely
disgusting/scary). The boy had to rate different stimuli on this Hierarchy.
The boy’s ratings are shown in the table to the right before he underwent any treatment. He
found the large denim jean buttons the least distressing, rating them a 2, and the small plastic
buttons (both clear and coloured) the most distressing (rating them the maximum score of 8).

RESULTS:
→ Exposure therapy: All items on the hierarchy were successfully completed. The boy could
handle larger button numbers in later sessions. Distress increased significantly between session
2 and 3. At session 4, some items have increased in the distress score, such as hugging mom
when she wears buttons. Although the boy could handle the items, his fear and anxiety
increased. Despite the behavioral changes, evaluative reactions do not improve as a
consequence of positive reinforcement, supporting evaluative learning.
→ Imagery therapy: Successful in minimizing the ratings of distress. Before therapy, the most
fearful experience rated with 8 was ‘hundreds of buttons falling over the body’. Mid therapy
reduced this to 5 and by the end of it was limited to 3.
→ After a 6 and 12 months follow-up, the boy reported minimal distress towards buttons. He no
longer met the criteria for button phobia. He was able to wear small clear buttons on his school
uniform, buttons no longer affected his daily life.
Behavioural Exposures
The child was treated with an exposure-based treatment programme that tackled cognitions and
behaviour.
The treatment involved the use of contingency management. The mother provided positive
reinforcement if the boy successfully completed the gradual exposure to buttons.
Treatment sessions lasted about 30 minutes with the boy alone and 20 minutes with the boy
and his mother.
Before the first session, the boy devised a disgust and fear hierarchy using distress ratings on a
9-point scale (from 0-8) via a feelings thermometer as shown in Table 1.
The most difficult were small, clear, plastic buttons.
He had 4 sessions of behavioural exposure to buttons using this hierarchy.

Disgust Imagery and Cognitions


After the behavioural exposure, it was planned to have 7 sessions looking into the boy’s
disgust imagery and cognitions with a view of helping him to change these over time.
Further probing revealed that the boy found buttons disgusting upon contact with his
body.
He also expressed that buttons emitted unpleasant odours.
These seven sessions involved exploring with the boy the vicarious things about buttons
that he found disgusting and using specific cognitive strategies.
He was prompted to imagine buttons falling on him and to express how they looked, felt,
smelled and to elaborate on how these imagery exposures made him feel.
Although the boy indicated that buttons were “disgusting and gross”, even with intense
probing it was difficult for him to describe exactly what about buttons rendered them
disgusting and gross.

Treatment 1: in vivo exposure


The boy was gradually able to handle more buttons, so in that way, he seemed to be improving.
However, in session 4, his subjective ratings on the Disgust/Fear Hierarchy (for example, his
rating of medium, coloured buttons and hugging his mum while wearing large plastic buttons)
were higher than the original ratings. Because his ratings on the Disgust/Fear Hierarchy had
increased, they started the second treatment which aimed to lower these ratings.
Treatment 2: Disgust-related imagery exposures
Disgust-related imagery exposures and cognitions appeared to be successful in reducing the
boy’s subjective ratings of distress. For example, the boy’s rating for the imagery of hundreds of
buttons falling on him fell from 8 at the beginning of the session (before imagery exposure) to 5
(midway through the exposure) to 3 (after the exposure). His ratings also decreased
dramatically about the imagery of him hugging his mum while she was wearing buttons from 7
(before imagery exposure) to 4 (midway through therapy) to 3 (after imagery exposure).

​ ong-term effects of the treatments


L
At the posttreatment (after 6 months), the boy reported minimal distress about buttons, and he
no longer met the DSM-IV criteria for a specific phobia of buttons. He was also able to wear
buttons on his school uniform. At the follow-up assessment (after 12 months), he was in
remission for the phobia diagnosis and continued to wear plastic buttons on his school uniform.

CONCLUSION:
The treatment was successful. Particularly imagery exposure can give long-term results for the
reduction of fear, disgust and distress that come with specific phobias by altering negative
evaluations. It may be argued that emotions and cognitions are the crucial items in a person’s
learning of phobic stimuli response.
By session 4, the boy had successfully completed all in vivo exposure tasks up to those with the
highest distress ratings.

Even though he could handle more and more buttons, his distress rating increased dramatically
from session 2 to 3 and 3 to 4.
In session 4, the boy’s subjective ratings that had been 6 or 7 prior to the treatment were now
higher.
This phenomenon was consistent with evaluative learning.
Disgust-related imagery exposures and cognitions appeared to be successful in reducing the
boy’s subjective ratings of distress.
In the imagery sessions, he had to imagine hundreds of buttons falling on him, before the
cognitive restructuring, he rated the experience as 8. This decreased to 5 midway through the
session and ended up as 3.
In a session where he has to imagine hugging his mother while she was wearing a shirt with
many buttons, the distress ratings went from 7 to 4 to 3.
He was followed up 6 and 12 months after treatment and he no longer met the specific phobia
of buttons

Disgust plays a key role in the development and maintenance of a phobia but a mixture of
behavioural exposure and cognitive restructuring helped to eliminate the feelings of disgust.
Imagery exposure therapies can provide long-term results for the reduct of disgust that comes
with evaluative learning of an object - resulting in a phobia. This is because the imagery
exposure therapies alter the negative evaluations associated with the object.
Cognitions and emotions are crucial items in a person’s learning of a phobic stimuli response,
and clinicians should focus on addressing associated emotions as well as presenting the child
with the stimulus. ​

Ethical Issues
The participant was severely distressed and protection provided.
Informed consent was taken from the mother and the boy.
Strengths
Qualitative and quantitative data were both acquired in this study.
This is a case study and is focused on one person only hence detailed data was
collected.
The study was conducted in a therapeutic setting hence had ecological validity.
Weaknesses
The study lacks mundane realism.
This was a case study and used only one participant, hence has a low generalizability.
The ratings are subjective and this lowers reliability.
Issues and Debates
Application to everyday life: For treating other phobias
Nature versus Nurture: The process by which the phobia was acquired relates to nurture.

One weakness of Saavedra and Silverman is that it was a case study, meaning that
generalisability is very low. The only participant in this study was a 9-year-old Hispanic who had
an unusual phobia of buttons. This means that the study is low in generalisability because the
sample is not representative of the target population (anyone with a phobia). This means that
the findings may not be generalizable to people with more extreme phobias, people who have
had their phobias for a longer period of time than 5 years, and adults.

Another weakness of this study is that the data was mainly based on self-reports. Despite
having one quantitative measure of progress other than self-reports (number of buttons
manipulated in the exposure therapy), most of the data gathered was the ratings of fear and
disgust given by the boy on the ‘Feelings Thermometer’. This decreases the validity of the study
because the boy may have displayed demand characteristics as he understood what the
researchers would have been investigating. This is even more problematic because the boy
may have developed a relationship with the researchers over the course of his treatment, and
he may have therefore rated his fear/disgust on the ‘Feelings Thermometer’ as the researcher
wanted. Therefore, the study lacks validity as the results may have been manufactured by the
boy in order to support what the researchers were looking for.

On the other hand, one strength of the Saavedra and Silverman study is that it was high in
ecological validity. The 9-year-old boy in this study had to hug his mum whilst she was wearing
buttons as part of the in vivo exposure therapy. This means that the study has high ecological
validity as the boy performed tasks that were familiar to him, and that he would have completed
outside of the Psychiatrist’s office.

Another strength of this study is that ethics were very high. Both the mother and the boy
provided informed consent to participate in the assessment and intervention procedures. This is
highly ethical as both the mother and the boy were made fully aware of the procedures that the
boy would need to undergo, which is especially important as the therapies would have been
very distressing for the boy. Despite the distress that the therapies caused, this study is still
highly ethical as it aimed to decrease the boy’s phobia of buttons in the long term, so even
though the boy was exposed to distressing stimuli in the short term, the therapies allow him to
overcome his fear and help him to return to school in the long term, giving him a better quality of
life overall. In addition, the boy's identity was never revealed, and therefore confidentiality is
preserved in this study. Therefore, this study is highly ethical as informed consent was given by
both the boy and the mother, the study benefited the boy long term (despite causing him some
distress) and the confidentiality of the boy and his mother were maintained.

Classical Conditioning
● First described by Ivan Pavlov, a Russian physiologist
● Focuses on involuntary, automatic behaviors
● Involves placing a neutral signal before a reflex
Operant Conditioning
● First described by B. F. Skinner, an American psychologist
● Involves applying reinforcement or punishment after a behavior
● Focuses on strengthening or weakening voluntary behaviors

One of the simplest ways to remember the differences between classical and operant
conditioning is to focus on whether the behavior is involuntary or voluntary.

Classical conditioning involves associating an involuntary response and a stimulus, while


operant conditioning is about associating a voluntary behavior and a consequence.
In operant conditioning, the learner is also rewarded with incentives,5while classical
conditioning involves no such enticements. Also, remember that classical conditioning is
passive on the part of the learner, while operant conditioning requires the learner to actively
participate and perform some type of action in order to be rewarded or punished.

For operant conditioning to work, the subject must first display a behavior that can then be
either rewarded or punished. Classical conditioning, on the other hand, involves forming an
association with some sort of already naturally occurring event.

PEPPERBERG
Background
To see if humans can use abstract symbolic relationships when communicating
Many psychologists believe that only humans possess “true language skills” alongside the
ability to show a range of cognitive skills.
Prior to this study, Pepperberg had reported on an African Grey parrot, Alex.
He could categorize subjects, count up to six and use functional phrases “Come here” “I want X”
“Wanna go Y” and “no”. However, Pepperberg stated that these do not show whether a
non-human can comprehend and use abstract symbolic relationships when communicating.
One cognitive skill that has been reported as being a concept not seen in non-humans is the
comprehension of “same” or “different”.
Premack noted that for a non-human to demonstrate comprehension of “same” two aspects
must apply
They must recognize that two independent objects called A1 & A2 are both blue and this single
attribute makes them “same”
They must also recognize that this “sameness” can be immediately extrapolated and
symbolically represented not only for two other blue items, but for two novel independent items
that have nothing in common with the original set of A’s
This study was designed to test these two ways of assessing the cognitive skill of “same” or
“different”.

Previous studies completed by Pepperberg have shown that Alex is able to use English
vocalisations to identify, request or refuse more than 80 different objects of various colours,
shapes and materials.

The ability to put things into groups of the same or different may be present in some animals. To
do this, the animal must first recognise the category being shared (e.g. colour) and they must
also understand what is meant by ‘same’ or ‘different'. They must also realise that this can be
applied to new situations and objects.

Most animal studies on language have focused on non-human primates (e.g. chimpanzees) as
they have high levels of cognition. Evidence has been found that many bird species have the
ability to recognise the same/different items, which is an advantageous benefit for the survival of
birds. This is what Pepperberg was aiming to investigate.

Operant conditioning is a method of learning that occurs through rewards and punishments for
behavior. Through operant conditioning, an individual makes an association between a
particular behavior and a consequence.

B.F. Skinner proposed his theory on operant conditioning by conducting various experiments on
animals. He used a “Skinner Box” for his experiment on rats.
As the first step to his experiment, he placed a hungry rat inside the Skinner box. The rat
discovered a lever, upon pressing which; food was released inside the box. The conditioning
was deemed to be complete when the hungry rat immediately pressed the lever once it was
placed in the box. The act of pressing the lever for food is positive reinforcement.
B.F. Skinner also conducted an experiment that explained negative reinforcement. Skinner
subjecte Pressing of the lever immediately seized the flow of unpleasant current. After a few
times, the rat had been conditioned to go directly to the lever in order to prevent itself from the
discomfort.
Aims
To see if an avian subject could use vocal labels to demonstrate symbolic comprehension of the
concepts of same and different.
To investigate whether a parrot is able to use vocal labels to demonstrate the symbolic
understanding of the concepts ‘same’ and ‘different’ ​
Sample:
The participant was an African Grey parrot called Alex and he had been the subject of
interspecies communication and cognitive ability testing for 10 years. This was an opportunity
sample as the parrot was owned by Pepperberg.

Prior to training, Alex had an extensive vocabulary. He could name and identify colours such as
Rose, Yellow, Green, Blue, Grey, Orange and Purple.
He could also name several shapes (2-,3-,4-, and 5-comer for, respectively, football-shaped,
triangular, square, and pentagonal forms). During the course of the experiment, he was also
able to identify 6-cornered shapes.
He could name four materials (paper, wood, hide [rawhide], and cork) and various metallic items
(such as key, chain, or grate).
He had also shown a limited comprehension of abstract categories, in that he could respond to
vocal questions of "What color?" or "What shape?" for example by saying “green” or “5-comer”.
His extensive vocabulary meant that in the testing conditions the researcher could put together
a number of possible scenarios.

CONDITIONS THAT ALEX WAS KEPT IN


Housing: Alex was allowed free access (based on his vocal requests) to all parts of the
laboratory for 8 hours of the day (time that the trainers were present). During sleeping hours,
Alex was confined to a wire cage (~62X62X73 cm).
Food: water and a standard seed mix (sunflower seeds, oats etc.) were available continuously,
and fresh fruits, vegetables and toys were available upon request (e.g. “I want cork”). ​
ETHICS OF USING ANIMALS IN RESEARCH
Although the conditions were fairly good, parrots are intelligent animals and so there are issues
of keeping him in captivity.
Alex was known to get bored, and he often picked his feathers out of boredom. Alex died at age
31, which came as a surprise as the average lifespan for a grey parrot in captivity is 45 years.
Though this may have been for other reasons, Alex may have been better off in the wild where
he would be more intellectually challenged and would not get as bored.
In addition, though he was allowed free-roaming of the laboratory, Alex was confined to a cage
at night, and was only allowed free-roaming at certain times of the day.

Procedure
Research Method: Laboratory Experiment, Case Study
Experimental Design: N/A
IV: Whether the object is familiar or novel
DV: Whether the parrot responds correctly to the questions
Sample: African Grey parrot named Alex who was the focus of Pepperberg’s work since June
1977. He had free access to all parts of the lab for 8 hours/day when the trainers were present.
During his “sleeping hours” he was placed in a cage with fresh water and a standard seed mix
for parrots was available at all times. The trials occurred at various locations around the
laboratory depending on where Alex was at that time. Other food such as fresh fruits,
vegetables, nuts and toys were provided when Alex asked for them.
Sampling Technique: Opportunity
Quantitative data: % Success rate on trials was measured for familiar and novel objects
Alex was presented with two objects which could be differentiated based on three categories:
colour, shape and material. He would then be asked either “What’s same?” or “What’s
different?”
A correct response would be only recorded if Alex vocalised the appropriate category.
Four processes Alex had to go through to get a correct response
Attend to multiple features of two different objects
From the vocal question, determine whether the response is based on sameness or difference
Work out what is same or different
Vocally produce a category response
To complete these, Alex had to perform the cognitive skill of feature analysis on the objects
He had already been learning “language” and concepts for around 9 years prior to this study
hence he could already produce vocal labels in English.
During the course of the study Alex acquired labels for orange, purple and six-cornered objects.
Training sessions occurred 2-4 times per week and lasted between 5 minutes to 1 hours.

Training (general):
MODEL/RIVAL technique - primary technique used by Pepperberg; it is based on the principles
of social learning theory
This demonstrates the parrot types of interactive responses in the study.
One person acts as a trainer to the second human. The trainer asks questions about the object
and gives praise and rewards for the correct answer but shows disapproval for the incorrect
answer.
The second human acts as a model for Alex but also as a rival for the trainer’s attention.
The roles of model/rival (M/R) and trainer were frequently reversed and Alex was often given the
opportunity to participate in these sessions.
During any training where the purpose was to acquire a correct label, each correct response
was rewarded with the object itself.
To keep Alex’s motivation high, he could ask for any reward if he answered correctly.

Training (same/different):
A trainer would hold up two objects in front of the model/rival and ask “What's the same?” or
“What’s different?”
Both types of questions and training objects were mixed within each session.
Objects were always red, green or blue; triangular or square; rawhide or wood.
M/R would respond with the correct category label and was given a reward.
If the M/R gave an incorrect response, the trainer scolded the person.
When an error was presented, the objects were removed from sight, and then presented again
with the same question asked.
The role of M/R was then reversed.
Initial training contrasted just the categories of colour and shape Alex had already learnt. He
was then trained in a third category “mah-mah” (matter).
To prevent boredom of repetitive tasks he was also being trained on number concepts, new
labels for other objects, recognition of photographs and object permanence.
Formal testing was only started after he acquired the label “mah-mah”.

General Training:
A secondary trainer who had never trained him earlier carried out trials. This was done to
reduce any effect of curing from the original trainers.
The questioning was incorporated into other test sessions that were being conducted on
Alex.
On a previous day, the principal trainer would list all possible objects that could be used
for testing. A student who was not involved in any training would then choose the
question and randomly order them.
In a week, “same” or “different” questions were asked 1-4 times.
Testing took place over 26 months (2 years 2 months)
Principal trainer was present wherever the trail took place but she sat with her
back-facing Alex and did not look at him during the presentation of the objects.
She never knew what was being presented and would repeat what she thought Alex had
said.
If it was correct Alex was given that object as a reward and praised.
If not, the examiner removed the object and emphatically said, “NO!” When this
happened, a correction procedure was used in which the object was presented until the
correct response was given.
The same materials were never presented again so there was a single first-trial
response.
An overall test score was produced. First-trial results were also calculated.
Tests on objects who are familiar:
Object pairs were presented to Alex.
They were similar pairings to the ones used in the training phase but never the same.
Individual objects were obviously used in more than one trial but the pairing were always
novel and a specific pair would only ever be presented again if Alex gave an incorrect
label (or erred).
Transfer tests using novel objects:
Alex was presented with pairs of objects that combined several attributes that had never
been used in the training phase or in previous questioning.
Alex was presented with totally novel objects that might not have even had a label.
He was exposed to objects that did not have a label for and objects that he has no
experience of.
Any completely new object was within the environment of Alex for several days prior to
being used so that Alex got used to seeing it and to reduce fear responses.
The use of probes:
One concern was that in formulating his answers, Alex might not be attending to the
questions, but merely responding to the physical characteristics of the objects.
Thus, at random intervals probes were administered in which he was asked questions
for which two category labels could be the correct response.
If he were ignoring the content of the question and answering on the basis of attributes,
he would have responded with an incorrect answer.

Training for label acquisition:


At the start of training, continuous reinforcement was used to get a close association between
the object and category or label to be learned.
If Alex got the right answer, he was awarded both objects. To motivate him to work with objects
in which he had little interest, the researcher also modified the procedure to allow him to request
alternative objects as his reward.
The training method then used was the ‘model/rival (M/R) approach which uses the principles
of ‘modelling’ (Bandura). One human is the ‘trainer’. They present objects and ask questions
about the objects to another person (the ‘rival’). Rewards are given when the ‘rival’ gives a
correct response. So the second human ‘models’ the behaviour for the parrot, and the parrot
then becomes a ‘rival’ for the trainer’s attention and rewards.The roles of trainer/model are then
reversed. Alex is given the opportunity to participate in the vocal exchanges.

Training on same/different:
The training aimed to teach Alex to respond to questions with a categorical label (“What’s
same/different?”), rather than just describing the object. Answers were ’colour’, ‘shape’ or
‘matter’. For example:
“What’s same?”
Correct answer is SHAPE/ MATTER
“What’s different?"
Correct answer is COLOUR

However, they would only have one similarity/ difference so that Alex could only give one correct
answer. The model/rival technique was also used in this stage of training.
Training sessions occurred 2- 4 times/week and lasted 5 min - 1hr. He took 4 months to learn
color and shape. For matter however, Alex had to instead say ‘mah-mah’ because he could not
vocalise ‘matter’. This meant that matter (or ‘mah-mah’) took longer to learn than the others - it
took 9 months.

Before testing, all of the possible objects were listed by the principal trainer. A student not
involved in testing would then choose the question, from the pairs for same/different, and
randomly order all the questions. Questions on same/different were asked on average one to
four times per week, and neither Alex nor the principal trainer could predict which questions on
which topic would appear on a given day. Testing on same/different occurred between February
1984 and April 1986.

A secondary trainer would present the objects in a variable but previously determined order.
Alex was shown an exemplar or number of exemplars, asked "What's this?" "What color?"
"What shape?" "How many?" "What's same?" or "What's different?" and was required to
formulate a vocal English response. The additional questions (not ‘same/different’) were asked
to prevent Alex from getting bored. The principal trainer repeated out loud what she heard Alex
say without looking at the objects presented so there was no bias. If that was the correct
response (e.g.,the appropriate category label), the parrot was rewarded by praise and the
object(s).

The operational definition of correct was the first vocalisation that Alex gave. If the identification
was incorrect or indistinct, the examiner removed the object(s), turned his/her head (a time-out),
and emphatically said "No!" The examiner then implemented a correction procedure in that the
misnamed object was then immediately and repeatedly presented until a correct identification
was made.

Tests on familiar objects


These trials involved object pairs that were similar to, but never the same as, those used in
training. These items combined one additional colour, shape, or material available in the
laboratory-that is, variously colored and shaped objects of wood and rawhide (e.g., 5-corner
blue wood) and, later, variously shaped keys.

Tests on novel/unfamiliar objects


Object pairs for the second set of transfer tests were interspersed randomly within the first set.
In these tests, Alex was presented with pairs of objects that combined several attributes never
used in training or previous tests on same/different (e.g., 5-Corner white paper) or totally novel
objects incorporating colours, shapes, or materials for which he might not even have labels
(e.g., pink woolen pompom).
In these pairs, at least one and often both objects were items that were thus totally unfamiliar to
Alex. They could be made of colours or shapes that he had probably seen, such as white or
round (e.g., on clothing or foods), but could not label; they could be objects, such as toy cars,
with which he at the time had had no experience.

Probe testing
There was concern that Alex may not be attending the questions. At random intervals Alex was
presented with probes. Alex was asked questions for which either of two (of the three) category
labels could be the correct response; that is, he would be shown a yellow and a blue wooden
triangle and asked "What's same?" If he were ignoring the content of the question and
answering on the basis of the attributes and his prior training, he would have responded with the
one wrong answer; if he were answering the question posed, he would have two possible
correct responses - wood/ triangle.

Results
The training for Alex to acquire “colour” and “shape” as labels took 4 months and for
“mah-mah” it took 9 months.
The length of each session was dictated by Alex’s willingness to attend.
Familiar objects:
99 out 129 (76.7%) correct responses overall
69 out of 99 (69.7%) on first-trail only performance
Based on chance, he should have scored 33.3%
His performance on pairs made of objects that were no longer novel but contained a
colour, shape or material he could not yet label was 13 out of 17 and 10 out of 13 for
first-trials.
Transfer tests with novel objects:
96 out of 113 (85%) correct overall
79 out of 96 (82.3%) on first trials
When there was a novel object in a pair his score was 86% and when both objects were
novel it was 83%
Probes:
55 out of 91 (90.2%) correct overall
49 out of 55 (89.1%) on fist trails
This demonstrates he was processing the questions rather than simply the attributes of
the objects
Using novel objects tested whether Alex could generalise his understanding of ‘same’ and
‘different’ to new situations. He was actually better at this.
This may have been because Alex was said to have built a collection of objects, so perhaps
when novel objects were presented, he wanted them more. This meant that he was more
motivated to get the correct answer to have the new object for his collection.

Conclusions
The data indicates that at least one avian subject shows symbolic comprehension of the
concept same/different.
Alex’s scores on all tests were significantly above chance, suggesting that he
understood what the questions were asking.
It would therefore appear that symbolic representation of same/different is not exclusive
to primates.
● Parrots have the potential to demonstrate understanding of ‘same’ and ‘different’.
● Parrots may learn to respond to verbal questions to vocalise categorical labels.

Ethical Issues
Strong animal ethics - Number (only one), rewards given, no deprivation and appropriate
caging.
Strengths
Case study – focused on one subject
High standardisation – higher reliability
High validity
Weaknesses
Lacks mundane realism
Low generalizability – due to case study
Issues and Debates
Application: for training animals
Nature versus nurture: supports nurture. He was learning through both operant
conditioning and social learning.

One strength of Pepperberg’s research is that it has high validity. For example, the trainer who
tested Alex had not been working with him during training. Instead, the researcher who trained
Alex stood in the corner of the room with her back turned to the objects being tested and
interpreted what Alex’s response was as the person who tested Alex may not have understood
some of his responses. As a result, researcher bias was limited as the tester could not be
criticised for ‘cueing’ Alex to respond in a particular way. In addition, a student was asked to
choose the question order and materials used in the study which again removes any researcher
bias.
However, one weakness of Pepperberg’ study is that it lacks generalisability. For example,
the study was a case study of one African grey parrot called Alex who had undergone previous
cognitive testing and had been kept in captivity for at least 10 years before the study. Therefore,
it is difficult to generalise his behaviour to other parrots that are wild as they may display
different behaviours.
Another weakness of Pepperberg’s study is that it is low in ethics. To some extent,
psychologists argue that Alex suffered with boredom as a result of repetitive testing over a 2
year period. Furthermore, Alex was confined to a wire cage during the night (~62X62X73cm)
and was observed to be plucking his own feathers when he was bored. Because of this, and the
fact that he is in a situation that is foreign to his natural environment, makes ethics in this study
low.
In contrast, one strength of this study is that pepperberg used quantitative data when
collecting the ‘same/different’ question responses. This is a strength as it allows Pepperberg to
make an objective analysis of whether Alex could comprehend abstract concepts. Furthermore,
the use of objective data allowed the researchers to make comparisons between novel and
familiar objects. Therefore allowing the researchers to establish whether Alex could use the
rules of same/different beyond the training materials.
Finally, this study has a lot of real-life applications beyond this study. Despite not necessarily
being generalisable to other parrots, we can use the training methods of operant conditioning,
continuous reinforcement and social learning to try and shape the behaviours of other animals,
in zoos for example. Zoo keepers can use observations and imitations to introduce new animals
to groups more easily by encouraging role models to show the new member what behaviour is
appropriate.

SOCIAL APPROACH
The social approach focuses on how individuals' behaviours are influenced by the social
situation that they are in and how they want to be perceived by others.
ASSUMPTIONS OF THE SOCIAL APPROACH
- individual behaviour can only be understood in relation to other people

- behaviour, cognitions and emotions can be influences by groups or social contexts that can
frame and direct and individual's actions


STRENGTHS WEAKNESSES
- we can understand what - Findings from some studies may not apply to
processes influence our other societies/ modern societies
behaviour within society and
- Social behaviours are very complex and
around different people
difficult to study in terms of controlling variables
- it tends to be a holistic (i.e.
- some problems include distinguishing
not reductionist) approach
individual from situational and ensuring
ecological validity
MILGRAM:
Research Method: controlled observation (there was no manipulation of an independent
variable)
* this was actually a pilot study, but the results were so shocking that the research was
published
Experimental Design: Repeated measures design (all participants underwent the same
procedure)

Background:
During World War 2 Holocaust, approximately 11 million people (6 million Jews) were killed
under the authority of Adolf Hitler. After the War, the German Officers and Guards tried for war
crimes claimed that they were ‘just following instructions’. Due to the large scale of the
genocide, it was believed at the time that the Germans were uniquely flawed in their high levels
of destructive obedience. Stanley Milgram, himself from a Jewish family, sought to investigate
whether these claims held any truth, and so set up a procedure to test whether similar levels of
obedience could be observed among Americans, when placed in the right situation.
Interestingly, when Milgram asked fellow psychologists to predict the outcome of his
experiment, very few said that any would go all the way to 450V. In fact, the largest estimate of
the proportion of people who would go to 450V was 1-3% of the participants - very different to
the actual findings.

AIM
To investigate whether or not ordinary people would be obedient to an authoritative figure, even
if the action physically harmed another person

SAMPLE
The sample was 40 males, ranging in ages from 20-50. Their professions ranged from unskilled
to white-collar workers. All of the participants came from the New Haven area, and were
recruited by means of volunteer sampling from a newspaper advert. They were paid $4.50 for
showing up, regardless of whether they completed the procedure or not.

The original advertisement used to recruit participants is shown on the left.



Why were a variety of professions included?
The main reason for this is generalisability. Some occupations may have a greater tendency to
be more obedient than others. A higher level of education may cause people to be more likely to
question the study. If a wide range of professions are included, the results should be
generalizable to most people, regardless of what they do for work.
This particular experiment was actually a pilot study. Milgram completed this smaller study in
preparation for a larger experiment, but the pilot study was published because of the shocking
(excuse the pun) findings.

A pilot study is a smaller, trial version of a larger study that is carried out to ensure that there are
no flaws in the procedure and to test the equipment. It saves time and money if the procedure is
found to have a flaw, as the flaw can be rectified before the full-scale version is carried out.

PROCEDURE;
The study took place in the interaction laboratory at Yale University. The role of the
experimenter was played by a 31-year-old high school biology teacher. His manner was
impassive, and he was trained to be quite stern during the experiment. He always wore a grey
lab coat. The victim was played by a 47-year-old accountant who was trained for the role. Most
people found him to be mild-mannered and likeable.
When the participant was brought into the lab, they sat down with the victim (stooge) and
both were told a cover story. The participants were told that the experiment’s aim was to
investigate ‘the effect of punishment on learning’. They were then told that either the naive
participant or the victim (stooge) would be the leaner, and they were asked if ‘either of them had
a preference’. The participants then drew slips of paper from a hand to decide who would be the
teacher or leaner in the experiment. This was always rigged so that the stooge would always be
the leaner and the participant would be the teacher.

After drawing the slips of paper, the learner and the participant were taken to a room and the
learner was shown being strapped into the electric chair. The experimenter explained that the
straps were to ‘prevent excessive movement’. An electrode was then attached to the learners
wrist along with an electrode paste in order to ‘avoid blisters and burns’. The experimenter then
told participants that ‘although the shocks can be extremely painful, they cause no permanent
tissue damage’.
​Sample shock
Each participant was then given a sample electric shock of 45V. They had an electrode and
electrode paste applied to their wrist, and then the shock was administered by pressing the third
lever on the generator.
Learning task
The participants then started the paired-associate learning task. The participant would read a
series of word pairs to the leaner, and the learner had to indicate which of the four words had
been originally paired with the first word by pressing one of four switches. If the learner got the
answer wrong (or gave no answer at all) the participant (teacher) would shock the leaner,
moving one level higher for every wrong answer. The participant is also told to read out the
voltage level to the learner before administering a shock, in order to remind them of the
increasing intensity.
Learner responses
The learner gives a predetermined set of responses to the word pair test, based on a schedule
of approximately 3 wrong answers to 1 correct answer. No vocal response was heard from the
victim until 300V was reached. At this point, the learner pounds on the wall. After 300V, the
learner does not answer any more questions - the participant is told no answer is a wrong
answer. The learners pounding on the wall is heard again at 315V, but afterwards, he is not
heard from.

Prods
If the participant shows an unwillingness to go on, the experimenter would respond with a series
of “prods”.
Prod 1: “Please continue” or “Please go on”
Prod 2: “The experiment requires that you continue”
Prod 3: “It is absolutely essential that you continue”
Prod 4: “You have no other choice, you must go on”
The prods were given in the order above, and if the participant refused to participate after prod 4
was given, the experiment was terminated.
Why were the participants told that the money was not dependent on their completion of
the procedure?
So that the participants had the right to withdraw (though even then this was made difficult).
Also, so that their obedience was not for any financial gain. However, the money may have still
caused a feeling of obligation to complete the experiment and please the researcher.
Why were the participants given a sample shock of 45V?
To convince them of the authenticity of the shock that they were giving the ‘learner’. They also
put anti-blister and burn cream on the participant before applying the shock to show them the
harmful side effects of even a small shock.
Standardised Procedures:
● Predetermined victim responses
● Predetermined wording of prods
● Predetermined order of prods
● 45V sample shock

Controls:
● Incremental increase in voltage was controlled (going
● up in 15V increments, always starting at 15V up to 450V)
● The experimenter always wore a grey lab coat
● The ‘learner’ was always the same person:

○ A ‘mild-mannered and likable’ 47-year-old accountant (stooge)


Features of the electric shock generator
There were 30 clearly marked voltage levels ranging from 15V to 450V and increased in 15V
increments (i.e. 15V, 30V, 45V…). Descriptions of the shock level were placed under the voltage
levels, ranging from Slight Shock to Danger: severe Shock and finally, the last two switches
were labelled XXX. After pressing a switch, it remained down to show how much shock the
participant had administered.

RESULTS:
Most participants were convinced of the reality of the experimental situation that they were
administering shocks to another person. They also believed that the highest shocks were
extremely painful. When participants were asked in a post-experimental interview “How painful
to the learner were the last few shocks that you administered to him?” on a 14-point scale
ranging from “Not at all painful” to “Extremely painful”, the modal (most common) response was
14 (Extremely painful), and the mean was 13.42.
The predictions that people would not, in general, administer shocks beyond those labelled
“Very Strong Shock” varies dramatically from the distribution of the scores shown on the right.
Each of the 40 participants went beyond the expected break-off point, no participant stopped
prior to administering 300V.
Out of the 40 participants, 5 refused to obey the commands of the experimenter beyond the
300V level. 4 more participants administered one further shock at 315V and then refused to go
on. Two broke off at the 330V level and one each at 345, 360 and 375 volts. Therefore, a total of
14 participants defied the experimenter out of the 40 that participated.

This therefore means that 26 of the 40 (or 65% of participants) obeyed the orders of the
experimenter until the very end by administering the 450V shock. At this point, the
experimenter ended the session. Although obedient participants continued to administer shocks,
they did so under extreme stress. Some even displayed the same reluctance to administer
shocks beyond the 300V level, similar to those who defied the experimenter, yet they gave up in
their attempts to defy and eventually obeyed when prods were given.

What were some of the participants’ behaviours observed?


Many participants showed signs of nervousness, especially when administering the more
powerful shocks. Participants were observed to sweat, tremble, stutter, bite their lips and dig
their fingernails into their skin. Another common behaviour of the participants was nervous
laughing fits - 14 participants displayed this behaviour. In post-experimental interviews, the
participants told the experimenter that the laughter did not mean that they were enjoying
shocking the victim, but instead they were distressed about the situation. In 3 cases, full-blown
seizures were observed, and for 1 participant (a 46-year-old encyclopedia salesman) his seizure
was so violent that the experiment had to be stopped.
When the experiment ended and participants had given the maximum shock available, many
signed in relief, or shook their heads in apparent regret.
What were some of the things that participants said when they defied the experimenter?
“I think he’s trying to communicate, he’s knocking… Well it’s not fair to shock the guy… these
are terrific volts. I don’t think this is very humane… Oh, I can’t go on with this; no, this isn’t right.
It’s a hell of an experiment. The guy is suffering there. No, I don’t want to go on. This is crazy”
“He’s banging in there. I’m gonna chicken out. I’d like to continue, but I can’t do that to a
man… I’m sorry I can’t do that to a man. I’ll hurt his heart. You take your check… No really, I
couldn’t do it”
WHY DID SO MANY PARTICIPANTS OBEY?
● They were in an agentic state (they were doing as they were told) as opposed to an
autonomous state (where they take responsibility for their actions)
● They were in a prestigious location at Yale University
● The authoritative figure was closeby and could see whether the participant was obeying
what they were told to do
● They could not see the consequences of their actions as the victim was in another room-
they could only hear occasional cries from the learner
● The experimenter would give the participant ‘prods’ which made it seem like the
participant had no choice but to continue
● The participant was told that the shocks would not harm the leaner (despite the screams
heard)

CONCLUSIONS
The results support the theory that it is the situation that causes obedience, not predisposition to
obedience.
● Most individuals are much more obedient than originally hypothesised
● Even if people are highly obedient, they still display high signs of tension, and they often
find it very stressful to carry out destructive acts because they are conflicted with two
social phenomena: not to cause harm to others or to obey authority

Therefore, we could conclude that the defense that war criminals were only acting under the
orders of a superior officer to be valid in some instances, as the situation that they were put
under could have been one in which most people would have been likely to obey.
EXAMPLE 10-MARKER FOR MILGRAM
One strength of Milgram is that there is high internal validity due to high levels of controls
that were put in place. For example, Milgram predetermined the responses of the learner (e.g.
pounding on the wall and then no response after 315V). This ensured that the dependent
variable (the response of the participants) was not affected as there were fewer extraneous
variables.
On the other hand, Milgram’s study has low ecological validity as the participants were not in
a natural setting due to the method of a lab experiment. Most people would not normally go to
Yale University and be required to shock someone by an experimenter, and therefore the results
may not be applicable to real life situations.
In addition, Milgram’s experiment had low generalisability because the sample used was 40
males who lived in the New Haven area. Since they were all males, generalisability would be
low because you cannot generalise the findings to females. In addition, since all the participants
lived in the New Haven area, they may have had similar experiences as they would have had
similar experiences. This means that the results cannot be generalised to other populations as
they may have different behaviours.
On the other hand, it can also be argued that generalisability is higher because Milgram did
further studies to investigate how females behave in this situation and how other populations
globally behave too. He found that they behave in similar ways to the participants in the pilot
study and therefore, it could be argued that generalizability is higher because of that fact.. In
addition, Milgram used participants from different professions which increases generalisability
because the data will not be distorted by one profession that is perhaps more submissive than
another.
PILIAVIN
Method: Field experiment that took place on the New York Subway
Experimental Design: Independent Measures Design

Background:
Kitty Genovese
Catherine ‘Kitty’ Genovese was a 28-year-old woman who was murdered outside of her
Queen’s apartment in NYC on 13th March 1964. The attack lasted for around 30 minutes and
she was stabbed 14 times by her attacker Winston Moseley. It was widely reported that despite
Genovese’s screams for help, none of her reported 38 neighbours came to her aid that night.
The Kitty Genovese murder led to the discovery of the socio-psychological phenomenon
dubbed ‘the Bystander Effect’. It was reported by the The New York Times newspaper due to a
reported 38 bystanders who witnessed the murder and didn’t call the police, however, that claim
is now known to be false.

Darley and Latané


Darley and Latané recruited university students to take part in a personal discussion of their
college lives which would be done over a microphone and headset so they could not see who
they were talking to. The participants would either be in a group chat of 6 people, or in a
one-on-one chat with the victim. The pre-recorded victim would then have a seizure during the
discussion, after already having talked about their epileptic condition. The experimenters were
measuring

WHAT IS THE 'BYSTANDER EFFECT?'


The bystander effect is when an individual is less likely to help a person in need if others are
around them. This diffusion of responsibility means that an individual may see themselves as
less qualified to help than, for example, a doctor, or that the responsibility of aiding that person
should not fall on them due to the amount of other people who could also help. This was drawn
from the research done by Darley and Latané.
AIM
To investigate bystander behaviour in a natural setting and to investigate the effect of the 4
following variables - the type of victim, the race of victim, the condition of the model and the size
of the group of bystanders - on the helping behaviour.
HYPOTHESES
Helping behaviour is affected by:
1. the victim’s responsibility for being in a situation in which they need help
2. the race of the victim
3. the effect of modelling helping behaviour
4. the size of the group of bystanders
SAMPLE
Around 4,450 men and women who travelled the 8th avenue IND in NYC between the hours of
11am-3pm on weekdays. They were unsolicited and they were an opportunity sample. The
racial mix was 45% black and 55% white at the time of the study. Mean number of people was
43 per car and 8.5 in the critical area.
FEATURES OF THE TRAINS
The A and D trains of the 8th avenue IND were used because they make no stops between 59th
street and 125th street. Therefore, the journey was 7.5 minutes, completely uninterrupted. The
trials were only run in the old subway cars as they has two-person seats in a group arrangement
rather than extended seats In the critical area, there are 13 seats and some standing room.
To the left is a layout of the train carriages taken from the study.
PROCEDURE - CONFEDERATES
For each trial, a team of 4 Columbia General Studies students entered the train car through
different doors. 4 different teams were used to collect the data for the 103 trials. There were 4
types of confederates:

● The 4 victims were all males between the ages of 26 and 35, but only 1 out of the 4 of
them were black (3 were white, 1 was black). All were identically dressed in an
Eisenhower jacket, slacks and no tie. Other than race, there were two conditions of the
victim: - Drunk: In 38 of the trials, the
victims smelled of liquor ​and carried a liquor bottle in a brown paper bag
- Cane: In 65 of the trials, the victims carried a black cane

Model: 4 white males between the ages


of 24 and 29 assumed the role of the model. All models wore informal clothes, however, they
were not identically attired. There were 4 different model conditions as well as the no model
condition:

○ Critical area - early: model stood in critical area and waited until passing the 4th
station (~70 seconds) to help victim
○ Critical area - late: model stood in critical area and helped after passing 6th
station (~150 seconds) to help victim
○ Adjacent area - early: adjacent after 4th station
○ Adjacent area - late: adjacent after 6th station
When the model provided assistance, he raised the victim into a seated position and stayed with
the victim until the train arrived. An equal number of trials in the no model condition and in each
of the 4 model conditions were assigned to each team by a random number table.

● Observers: Were always female and had differing roles:


○ Observer 1: noted the race, sex, and location of every person in the critical area
and that of the every helper that came to the victim’s assistance. She also
recorded the total number of individuals in the car and the number of who came
to the victim’s assistance.
○ Observer 2: noted the race, sex and location of individuals in the adjacent area.
She also recorded the latency of the first helper’s arrival.

Both observers recorded comments made by nearby passengers and attempted to elicit
comments from riders seated next to them.
PROCEDURE
The four confederates would enter the train car using different doors and they varied the car that
they used for each trial. The female confederates took seats outside of the critical area, while
the male model and victim always remained standing. The victim always stood next to the pole
in the centre of the critical area. As the train passed the first station (~70 seconds after
departing) the victim staggered forward and collapsed. Until receiving help, the victim remained
supine on the floor facing the ceiling. If the victim received no help by the time the train came to
a stop, the model helped him to his feet. At the stop, the team proceeded to another platform to
board a train going in the opposite direction for the next trial, 6-8 trials were run in a day. All
trials on a given day were in the same ‘victim condition’.
STANDARDISED PROCEDURES + VARIABLES
Standardised procedures:
1. The model would always help the victim by helping the victim into a seated position and
then staying with him until the train arrived at the station.
2. The victim would always lie supine until receiving help
3. The victim would always fall after passing the second station (~70 seconds into the
journey)

- Independent variables:
1. Type of victim (cane or drunk)
2. Race of victim (black or white)
3. Type of model (e.g. critical area - late)
4. Size of bystander group (naturally occuring)

- Dependent variables:
1. Time taken for help to be offered
2. Race of the helper
3. Gender of helper
4. Frequency of helping

- Controls:
1. Clothes that the victim and model were wearing
2. Gender of the model and the victim
3. Where the victim fell
RESULTS
Table 1: The cane victim received spontaneous help on 62 of the 65 trials (95%) and the drunk
received help on 19 of the 38 trials (50%). This difference is not due to the number of people in
the car (mean number of passengers in the car on cane trials was 45; on drunk trials was 40.
Total range was 15-120).
​Table 2: On 60% of the 81 trials in which the victim received help, they received it from more
than one person. No one left the car itself in any of the trials, however, on 21 of the 103 trials, a
total of 34 people left the critical area. People left the critical area on a higher proportion of trials
with the drunk than the cane condition. They were also far more likely to leave on trials in which
help was not given before 70 seconds, and more likely to comment. Far more comments were
obtained on drunk trials than cane trials.

Table 3: With both black and white cane victims, the proportion of helpers for each race was in
accordant with expected (45%-55%). However, for dunk victims, mainly members of his own
race came to his aid.
Table 4: The area variable has no significant effect, however, the early model elicited help
significantly more than the late model did.
​ able 5: Here, it is shown that there is no evidence for diffusion of responsibility, in fact,
T
response times for the 7 or more groups are consistently faster than the 1-3 groups. However,
Darley and Latane pointed out that different-sized real groups cannot be compared to one
another since an increased group size would increase the likelihood of helping.
​ esting diffusion of responsibility: In the Darley and Latane experiment, it was found that as
T
the number of bystanders increased, the less likely an individual was to help and the latency of
response would increase. However, in Piliavin, it was found that members of real groups
responded faster than expected.

More information:
Example comments that women made to the situations:
● ‘It’s for men to help him’ or ‘I wish I could help him - I’m not strong enough’
● ‘You feel so bad that you don’t know what to do’​

Why there were uneven numbers of trials: On their fourth day, Team 2 violated the instruction
and ran cane trials when they should have run drunk trials; the victim "didn't like" playing the
drunk! Then the Columbia student strike occurred, the teams disbanded, and the study was
over. Teams 1 and 3 had run on only3 days each, while 2 and 4 had run on 4 days.

Characteristics of first helpers: On average, 60% of people in the critical area were males,
yet 90% of the 81 first helpers were male. Also, of the 65 trials in which spontaneous help was
offered to the white victim, 68% of the helpers were white. On the 16 trials in which spontaneous
help was offered to the black victim, half of the first helpers were white. With drunk, it was
mainly members of his own race that helped.
Observing an emergency situation creates emotional arousal which can be interpreted in
different ways in different situations. Arousal increases:
1. The more we empathise with the victim
2. The closer we are to the emergency situation
3. The longer the situation goes on for

However, arousal can be reduced by:


1. Helping directly
2. Getting others to help
3. Leaving the scene
4. Rejecting the victim as undeserving of help

Whether or not we help is based on a cost reward analysis that we all do before making a
decision. The table above shows this for a woman. (Please note that these thoughts/ views may
have been more common in the 1960s, and may not necessarily represent females in
modern-day)

Reasons why findings do not follow the patterns obtained by Darley and Latane: The
observers could see the victim and they could not necessarily leave the train in order to get
away from the situation, therefore, to reduce the arousal, the participants could either help the
victim or move to another part of the carriage. Also, as group size increases, the more likely
there will be one person who will help in a given time frame. This study is also a field experiment
and not a lab.
CONCLUSIONS
1. Someone who is ill is more likely to receive help than someone who is drunk - even if the
help is of the same kind.
2. Given mixed groups of men and women with a male victim, men are more likely to help
than women
3. Given mixed racial groups, there is some tendency for same-race helping, which is
increased when the victim is drunk compared to ill.
4. There is no strong relationship between the number of bystanders and speed of helping.
5. The longer the incident continues without help being offered:

1. The less impact the model has


2. The more likely it is that individuals will leave
3. The more likely observers will discuss

EXAMPLE 10-MARKER EVALUATION FOR PILIAVIN


One strength of the Piliavin study is that it was high in internal validity as the participants
were unaware that they were a part of a field experiment. The experiment took place on the A
and D trains of the 8th avenue IND and the stooges were covert, so the participants did not
know their behaviours were being observed. This means that their behaviour would be more
natural and free of demand characteristics. This improves the internal validity as the researchers
were measuring actual responses, rather than simply acting due to social desirability bias or
acting in a way that the researchers were looking for - demand characteristics.
On the other hand, it could be argued that generalisability is low, despite the large sample
size of around 4,450 participants, because the participants were only from one specific area of
New York near Harlem, and travelled on the subway at the same time of day (11am-3pm) on
weekdays. This might mean that the participants had similar occupations or routines because
they travelled at that time of day, and they may have had similar experiences living in New York,
meaning that their altruism towards each other may differ from those living in other areas.
Therefore the findings may not be able to be generalised to other populations in the world as
their behaviours and upbringings may be different to those in the study.

However, another strength of the Piliavin et al. study is the use of both qualitative and
quantitative data. The 2 observers collected quantitative data (for example, the total number of
people in the critical area or the race and gender of the first helper) which allowed numerical
comparisons to be made between the conditions and is more objective and less open to bias.
However, a strength of this study is that the observers also noted down any remarks made by
nearby passengers, for example, women were recorded saying ‘‘It’s for men to help him’ or ‘I
wish I could help him - I’m not strong enough’. This is a strength because it allows conclusions
to be made as to why women were less likely to help than men were, and it may have helped in
the theory behind the cost-reward matrix that was used to explain whether or not a person
would help, depending on whether the rewards outweigh the risks (cost). Therefore, the use of
both qualitative and quantitative data in this study is a strength because it allows for
comparisons to be made between groups, and gives reason behind those differences in results.

On the other hand, another weakness of the Piliavin et al. study is that it is low in ethics.
Participants never gave informed consent to participate in the study and neither were they
debriefed after the study was completed. This is highly unethical, especially since the procedure
may have been distressing for some people as participants are not told that the victim is a
stooge, and they may therefore end up leaving the study wondering if the victim was okay,
especially if they received no help. Without debriefing, participants may have not only been left
to be psychologically harmed, but they were not told that their data would be used in the study
or had the right to withdraw from the experiment. The train journey was 7.5 minutes long,
meaning that the victim could have been collapsed for at least 6 minutes if he did not receive
help, which may have caused many people to be distressed, as perhaps noted in the comments
noted by the observers (‘You feel so bad that you don’t know what to do’). Therefore, this study
was highly unethical because participants did not give informed consent, nor were they given
the right to withdraw or were debriefed after the study.

YAMAMOTO ET. AL
Research Method: Laboratory experiment
Experimental Design: repeated measures design which used ABA counterbalancing. Also
used structured, controlled observations
Unlike humans, animals help more often as a direct request of their conspecific rather than
voluntarily. Targeted helping is based on a cognitive understanding of the need or situation of
others, and therefore to display targeted helping, the helper must display a ‘theory of mind’.
Some people believed that ‘theory of mind’ and therefore altruistic helping is only attributed to
humans, however, previous studies had shown that some nonhuman primates can help or share
food with their conspecifics without any direct benefit to themselves (e.g. cotton-top tamarin,
capuchin, marmoset, bonobo and chimpanzees). Chimpanzees have been shown to offer
targeted help at direct request, but it is not yet known whether they can actually interpret the
needs of their conspecifics.
The Savage-Rumbaugh studies trained chimpanzees to give tools based on request using
symbols using a lexigram board. One symbol would represent a tool that their conspecific
needed, and the chimpanzee would be conditioned to give that tool. Though this study improved
the knowledge of symbolic communication in primates, it provided limited insight into helping
behaviour and its mechanisms. This is because the chimpanzees were trained to give the tools,
which may bias the results of this study because the chimpanzees may not be displaying true
altruistic behaviour as it is unclear whether or not the understood their conspecific’s needs, or if
they just attributed the symbol to the tool.

Why did previous studies fail to examine whether chimpanzees actually understood what
others needed?
The potential helper was never confronted with a behavioural choice when given the opportunity
to help in previous studies. In this study, they achieved this by allowing the conspecific to
‘request’ an item by poking their arm through the hole in the panel.
AIMS
To learn more about altruistic behaviour in chimpanzees by investigating whether:
● chimpanzees can understand the needs of conspecifics
● chimpanzees can respond to those needs with targeted helping

SAMPLE

The participants were 5 chimpanzees (Ai, Ayumu, Pan, Pal and Cleo) taken from 3 mother-child
pairs. Chloe did not take part in the study, she was only ever receiving the help, because she
was not cooperating with the researchers.
They were recruited from the Primate Research Institute at Kyoto University - it was an
opportunity sample. The participants had participated in studies on helping behaviour
beforehand.

PROCEDURE (APPARATUS USED)


Features of the tray:
The tray consisted of seven objects (a stick, a straw, a hose, a chain, a rope, a brush and a
belt). Only one of the seven objects (the stick or the straw) could serve as an effective tool for
the conspecific to successfully obtain their juice reward. The items were randomly presented on
a tray (26cmX36cm)
Features of the experimental booth:
Participants were tested in two adjacent experimental booths (136X142cm and 155X142cm,
both 200cm high). A hole (12.5X35cm) in the panel-wall divider separating the two participants
was located ~1m above the floor.
The juice container was located behind a wall and was either out of reach (stick condition) or did
not have a straw (straw condition) so the conspecific could not drink the juice (reward) unless
the participant gave them the correct tool through the slot.

​Type of cameras used:


Participant’s behaviour was recorded with three video cameras that were Panasonic NV-GS150
models.

The chimpanzees each had 8 5-minute sessions where they could freely manipulate the seven
objects without any tool-use situation before testing. Participants would complete one
familiarisation session a day. The chimpanzees only gave a tool to their conspecific 5% of the
time in this trial phase, suggesting that they were not motivated to give their conspecifics a tool
because no tool task was available.
PROCEDURE
The task consisted of one chimpanzee being in the booth with the tray of 7 tools, and their
conspecific being in the adjacent booth with one of two problems: needing a stick to reach the
juice, or needing a straw to drink the juice. The chimpanzee who was being observed then had
to provide the correct tool for their conspecific’s situation.
There were 3 conditions in this study. The first was the ‘can see’ condition, then ‘cannot see’,
and then a second ‘can see’ condition to check for order effects. Each condition had 48 trials,
half of which had the correct tool as the stick and the other half as the straw. The stick or straw
order was randomised, again to control for order effects. There were around 2-4 trials conducted
per day.
All 5 participants (Ai, Ayumu, Chleo, Pan and Pal) completed the first ‘can see’ and ‘cannot
see’ conditions, but only Ai, Cleo and Pal were selected to complete the second ‘can see’
condition.
The trial finished when the recipient obtained an offered tool, or after 5 minutes of no tool
being offered. Only the first offer was counted. Qualitative and quantitative data were gathered
from the film of the chimpanzees behaviour, taken with video cameras. Qualitative data included
chimpanzee’s gestures or movements, and quantitative data was the number of correct offers
given/ what tool was offered first.
How was an ‘offer’ operationalised?
An ‘offer’ was operationalised as a chimpanzee holding a tool for the other, even if they did not
take it.
How was an ‘upon request offer’ operationalised?
The conspecific of the chimpanzee would stick their hands through the slot in the panel to
request an item and the chimpanzee would offer an item as a result of the request.

Features of the conditions:


‘Can see’ ‘Cannot see’
● Panel that divides the booths was ● Panel that divided the booths was
transparent opaque
● Helper chimpanzee was able to see ● There was still the hole in the panel
what their conspecific needed ● Helper chimpanzee could not assess the
● This condition was completed twice needs of their conspecific
● Completed once
VARIABLES AND STANDARDISED PROCEDURES
Variables:
- Independent variable:
● Whether the chimpanzee could see their conspecific’s situation (if the panel between the
adjacent rooms was transparent or opaque)

- Dependent variable:
● The targeted helping behaviour.

This was operationalised as the items offered by the chimpanzees to their conspecifics. This
was either the correct tool (stick or straw) or a non-correct tool (e.g. a piece of string)
- Controls:
1. The stick condition vs. the straw condition being used were randomised to ensure that
there were limited order effects which could affect the chimpanzees behaviour and
therefore the results.
2. As a further control for order effects, ABA counterbalancing (can see, cannot see, can
see) was used to ensure practice effects were reduced

Examples of standardised procedures:


1. They used the exact same tool options available to the chimpanzees in the tray (a stick,
a straw, a hose, a chain, a rope, a brush and a belt)
2. Same reward was used (grape-flavoured juice)
3. The same predicaments of the conspecific were used (i.e. stick needed because juice is
out of reach and straw needed because juice box has no straw and the juice is therefore
undrinkable)
RESULTS
Trial stages: The chimpanzees only gave a tool to their conspecific 5% of the time in this trial
phase, suggesting that they were not motivated to give their conspecifics a tool because no tool
task was available.
First ‘can see’ condition: Object offer occurred mainly upon the recipient’s request. An ‘upon
request offer’ accounted for 90% of all offers. This result shows that direct request is important
for the onset of targeted helping in chimpanzees. All the chimpanzees, except form Pan, first
offered potential tools significantly more frequently than non-tool objects. Pan most frequently
offered the brush (79.5% of her first offers), but when eliminating her brush offers, her results
were similar to the other chimpanzees. Also, helpers selected to offer more frequently a stick (or
straw) when their partner was faced with the stick-use (or straw-use) situation. Therefore,
chimpanzees demonstrated flexible targeted helping based on their partner’s predicaments.
‘Cannot see’ condition: The chimpanzees continued to help in this condition, offering an object
in 95.8% of trials. Pan still showed a preference for offering the brush, however, all other
chimpanzees gave tool objects much more frequently than non-tool objects. Unlike the ‘can see’
condition however, the correct tool was offered around half of the time, suggesting that the
chimpanzees could not properly assess their partner’s situation and therefore could not give the
right tool. Ayumu is the exception to this rule however, because he looked through the slot, and
therefore, he performed the same as he did in the ‘can see’ condition.
Second ‘can see’ condition: there was an offer 97.9% Of the time (upon request accounted for
79.4%) . The 3 participants offered the correct tool more frequently. The results were very
similar to that of the first ‘can see’ condition, showing results were not down to order.

Pan’s results differ from the other chimpanzees because she tended to favour the brush.
However, her second offering was often the correct tool and corresponded with other’s.

Why was Ayumu looking through the slot significant in terms of targeted helping?
Ayumu’s behaviour (i.e. selecting the appropriate tool after assessing his partner’s situation by
peeking through the hole) further demonstrates that the chimpanzees depended on visual
assessment of their partner’s situation to acquire the necessary information to appropriately
help their partner by providing the correct tool.

CONCLUSIONS
1. Chimpanzees are able to understand the needs of conspecifics and help them to solve
tasks at no benefit to themselves
2. Chimpanzees will offer help in most cases, but more likely at the direct request of
another chimpanzee than spontaneously. Even if they can visually assess their partner;s
situation, they seldom help others unless directly requested.
3. Chimpanzees require visual confirmation to understand a conspecific’s goal and to offer
targeted help - they are unlikely to be able to offer flexible targeted helping when they
are unable to see their conspecific’s predicament.

EXAMPLE 10-MARKER EVALUATION FOR YAMAMOTO


One strength of the study by Yamamoto et al. is that reliability was high because procedures
were standardised. For example, they used the exact same 7 tools for each experiment which
were the straw, stick, chain. belt, rope, hose and brush. This ensures that the experiment can be
easily replicated by other psychologists who should find the same results when following the
highly detailed procedure. It can be seen that the procedure is highly detailed as precise
measurements were given, for example, the exact dimensions of the experimental booths
including the fact that the slot between the two booths was approximately 1m off the ground.

On the other hand, one weakness of this study is that generalisability was quite low because
five chimpanzees were used from the same institute. This is not a lot and since only one type of
primate was used means that the results may not be generalisable to other primates. In
addition, since they all came from the same institute and were often used for these types of
behavioural testing, they are not wild chimpanzees and may be conditioned to performing
altruistically through other behavioural experiments. Another reason why generalisability is quite
low is that they used mother-child pairs, therefore the altruistic behaviour may be
maternal/family instinct involved in helping the other chimpanzee, and therefore you may not be
testing pure altruism. However, we can presume that we can generalise to the whole population
as the theory of mind is seen as a biological or evolutionary factor, and therefore, the small
sample size is not that much of a problem.

Another weakness of the study by Yamamoto et al. is that the validity is quite low as a
repeated measures design was used which could lead to order effects affecting the results,
despite the use of counterbalancing. In addition, since it was a repeated measures design,
individual differences may have played a key role in determining the results which may not have
been the case with an independent groups design with a larger sample size. For example, Pal
gave the right tool 100% of the time in the second condition which may not be the case for most
chimpanzees. Ecological validity is also low because it was conducted in an artificial setting
which would not be the chimpanzee’s normal environment.

On the other hand, another strength of this study is that it is high in ethics because low
numbers of the chimpanzees were used. Only 6 chimpanzees participated in the study,
therefore it did not cause unnecessary stress to too many animals. Also, they were housed in
the institute that they were being tested in and the handlers were very professional which
minimised the amount of stress that the chimpanzees were under. In addition, since the
chimpanzees were used to participating in other studies, their stress levels would not be as high
as, for example, a chimpanzee from the wild as the environment would be very unfamiliar to
them. Also, there was no aversive stimuli used and therefore, ethics is higher.

You might also like