Pearson Correlation Coefficient

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Pearson correlation coefficient measures the linear correlation between two variables.

It has a value between +1 and −1, where 1 is total positive linear correlation, 0 is no
linear correlation and −1 is total negative linear correlation. It’s often denoted by r for
sample correlation and ρ for population correlation.

Note: Pearson Correlation only measures the linear relationship between two variables,


such as y = a*x + b. There are other measurements for nonlinear correlations.

When option “R” for "Correlation variants" is chosen, the value would be the same as
TradingView's built in correlation() function. For "Adjusted R ", the calculation is based on
the traditional Pearson. The sample r is a biased estimate of population ρ. The adjusted r
gets rid of some of the bias but not all. As the sample size or lookback period increases,
adjusted r will be closer to r.

The confidence interval is computed for population ρ estimation based on sample


r. Correlation coefficient itself doesn’t follow a normal distribution. Fisher
transformation is applied to transform the data into an approximately normal
distribution. We compute the standard error based on the transformed data, then use an
inverse fisher transform to transform back the standard error in terms of r.

Note: the confidence interval band is an approximation of population, it proposes a


range of plausible r values (instead of a point). The confidence level represents the
frequency (i.e. the proportion) of possible confidence intervals that contain the true
value of the unknown population parameter. The proportion of those intervals that
contain the true value of the parameter will be equal to the confidence level. For
example, if the confidence level is 95% then in hypothetical indefinite data collection, in
95% of the samples the interval estimate will contain the population parameter. The
default setting is 1.96* standard error which is 95% confidence interval.

The most important and distinguishable feature of this indicator is the p-


value provided along with the correlation.

The value of Correlation Coefficient alone doesn’t provide any information regarding


its statistical significance. For example, two sets of independent samples have 0
correlation in theory. However, your correlation coefficient on these samples will never
actually show 0 correlation (small correlation value but not 0). Therefore without a
significance test, one would be fooled by the value of r when there’s no linear
relationship at all.

In statistical hypothesis testing, the p-value or probability value is the probability of


obtaining test results at least as extreme as the results actually observed during the test,
assuming that the null hypothesis is correct. The smaller the p-value, the stronger the
evidence that the null hypothesis should be rejected and that the alternate hypothesis
might be more credible. Since one could be deceived by r showing values while
correlation is actually 0. The null hypothesis here is the “r is 0”. The alternative
hypothesis is “ r is not 0”. The default setting for p critical value is 0.05. It means that
when p is lower than 0.05, there’s less than 5% chance that correlation is 0, and we
consider that to be "significant correlation". To get the p-value, We use a t distribution
with n – 2 degrees of freedom to find the probability. P-value will adjust automatically
when the sample size or lookback changes.

Displays :
When p is lower than 0.05 and r > 0, correlation coefficient shows red, p-value shows
yellow, panel shows “Significant Positive Correlation”.
When p is lower than 0.05 and r < 0, correlation coefficient shows green, p-value shows
yellow, panel shows “Significant Negative Correlation”.
When p is higher than 0.05, correlation, correlation coefficient shows white, p-value
shows grey, panel shows “Insignificant Correlation”.

r² (r squared) also known as the coefficient of determination, is the square of correlation


r. r² measures how well the data fit the linear regression model used in correlation. When
two assets show significant correlation, r squared can be used to compare which one fits
the data better. r² is displayed on the panel and has a different lookback by default than
the correlation coefficient .

Contributors : Pig (ideas, code, math and design), Balipour (ideas),


midtownsk8rguy(applying/employing Pine etiquette).
May 9
Release Notes: minor update to add public license.
May 9
Release Notes: minor fix to adjust panel's positioning.
May 10
Release Notes: added feature to assist the visually impaired.
May 11
Release Notes: Modified the security() function to avoid lookahead bias.
May 11
Release Notes: removed the mtf function.
May 15
Release Notes: update to modify default position of the panel.

=========================

// This source code is subject to the terms of the Mozilla Public License 2.0 at
https://mozilla.org/MPL/2.0/

// © balipour
//@version=4

study("Correlation with P-Value & Confidence Interval [pig]", "BA🐷 CC", false, format.price, 3)

var invisible = color(na)

bgcolor(#000000c0)

var cPI = 2.0 * asin(1.0) // 3.1415926536 Constant

//===== Functions =====//

cc(x, y, len) => // Correlation Coefficent function

lenMinusOne = len - 1

meanx = 0.0, meany = 0.0

for i=0.0 to lenMinusOne

meanx := meanx + nz(x[i])

meany := meany + nz(y[i])

meanx := meanx / len

meany := meany / len

sumxy=0.0, sumx=0.0, sumy=0.0

for i=0 to lenMinusOne

sumxy := sumxy + (nz(x[i]) - meanx) * (nz(y[i]) - meany)

sumx := sumx + pow(nz(x[i]) - meanx, 2)

sumy := sumy + pow(nz(y[i]) - meany, 2)

sumxy / sqrt(sumy * sumx)

adj(r, n) => // Unbiased Adjusted R Estimation Approximate Function

(1 + (1 - pow(r, 2)) / (2 * n)) * r

Round(src, digits) => // Round Function


p = pow(10, digits)

round(abs(src) * p) / p * sign(src)

xp(offset) => // Offset

time + round(change(time) * offset)

_label(offset,P, T, s, color_PnL) => // Info Panel

label PnL_Label = na

label.delete(PnL_Label[1])

PnL_Label := label.new(xp(offset), P, text=T, color=color_PnL, textcolor=color.white, style=s,


yloc=yloc.price, xloc=xloc.bar_time, size=size.normal)

//===== Inputs =====//

src = input( close, "========= Source =========", input.source )

sec1in = input( "SPX", "Comparison Security", input.symbol , confirm=true )

mode = input("Adjusted R", "Correlation Variants ", input.string , options=["R", "Adjusted R"])

len = input( 20, "Correlation Lookback Length", input.integer, minval=2)

//Stats Settings

sc = input(true, "Show Confidence Interval for Population", input.bool )

csd = input(1.96, "Confidence Interval SD Multiplier", input.float , minval=0.1, step=0.1) //Default


95%

sp = input(true, "Show P-Values", input.bool )

cp = input(0.05, "P-Value Significant Confidence Level", input.float , minval=0.0, step=0.01) //Default


= 1- 0.05 = 95%

pan = input(true, " Show Information Panel", input.bool )

rlen = input( 50, "  R Squared Length", input.integer, minval=2)

os = input( 40, "  Panel Position Offset", input.integer, minval=0)

lT = input( 1, "--- Line Thickness ---", input.integer, options=[1,2,3])


sec1 = security(sec1in, timeframe.period, close)

//===== Calculations =====//

R = cc(src, sec1, len) // Traditional Pearson

adjr = adj( R, len) // Adjusted R

float r = na

if(mode == "R")

r := R

if(mode == "Adjusted R")

r := adjr

R2 = pow( cc(src, sec1, rlen) , 2) // R Squared

adjR2 = pow(adj(cc(src, sec1, rlen), rlen), 2) // R Sqaured Based on Adjusted R

float r2 = na

if(mode == "R")

r2 := R2

if(mode == "Adjusted R")

r2 := adjR2

// Fisher Transform

z = 0.5 * log((r + 1.0) / (1.0 - r)) // Fisher

se = 1.0 / sqrt(len - 3) // Standard Error

zl = z - csd * se // Lower Limit for fisher z

zu = z + csd * se // Upper Limit for fisher z 95% confidence

// Inverse Fisher Transform to Transform Back to r


rl = (exp(2.0 * zl) - 1.0) / (exp(2.0 * zl) + 1.0) // Lower limit for r

ru = (exp(2.0 * zu) - 1.0) / (exp(2.0 * zu) + 1.0) // Upper limit for r

// P Test

t = sqrt((1.0 - pow(r , 2)) / (len - 2)) // T value based on t distribution degrees of freedom 2

Pvalue = exp(-0.5 * pow(r / t, 2)) / (t * sqrt(2.0 * cPI)) // P Value

pro = Pvalue > 1.0 ? 1.0 : Pvalue // Limit P value overshoot 1

//===== Plotting =====//

colorCC = (pro < cp and r < 0.0) ? #FF0000ff :

(pro < cp and r > 0.0) ? #00FF00ff :

#FFFFFFff

colorP = (pro < cp) ? #FFFF00ff : #C0C0C040

plot( sp ? pro : na, color=colorP, title="P Value", style=plot.style_columns)

plotUpper = plot( rl, color=sc ? color.new(#00C0FFff,100) : invisible, style=plot.style_linebr,


title="Confidence Interval Lower" )

plotLower = plot( ru, color=sc ? color.new(#00C0FFff,100) : invisible, style=plot.style_linebr,


title="Confidence Interval Higher")

fill(plotUpper, plotLower, color=sc ? color.new(#00C0FFff, 85) : invisible)

plot( r, linewidth=lT , color=colorCC , style=plot.style_linebr, title="🐷 Correlation")

plot(sp ? cp : na, color=color.new(#C0C0C0ff,30), trackprice=true, show_last=1, title="P value


Threshold", style=plot.style_linebr)

plot(sp ? na : 0, color=color.new(#C0C0C0ff,30), trackprice=true, show_last=1, title="Zero Line")

hline( 1.0, color=color.new(#00FFFFff,30))

hline( -1.0, color=color.new(#FF00FFff,30))

// Information Panel
sig() =>

return = (pro < cp) and r > 0 ? "Significant Positive Correlation" :

(pro < cp) and r < 0 ? "Significant Negative Correlation" :

"Insignificant Correlation"

if(pan)

txt = "R : " + tostring(Round(r ,3)) +

"\n\n R Squared : " + tostring(Round(r2 ,4)) +

"\n\n P Value : " + tostring(Round(pro,4)) +

"\n\n" + sig()

You might also like