Pearson correlation coefficient

What is Pearson Correlation?

Correlation between sets of data is a measure of how well they are related. The most common measure of correlation in stats is the Pearson Correlation. The full name is the Pearson Product Moment Correlation (PPMC). It shows the linear relationship between two sets of data. In simple terms, it answers the question, Can I draw a line graph to represent the data? Two letters are used to represent the Pearson correlation: Greek letter rho (ρ) for a population and the letter “r” for a sample.

How to Find Pearson’s Correlation Coefficients

By Hand

Example question: Find the value of the correlation coefficient from the following table:

Subject Age x Glucose Level y
2 21 65
4 42 75
6 59 81

Step 1: Make a chart. Use the given data, and add three more columns: xy, x2, and y2

Subject Age x Glucose Level y xy x2 y2
2 21 65      
4 42 75      
6 59 81      

Step 2: Multiply x and y together to fill the xy column. For example, row 1 would be 43 × 99 = 4,257

Subject Age x Glucose Level y xy x2 y2
2 21 65 1365    
4 42 75 3150    
6 59 81 4779    

Step 3: Take the square of the numbers in the x column, and put the result in the x2 column. 

Subject Age x Glucose Level y xy x2 y2
2 21 65 1365 441  
4 42 75 3150 1764  
6 59 81 4779 3481  

Step 4: Take the square of the numbers in the y column, and put the result in the y2 column. 

Subject Age x Glucose Level y xy x2 y2
2 21 65 1365 441 4225
4 42 75 3150 1764 5625
6 59 81 4779 3481 6561

Step 5: Add up all of the numbers in the columns and put the result at the bottom of the column. The Greek letter sigma (Σ) is a short way of saying “sum of” or summation. 

Subject Age x Glucose Level y xy x2 y2
2 21 65 1365 441 4225
4 42 75 3150 1764 5625
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022

Step 6: Use the following correlation coefficient formula.

The answer is: 2868 / 5413.27 = 0.529809 From our table:

  • Σx = 247
  • Σy = 486
  • Σxy = 20,485
  • Σx2 = 11,409
  • Σy2 = 40,022
  • n is the sample size, in our case = 3

The correlation coefficient =

    • 3(20,485) – (247 × 486) / [√[[6(11,409) – (2472)] × [6(40,022) – 4862]]]

= 0.5298 The range of the correlation coefficient is from -1 to 1. Our result is 0.2649 or 26.49%, which means the variables have a low positive correlation.

 

Types of correlation coefficient formulas.

There are several types of correlation coefficient formulas. One of the most commonly used formulas is Pearson’s correlation coefficient formula. If you’re taking a basic stats class, this is the one you’ll probably use:

Types of correlation coefficient formulas.

There are several types of correlation coefficient formulas. One of the most commonly used formulas is Pearson’s correlation coefficient formula. If you’re taking a basic stats class, this is the one you’ll probably use:

Two other formulas are commonly used: the sample correlation coefficient and the population correlation coefficient.

Sample correlation coefficient


Sx and sy are the sample standard deviations, and sxy is the sample covariance.

Population correlation coefficient


The population correlation coefficient uses σx and σy as the population standard deviations, and σxy as the population covariance.