Life Expectancy Analysis

Angel Sanchez
7 min readDec 15, 2020



There have been a lot of studies on life expectancy focusing on variables such as country, healthcare, age, and other demographics. This “Life Expectancy (WHO)” Data set provides a look at this topic. The data comes from The Global Health Observatory (GHO) data repository. GHO is a repository managed by the World Health Organization (WHO). This dataset is available on Kaggle for the purposes of data analysis. The set looks at a period from 2000 to 2015 of 193 countries. The set divides into 4 categories: Immunization, Mortality, Economics, and Social factors.

This report will focus on economic and social factors. Life expectancy is the dependent variable that will be analyzed. Countries are responsible for improving the lives of their citizens. How does a country increase the quality of life for its citizens? If the income of the population, GDP of the country, and schooling is increased, then there will be an increase in life expectancy. Whether a country is developed or developing also plays a role in this. The purpose of this report is to show hidden patterns that will support the previous statements.


During the pre-processing of the source file in excel, none of the categorical data was missing, for the columns. No missing values for “Country” or “Status”. Visual inspection of the table in the power query showed missing quantitative values. Upon further inspection, many nulls and zeros existed in the set. The columns underwent processing to remove nulls and zeros. The calculated means of each column imputed the missing values using powerquery. The processed data set has 22 columns and 2938 rows with 20 predicting variables.

Statistical Analysis

“Life Expectancy”

This analysis views the variances and frequency distribution of the histogram titled “Life Expectancy”. The mean life expectancy is 69 years, which is sensitive to extreme values. The standard deviation of life expectancy is 9.51. Since that is approximately 10 points away from the mean this set has a lot of variances. The kurtosis of life expectancy is -0.225386897. The Kurtosis represents the shape of the histogram, in terms of sharpness or flatness. A negative Kurtosis is representative of a flatter distribution. The skewness of life expectancy is -0.638847584. The Skewness also represents the shape of a histogram in terms of symmetry. In this case, because it is a negative value it will be skewed to the left.

“Status Box and Whisker”

The analysis also highlights the country's status. A box and whisker plot titled “Status Box and Whisker” showed the spread of developing and developed countries. The whiskers show you the max and min whereas the horizontal line shows the median, and the X shows the mean. For both categories, there is not a large gap between the mean and median. There are 2 outliers in the “Developing” category. Further statistical analysis was done on other variables. These features can be analyzed in a future report.

Exploratory Data Analysis

Many relationships exist between variables in this data set. This report will focus on status, GDP, Schooling, and Income Composition. The report will explain the hidden patterns that show an increase in life expectancy. Serval visualizations within the report show these relationships.

“Country Status”

A column chart titled “Country Status” shows the status of a country and life expectancy. For the column titled “Developed” the average life expectancy is 79 years. For developing countries that life expectancy is 67 years. According to the data, life expectancy in developed countries increases by 12 years.

“Life Expectancy and Schooling”

A combo chart titled “Life Expectancy and Schooling” shows the life expectancy in relation to education. 17 years of education is indicative of having an average life expectancy of 89 years. This means the more educated you are the longer your life tends to be.

“GDP & Life Expectancy”

Gross domestic product or GDP is the total monetary value of goods and services produced within a country within a time-period and shows the economic health of a country (Fernando). In the column chart titled “GDP & Life Expectancy”, GDP shows the economic health as it relates to life expectancy. When the GDP is approximately $40,000 the life expectancy on average is 88 years. This shows that as the economy does well the life expectancy tends to increase with it.

“Average of Income Composition of Resources”

Income composition is the rate at which society has a higher, middle, or lower-income class (Ranaldi). The bar chart titled “Average of Income Composition of Resources” covers this variable. An income composition rate of 89% has a life expectancy of approximately 84–85 years. This means that if there is a high-income composition rate the life expectancy will be high as well.


“Life Expectancy Heat Map”

A correlation calculated on all 20 predicting variables shows relationships. The table title “Life Expectancy Heat Map” shows the correlation in conditional formatting. Green represents a positive correlation and red represents a negative correlation. If a relationship between 2 features has a high correlation, the correlation number will be close to +1 or — 1. GDP, Income composition, and Schooling have a positive correlation with Longer life Expectancy.

“Income composition of Resources”

Income composition and life expectancy have a correlation value of 0.83. This means that as income composition goes up so too does the life expectancy. The “Income composition of Resources” scatter plot shows the spread of the relationship with the “R-Squared” value being 0.69. This means the strength of the correlation is strong.


Schooling and life expectancy have a correlation value of 0.75. This means that as schooling goes up life expectancy increases as well. The “Schooling” Scatter plot shows the spread of the relationship with the “R-Squared” value being 0.56. This is not as strong as income composition, but it is still a strong correlation.


GDP and life expectancy have a correlation value of 0.43. This means that as GDP goes up so too does the life expectancy. The “GDP” Scatter plot shows the spread of the relationship with the “R-Squared” value being 0.19. This is not a strong correlation indicating that these variables are slightly related. GDP seemed promising in the exploratory analysis. It seems that it is not a good predictor of life expectancy.


World leaders have a responsibility to improve the lives of their citizens. Government officials create policies and laws that affect the quality of life. To make efficient laws investigation is necessary. A great place to start making these investigations is understanding what prolongs the life of a population. Other reports can also investigate what indicators predict a negative effect on a population.

This report supported most of the assumptions made in the hypothesis. life expectancy in developed countries increases by 12 years. An average of 17 years of education increased life expectancy to 89 years. GDP and Income composition increased life expectancy as well. From a social perspective, life expectancy can be an indicator of the quality of life in a country. Lawmakers and leaders in every country should be making evidence-based decisions to improve their societies. For a country to prosper, it must listen to data and research.


Fernando, J. (2020, November 13). Gross domestic product (GDP). Investopedia.

Ranaldi, M. (2020, October 28). How changes in income composition inequality challenge our thinking about socio-economic classes. Stone Center on Socio-Economic Inequality.

World Health Organization. (2017). Life expectancy (WHO). Kaggle: Your Machine Learning and Data Science Community.

Ziring, S. (2020). Life expectancy (WHO) Discussion. Kaggle: Your Machine Learning and Data Science Community.

Contact me