Finding the Line of Best Fit in Linear Regression using Different Methods

In this article we’ll come up with the line of best fit for linear regression using the least squares equation as well as the normal equation from linear algebra. While these methods are inter-connected, it is helpful to walk through the logic enshrined in the approaches. We’ll use SKLearn’s demo dataset ‘Boston’ for this purpose. We’ll then compare our values with SKLearn’s inbuilt LinearRegression class.

Read More

Predicting Deaths from Diseases Related to Air Pollution (and Quantifying the Effect of Air Pollution )

Data Science Problem:

My project’s first objective was to predict the number of deaths from diseases related to air pollution: namely Ischaemic Heart Disease (IHD) and Chronic Obstructive Pulmonary Disease (COPD). The second objective was to isolate and quantify the effect of air pollution in those deaths. I decided to structure my problem so that I was predicting deaths for each five year age group (for example, 65-70 years) for each county in the continental United States.

Read More

Relationship between School Safety and Attendance Rates

This study looks at the relationship of different variables on attendance rates in NYC public high schools, and attempts to extract the effect of school safety in particular. Both school safety. as well as school attendance rates are both a critical focus for schools adiminstation entities.

Read More