How to run Linear regression in Python
Scikit-learn is a powerful Python module for machine learning. It contains function for regression, classification, clustering, model selection and dimensionality reduction. Today, I will explore the sklearn.linear_model module which contains “methods intended for regression in which the target value is expected to be a linear combination of the input variables”.
- The first step is to import the required Python libraries into Ipython Notebook.
- import Boston data set into Ipython notebook and store it in a variable called boston.
- convert boston.data into a pandas data frame.
- add these target prices to the bos data frame.
- import linear regression from sci-kit learn module. Then I am going to drop the price column as I want only the parameters as my X values. I am going to store linear regression object in a variable called lm.
- print the intercept and number of coefficients.
- construct a data frame that contains features and estimated coefficients.
As you can see from the data frame that there is a high correlation between RM and prices. Lets plot a scatter plot between True housing prices and True RM.