Logistic Regression with R

In myfirst blog post, I have explained about the what is regression? And how linear regression model is generated in R? In this post, I will explain what is logistic regression? And how the logistic regression model is generated in R?

Lets first understand logistic regression. Logistic regression is one of the type of regression and it is used topredict outcomeof the categorical dependent variable. (i.e. categorical variable has  limited number of categorical values) based on the one or more independent variables. For example, if you would like to predict who will win the next T20 world cup, based on players strength and  other details. It is a prediction done with categorical variable. Logistic regression can be binomial  or multinomial.

In the binomial or binary logistic regression, the outcome can have only two possible types of values (e.g. Yes or No, Success or Failure). Multinomial logistic refers to cases where the outcome can have three or more possible types of values (e.g., good vs. very good vs. best ). Generally outcome is coded as 0 and 1 in binary logistic regression. We will use binary logistic regression in the rest of the part of the blog. Now, we will look at how the logistic regression model is generated in R.

To fit logistic regression model,glm()function is used inRwhich is similar tolm(), butglm()includes additional parameters. The format is

glm(Y~X1+X2+X3, family=binomial(link=logit), data=mydata)

Here, Y is dependent variable and X1, X2 and X3 are independent variables. Function includes additional parameterfamilyand it has valuebinomial(link=logit)which means the probability distribution of regression model isbinomialand link function islogit (Refer bookR in Actionfor more information).Lets generate a simple model. Suppose we want to predict whether a student will get admission based on his two exam scores. For this problem we have a historical data from previous applicants which can be used as the training data set to build a model. The data set contains the following parameters.

In the above parameters, parameteradmittedhas value 1 or 0 for each observation. Now, we will generate a model that can predict, will student get admission based on two exam scores? For a given problem,admittedis considered as dependent variable,exam_1andexam_2are considered as independent variables. The R code for the model is given as  below.

After generating the model, lets try to predict using this model. Suppose we have two exam marks of a student, 60 of exam_1 and 85 of exam_2. We will predict that will student get admission? Following is R code for predicting probability of student to get admission.

Here, the output is given as a probability score which has value in range 0 to 1. If the probability score is greater than 0.5 then it is considered as TRUE. If the probability score is less than or equal to 0.5 then it is considered as FALSE. In our case 1 or 0 will be considered as the output to decide, will student get admission or not? if it is 1 then student will get admission otherwise not.  So I have usedround()function to convert probability score to 0 or 1. It is as below.

Output is 1 means a student will get admission. We can also predict for other observations in the above manner. Finally we understood what is logistic regression? And how it works inR? If you want to do the same exercise,Click herefor R code and sample data set of above example. In thenext blog, we will discuss about a specific problem for Google Analytics data and see how to use logistic regression into?

Would you like to understand the value of predictive analysis when applied on web analytics data to help improve your understanding relationship between different variables? We think you may like to watch our Webinar How to perform predictive analysis on your web analytics tool data.Watch the Replay now!

The following two tabs change content below.

Amar is data modeling engineer at Tatvic. He is focused on building predictive model based on available data using R, hadoop and Google Prediction API. Google Plus Profile: :Amar Gondaliya

Predictive analysis on Web Analytics tool data

Predict Users Return Visit within a day part-3

Predict Users Return Visit within a day part-2

Web Page Sequencing Analysis Using Google Analytics

Lead Generation Form Analysis Part II

Web Analytics Visualization through ggplot2

Your email address will not be published.Required fields are marked*

Facebook Ads Install Data Missing in Google Analytics? We are at your rescue

Infographic Comparison: Google Analytics Standard vs. Google Analytics 360

Quick Tip: How to Identify Facebook In-App Browser

Implement dataLayer and Google Tag Manager: Best Practices and Advantages

Leverage on customTask API to achieve to your desired tracking goals