Help needed] multinomial logistic regression in R

use the following search parameters to narrow your results:

find submissions in subreddit

find submissions by username

search for text in self post contents

include (or exclude) results marked as NSFW

e.g.subreddit:aww site:imgur.com dog

advanced search: by author, subreddit…

Homework questions are forr/homeworkhelpHow to ask a statistics questionModmail usif your submission doesnt appear right away, its probably in the spam filter.

This is a subreddit for the discussion of statistical theory, software and application.

This is not a subreddit for homework questions.They will be swiftly removed, so dont waste your time! Please kindly post those over at:r/homeworkhelpor/r/AskStatistics. Thank you.

Please try to keep submissions on topic and of high quality.

Just because it has a statistic in it doesnt make it statistics.

Memes and image macros are not acceptable forms of content.

Self posts with throwaway accounts will be deleted by AutoModerator

UC-Irvine Machine Learning Repository

Kaggle- also great for stats competitions

r-bloggers- blog aggregator with statistics articles generally done with R software.

This is an archived post. You wont be able to vote or comment.

[Help needed] multinomial logistic regression in R(self.statistics)

submitted5 years agobySwiss_redditor

I am working at the moment on my master thesis. I am analyzing my data at the moment and I have a lot of trouble to get through it.

I have been told inthat threadthat I should use multinomial logistic regression.

I will just explain here what I have as data (each bullet is a column in my text file):

Where date and time are the value for the day and time when the activity (on x and y axis) were registered. The temperature is the temperature when the activity was registered.

I have made a recursive partitioning to find out for each activity to what behavior it correspond (based on field data I gathered).

The valley is where the individual lived (2 different valleys)

The age corresponds to how old the individual is.

The month is in what month the data was registered. Same for the year.

Kid is a Yes or No variable (if the individual has a kid or not). And the Individual is the number associated to him (the tag on his ear).

By individual I mean alpine chamois (Rupicapra rupicapra). All the data (except from the one I gathered on the field) is GPS data.

My goal is to check for correlation between the Behavior (4 different possible: [R]esting, [F]eeding, [M]oving, [R]uning) and the temperature, month, year, age, valley (and maybe later if it has a kid or not), also I want to account for the individual (since they might not all behave the same way all the time).

Now my big problem is that I cant do that on R, I do not have the knowledge and despite my tries I cant get to what I want (plots of correlation and p-value associated).

I hope someone here can help me with that, If it is requested, I can pay (unfortunately not very much since I am still a student). If it is the case PM me so we can talk about it.

I really hope someone here can help me with that (also I can put your name in the acknowledge part of the paper that will result of this study).

Finally I want to apologies for the grammar/orthography of this post; English is not my main language.

2 points3 points4 points5 years ago(8 children)

I googled multinomial logistic regression in R and my first result (not surprisingly) was fromUCLA stats. Does this work?

There are other ways to do multinomial logistic regression, but this should work. Heres the help page:

Obviously I dont know the names of your variables, so youll have to sub them in. Im also not going to claim that this is correct, as your data may have some dependence issues, but this is the command Ive used for multinomial logistic regression before.

1 point2 points3 points5 years ago(1 child)

The multinom() function indeed is the easiest for fitting a multinomial logistic regression.

You could also use the mlogit() function, but this requires a bit more data manipulation to work since it only accepts its own data format.

If you have any further questions, be sure to ask.

[S]0 points1 point2 points5 years ago(0 children)

[S]0 points1 point2 points5 years ago(0 children)

[S]0 points1 point2 points5 years ago(4 children)

Hey, so I have tried your formula and here are my problems :

Coefficients: (Intercept) Temp Year Age ValleyTrupchun M 150.23018 0.01967008 -0.07569498 0.0003738815 -0.975216 R -119.39509 -0.04635857 0.05956101 -0.0184744385 -1.254648 RUN 20.71603 -0.01638416 -0.01312474 0.1369042933 3.041834 Individual M 0.002078048 R 0.003043007 RUN -0.003739576

Std. Errors: (Intercept) Temp Year Age ValleyTrupchun M 1.310620e-07 0.0009410925 1.423581e-05 0.002565833 3.760150e-05 R 5.184267e-08 0.0006774943 9.169682e-06 0.001635721 4.180875e-05 RUN 3.089474e-07 0.0037869953 5.787151e-05 0.009505235 2.369950e-04 Individual M 2.795408e-05 R 1.801261e-05 RUN 1.125990e-04

Residual Deviance: 531084.3 AIC: 531120.3

But the behavior F (for feeding) doesnt show up, also I dont know where are the p-values and how to plot that (when I try plot(model) i tells me that it doesnt work because there is no y values).

Could you (once again) help me ? Oh and thanks again!

1 point2 points3 points5 years ago(3 children)

Youre going to need to space that better, I have no idea what it says: adding four spaces to the front of a line makes it unicode, try combining that with 2 spaces at the end of the line:

[S]0 points1 point2 points5 years ago(2 children)

Call: multinom(formula = Behavior ~ Temp + Year + Age + Valley + Individual, data = Merge) Coefficients: (Intercept) Temp Year Age ValleyTrupchun M 150.22957 0.01967008 -0.07569467 0.0003738003 -0.9752194 R -119.39570 -0.04635857 0.05956132 -0.0184745126 -1.2546514 RUN 20.71584 -0.01638386 -0.01312466 0.1369054953 3.0418335 Individual M 0.002078054 R 0.003043013 RUN -0.003739562 Std. Errors: (Intercept) Temp Year Age ValleyTrupchun M 1.310621e-07 0.000941093 1.423581e-05 0.002565834 3.760152e-05 R 5.184258e-08 0.000677494 9.169680e-06 0.001635720 4.180893e-05 RUN 3.089505e-07 0.003787002 5.787172e-05 0.009505270 2.369960e-04 Individual M 2.795409e-05 R 1.801261e-05 RUN 1.125993e-04 Residual Deviance: 531084.3 AIC: 531120.3

1 point2 points3 points5 years ago(1 child)

Theres no F because thats the reference level. The idea of multinomial logistic regression is to predict the K probabilities (in your case 4 probabilities) based on the independent variables. Whenever youre predicting a set of K probabilities, you only really need to predict K-1 of them, as the last probability is just 1- sum(rest of the probabilities).

I think you need to read a bit more on multinomial regression, because thats a pretty basic concept within it.

Also, if youre looking for some examples, see this page:

[S]0 points1 point2 points5 years ago(0 children)

Ok thanks a lot, I will check everything.

My stat knowledge is very basic that is why I have some problem understanding everything.

2 points3 points4 points5 years ago*(5 children)

Hi– I highly recommend the glmnet package. It actually fits penalized multinomial logit models, but its a very clean and elegant package. The nice thing about penalized models is that they deal well with collinearity, and they tend to generalize better than simple linear models. e.g.library(glmnet)library(caret)data(iris)model – train(Species~., iris, method=glmnet,tuneGrid=expand.grid(.alpha=0:1,

plot(model)coef(model$finalModel, s=model$bestTune$.lambda)

/edit: This code will fit both alasso model(alpha=1) and aridge regression model(alpha=0). You can also pick an alpha somewhere between the 2 for a mix of lasso and ridge regression. This is called theelastic net.

[S]0 points1 point2 points5 years ago(0
children)

Thanks for your input. I will try both and see what happens.

[S]0 points1 point2 points5 years ago*(3 children)

So I have run the script you gave me. It worked well but took a long time (about 2 hours to compute). I dont really get what the graph shows and what the numbers represents for each of my behavior (I show you what I got for one of them):

$F 6 x 1 sparse Matrix of class dgCMatrix 1 (Intercept) 6.535632614 Temp 0.008613482 Year -0.021527588 Age -0.002219361 ValleyTrupchun 0.362143012 Individual -0.001570790

What I dont get is why I have written here ValleyTrupchun when I have 2 different valleys.

I used also the library Ordinal with the clm function that gave me the p-values, which I really need, is there a way to get p-values with what you sent me ?

0 points1 point2 points5 years ago(2 children)

One issue with penalized regression (such as the glmnet package) is you do not get p values. If you really need p values, one of the other packages is a better bet. Personally I think p values are over-rated, but if you need them, you need them!

Anyways, you have 2 valleys, and R automatically codes them asdummy variables. When ValleyTrupchun = 1 it represents Trupchun, and when ValleyTrupchun = 0 it represents the other valley.

Finally, thetrainfunction automaticallycross-validatesyour model, which is generally a good thing to do. All the statistics that you see after the model is fit are cross-validated. If you want to speed things up a bit, you can add the following argument to the function:trControl=trainControl(method=cv, number=10). Smaller numbers will be faster.

[S]0 points1 point2 points5 years ago(1 child)

Ok, so unfortunatly I really need p-values (it is for a paper so I dont really have a choice).

This one gave me p-values but I am not sure if it really is a multinomial logit. Otherwise are you aware of how to get p-values with Slammasters script ?

1 point2 points3 points5 years ago(0 children)

Im not very familiar with the ordinal package, but Im not sureclmfits a multinomial logit. Unfortunately, Im pretty sure the multinom function in nnet also doesnt give p values.

1 point2 points3 points5 years ago(1 child)

If you need to have p-values, Id say themlogit packageis your best bet, but it seems to be a little difficult to use. Take a look at the package vignettes. You might also get better answers oncross validated. Take a look atthis question.

[S]0 points1 point2 points5 years ago(0 children)

Use of this site constitutes acceptance of ourUser AgreementandPrivacy Policy. © 2018 reddit inc. All rights reserved.

REDDIT and the ALIEN Logo are registered trademarks of reddit inc.

Rendered by PID 110553 on app-32 at 2018-02-03 11:35:02.676902+00:00 running 1cce75d country code: CN.