Before we start

Practice dataset download:

SD_banding_data

About the dataset

This is a subset of the data of the grassland birds I caught and banded in South Dakota.

Record of bird species caught, where they were caught, and their measurements.

Variables: species (using four letter code), age (binary, hatch-year/after hatch-year), mass (in grams), wing (length mm), tarsus (length mm), fat (score on scale of 1-5), location, taxon, foraging (how the bird forages), temp (temperature in F), season ( binary, spring/fall), year (binary 2018, 2019), time

Packages to download

install.packages("MuMIn") 

#Stands for Multi-Model Inference
#Calculates AICc scores
#Automated model generation

install.packages("AICcmodavg") 

#stands for AICc model averaging
#Does other things, but we will primarily use it to make pretty AIC tables

Load libraries

library(MuMIn)
library(AICcmodavg)

Part 1:

Generalized Linear Models

r code structure

glm(formula, data, family = gaussian,…)

Formula:

follows Y=B0+B1X

in R: dependent variable ~ independent variables

Data:

name of data frame

Family:

in R: family(link= “link_name”)

Options: gaussian, binomial, gamma, poisson

GLM code example

Gaussian:

#simple way to get rid of NA's
##glm will automatically drop NA's in independent variables, but not dependent 
banding_na<-na.omit(banding)

glm_example<- glm(  #glm function to run model
  mass~  #dependent variables that you want to predict
    species+fat+wing*tarsus, #independent variables that you want to use to predict
                              ## * indicates interaction
  data = banding_na,  #reference data set
  family=gaussian) #reference family name (here is where you would also put your link function)

Binomial:

#Trick: turn your binomial variable into 1's and 0's
banding$location_numeric<-ifelse( #name a new variable, and use if else function
  banding$location=="Forb",0,1) #if location equals "Forb" turn into a 0 else turn it into a 1
 
#binomial glm, same as gaussian, but change family name
Band_binomial<- glm(location_numeric~temp, data= banding, family=binomial)

Output model results:

#not very useful, gives you intercepts for each variable
glm_example
## 
## Call:  glm(formula = mass ~ species + fat + wing * tarsus, family = gaussian, 
##     data = banding_na)
## 
## Coefficients:
## (Intercept)  speciesAMRO  speciesATSP  speciesCOYE  speciesDICK  speciesHASP  
##    12.17082     38.12907      0.75373     -1.09128      9.66889     13.00497  
## speciesLCSP  speciesLISP  speciesSAVS  speciesSOSP  speciesSWSP  speciesYEWA  
##     1.33343      2.79863      2.27142      5.96914      2.84455     -3.37180  
##         fat         wing       tarsus  wing:tarsus  
##     0.88363     -0.07958     -0.53147      0.01256  
## 
## Degrees of Freedom: 609 Total (i.e. Null);  594 Residual
## Null Deviance:       32420 
## Residual Deviance: 810.7     AIC: 1939
#nicer, but this is not a t-test
summary(glm_example)
## 
## Call:
## glm(formula = mass ~ species + fat + wing * tarsus, family = gaussian, 
##     data = banding_na)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -4.5545  -0.7099  -0.0249   0.6238  10.8800  
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 12.170819   6.211964   1.959  0.05055 .  
## speciesAMRO 38.129072   3.506298  10.874  < 2e-16 ***
## speciesATSP  0.753735   0.459969   1.639  0.10181    
## speciesCOYE -1.091278   0.677540  -1.611  0.10779    
## speciesDICK  9.668891   0.816786  11.838  < 2e-16 ***
## speciesHASP 13.004969   0.809828  16.059  < 2e-16 ***
## speciesLCSP  1.333433   0.634567   2.101  0.03603 *  
## speciesLISP  2.798630   0.543061   5.153 3.49e-07 ***
## speciesSAVS  2.271416   0.475647   4.775 2.26e-06 ***
## speciesSOSP  5.969137   0.584056  10.220  < 2e-16 ***
## speciesSWSP  2.844550   0.640856   4.439 1.08e-05 ***
## speciesYEWA -3.371800   0.541331  -6.229 8.91e-10 ***
## fat          0.883632   0.069686  12.680  < 2e-16 ***
## wing        -0.079582   0.088304  -0.901  0.36783    
## tarsus      -0.531473   0.278193  -1.910  0.05656 .  
## wing:tarsus  0.012556   0.003893   3.225  0.00133 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.364891)
## 
##     Null deviance: 32423.82  on 609  degrees of freedom
## Residual deviance:   810.75  on 594  degrees of freedom
## AIC: 1938.6
## 
## Number of Fisher Scoring iterations: 2
#Heck yeah! P-values, but have to go to one of above summaries to get intercept estimates
anova(glm_example, test="F")
## Analysis of Deviance Table
## 
## Model: gaussian, link: identity
## 
## Response: mass
## 
## Terms added sequentially (first to last)
## 
## 
##             Df Deviance Resid. Df Resid. Dev        F    Pr(>F)    
## NULL                          609      32424                       
## species     11  31170.6       598       1253 2076.131 < 2.2e-16 ***
## fat          1    200.7       597       1052  147.076 < 2.2e-16 ***
## wing         1    188.7       596        864  138.225 < 2.2e-16 ***
## tarsus       1     38.9       595        825   28.474 1.353e-07 ***
## wing:tarsus  1     14.2       594        811   10.403  0.001327 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Part 2:

Information Criterion

Get AIC scores

In regular R, you can get AIC scores, but not AICc scores :(

Both MuMIn and AICcmodavg have an AICc function

AIC(glm_example)
## [1] 1938.647
AICc(glm_example)
## [1] 1939.681

Get ALL the model combinations

(Warning: may anger statisticians, see step command for step wise model selection)

We want to create an optimal model to predict mass in birds

#create a large model with thoughtful variables
band_model<- glm(mass~species+tarsus*wing+fat, data = banding_na, family=gaussian,
                 na.action = na.fail) #have to include line to knit, but it works better if you run code without na.action

#modeling dredging
AICc_band_models <- dredge( #construct all possible models
  band_model,    #use band model as a reference
  rank = "AICc", #use AICc scores to compare
  fixed = "species") #list variables that you want to include in every model
                      ## use NULL if you don't want to include

Get output from model selection

#Make a list of all the models
model_list <- get.models(AICc_band_models, #retrieve models from dredged data
                              subset = TRUE) #select which models to retrieve
                                              ##1:10 gives you top 10 models
                                              ## subset = TRUE gives you all the models

model_list[1] #check out info from a single model
## $`16`
## 
## Call:  glm(formula = mass ~ fat + tarsus + wing + tarsus:wing + 1 + 
##     species, family = gaussian, data = banding_na, na.action = na.fail)
## 
## Coefficients:
## (Intercept)          fat       tarsus         wing  speciesAMRO  speciesATSP  
##    12.17082      0.88363     -0.53147     -0.07958     38.12907      0.75373  
## speciesCOYE  speciesDICK  speciesHASP  speciesLCSP  speciesLISP  speciesSAVS  
##    -1.09128      9.66889     13.00497      1.33343      2.79863      2.27142  
## speciesSOSP  speciesSWSP  speciesYEWA  tarsus:wing  
##     5.96914      2.84455     -3.37180      0.01256  
## 
## Degrees of Freedom: 609 Total (i.e. Null);  594 Residual
## Null Deviance:       32420 
## Residual Deviance: 810.7     AIC: 1939
#Make a messy but informative AICc table
AICc_table_band<-model.sel(model_list)

Make a nicer AICc Table using AICcmodavg

Make a list of model names

#Trick: create a list of model names
model_name_list<-NULL #make an empty list

for (i in 1:10){
  model_name_list = c(model_name_list, as.character(model_list[[i]][['formula']]))
} #loop through model output to extract formula for each model

model_name_listb <- model_name_list[seq(3, length(model_name_list), 3)] #select every third element from list and put it in a new list

Make a nicer AICc table

modavg_table<-aictab(model_list, #make a table with models from your list
                     modnames = model_name_listb, #label the models with your names list
                     second.ord = TRUE,   #Use AICc (FALSE gives you AIC)
                     sort = TRUE) #Order based on model weight

#View table
modavg_table
## 
## Model selection based on AICc:
## 
##                                                  K    AICc Delta_AICc AICcWt
## fat + tarsus + wing + tarsus:wing + 1 + species 17 1939.68       0.00   0.99
## fat + tarsus + wing + 1 + species               16 1948.15       8.47   0.01
## fat + wing + 1 + species                        15 1974.13      34.45   0.00
## fat + tarsus + 1 + species                      15 2042.99     103.31   0.00
## tarsus + wing + tarsus:wing + 1 + species       16 2083.69     144.01   0.00
## tarsus + wing + 1 + species                     15 2089.18     149.50   0.00
## fat + 1 + species                               14 2092.53     152.85   0.00
## wing + 1 + species                              14 2112.01     172.33   0.00
## tarsus + 1 + species                            14 2154.49     214.81   0.00
## 1 + species                                     13 2196.92     257.24   0.00
##                                                 Cum.Wt       LL
## fat + tarsus + wing + tarsus:wing + 1 + species   0.99  -952.32
## fat + tarsus + wing + 1 + species                 1.00  -957.62
## fat + wing + 1 + species                          1.00  -971.66
## fat + tarsus + 1 + species                        1.00 -1006.09
## tarsus + wing + tarsus:wing + 1 + species         1.00 -1025.39
## tarsus + wing + 1 + species                       1.00 -1029.19
## fat + 1 + species                                 1.00 -1031.91
## wing + 1 + species                                1.00 -1041.65
## tarsus + 1 + species                              1.00 -1062.89
## 1 + species                                       1.00 -1085.15

Part 3:

Exercise on your own

  1. Create a glm model with at least three independent variables using the banding data set or another data set of your choice

  2. Determine which independent variables contribute to an optimal model using AICc

  3. Create a AICc table using AICctab to show your results

Resources:

Package websites

MuMIn

AICcmodavg

Info on how to use AIC and model selection

?AICtab
# gives 10 suggestions for model selection

Chapter 6 of Rethinking Statistics