Parametric Linear Regression Models

November 15, 2022

We categorize linear parametric regression models in the following groups. Note that Linear regression models are linear in the parameters (i.e., linear in the β’s).

Linear models

Simple/Multiple Linear Regression Model is used to predict a continuous response variable based on one or a group of independent variables. For example, modeling weight of athletes based on height, BMI index, and other independent variables.
Polynomial Regression is used when the relationship between response and independent variables is non-linear. For example, imagine you want to predict how many likes your new social media post will have at any given point after its publication. There is no linear correlation between the number of likes and the time that passes. Your new post will probably get many likes in the first 24 hours after publication, and then its popularity will decrease.

Generalized Linear Models

Poisson Regression is the most widely used method for modeling and predicting the count response variable based on independent variables. For example, this model can be used to investigate the number of traffic accidents on the highway.
Negative-Binomial Regression is a suitable method for modeling an over-dispersed count response variable based on independent variables. For example, predicting the number of days students will be absent based on their math score.
Logistic Regression is used to predict a binary response variable based on independent variables, e.g., whether the tumor is malignant (1) or not (0).
Multinomial Logistic Model is used when we have a categorical dependent variable with two or more unordered levels. It is an extension of binary logistic regression. For example, modeling the influence of education level and father’s occupation on one’s occupation choice.
Ordinal Logistic Regression is used when there are three or more categories with a natural ordering to the levels. For example, a marketing research firm wants to investigate what factors influence the size of soda (small, medium, large, or extra large) that people order at a fast-food chain.

Mixed Models

Linear Mixed Models are an extension of simple linear models to allow both fixed and random effects. They are particularly useful when there is non-independence in the data, such as arises from a hierarchical structure. These models are most useful in modeling longitudinal data.
Generalized Linear Mixed Models (GLMM) are an extension of generalized linear models (GLM). They are appropriate when the linear predictor contains random effects in addition to the usual fixed effects.

Shrinkage Methods/ Regularization Techniques

Ridge and LASSO Regressions are most suitable when a data set contains a higher number of predictor variables than the number of observations. The second scenario is when predictor variables exhibit a correlation among themselves. In a case where observations are fewer than predictor variables, ridge regression is the most appropriate technique. Among the applications of these models, we can mention genomic data modeling.

Inflated Models

Zero-Inflated (Poisson/NB) Regression is used to model count data that has an excess zero count. Having excess zeros means there are more zeros than expected by the distribution we are using for modeling. For example, modeling the number of defects and breakdowns in the production process.
Zero-and-One Inflated (Poisson/NB) Regression is used to model count data that has an excess of zero and one count. For example, modeling the number of motor vehicle insurance claims filed per year.