This is applied biostatistics for public health practitioners the title of this lecture is introduction to regression analysis regression analysis is used in just about every field of statistics. Regression analysis is used to model the relationship between an outcome variable and independent predictor variable or variables. We use the term variables here because we will be developing more complex models as we look at interaction terms, as we look at confounders, as we look at multiple variants within the model. So, we will be looking at unit variable models and we will be looking at multi variant models, models with more than one covariant. Most often used, regression analysis is most often used when independent variables cannot be controlled in the study design. You know that it’s important when you’re designing your study to control for example for confounders within the design. However, this is not possible for example in the case of surveys then you can control for those extraneous variables in your analysis by using regression analysis. The popularity of this method of analysis is merely due to the fact that biologically plausible models are easily developed, evaluated, and interpreted. Now one might think that is not so easy to develop and evaluate and interpret these models but we will be you cannot some examples and pretty much walk in through how we develop these models and how we evaluate the models that how we interpret the results from them that's one of the primary goals in discourse. The specification of a model or regression model can be divided into two major components namely systematic and error components by systematic componentry means that aspect that involves an assessment of the relationship on. An average of the outcome variable and the independent variable variables know what we're looking at the relationship between an average So this is our best guest of the relationship between the outcome and the independent variable or variables. Systematic component or for our regression analysis is guided by exploratory analyses and or by past experience this is rare if you don't have the experience as is that statistician you may want to seek out this subject matter expert on whichever area you're exploring or you're analyzing the data from so that they can bring this experience to be here on your analysis. The second component is the area component which involves specifying the statistical distribution I would be looking at these distributions a little late in this course the distribution of walk means to explain after the model is fit you may hear this is being described as residual or just plain arrow. Let us look at model selection and this again is the introduction to model selection will be spending a whole section on how to select your model how to build your model and how to test your model so this is just introduction to model selection. Model selection to a great extent is based on two primary areas won the gold or goals of the analysis what's the purpose of the analysis to the measurements killed off your outcome variable for example if you look at the figure in your text figure five point one we have on the X. axis we have age and we have on the Y. axis we have systolic blood pressure so we can say it is zero independent or very room or exposure of the or both and systolic blood pressure then is our dependent variable the question is. Could we determine or can we determine the relationship between these these two variables or measurement scale of the outcome variable in this case or can vary was just all or blood pressure in this figure we can tell that systolic blood pressure is a continuous variable. And so one might recommend the classic linear regression analysis for this scenario however it say you're not interested in in in keepin on a broad pressure continuous variable Let's say you want to dichotomize thought of blood pressure you want to look at how I or low blood pressure and we have according to your high blood pressure let's say that's equal to one. Is equal to zero we don't have to. Stick to these strict cause these are just examples well let's just say you know as a researcher you want to want to know or underlies your data base on whether you're at home in your patients are in the category of high blood pressure or low blood pressure in this case one might model. So just the model with a systematic component that is linear in the log odds Yes And all this might be confusing but we're going to dig a little deeper into this and explain what this is all about. Component that is linear in the log odds and has a by normal Remember the back optimizer or or outcome so we're looking at the binomial distribution of the errors in this case and so you might suggest a lot just tick type progression for you and that is something so you see here we have the. Basically the same goal we want to assess the relationship between each and blood pressure systolic blood pressure and the skis one you can keep. Your outcome variable as continuous in that case you buy. So just a linear regression analysis or you might dichotomize your outcome variable from continuous to high or to a blood pressure in that case you may want to suggest or you miss a lot just to model. Which ever model you select there may be many years on how to fit. Refine evaluate and interpreter results from these models and the good news is the same basic modeling paradigm is employed and whether you use in a mere aggression analysis or logistic regression analysis and that's what. Aim is to cover in this course how do we fit these models how do you find these models how do the evaluate them how do we know to be using the correct model how do we know when we've arrived at the parsimonious model and of course have to be in trouble at the results from these models one of the things you will find out very quickly if you're not used for is that SAS will give you tons of output you know how do you select from these many pages of which is good which you're going to use and which you're not going to use and how do you interpret these results we're going to get into that as we progress through this course. Here we have a cartoon. Looking at both logistic and your aggression together on the same plot I want us to take a look at the Y. axis switches or vertical axis it represents a lot of in our previous example that was our systemic blood pressure door variable systolic pressure or X. axis or our horizontal axis represents zero exposure again in our previous example I'm sure five point one zero was it that we looked at under figure on this plot Data points there represented by the green dots as this somewhat of a as shape or logistic model or or logistic fit to the data is represented by this. Readline a linear fit or they fit from a linear regression model is represented by the broken blue line and just by looking at his figure what I want us to take or more to to leave understanding is that there are just two progression in this case represents a better fit to the data because it touches more often data points than does linear regression it is. Probably easier just to look at this figure and tell which model is the better one however there are other methods and we will get into those as to how to determine which model is your best model to use as we move forward and continue to look at largest tick and in your regression in the section of Applied regression analysis. Before you close let's look at some lessons learnt. Regression models are very popular for modeling relationships between outcome and independent variables whichever field you're in if you have an outcome and you have an independent variable you may want to explore the use of regression models to analyze are to assess the relationship between these two variables how do you determine which regression model to use well this is this decision is based merely on what is your research question what's the question you're trying to answer And on the type of outcome variable that you have is your article variable continuous or is a categorical. Regression model has two major components these components are systematic and error importance and of course we have two major categories of regression models namely linear on just tick regression. I just want to share this great cartoon with you before we close on the next time we meet people take a closer look at least near our straight line regression model until then the good from applied by a statistics for public health practitioners.
IntroductionRegressionAnalysisHM878_2014
From pblhlth Program in Public Health October 12th, 2015
4 plays
0 comments
Add a comment