This way, if you make an error while building the dummy variables, you haven’t altered your original variable and can always start again. (We will see later that creating dummy variables for categorical variables with multiple levels takes just a little more work.) However, it’s good practice to create a new variable altogether when you are creating dummy variables. (These numbers are just indicators.)īecause our sex variable only has two categories, turning it into a dummy variable could be as simple as recoding the values of Male and Female from 1=Male and 2=Female to 0=Male and 1=Female. This allows us to enter in the sex values as numerical. For example, in a dummy variable for Female, all cases in which the respondent is female are coded as 1 and all other cases, in which the respondent is Male, are coded as 0. Each dummy variable represents one category of the explanatory variable and is coded 1 if the case falls in that category and zero if not. We can avoid this error in analysis by creating dummy variables.Ī dummy variable is a variable created to assign functional numerical values to levels of categorical variables. This would provide us with results that would not make sense, because for example, the sex Female does not have a value of 2. So, if we were to enter the variable s1gender into a linear regression model, the coded values of the two gender categories would be interpreted as the numerical values of each category. However, linear regression assumes that the numerical amounts in all independent, or explanatory, variables are meaningful data points. The codes 1 and 2 are assigned to each gender simply to represent which place each category occupies in the variable s1gender. However, before we begin our linear regression, we need to recode the values of Male and Female. (If you check the Values cell in the s1gender row in Variable View, you can see that the categories in this sex variable are labelled as 1= Male and 2= Female). In order to answer the question posed above, we want to run a linear regression of s1gcseptsnew against s1gender, which is a binary categorical variable with two possible values.