If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains ***.kastatic.org** and ***.kasandbox.org** are unblocked.

Main content

Current time:0:00Total duration:7:39

in this video in the next few videos we're just really going to be doing a bunch of calculations about this data set right over here and hopefully just going through those calculations will give you an intuitive sense of what the analysis of variance is all about now the first thing I want to do in this video is calculate the total sum of squares so I'll call that SST SS sum of squares total and you could view it as really the numerator when you calculate variance so you're just going to take the the distance between each of these data points and the mean of all of these data points square them and just take that sum we're not going to divide by the degree of freedom which you would normally do if you were calculating sample variance now what is this going to be well the first thing we need to do we have to figure out the mean of all of this stuff over here and I'm actually going to call that the grand mean I'm going to call that the grand mean I'm going to show you in a second that it's the same thing as the mean of the means of each of these data sets so let's calculate the grand mean so it's going to be 3 plus 2 plus 1 3 plus 2 plus 1 plus 5 plus 3 plus 4 plus 5 plus 3 plus 4 plus 5 plus 6 plus 7 plus 5 plus 6 plus 7 and then we have 9 data points here we have 9 data points so we will divide by 9 and what is this going to be equal to 3 plus 2 plus 1 is 6 6 plus let me just add so these are 6 5 plus 3 plus 4 is that's 12 that's 12 and then 5 plus 6 plus 7 is 18 18 and then 6 plus 12 is 18 plus another 18 is 36 divided by 9 is equal to 4 now let me show you that that's the exact same thing as the mean of the means so this the mean of this group 1 over here we doing that same green the mean of Group one over here is 3 plus 2 plus 1 that's that 6 right over here divided by 3 data points so that will be equal to 2 the mean of group 2 the mean of group 2 the sum here is 12 we saw that right over here 5 plus 3 plus 4 is 12 divided by 3 is for because we have three data points and then the mean of group three the mean of group three 5 plus 6 plus 7 is 18 divided by 3 is 6 so if you were to take the mean of the means which is another way of viewing this grand mean you have 2 plus 4 plus 6 which is 12 divided by 3 means here and once again you would get 4 so you could view this as the mean of all of the data in all of the groups or the mean of the means of each of these groups but either way now that we've calculated it we can actually figure out the total sum of squares so let's do that so it's going to be equal to so 3 minus 3 minus 4 the 4 is this 4 right over here squared plus 2 minus 4 squared plus 1 minus 4 squared now I'll do these guys over here in purple plus plus 5 minus 4 squared plus 3 minus 4 squared plus 4 minus 4 squared let me scroll over a little bit plus 4 minus 4 squared now we only have three left plus plus 5 minus 4 squared plus 6 minus 4 squared plus 7 minus 4 squared and what does this give us so up here this first is going to be equal to 3 minus 4 differences one you square it you're going to get squits actually negative 1 but you square it you get one plus you get this negative 2 squared is 4 plus negative 3 squared negative 3 squared is 9 and then we have here in the magenta 5 minus 4 is 1 squared is still 1 3 minus 4 squared is 1 you square it again you still get 1 and then 4 minus 4 is just a 0 so we can well I'll just write the 0 there just to show you that we actually calculated that and then we have these last three data points 5 minus 4 squared that's 1 6 minus 4 squared that is 4 right that's 2 squared and then plus 7 minus 4 is 3 squared is 9 so what's this going to be equal to so I have 1 plus 4 plus 9 1 plus 4 plus 9 right over here that's five plus nine this right over here is 14 right five plus 14 and then we also have another 14 right over here because we have a 1 plus 4 plus 9 so that right over there is also 14 and then we have 2 over here so it's going to be 28 14 times 2 14 plus 14 is 28 plus 2 is 30 is equal is equal to 30 so our total sum of squares and actually if we wanted the variance here we would divide this by the degrees of freedom and we've learned multiple times the degrees of freedom here so let's say let's say that we have so we know it we have M groups over here so let me just write it as M and I'm not going to prove things rigorously here but I want you to show I want to show you where some of these strange formulas that show up in statistics books actually come from without proving it rigorously more rigorously more to give you the intuition so we have M M groups here and each group here has n each each group here has n members so how many total members do we have here well we had M times n or 9 right 3 times 3 total members so our degrees of freedom and remember you have this many however many data points you had minus 1 degrees of freedom because if you know if you knew the mean of means if you know the mean of means if you assume you knew that then you only would then only N only n only 9 minus 1 only 8 of these are going to give you new information because if you know that you could calculate the last one or you really doesn't have to be the last one if you have the other 8 you could calculate this one you can all all if you have 8 of them you can always calculate the ninth one using the thoth using the mean of means so one way to think about it is that there's only eight independent measurements here or if we want to talk in terms of general if we want to talk generally there are M times n so that tells us the total number of samples minus one degrees of freedom degrees of freedom and if we were actually calculating if we were actually calculating the variance here we would just divide 30 by M time - one or this is another way of saying eight degrees of freedom for this exact example we would take 30 divided by eight and we would actually have the variance for this entire group for the group of nine when you combine them I'll leave you here in this video in the next video we're going to try to figure out how much of this total variance how much of this total sum of squared the total squared sum total variation comes from the variation within each of these groups versus the variation between the groups and I think you get a sense of where this whole analysis of variance is coming from it's the sense that look there's a variance of this entire sample of nine but some of that variance if these groups are different in some way might come from the the variation from being in different groups versus the variation from being within a group and we're going to calculate those two things and we're going to see that they're going to add up to the total squared sum variation