So, are you ready for more? I still have these papers.
I could go through the morning, so I
don't know if I need to complete them. All right. So, let's take a look at how we will now fit these models in SAS. And just as an illustration, what I take is this model that we had derived before for the prostate data, where we have a quadratic function, where the linear quadratic terms, as well as the intercepts, depend on the age and on the diagnostic status of the patient. And then we have random intercepts, linear time effects, quadratic time effects. And, of course, we still have that measurement error. And right now, I'm just going to use measurement error. Later on, I will show you how to put also serial correlation in SAS. But for now, let's just stick to just purely measurement error to keep it simple. How do we do it? Well, we have a factor group, which defines the four diagnostic groups. I have a variable time, because I will use time as a continuous covariate. As explained this morning, I will have a copy of time, because I will also need it as a factor if I repeat the statement. And the time CLSS variable then is the copy. H is the age at the time of diagnosis. My outcome is the logarithmic transformation of the outcome. And my SAS program is then the following. So what we need to keep in mind is that there's three different statements that we are using. A model statement for the mean, a random statement for the random effects, and a repeated statement for the errors. And that makes sense. Because this morning, we used the error. We used a repeated statement in the context of the multivariate regression model. And like I said, the multivariate regression model is just a simple version or a special case of our general mixed model, in the sense that it does not contain random effects. So these residuals, we have to model them in the repeated statement. So we will keep that repeated statement. But now what comes additional on top of that is a random statement, because now we will use random effects as well. So in the model statement, what do I have? All the fixed effects. So a group effect, an age effect to model the intercepts. Group by time, age by time to model the linear time effects. Group by time squared, age by time squared to model the quadratic time effects. In the random statement, I'm specifying an intercept, a linear time effect and a quadratic time effect. So these are my three random effects. And I also need to tell SAS what are the subjects in my data set. Subjects are defined with a variable id. And I need to tell SAS what covariance matrix I want. And that's the covariance matrix for my random effects. It's the d matrix in our notation. And I will keep it unstructured. You can use other choices. And you can even use all the same choices as we discussed this morning for the repeated statement. So all the same choices are possible in SAS or feasible. But in most situations, you just want to use an unstructured covariance matrix for these random effects. And we've discussed that. The repeated statement is used to tell SAS what covariance structure needs to be used for these residues, for the epsilons. So I need to tell SAS what are my subjects. And I also need to tell SAS how the measurements have been ordered within the subjects. And that's according to the variable time, which needs to be effective. So that's in the class statement. Note that here now I have specified type equal to simple because I'm using a C as a residual. So I'm using a residual covariance structure, just sigma squared times the identity matrix. And that's the simple covariance structure in SAS. So that's how you put together the different pieces of your model. Now, you can already start thinking of what would happen if you also need to add zero correlations. Because I now have three statements. But if I want to add zero correlations, would I then need a fourth statement for that? We will discuss that tomorrow. We will do that tomorrow. There's also some additional options here. Like G, G chord, B, B chord, R, R chord. These are just output options. So what they do is the G is making sure that our D matrix is also printed as a matrix instead of just a number of estimates all underneath each other. So it will just print it as a matrix. So that's a 3 by 3 matrix here in our code. We have three random effects. The G chord will calculate the correlation matrix for the random effects. So it starts from the G and it calculates. G is a covariance matrix. It calculates based on that covariance matrix and correlation matrix. And it prints it out. The same for R and R chord. It gives you the residual covariance matrix. So that's sigma. I and the resulting correlation matrix. Of course, in this example it's not very informative because our code will only present you the identity matrix. Because that's the correlation matrix for the errors. What is then V and V chord? That will print out the covariance structure of the marginal model. So it's Z, D, Z prime plus sigma I. And based on that covariance matrix, which you obtain with the option V, you can also calculate the correlation matrix, the marginal fitted correlation matrix. And that's obtained by the option V chord. So all these options just have to do with additional output that you will create in your output stream. Note that I've now explicitly mentioned here, that the method is a lemma. You don't have to do that because it's the default. A lemma is the default in SAS. You can specify it, but it's not mandatory. But if you want to do maximum likelihood, like this morning, you have to specify it. Because maximum likelihood is not the default. Alright, so here you see a summary of all these options. Again, you see all these covariance structures. That's not so important. Let me take a look at the estimates. And I've fitted the model twice. Yes? . Yes. So in this example, you would not need the repeat statement. It was added here for, just for the sake of illustration, to show you how you would have to put in other covariance structures. If you want. . It uses, oh, okay. It uses the, it will print out the covariance structure for the first subject in your database. And if you want it for subject number 13, you need to specify b equals to 13. And then it will print out the covariance structure for subject number 13. With the covariance values for that particular subject. That's a good point. So here we have fitted the model twice. Once with maximum likelihood, once with restricted maximum likelihood. Just to show you the differences. And what you see in this big table is first all, well, this is the table for all the fixed effects. The estimates and the standard errors. And you clearly see that these estimates are different. And that's what I mentioned before. Beta hat depends on alpha. So if you plug in another estimate for alpha, you will get another estimate for beta. So indirectly it has some impact on the estimation of beta. So for example, if you want to see what is the amount of curvature on average in the control group, then that's the parameter beta 12. It's estimated here as minus 0.10. Of course, correct it for h. Because you also have h. The h by times square interaction in that model, which you see here. So these are the parameters that you would need if you want to plot. We will do that in two slides from now. If you want to plot the average evolution in the different groups, then you would just leave these beta values and plot these fitted curves. Here you see the covariance parameters, all these variance components. And what you see is that the log-lighting rule is also reported and you get two different values. You need to be aware of what we are doing here. Strictly speaking, that's impossible. For one model, one net set, you can only have one log-lighting rule. But this value here, minus 30, one is strictly speaking not the log-lighting rule. It's the enamel of log-lighting rules. It's the log-lighting rule for these error contrasts. But in the output, you will get it mentioned as the light rule or the log-lighting rule. So be aware. Here you clearly see that it's a different value. So it would have implications if you would use that in a light rule. Basically. And can you use it? Do you have any other particular rules you would like to use? This one? Sometimes. I could know. No, no, I know. But you will know tomorrow. Oh, oh. We will discuss that in the next chapter on inference where we will discuss all the different values. All the various testing procedures like the regression testing that we made. Yes. Yes. Do you think that the estimates are close to each other for the . Is there like a test . No, because it's this . No, I mean the . So you are saying that the estimates are close to each other. Yes. Yes. But you know that from theory, that was the whole idea for using an error estimation. Now you started by saying they're very close. Yes, they are. Well, I think that's where the point is. And I think that's the point. And I think that's the point. And I think that's the point. I think that's the point. Yes. Yes. Yes. Yes. I think they're very close. I don't agree that they're so close. I mean if you look for example at the variability in the intercepts, 0.4 about. Here it's 0.45. But if you look how small that standard error is, I mean if you take into account the standard errors, they're not that close. So it does make a difference. And what you do see, especially here in these variances of these random effects, is that the maximum likelihood estimates are systematically lower than the lambda estimates. And that's of course what we would expect. Think of the linear regression model where the maximum likelihood estimate is sum of square positrons divided by n. The unbiased one is divided by n minus p. So the maximum likelihood is biased downwards. It's too small. It's systematically too small. And that's what you see also here in these results. Note also that the variance of the residuals, the residual variability sigma squared, is pretty small compared to all the other variances that you see. And that again supports what we have seen already several times, that the within-subject variability is small compared to the between-subject variability. Meaning the variability in all these random effects, these variances are much larger than the variability that you have in your residual sphere. And here you see the fitted curve. So instead of just looking at the individual parameter estimates from two pages ago, you can just put them together and plot the quadratic curves for your four different groups. That's graphically much more informative. And especially if you want to report to a clinician, he or she would probably not understand what all these parameters mean, but they do understand what they see here. Now there's one additional problem. H is in that model. So we need to plug in a value for H. And here the median H at diagnosis was used, and as far as I remember, it's about 72. So for people of 72 years old at the moment of diagnosis, if you look at what was their past evolution in PSA, so going back in the past, for controls it was this evolution. And you see it's similar. It's decreasing. Going back in the past, it's decreasing, but it's a very moderate decrease. And of course, a key question will be, is this significant? In the next chapter, we will have to test that if we want to know that. In the benign prostatic hyperplasia case, the decrease is more severe. So it means historically the increase was more severe. Is it significant? We will test that. In the cancer groups, you see that is more severe. And then there's curvature. So the cancer groups really increase much more quickly with some curvature. And also that's something that we will want to compare to the benign prostatic hyperplasia group because that's exactly what we wanted to know. And when I introduced the dataset yesterday, remember I said what we want to know is whether we can distinguish patients who have cancer in their evolution from patients who have benign prostatic hyperplasia. So the first thing I want to do then is compare these cancer groups with the benign prostatic hyperplasia and see if this difference that you seem to see here, whether they're really significant or not. You had a question? Yeah, we have people who don't know the spread of the virus. And I don't know if that's a good thing or a bad thing. Yeah. Yes, but that has to do with that time scale that was chosen, which is the time years before the time of the diagnosis. So you can completely mirror the picture and have your zero at the end. Maybe you should then have like minus 25, minus 20, minus 50 of the . No problem. Yeah. All right. The increase at the end of the cancer. Here. . That's artificial. And that's what I was saying this morning. That's completely a result of the quadratic profile. So and that's also the reason why they wanted to further fine-tune their model and that they wanted to use, you know, a better model for this type of data. Now, we should not overemphasize this. If you look in your original data, you have very few observations within the interval 20 and 25 years. So if you would add confidence intervals, there would be pretty large here. Here at the beginning, you would have relatively narrow confidence intervals, but not at the end. Because there's very few people who have been following for over 25 years. But you're right. This is an artificial effect of the quadratic function that was assumed. Yeah. So you know a quadratic function. If you need that curvature here to describe that evolution, if you extrapolate sufficiently far, there comes a point where it starts increasing again. And that's not realistic. Yeah. So you have to figure out what the Yes. The only thing is here that, I mean, it's not completely extrapolation because there still is some data. But if there would be no data, because then we all know you should never extrapolate. You should never go to a region where there's no data. But here there's still a little bit of data. And you know that this is not realistic. But it's not a major thing because if you, once again, if you look at how uncertain that estimation is, even that, you know, it's not going to influence your conclusions. But still, it would look, it would look better if you would not have that behavior in that data. That's right. All right. Let's now do it for the rat data. So for the rat data, remember we had the random intercepts, random slopes model. Here it is with a fixed common intercept, then three fixed slopes. And now we fit that model with SAS, PROC MIXED. It's a very simple model. And here are the results based on the lemma. So I have my fixed intercept, my three fixed slopes, then my covariance parameters, the residual variance, and the lemma log likelihood. There's only one thing which is a little bit strange. And that is here this value. A value zero and no spanning error required. Of course, you could say maybe there's very little variable between these atoms. That can, I mean, in fact, if the slopes are all very similar and you would need a random intercepts model, then you would expect the variance in the slopes to be very close to zero. But very close to zero is something different from exactly zero. Of course, you could say, we only see three decimal digits. Well, I can guarantee you, you can print as many as you like, it's going to be zero, zero, zero all the time. So it's exactly zero. And that's kind of strange. Why is that strange? Because that's only an estimate. An estimate obtained from optimizing a function. I mean, it's a function. I mean, it's a ramble. So what is ramble? You have that likelihood of these error contrasts. So it's a function and you look for the maximum. And you then happen to end up in exactly zero. That's kind of strange. If you carefully think about this and you want to understand what is happening, what I think is happening, is the following. This is my objective function. This is my parameter d22. And I need to maximize that function with respect to all the parameters, but also with respect to d22. And d22 is the variance of my random slopes. So it needs to be positive. So I'm only maximizing. Over this region. Because below zero, that's a forbidden region. So I have a restriction in my parameter space telling SAS you cannot allow negative values there. Now, if this is the situation, the way I have drawn it here, then, of course it will end up in exactly zero. Because you're maximizing this function in this region and it happens to be at maximum in exactly that boundary value here. Now, that's just my guess. And you don't have to believe me. But I can prove it. . How would I prove it? Well, there's only one way to prove it. And that would be to tell SAS to allow negative estimates. . And then I would just allow d22 to become negative. And if I allow that, and the result would be that I have a negative estimate with a higher value here, . Then that's the proof of what I just typo-decided. . Now, you might wonder, what is he now doing? Has he lost completely his mind? Because you then have negative variance for random slopes. What the hell does that mean? Well, wait for that. Because we first need to see where it ends up. We first need to see where it ends up. What really what's happening. And you can do that. In SAS, you can add the no-bound option in the frog-mix statement. So in your first line, you just add the words no-bound. And that means no boundaries. Remove all the boundaries of the parameters. So then you allow it to become negative. . And that's the fact that SAS allows you to do that. I think it should suggest that it's not such a stupid idea. And of course, then you're stuck with your interpretation. But that is something we have to worry about later. So let's first do it. Here you see the results. So the first column is the column from the previous page. The second column are the new results. And you immediately see what happens. A negative estimate. And a value for that level of likelihood, which is less negative. So it means it's higher. So yes, what I've been drawing here on the blackboard, that's exactly what's happening. You confirm, increase that. . . . . . In the book by Brown and Prescott, you find a whole section of that, where they write the following. The usual action when a negative variance component estimate is obtained for a random coefficient would be to refit the model with the random coefficient removed. And then they continue and they start talking about prop-mixed. And then they conclude here by writing the recommended action is then to remove the random coefficients one by one in decreasing order of complexity until all variance components become possible. . Does that solve the problem? Yes. Because you throw away what you don't understand. That's what you do. What I'll try to convince you of is that you have to be careful with this. I mean, this is typically what you often see in statistics books which are written as a cookbook, where you get recipes. This happens, do that. If you don't think, just do it. Now, that's not how statistics works. I mean, if you get that negative estimate, it must tell you something, either about your data or about your model. But it just doesn't happen by pure coincidence. So, let's try to think what's happened, what happens here. . What does that negative variance component mean? Well, when I want to understand it, I have to look at the model that I have fitted. And the model that I have fitted is a marginal model. So, let's look at the marginal model. Where did that parameter appear? It appears in that covariance function. And then definitely also in the variance function. And the variance function, we have calculated this earlier. . We calculated this earlier today, that variance function is a quadratic function over time with a quadratic term, the d2. So, if you plug in the estimates that you get, you get this variance function here for the right way. So, it seems to suggest that if the variance function is a quadratic function, it's one with negative curves. And then, you don't have to believe me, but again, I can prove it. Because we have completely balanced data, so we can very easily look at the variance function in this data. And that's what you see here. So, this is just calculating simple variances at the various time points. And then you get this function. Now, I'm not claiming here that this function is a perfect quadratic function. . But it's quite obvious, if you use a model that assumes this function to be quadratic, it should be a quadratic function with negative curves. And that's of course what this model is telling you. That the variance function looks like this. That it first increases and then it starts decreasing. What would happen if you follow that pattern? If you follow that recipe from the previous page, you would have kicked out that random slope. You would have only random intercepts in your model. And you would be fitting a model with constant variance function. So, you would assume that this whole function here is just constant, flat. But the least thing you should do is first test for that. Test if that's acceptable. Maybe it is. You cannot just do it. You have to test for that. And see whether that's a realistic assumption and an acceptable assumption. And whether that's supported by your data. This is one of the examples I was referring to this morning when I said every now and then you will be confused when you see some of the results in your outputs. Why are you confused? Because you get negative estimates for something you were interpreting as the variance of the random slopes. And that means you were interpreting these results in the hierarchical model. You are not fitting a hierarchical model, you are fitting a marginal model. And in that marginal model, there is nothing wrong with that negative estimate. So, the marginal model only requires that covariance threshold. That covariance structure here could be positive definite. That's the only thing you need. And that D matrix here not necessarily has to be positive definite. As long as that marginal covariance structure is okay. If that's a valid covariance, D can be anything. So, it means that, strictly speaking, if you look at this problem mathematically from the view, the parameter space that you can allow in that marginal model is larger than what you have to allow under the hierarchical model. The hierarchical model puts more restrictions on your parameters. As you clearly see here. So, is it now a problem that you have that negative estimate there? It depends. If you really believe that hierarchical model from this morning, then yes, it's a problem. Because this shows that the hierarchical model you have in mind is not good. It's not a correct model. But if you say, I'm happy with my marginal model, because the only thing I care about is to say something about the average strength in the three different groups, and to compare these average strengths, then all you need is a good marginal model. And then the only thing you worry about is that your marginal model correctly describes what happens in your data. And then the marginal model is fine. And then maybe you should always just put in the top-out function. To allow more values for your parameters. To allow the parameters go into regions where otherwise we would not be allowed to go. And this way you can further improve your model that you obtain. This will have serious consequences when we come to tests in the next time. Because you really need to carefully think about it. Do you not really want to stick to that hierarchical interpretation? Do you really believe that the hierarchical model was correct? If so, then you have to use the restricted parameter space. But if you say, I don't care whether that hierarchical model was correct, all I care about is the marginal model, then just put in the no-border option. And that will lead to different testing results and things like that. Any questions about this? So this is not an exceptional case. What you see here. It happens every now and then. And you typically, if you do not put in the no-border option, in your log screen you get that the final hashing is not positive definite. If you look into your output, you see that some of your parameters are zero. And of course it's tempting to do what they were suggesting. If you say, the variance is so close, so I can just take on these random slopes. But then you have to realize that you really reduce your model drastically, and maybe too drastically. And that this is not necessarily the realistic model that you then have to adopt. All right. That's with respect to estimation. Now it's time to start talking about inference. Because we've seen all these estimates. We've seen these average curves. But of course in practice what you really want to do is come to comparisons of groups, say something about precision of your estimates and things like that. So we will do that here in this chapter. And we will first look at inference for fixed effects. Afterwards we will talk about inference for variance components. And then at the end we will say something about information criteria. Because testing hypothesis is comparing models. And we've done that already this morning with likelihood ratio tests. So ways to compare models. There's many ways. But sometimes people also use information criteria. Like a Heide information criteria, Schwarz-Basin criteria and things like that. So that's why I also will briefly say something about that in the last section. Not very extensively, but something. Let's first take a look at the fixed effects. Well, that's my estimate for the fixed effects. Rated least squares, conditional on R. Conditionally on alpha, this estimate is multivariate normal. And what is the mean? Well, the mean is easily calculated. This is constant. Here, this is constant. I only need to take the expectation of Y. The expectation of Y is XI beta. And then you receive that this whole first part vanishes with the second part. And the second part is still there. So I have an unwise estimate. That's good news. That's very good stuff. What is the covariance? Well, the covariance, this is constant. So it comes up front. It comes at the end again. Then a covariance of the sum of independent subjects. This is the sum of covariances. Then again I have a constant. I have to oppose, multiply by the transpose of that constant. And in the middle I have the variance of Y then. What is the variance of Y? It's V. And that's the inverse of W. So those two will vanish. And then this term will vanish with that term. And this is what I have left. So it's very easy to do the calculations. Note that I have multivariate normality due to the fact that my data were normally distributed. But again, if that would not be the case, then due to central limit theories, I can still have asymptotic normality. Because I have here a summation over independent subjects. So it looks all very straightforward. Only, all the calculations I've done were conditional on knowing alpha. And in practice, you don't know alpha. So what you do, you just take your estimate for alpha. You plug it in. So you will have an unbiased estimate here for your parameters. And then in the standard errors, in the covariance matrix, you just replace alpha by that estimate. That's what you do. The first type of test you can do then is a wall test. What is a wall test? Well, the univariate context of the wall test is just that you have a parameter. Minus its true value. And you divide by the standard error of that parameter estimate. That's a wall test. And you compare that to a standard normal distribution. That's what you often do. Also in maximum likelihood context, you create what is called these statistics. Okay? So that's the wall test. But you can do that more generally. Sometimes you have like multi-variant hypotheses or linear combinations of parameters that you want to test. And in general, you might want to test a linear hypothesis about the entire vector beta. So we denote that here as L times beta equals to zero versus the alternative that it's not equal to zero. And that can be a set of linear hypotheses. All simultaneously. That you can test. And you will see examples of that later on. What would then be your test statistic? Well, this then is like multi-dimensional. So you cannot just do this simple calculation here. So what you do is you square it. So it means you take your estimate squared divided by the variance. But of course, if you work with matrices, it's your two estimates. Your estimate twice. And in the middle, you have the covariance inverse. And then while we had here a normal distribution, you would then get a chi-square distribution. Okay? So the asymptotic null distribution of that test statistic would be a chi-square distribution. And the degrees of freedom would be the rank of L. So if, for example, you just want to test the difference between two slopes. The rank of L would be 1. Because the contrast would be slope 1 minus slope 2. So that's just one linear hypothesis of the parameters. All right. There's one problem with that. And that is that, as I indicated before, what you do is you do all the calculations. And then at last instance, you replace alpha by alpha hat. Now, that means that the standard errors that you use here in your test statistic, they reflect the uncertainty that you have about beta. Assuming that you would know alpha. But you don't know alpha. And you need to replace it by an estimate. So it basically implies that... You can expect that the standard errors that we use, that they are too small. Because they do not reflect all the uncertainty in the whole estimation process. It only reflects the uncertainty about beta. Not the uncertainty about alpha. Of course, if you have huge data sets, that's just a small difference. Because then you have consistency in all your estimates. And your alpha hat will be very close to the true value. But in finite samples, not too large, that can make a difference. Then what do you do? Well, people have come up with a nice solution to that. The idea is the following. If we now use, let's say, here this normal distribution. As a reference distribution. As a reference distribution for my test statistic. If now there is more uncertainty in the whole estimation process. Than what is taken care of by that normal distribution. What you should do is replace that normal by another distribution which has bigger tails. Good guess. So what we will do is we will just stick to that same type of test statistic. A quadratic form of our estimates, let's say. Or linear hypothesis, linear combination of the estimates. And we will now, so it's still the same kind of situation. The test statistic will still be the same. But we will replace that normal distribution by a period. In the multivariate setting where we had a chi-squared before. It becomes an f-distribution. So it's a very natural idea. If there is more variability than what is represented by the normal. Just make the distribution a bigger test. So the idea is normal. Sorry, it's easy. But that's only the start of the whole problem. So can you not feel it? It's just at all. I mean in linear regression you can prove that you have these tests and x-tests. Not here. You can not do that. But in linear regression it is the same. Why do you have a t-distribution? Because you have to estimate the variance. If you would know the variance. It would be a normal distribution. So it's exactly the same. So it's the same idea. And in linear regression you can mathematically prove that the correct distribution then is x. Here you can not prove it. Why not? Because you don't have those four distributions. To work with. Yes? How do you understand there is always a t-distribution? In most of the cases. In most of the cases. Yes. By default in your output you will get your t. T-distribution. T-tests. F-tests and things like that. Now, especially due to the fact that you can not prove it and derive it mathematically. You are stuck. I mean the idea is nice that you will replace the normal by a t. But as you all know there is more than just one single t. There is many t's. There is an infinite number of t's. Which one will you then choose? So the choice you will have to make is how many degrees of freedom will I use? And since you cannot mathematically derive it. The only thing you can do is come up with like approximations. Based on simulations. Based on some analytical results. But approximations. And there are many methods that have been proposed in the literature. Some of which have been mentioned here. The containment method, Satter-Frey approximation. The most recent one is the Kenrod and Roger approximation. And they all do something different. And they all lead to different degrees of freedom. And that means they lead to different feedback. Which ones to use? That's difficult. Now we are not going to spend much time on that. Because you can spend the whole day discussing all the theory behind these methods. And then you get some feeling what is best in what situation. Nowadays probably Kenrod and Roger is like the best approximation. It also gives you like a small sample curve. And you can see that. It also gives you like a small sample correction. But the good news is. And that's the reason why we don't spend too much time on this. The good news is that in our context. In our context. It's not in general the case. In our context. The degrees of freedom are typically relatively large. For humans. So whether you need let's say 70 degrees of freedom. Or 72. Or 76. You hardly notice the difference in your feedback. But there's other contexts where you have like ANOVA models with random factors. Where like all the different. All the observations in your data set are correlated to all other observations. Then in cases like that with degrees of freedom. That you would need from much lower. Somewhere like 4 or 5 or 6. That affects the difference for you. But not in our context. And the reason is that since we have independent replication in our data sets. We have independent subjects. And we typically have a reasonable number of subjects in our data set. These degrees of freedom are relatively large. So we don't worry about that. If you do worry. Use ScanRef Project. It's like the best. And it's available in SAS. So there's just an option to be specified to do it. A little confusion. I'm a little confused about just this thing. Because I see that one chi-square variable divided by its degrees of freedom. And I don't see another chi-square variable. There are two. There are two. The first one is actually this multiplied by itself. And what is in the denominator is actually this part here. And that's like the other value. Remember? Yes? Maybe forget what I've said. Because that's going to confuse you. You're still thinking about the theory in linear regression. Where you'd written that . Yes. But that's not what we prove here. You still use that same test statistic. And you say, the test statistic. no distribution we used before has tails which are not sufficiently big, so we just replace it by something else. It's purely at all. Why is it just because of practice? Bigger tails. Bigger tails, yeah. That's very easy. And the normal view is replaced by a P, so the chi-squared is replaced by an F. Yeah. That's the only thing. I think that all this type of, can we go into the law of P, by the way? We'd like to say that the one that the, oh, but if you like, if you prefer a small P value, I would propose to always report zero. That's all P value. Yeah, but no, but speaking of, you wanna have some. Yeah, but it needs to be correct. I mean, if that P value is systematically too low, because it does not take into account all the variability in the estimation process. Then I find it extremely not. Yeah, then you know that you're, I mean, you're just fooling yourself. Yeah, I know. I'm not talking to you five percent, I'm talking to you four and a half percent. Well, you can do that, but it's the same. I mean, either you adjust your P value or you adjust your statistical value. But eventually, if you wanna do the right test, you should get the right result. Yes. It's how spin is stationed, the estimate, the number, the denominator, the coming unit, the principle, the message, the solution, the solution. But it's exactly, yeah, it is, but it's exactly what I said before. You have a normal, so think of the univariate case. You have a normal, and you want to look for something that looks like the normal, but with bigger pairs. So the first thing that comes to mind is the P. So, let's do the P. That's what happens. And you know that in simpler models, like in linear regression, it is a P. So it's not such a stupid choice, because you know in simple cases, you have the P as an exact solution. And you can just use that in approximation. And then it's a benefit. That's what happens. Yeah., all right. Let's apply this in an example. For the prostate data, remember this was our model. And we will now test some hypothesis, some of these linear hypothesis. And you can do that in contrast statements. In a contrast statement, you can specify such a linear combination of your, first, the P, and the P. So you can do that in contrast statements. In a contrast statement, you can specify such a linear combination of your, first, the P, and the P. And in contrast statements, you can specify such a linear combination of your, first, the P, and the P. You can specify several linear combinations. And we will do that here. So for example, I want to test whether local cancer cases evolve differently from metastatic cancer cases. So that I have these two cancer groups. It's a natural question to wonder whether there's a difference between these two cancer groups. So how would I test that? Well, if they would be equal in their evolution on average, they would have the same intercept. They would have the same linear time effect. And they would have the same quadratic time effect. So the null hypothesis I need to test is whether beta 4 is equal to beta 5, beta 9 is equal to beta 10, and beta 14 is equal to beta 15. And you immediately see that you can write that as L multiplied by beta is equal to 0 for the particular matrix L. So it fits in our general theory. It's a general. It's a linear hypothesis of the parameters. And I have a three-dimensional. The dimensionality is three. I have three of these linear hypotheses. So the numerator degrees of freedom in our F test would be 3. The denominator degrees of freedom, we would have to approximate that with any of these methods that have been proposed. How would we do that? Well, in the SAS program, you would add. So here you don't see the full program, because most is just the same as what you've seen before. You only see the model statement, because that's what I need to build my contrast statement. So you add a contrast statement. And you give the hypothesis a name such that you can recognize it in your output. And that's necessary, because you can specify as many contrast statements as you want in one row. So in your output, you should recognize your hypothesis that you've tested. So that's it. So here I called it the local regional cancer is equal to metastatic cancer. And then you specify, divide it by commas, these linear hypotheses. So for the group effects, it's the difference between the third and the fourth parameter. For the group by time effects, it's the difference between the third and the fourth parameter. For the group by time squared effects, it's the difference between the first and the third, the third and the fourth parameter. Each time separated by commas, meaning you start a new line in that L matrix. And I've now also added here the option chi-squares to get also the wall tests. Because by default, you would get the M tests. But just to see how different it would be, let's just do also the chi-square test. So here I've used the Sattertrain approximation for the degrees of freedom. If you would want to do Kenworth-Rothschild, you would replace that by Kr in Sats. So this is just for the illustration, Sattertrain. So here you put in ddfM, the denominator degrees of freedom is equal to Sattertrain. So that's where you specify the method that you want to use for the Sattertrain. So that's the way you do it. Thank you. And now let's do the calculation of your degrees of freedom. And this is like the additional output you would get. So a table with the title contrast statement results. Yes? You also just need to use the D test to get the contrast to the M function. Oh, but I do the F and the chi-squares. So by default, you see it here. By default you get your F. On top of that, I also get the F. So I get the F. So I get the F. So I get the F. And I also add the chi-squares. And this will be the result. Always the result. Yes. Always the result. Yes. Otherwise you would indeed not need the method for the degrees of freedom. So an additional table in the results section. The name of my hypothesis, three numerator degrees of freedom, because I have three hypotheses that I test simultaneously. Note that the denominator degrees of freedom are the same. So the numerator degrees of freedom are the same. So the denominator degrees of freedom are the same. So the denominator degrees of freedom are the same. So the denominator degrees of freedom are the same. So the numerator degrees of freedom are the same. This is purely it but consequently you should do the addition of this premier everywhere. Now, how can you do that? Well, if you假 vardır la판 to be here 24.4. So it's a p-distribution that we use with 24.4 degrees of freedom in the calculation of the p-values. Here you see the results for the chi-square test, here you see the results for the f-test. So, sorry, these are the two test statistics, these are the two p-values. And you see here in the p-values that yes, there is some difference. So with the chi-square tests your p-value is small. In your chi-square test you kind of use an infinite number of degrees of freedom, while the f-test uses only 24 degrees of freedom. Why only 24? Well, this is not such a big data set, it's only 54 subjects. So it's not a huge data set. Subtitles by the Amara.org community So contrast statements can be used to test whatever time of hypothesis. Now, we've done that and we have started with the big model and we have used contrast statements to like, in a backward selection, reduce the model step by step. And here you see the results of all these tests that were performed. First of all, we tested whether the age by time squared interaction was significant. It was not significant. Whether the age by time interaction was significant. Not significant. Next step was whether the quadratic time effect, the amount of curvature in the two cancer groups is the same. I mean, just from looking at these plots that you've seen, the amount of curvature was very similar. So, let's test that. And indeed that was accepted. The quadratic time effect is not needed for the non-cancer groups. So not needed for the control group, not needed for the denyphostatic hyperplasia group. The linear time effect is not significant for the controls. So it means that the controls basically stay horizontal. Now, as we mentioned, I think, this morning, if you do it in a backwards selection, you're never sure that the final model is acceptable. So what we want to do now is test all these hypotheses at once. Because I want to make sure that my final model is an acceptable model. How will I do that? A contrast state. And here you see it. So you see here the null hypotheses that need to be tested. So all the previous hypotheses all together. You can rewrite that in the format of L times beta is equal to zero. And that fits in the contrast state in sense. So here you see your contrast state. Again, it's still with the second plane approximation. And what you see is that you have here, you first for the age by time interaction, a one. Because there is only one parameter assigned to that. Age by time squared interaction also a one. For the group by time, the first parameter equal to zero means that there's no linear time effect for the controls. For the group by time squared interaction, the first parameter equal to zero means there's no curvature in the control. For the group by time squared interaction, And the second parameter equal to zero means there's no curvature in the denoted post-hapital hyperplasia. And then in the group by time squared interaction, the difference between the last two groups means that you're testing here whether the two cancer groups have the same curvature. And you will see here whether you now look at your F-test or your chi-square test, both are not significant at all. They don't even get in the neighborhood of not significant. So I'm happy to find the model is an acceptable model. And it means that from now on, I can use this model. It's a simplified version. It's a simplified version of the model where I still have my four intercepts, my age correction. I have my benign post-hapital hyperplasia, linear and local and metastatic cancer case, linear time trend. And I have only one. One quadratic time trend, which is the same for the local and the metastatic cancer groups. So it's a simplified version. And from now on, whenever we will use the prostate data, we will use this model. So you will see examples illustrated with this model. You can also use an estimate statement with a contrast statement. You only have a test. You get a P-value. You get a P-value. If your L is just one dimensional, just one row, you can specify it also in the estimate statement. And then you get a point estimate for that linear combination of the tracks. But that's only for one dimensional matrices L, so for rows. But it works in exactly the same way as we've done it. Thank you. Pardon? You cannot use comments, right? Yes. Then you estimate yourself. Yes. Exactly. Thank you. Now, there's one more thing that we should discuss. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. This is quite obvious that in whatever we do here, you need that covariance method. And I think you already have a theory that specifying a good covariance structure is not an easy task. So, what we would now do is we would make the��요 So, what we would now do is we would make the we will try to investigate how sensitive our results are with respect to possible mis-specifications in our comparison. Because, once again, we're only interested, in the majority of the cases, we're only interested in the need. Because these are the general trends, and we want to compare them, we want to do tests, and things like that. And yes, we do need that covariance structure, but we hope that our results are not too sensitive with respect to what we do during that covariance structure. But let's take a look at that. So what I'll now do is I will look again at the calculations that we've done relatively quickly on the previous page, and I will look where exactly I need that covariance measure. So, this is my estimate for my fixed effects. Of course, the covariance matrix gets in here, the W, that's the inverse covariance matrix. With respect to the distribution, we've already mentioned that. This expression, no matter what covariance matrix you would plug in, this expression is always asymptotically normal in this case. You have central indicators. So we don't have to worry too much about the distribution of that thing. What about the mean? I said before it's unbiased. But do I require a good covariance matrix in order for my estimate to be unbiased? Well, let's take a look at that. So I take the expectation of my estimate, that all I need to do is, instead of taking the expectation of my estimate, since everything here is constant, I need to take here the expectation of Y, I replace that by the Xi beta, and then this whole part vanishes with the first part, and I have beta of X. What did I use in these calculations? The only thing I used was that the expectation of Y is Xi beta. So it means that if my covariance matrix would be completely wrong, I still have an unbiased estimate. It also means that if you would just do plain linear regression, you would still have a good estimate. Just make that covariance matrix, sigma squared times the identity matrix, and you will have an unbiased estimate. Because whatever matrix you plug in, it's unbiased, what you get. You don't need that covariance matrix that you use in the calculations to be correct, in order for the results to be unbiased. That's nice. Okay? Because that's hopeful, because it really means that we will not be too sensitive in our results. What about the covariance matrix? Well, let's see how we did the calculations. The covariance matrix of beta hat, was, you've seen that expression before, there are many things constant here, so they come up front, then you have here the covariance of Y, then that constant is transposed again, then the variance of Y is V, it's the inverse of W, then this term vanishes with that, and I have this expression. And that's what I used in the WAL test, and the T test, and the F test, and all these tests that we discussed. What did I use in my calculations? That the variance of Y is equal to V. So this is the sensitive place. Here is the place where I used in my calculations the assumption that the covariance matrix of Y was correctly modeled. I used the assumption that the covariance matrix of Y was correctly modeled. The assumption that the variance of Y is given by this expression, Vi. So what does it mean? It means that if you misspecify your covariance matrix, you still have an undiagnosed estimate, but your standard errors are wrong. But that's not such a nice message. Because if your standard errors are wrong, test statistics would be wrong, confidence intervals would be wrong, E-values would be wrong, and that's what you're interested in. Is it not possible to solve that? Well, let's take a look. I need that variance-covariance matrix to go from here to here. But if I don't want to make that assumption, then why not use that topic question? I need that equation for my covariance matrix. Don't make that step to the next equation. That's a very good idea, but it's useless. Why? Because it contains the variance of Y, and I don't know the variance of Y. That's not the whole issue. Well, what do we do in statistics? We want to calculate the variance of Y. We use squared residuals. So instead of replacing here the variance of Y by what we assumed in the model, let's replace it by squared residuals. And that means that, as an estimate of variance of Y, you take your Y vector minus the fitted average squared. Is that a good estimate for B? Well, it's going to be a good estimate. It's going to be an unbiased estimate, provided that the mean was correctly specified. I mean, in order for this to be unbiased for B, I need that XI beta hat is a good estimate for the mean. But that's something that, by the way, I already needed in the previous step. I mean, to go from here to here, I already needed that the mean was correctly specified. So that's not a new assumption. So what, where does it bring me? It tells me that I can allow myself, to misspecify my covariance matrix. You use it in all the calculations. It will lead you to estimates. But even if that covariance matrix was wrong, I would have an unbiased estimate, and I can still have good standard errors by using the first expression here, but then the variance of Y replaced by the squared residuals. Well, that estimate for the covariance matrix is called the sandwich estimate, for obvious reasons. The other parts are the same, and you have something in between. You can call it Y or G or whatever. But in the literature, people call it the sandwich estimate. Or also, it's called the robust variance estimate. Why robust? Because it's correct, even if your covariance matrix was misspecified. Yes? In the linear regression, it wasn't called the Y. Well, yeah, that's right. Yeah. Y has developed a whole theory of what happens with your estimates if your model was misspecified. Exactly. That's what it does. Now, once you get to this place here, you can still use your PPS, your FTS, your FVOLTEST as before, but then the only thing you do is you use the standard errors by the robust ones. And the previous standard errors, we call them model-based, because they're only correct if you really assume them to be correct, the covariance matrix to be corrective also. People sometimes also call them naive, because you really have, they're only correct if you're that naive to believe that the model was correct. Okay? Yes? And the W matrix is also that estimated from the... No. No, W is just a matrix that you use in all the calculations. Right? No matter what W you have, that beta hat is unbiased, and that the covariance matrix is given by this expression, the only thing you need to solve is get an estimate for the variance of Y, and that's the square root of Z. But you will not, you cannot afterwards replace the W that you used in all your calculations by a square root of Z. So the only thing you do is you correct the standard errors. You keep the estimates, and you correct the standard errors. So, you don't need to process all of the inverse of E now. It is the inverse of E, but it may be, it may be misspecified. Yes? So you need to estimate, you need to estimate W then from your observed variance as well. Well, you, it's not, it would be a clever idea, only it's impossible to do that, because look at that estimate here. Uh-huh. Yeah? It's a column multiplied by a row. You cannot infer it. It has rank 1. So you can, you just cannot do it. You just cannot use that to like update your W. So what, the only thing you do is you, you just keep the W, and you know that it's misspecified. But that's not important, as long as your final inferences are okay. And what do you need to do to get your final inferences okay? Unwiseness, you already get for free. The only thing you need to do is update the standard errors. But the standard errors still contain, the wrong W, and the correct covariance. That's okay. Okay. Okay. Okay. But it is strange. When you see that on the first time, it is strange. Now, this is just one example of a much more general theory, which is called general, generalized estimating equations. And you will discuss that in row 2, in the context of discrete data. In the context of generalized linear models, you can apply the same principle for logistic regression, for Poisson regression. You just plug in a correlation matrix. You know that it may be wrong. Afterwards, you correct the standard errors. But here you see really analytically almost how it works, where it comes from. This is a theory, it's a very strong theory, that has been developed in, halfway through the 80s, by Liang and Ziegler, in Baltimore. I'm not going to go into the details of Baltimore, but it's known in Baltimore, along with even in the study of Haitian. And, since then, it has many applications. People use it all the time, because it's very strong. You can forget about that, the diagram structure. Just do whatever you like, and you fix your results. Afterwards. There are also side remarks. Because the extreme point of view would be that you can just do OLS. I mean, if you can just forget about that, you could say, well, let's just do plain linear regression. We do OLS. And then we will fix our standard errors afterwards. And that would be correct, only, in terms of gaining a certain amount of time, and then, you can just do OLS. And then, you can just do OLS. And then, in terms of gaining efficiency, it's important to have more of this than good covariance structure specifying. I mean, you can have many testing procedures and estimates for parameters that all are unbiased and correct testing procedures, but one can still be more powerful than another. But it turns out, if your covariance matrix that you use in the calculations is better, then you will gain power. So, in terms of efficiency, it's worthwhile spending a little bit of time to come up with a good covariance structure. But you should not take it too far and worry too much about it, because you can fix your results afterwards. This is actually the final argument why we don't worry too much about sigma r, because, first of all, it's only a small part of the variability structure, and second, even if it would be this specified, here we see that it will not have a major role. And if you would worry about it, you can still fix your results. There's one additional question and one additional side remark here. You have to be careful if you have missing data, because it's only valid and you will discuss that in full detail when you will talk about missing data issues. If you have missing data, then this approach, GBE, it's called generalized estimating equations, this approach is only valid if the reason for trouble is completely unrelated to your outcomes. As soon as there might be some relation, you're screwed. And you have to be careful. Yes? Um... Am I done? I might understand that the... your estimates of the piece actually remain the same. Yes. They remain the same. It's something you do afterwards. So you first calculate your estimates and then you correct your standard errors. You just use other standard errors. So the output you see is already recorded? Pardon? The output you see falls in? No. By default, what you would say, what you get is the model based. You can ask for the correct ones. Oh, okay. You will have to ask for them. On slide 5, how do you do it? How do you do it? Well, I will show you. Ah, I see. On slide 5, you have to... That's a long way back. I can feel it. Yes. Yes. We will have two versions of the... of the values. The first one is the... the estimate. Ah, but the structure is the same but if I have different estimates because I have extended my parameter space. Yeah, okay. And then you have different estimates for the fixed effect as well? Yes, because fixed effect depends on the covariance parameter. But not on the volume? How is that? This here has nothing to do with what we have now described. This here is just because what we did here is we allowed more values for the parameters than what we had before. Before you restricted this d22 to be positive. You really said it has to be positive. And now you allow negative values. And then you get... It's still the same model but with a more... more... more... more... more... more... more... extensive parameter space. So the parameter space is larger and therefore you can find a model that better fits your data. So you can... But the model structure for the random effect? Well... Then it would be different. I mean, the structure for the random effect the covariance structure here remains the same. Of course, you can start with a different model with a different component. With a different covariance structure. And then also you would get different estimates. But the discussion that we now have is that you have made a choice about the covariance structure. You've used that all the way through your calculations. You have estimates. And you know that the standard errors that you then obtain are wrong if the covariance may not be correct. And now what you say is let's replace these standard errors by others which are no longer assuming that the covariance matrix I had in mind was the correct one. Yeah, okay, good. I was just wondering about the... the mean and the estimates of the mean. The estimates of the mean will always change. They will always change. They change. I mean, not here. Because here we only change the standard error. So the standard error is always the same. So the standard error is always the same. So the standard error is always the same. So the standard error is always the same. So we have the estimates. They are fixed. They are always unbiased. They are always unbiased. Yes. They are always unbiased. So they are coinciding with the other estimates? They will not coincide. But you can have many unbiased estimates. I mean, you can have millions of unbiased estimates for one parameter. And then all the difference. And then all the bias. And that's what happens. So let's try this. And I'll do that for the prostate data. And then we will stop for today. So this is my reduced model we had already seen before. And in PROC MIXED, by default, you will get the model-based standard data. So the naive ones. Assuming that the model was correct. If you want the robust ones, if you want the corrected ones, you should add the option empirical to the PROC MIXED statement. And then what you see is the following. You see the following table. And now here I've reported the two standard errors next to each other. The first set is for the naive ones. The others are the robust ones. And note that this is something that only is valid for the fixed effects. You don't have that for the variance components. This is for the fixed effects. So here you see all the estimates. You have only one set of estimates because they remain the same. And it's the standard errors that you need to fix afterwards. So here you see the results. And then you see that it does make some difference. So, for example, here for the intercept in the control group, the naive standard error was larger than the robust one. But for example, for the local candidates group, it's the other way around. So you cannot say in advance that the model-based one will always be smaller than the other one. It can go in two directions. The model-based one sometimes is too small, sometimes is too large to be a good reflection of what is really the uncertainty about the parameters. Alright. Another example is when we go back to the growth data. The growth data that we analyzed and... extensively this morning. And we start the estimate. And now I fit again my model 1. The model we start from. Unstructured mean, unstructured co-values. The most general model you could think of. And what you see here are the estimates. And then the naive standard errors. And we also, just for fun, we calculate... the robust standard errors. And you see that there are quite some differences. Now what does that mean? If the two sets are very close, it kind of tells you that the model you assumed is OK. The further part is to add standard errors are. It basically means that probably the model you used is not OK. Because what you obtain by assuming that the model is right, is different from what you obtain if you do not make that assumption. And since here we have quite some differences, like sometimes 10% difference in the standard error, maybe our covariance matrix was wrong. But how can that be? It's an unstructured covariance matrix. But there's one thing that we did assume. And that is that boys and girls have the same covariance matrix. And that's not necessarily the case. So maybe that's why you need a correction in your standard error. But can we check that? Whether the boys and girls have different covariance structures? Yes, we can do that. The only thing we need to do in our SAS procedure, so this is the same procedure as we used yesterday, the only thing is that I've added now group is equal to gender. And there you specify that the variance components that are estimated should be estimated separately for the boys and separately for the girls. And you will get two covariance structures. Both unstructured, but with different estimates. We call that now model 0, because it's even more general than what we had as original model 1. And you can again utilize the integration test to compare the models. And what turns out is that it's highly significant. So the covariance matrix for the boys is significantly different from the covariance matrix for the girls. And the nice thing is, that if you use model 0, then the standard error, the estimates for all the fixed effects are still the same. And the standard error that you get, assuming model 0 to be correct, or the robust standard error that we got before. So this shows that this correction, this sandwich estimate, is doing the right thing. So it's the standard errors that we get here with the correction, or the standard errors that you would have observed if you had fitted the right model. So the correction is going in the right direction. I'm used to thinking that the robust of the plates, they are robust in some assumptions, but they are not quite efficient, as one is going to have multiple problems for the data. So that maybe means that somewhere you see, well, I'm not sure how that would be. It doesn't mean that this is the best model, perhaps that's a correct... The best model, here is one of the second covariance structures, definitely. Because if you formally test it, so you compare the model with two covariance structures, with the one with one common covariance structure for the two groups, it's highly significant. So in general, robust standard errors are on the little less efficient than... Yes, yes. I mean, if your model is right, then using that information in the calculations will lead to smaller standard errors. The risk is that the model is not right, and that the standard errors are wrong, that you would be... Yes. And can you scope the model when you realize what the problem is, or can you... Can you see the difference in the model? Yes. It again depends if you have missing data or not. Because if you would have missing data, then this correction would not necessarily be correct. It can help you in pointing you towards possible errors in your model. And then if you can improve your model, then it might still be wise to use that improved model, rather than sticking to the old, wrong model and use the first standard. So it's always the case, if you have a better model, you gain. But the only message here is, you should not worry too much about that covariance structure. Try to do as good as possible. Explore the data. Look at the covariance structure. Explore the correlation structure. But the positive thing is, that since we already realized that specifying that covariance makes it difficult to practice, it's reassuring to see that if we do have some doubts about our choices, we still have the possibility to correct our influences. And that is the main message. But you didn't show us how to do it, or what? Did you show us how to do it, or what? Did you show us what happened? But we don't have the code for that. You only have to add the epilogue. Ah. All right. Any further questions? Yes. I was wondering if Harley and the robot can do some sort of . It would seem that all the samples would end up with the, let's say, estimates of your covariance that is quite long. Yes. Because the whole theory is also asymptotic. Yeah. Is there, if people don't work on trying to figure out, OK, How far is it? Oh, we shouldn't, yeah, I'm sure people have done that, because this type of, these GEE methods, this robust inference, there's thousands of publications where that has been studied and investigated in many different contexts, and people have done many simulations to see how sensitive results are and things like that. But out of the top of my head, I cannot say that's where you have to look. Now, probably there's not going to be an easy message, because it all depends on the actual setting in which you work, because if you have more repeated measurements per subject, it's more tricky than if you have smaller numbers of repeated measurements per subject, because the more measurements per subject you have, the larger that covariance structure is, and the more things can go wrong in that covariance structure. If you have fewer observations, then you're more safe about it. So, the answer to your question is never going to be an easy answer, because you always have to say, for this type of outcomes, like continuous outcomes, that at most so many observations per subject and so many subjects, you're safe. And then you still need to, like, quantify how wrong your covariance structure would be, and then you would have to try out different scenarios, if it's only a little bit wrong, or if it's very wrong. And you would have to repeat that for binary outcomes and for counts and for ordinal outcomes, so it's an endless story. So, there's no... I don't believe... I'm sure there is no easy answer to that, or not a simple answer. There's not a unique answer to that. Was it a coincidence that the naive standard errors on the bottom zero was exactly the same as the sandwich? Are they analytically the same? Here they are the same, because we have completely balanced data. No missingness. Everyone measured at exactly the same higher points. So, here it's analytical. It's analytically the same. It's a strong theory, because it really tells you you can mess up the covariance. I mean, assuming you don't have missing data. I mean, that's the critical issue. But let's assume you don't have any missing data, and you have a sufficient number of subjects so that you don't worry too much about efficiency. What you can do is just do ordinary least squares. So, going back to the discussion of yesterday, you can basically do a two-sample t-test and correct your standard errors. And what you would get as standard errors, or standard errors you would have obtained if you had done a test t-test. Because the treatment effect would be the same. The treatment, your estimate for the treatment effect is the same under the fair t-test as under the two-sample t-test. So, there you see that the estimation is the same. It's the standard errors that need to be corrected. . . . . . . . . . But if you have unequally spaced measurements of different time points for different subjects, that's another scenario where you would have to pay attention. So it does make a difference. I mean, that you would be more sensitive if you have different time points for different subjects than if you have the same time points for all the subjects. But the method again is asymptotic. So asymptotically it works. How large your sample should be in order to have an accurate approximation depends on so many different aspects. And one of these aspects is whether or not your data are at fixed time points or not. And if there's many different time points, your data set will have to be larger than if you have nicely balanced data at a fixed time. Thank you. So there's numerous papers where people have done simulations in all different kinds of settings to study how sensitive results are. But always you end up with saying that you're pretty robust if you use that method. You can really mis-specify your model very much in order for your results to be accurate. I don't think that's true. Yeah. Or for your results to be completely wrong. The key is to go first. to be able to go first is then me. but it's a big operation. Yes. . Okay. Um. It's a real success. It's really just a simple- Yeah.ły What you have, not really, because... You just detect them and then you do the estimation. How would you... Detection of influential observations... But then you're talking about influential subjects. What about influential observations? So that by itself is not an easy issue. Influence can be on the level of an observation or on the level of a subject. And that's two different things. Because you use the residuals as the invariance. You use something more of a variance of the Y-shape instead of the residuals. Yeah. But of course, also in the estimation of your details, you use your normality as such. Also there you would have influence of your... All of those. Because the beta is a maximum likelihood of estimates. But afterwards you plug in the value for alpha. But they are maximum likelihood of estimates. So they also are... I mean, you can see it in the expression. It's very quickly squares. So let's say it's an average. And you know an average is influenced by extreme values. So as soon as you have extreme values, they will influence everything. Your estimate and standard values. Yeah. And a small question. In the formula for the self-explorer, you have the fundamental. Are you going to replace the values as well? No. You're going to use the specifier? You cannot replace it. Because if you would do that, if you would replace the V by the sum of squared residuals, you would put in a matrix. So you would have a matrix of rank one. It's a column multiplied by your row. And you cannot invert it. So you can never calculate that W. No. You're thinking that the variance of Y... Yes, but I don't invert it. The W is the inverse. Yes. And what you would then do is you would say Y by Y. Yes. And what you would then do is you would say Y by Y. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. This. Yes. It's possible to do that. So you're using the mist as a type of variance and using that for the... Yeah. Any other questions? All right. Well, if not, I think we can stop here for today. It's been a long day, and tomorrow we will continue with, like, the ratio tests. And be aware of the fact that we will be back in the same room as yesterday, for the entire day. Is time enough for me? Um, nine. Nine? Okay. Okay. I'm just going to force that on. Are you going to force it? Yes.