Ldap1 L3P2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

So, are you ready for more? I still have these papers.

I could go through the morning, so I


don't know if I need to complete them. All right. So, let's take a look at how we will now fit
these models in SAS. And just as an illustration, what I take is this model that we had derived
before for the prostate data, where we have a quadratic function, where the linear quadratic
terms, as well as the intercepts, depend on the age and on the diagnostic status of the
patient. And then we have random intercepts, linear time effects, quadratic time effects. And,
of course, we still have that measurement error. And right now, I'm just going to use
measurement error. Later on, I will show you how to put also serial correlation in SAS. But
for now, let's just stick to just purely measurement error to keep it simple. How do we do it?
Well, we have a factor group, which defines the four diagnostic groups. I have a variable
time, because I will use time as a continuous covariate. As explained this morning, I will
have a copy of time, because I will also need it as a factor if I repeat the statement. And the
time CLSS variable then is the copy. H is the age at the time of diagnosis. My outcome is the
logarithmic transformation of the outcome. And my SAS program is then the following. So
what we need to keep in mind is that there's three different statements that we are using. A
model statement for the mean, a random statement for the random effects, and a repeated
statement for the errors. And that makes sense. Because this morning, we used the error. We
used a repeated statement in the context of the multivariate regression model. And like I
said, the multivariate regression model is just a simple version or a special case of our
general mixed model, in the sense that it does not contain random effects. So these
residuals, we have to model them in the repeated statement. So we will keep that repeated
statement. But now what comes additional on top of that is a random statement, because
now we will use random effects as well. So in the model statement, what do I have? All the
fixed effects. So a group effect, an age effect to model the intercepts. Group by time, age by
time to model the linear time effects. Group by time squared, age by time squared to model
the quadratic time effects. In the random statement, I'm specifying an intercept, a linear
time effect and a quadratic time effect. So these are my three random effects. And I also need
to tell SAS what are the subjects in my data set. Subjects are defined with a variable id. And I
need to tell SAS what covariance matrix I want. And that's the covariance matrix for my
random effects. It's the d matrix in our notation. And I will keep it unstructured. You can use
other choices. And you can even use all the same choices as we discussed this morning for
the repeated statement. So all the same choices are possible in SAS or feasible. But in most
situations, you just want to use an unstructured covariance matrix for these random effects.
And we've discussed that. The repeated statement is used to tell SAS what covariance
structure needs to be used for these residues, for the epsilons. So I need to tell SAS what are
my subjects. And I also need to tell SAS how the measurements have been ordered within
the subjects. And that's according to the variable time, which needs to be effective. So that's
in the class statement. Note that here now I have specified type equal to simple because I'm
using a C as a residual. So I'm using a residual covariance structure, just sigma squared
times the identity matrix. And that's the simple covariance structure in SAS. So that's how
you put together the different pieces of your model. Now, you can already start thinking of
what would happen if you also need to add zero correlations. Because I now have three
statements. But if I want to add zero correlations, would I then need a fourth statement for
that? We will discuss that tomorrow. We will do that tomorrow. There's also some additional
options here. Like G, G chord, B, B chord, R, R chord. These are just output options. So what
they do is the G is making sure that our D matrix is also printed as a matrix instead of just a
number of estimates all underneath each other. So it will just print it as a matrix. So that's a
3 by 3 matrix here in our code. We have three random effects. The G chord will calculate the
correlation matrix for the random effects. So it starts from the G and it calculates. G is a
covariance matrix. It calculates based on that covariance matrix and correlation matrix. And
it prints it out. The same for R and R chord. It gives you the residual covariance matrix. So
that's sigma. I and the resulting correlation matrix. Of course, in this example it's not very
informative because our code will only present you the identity matrix. Because that's the
correlation matrix for the errors. What is then V and V chord? That will print out the
covariance structure of the marginal model. So it's Z, D, Z prime plus sigma I. And based on
that covariance matrix, which you obtain with the option V, you can also calculate the
correlation matrix, the marginal fitted correlation matrix. And that's obtained by the option
V chord. So all these options just have to do with additional output that you will create in
your output stream. Note that I've now explicitly mentioned here, that the method is a
lemma. You don't have to do that because it's the default. A lemma is the default in SAS. You
can specify it, but it's not mandatory. But if you want to do maximum likelihood, like this
morning, you have to specify it. Because maximum likelihood is not the default. Alright, so
here you see a summary of all these options. Again, you see all these covariance structures.
That's not so important. Let me take a look at the estimates. And I've fitted the model twice.
Yes? . Yes. So in this example, you would not need the repeat statement. It was added here
for, just for the sake of illustration, to show you how you would have to put in other
covariance structures. If you want. . It uses, oh, okay. It uses the, it will print out the
covariance structure for the first subject in your database. And if you want it for subject
number 13, you need to specify b equals to 13. And then it will print out the covariance
structure for subject number 13. With the covariance values for that particular subject.
That's a good point. So here we have fitted the model twice. Once with maximum likelihood,
once with restricted maximum likelihood. Just to show you the differences. And what you
see in this big table is first all, well, this is the table for all the fixed effects. The estimates and
the standard errors. And you clearly see that these estimates are different. And that's what I
mentioned before. Beta hat depends on alpha. So if you plug in another estimate for alpha,
you will get another estimate for beta. So indirectly it has some impact on the estimation of
beta. So for example, if you want to see what is the amount of curvature on average in the
control group, then that's the parameter beta 12. It's estimated here as minus 0.10. Of
course, correct it for h. Because you also have h. The h by times square interaction in that
model, which you see here. So these are the parameters that you would need if you want to
plot. We will do that in two slides from now. If you want to plot the average evolution in the
different groups, then you would just leave these beta values and plot these fitted curves.
Here you see the covariance parameters, all these variance components. And what you see is
that the log-lighting rule is also reported and you get two different values. You need to be
aware of what we are doing here. Strictly speaking, that's impossible. For one model, one net
set, you can only have one log-lighting rule. But this value here, minus 30, one is strictly
speaking not the log-lighting rule. It's the enamel of log-lighting rules. It's the log-lighting
rule for these error contrasts. But in the output, you will get it mentioned as the light rule or
the log-lighting rule. So be aware. Here you clearly see that it's a different value. So it would
have implications if you would use that in a light rule. Basically. And can you use it? Do you
have any other particular rules you would like to use? This one? Sometimes. I could know.
No, no, I know. But you will know tomorrow. Oh, oh. We will discuss that in the next chapter
on inference where we will discuss all the different values. All the various testing
procedures like the regression testing that we made. Yes. Yes. Do you think that the
estimates are close to each other for the . Is there like a test . No, because it's this . No, I
mean the . So you are saying that the estimates are close to each other. Yes. Yes. But you
know that from theory, that was the whole idea for using an error estimation. Now you
started by saying they're very close. Yes, they are. Well, I think that's where the point is. And
I think that's the point. And I think that's the point. And I think that's the point. I think that's
the point. Yes. Yes. Yes. Yes. I think they're very close. I don't agree that they're so close. I
mean if you look for example at the variability in the intercepts, 0.4 about. Here it's 0.45. But
if you look how small that standard error is, I mean if you take into account the standard
errors, they're not that close. So it does make a difference. And what you do see, especially
here in these variances of these random effects, is that the maximum likelihood estimates
are systematically lower than the lambda estimates. And that's of course what we would
expect. Think of the linear regression model where the maximum likelihood estimate is sum
of square positrons divided by n. The unbiased one is divided by n minus p. So the maximum
likelihood is biased downwards. It's too small. It's systematically too small. And that's what
you see also here in these results. Note also that the variance of the residuals, the residual
variability sigma squared, is pretty small compared to all the other variances that you see.
And that again supports what we have seen already several times, that the within-subject
variability is small compared to the between-subject variability. Meaning the variability in
all these random effects, these variances are much larger than the variability that you have
in your residual sphere. And here you see the fitted curve. So instead of just looking at the
individual parameter estimates from two pages ago, you can just put them together and plot
the quadratic curves for your four different groups. That's graphically much more
informative. And especially if you want to report to a clinician, he or she would probably not
understand what all these parameters mean, but they do understand what they see here.
Now there's one additional problem. H is in that model. So we need to plug in a value for H.
And here the median H at diagnosis was used, and as far as I remember, it's about 72. So for
people of 72 years old at the moment of diagnosis, if you look at what was their past
evolution in PSA, so going back in the past, for controls it was this evolution. And you see it's
similar. It's decreasing. Going back in the past, it's decreasing, but it's a very moderate
decrease. And of course, a key question will be, is this significant? In the next chapter, we
will have to test that if we want to know that. In the benign prostatic hyperplasia case, the
decrease is more severe. So it means historically the increase was more severe. Is it
significant? We will test that. In the cancer groups, you see that is more severe. And then
there's curvature. So the cancer groups really increase much more quickly with some
curvature. And also that's something that we will want to compare to the benign prostatic
hyperplasia group because that's exactly what we wanted to know. And when I introduced
the dataset yesterday, remember I said what we want to know is whether we can distinguish
patients who have cancer in their evolution from patients who have benign prostatic
hyperplasia. So the first thing I want to do then is compare these cancer groups with the
benign prostatic hyperplasia and see if this difference that you seem to see here, whether
they're really significant or not. You had a question? Yeah, we have people who don't know
the spread of the virus. And I don't know if that's a good thing or a bad thing. Yeah. Yes, but
that has to do with that time scale that was chosen, which is the time years before the time
of the diagnosis. So you can completely mirror the picture and have your zero at the end.
Maybe you should then have like minus 25, minus 20, minus 50 of the . No problem. Yeah. All
right. The increase at the end of the cancer. Here. . That's artificial. And that's what I was
saying this morning. That's completely a result of the quadratic profile. So and that's also the
reason why they wanted to further fine-tune their model and that they wanted to use, you
know, a better model for this type of data. Now, we should not overemphasize this. If you
look in your original data, you have very few observations within the interval 20 and 25
years. So if you would add confidence intervals, there would be pretty large here. Here at the
beginning, you would have relatively narrow confidence intervals, but not at the end.
Because there's very few people who have been following for over 25 years. But you're right.
This is an artificial effect of the quadratic function that was assumed. Yeah. So you know a
quadratic function. If you need that curvature here to describe that evolution, if you
extrapolate sufficiently far, there comes a point where it starts increasing again. And that's
not realistic. Yeah. So you have to figure out what the Yes. The only thing is here that, I mean,
it's not completely extrapolation because there still is some data. But if there would be no
data, because then we all know you should never extrapolate. You should never go to a
region where there's no data. But here there's still a little bit of data. And you know that this
is not realistic. But it's not a major thing because if you, once again, if you look at how
uncertain that estimation is, even that, you know, it's not going to influence your
conclusions. But still, it would look, it would look better if you would not have that behavior
in that data. That's right. All right. Let's now do it for the rat data. So for the rat data,
remember we had the random intercepts, random slopes model. Here it is with a fixed
common intercept, then three fixed slopes. And now we fit that model with SAS, PROC
MIXED. It's a very simple model. And here are the results based on the lemma. So I have my
fixed intercept, my three fixed slopes, then my covariance parameters, the residual variance,
and the lemma log likelihood. There's only one thing which is a little bit strange. And that is
here this value. A value zero and no spanning error required. Of course, you could say maybe
there's very little variable between these atoms. That can, I mean, in fact, if the slopes are all
very similar and you would need a random intercepts model, then you would expect the
variance in the slopes to be very close to zero. But very close to zero is something different
from exactly zero. Of course, you could say, we only see three decimal digits. Well, I can
guarantee you, you can print as many as you like, it's going to be zero, zero, zero all the time.
So it's exactly zero. And that's kind of strange. Why is that strange? Because that's only an
estimate. An estimate obtained from optimizing a function. I mean, it's a function. I mean,
it's a ramble. So what is ramble? You have that likelihood of these error contrasts. So it's a
function and you look for the maximum. And you then happen to end up in exactly zero.
That's kind of strange. If you carefully think about this and you want to understand what is
happening, what I think is happening, is the following. This is my objective function. This is
my parameter d22. And I need to maximize that function with respect to all the parameters,
but also with respect to d22. And d22 is the variance of my random slopes. So it needs to be
positive. So I'm only maximizing. Over this region. Because below zero, that's a forbidden
region. So I have a restriction in my parameter space telling SAS you cannot allow negative
values there. Now, if this is the situation, the way I have drawn it here, then, of course it will
end up in exactly zero. Because you're maximizing this function in this region and it happens
to be at maximum in exactly that boundary value here. Now, that's just my guess. And you
don't have to believe me. But I can prove it. . How would I prove it? Well, there's only one
way to prove it. And that would be to tell SAS to allow negative estimates. . And then I would
just allow d22 to become negative. And if I allow that, and the result would be that I have a
negative estimate with a higher value here, . Then that's the proof of what I just
typo-decided. . Now, you might wonder, what is he now doing? Has he lost completely his
mind? Because you then have negative variance for random slopes. What the hell does that
mean? Well, wait for that. Because we first need to see where it ends up. We first need to see
where it ends up. What really what's happening. And you can do that. In SAS, you can add
the no-bound option in the frog-mix statement. So in your first line, you just add the words
no-bound. And that means no boundaries. Remove all the boundaries of the parameters. So
then you allow it to become negative. . And that's the fact that SAS allows you to do that. I
think it should suggest that it's not such a stupid idea. And of course, then you're stuck with
your interpretation. But that is something we have to worry about later. So let's first do it.
Here you see the results. So the first column is the column from the previous page. The
second column are the new results. And you immediately see what happens. A negative
estimate. And a value for that level of likelihood, which is less negative. So it means it's
higher. So yes, what I've been drawing here on the blackboard, that's exactly what's
happening. You confirm, increase that. . . . . . In the book by Brown and Prescott, you find a
whole section of that, where they write the following. The usual action when a negative
variance component estimate is obtained for a random coefficient would be to refit the
model with the random coefficient removed. And then they continue and they start talking
about prop-mixed. And then they conclude here by writing the recommended action is then
to remove the random coefficients one by one in decreasing order of complexity until all
variance components become possible. . Does that solve the problem? Yes. Because you
throw away what you don't understand. That's what you do. What I'll try to convince you of
is that you have to be careful with this. I mean, this is typically what you often see in
statistics books which are written as a cookbook, where you get recipes. This happens, do
that. If you don't think, just do it. Now, that's not how statistics works. I mean, if you get that
negative estimate, it must tell you something, either about your data or about your model.
But it just doesn't happen by pure coincidence. So, let's try to think what's happened, what
happens here. . What does that negative variance component mean? Well, when I want to
understand it, I have to look at the model that I have fitted. And the model that I have fitted
is a marginal model. So, let's look at the marginal model. Where did that parameter appear?
It appears in that covariance function. And then definitely also in the variance function. And
the variance function, we have calculated this earlier. . We calculated this earlier today, that
variance function is a quadratic function over time with a quadratic term, the d2. So, if you
plug in the estimates that you get, you get this variance function here for the right way. So, it
seems to suggest that if the variance function is a quadratic function, it's one with negative
curves. And then, you don't have to believe me, but again, I can prove it. Because we have
completely balanced data, so we can very easily look at the variance function in this data.
And that's what you see here. So, this is just calculating simple variances at the various time
points. And then you get this function. Now, I'm not claiming here that this function is a
perfect quadratic function. . But it's quite obvious, if you use a model that assumes this
function to be quadratic, it should be a quadratic function with negative curves. And that's of
course what this model is telling you. That the variance function looks like this. That it first
increases and then it starts decreasing. What would happen if you follow that pattern? If you
follow that recipe from the previous page, you would have kicked out that random slope.
You would have only random intercepts in your model. And you would be fitting a model
with constant variance function. So, you would assume that this whole function here is just
constant, flat. But the least thing you should do is first test for that. Test if that's acceptable.
Maybe it is. You cannot just do it. You have to test for that. And see whether that's a realistic
assumption and an acceptable assumption. And whether that's supported by your data. This
is one of the examples I was referring to this morning when I said every now and then you
will be confused when you see some of the results in your outputs. Why are you confused?
Because you get negative estimates for something you were interpreting as the variance of
the random slopes. And that means you were interpreting these results in the hierarchical
model. You are not fitting a hierarchical model, you are fitting a marginal model. And in that
marginal model, there is nothing wrong with that negative estimate. So, the marginal model
only requires that covariance threshold. That covariance structure here could be positive
definite. That's the only thing you need. And that D matrix here not necessarily has to be
positive definite. As long as that marginal covariance structure is okay. If that's a valid
covariance, D can be anything. So, it means that, strictly speaking, if you look at this problem
mathematically from the view, the parameter space that you can allow in that marginal
model is larger than what you have to allow under the hierarchical model. The hierarchical
model puts more restrictions on your parameters. As you clearly see here. So, is it now a
problem that you have that negative estimate there? It depends. If you really believe that
hierarchical model from this morning, then yes, it's a problem. Because this shows that the
hierarchical model you have in mind is not good. It's not a correct model. But if you say, I'm
happy with my marginal model, because the only thing I care about is to say something
about the average strength in the three different groups, and to compare these average
strengths, then all you need is a good marginal model. And then the only thing you worry
about is that your marginal model correctly describes what happens in your data. And then
the marginal model is fine. And then maybe you should always just put in the top-out
function. To allow more values for your parameters. To allow the parameters go into regions
where otherwise we would not be allowed to go. And this way you can further improve your
model that you obtain. This will have serious consequences when we come to tests in the
next time. Because you really need to carefully think about it. Do you not really want to stick
to that hierarchical interpretation? Do you really believe that the hierarchical model was
correct? If so, then you have to use the restricted parameter space. But if you say, I don't care
whether that hierarchical model was correct, all I care about is the marginal model, then just
put in the no-border option. And that will lead to different testing results and things like
that. Any questions about this? So this is not an exceptional case. What you see here. It
happens every now and then. And you typically, if you do not put in the no-border option, in
your log screen you get that the final hashing is not positive definite. If you look into your
output, you see that some of your parameters are zero. And of course it's tempting to do
what they were suggesting. If you say, the variance is so close, so I can just take on these
random slopes. But then you have to realize that you really reduce your model drastically,
and maybe too drastically. And that this is not necessarily the realistic model that you then
have to adopt. All right. That's with respect to estimation. Now it's time to start talking
about inference. Because we've seen all these estimates. We've seen these average curves.
But of course in practice what you really want to do is come to comparisons of groups, say
something about precision of your estimates and things like that. So we will do that here in
this chapter. And we will first look at inference for fixed effects. Afterwards we will talk
about inference for variance components. And then at the end we will say something about
information criteria. Because testing hypothesis is comparing models. And we've done that
already this morning with likelihood ratio tests. So ways to compare models. There's many
ways. But sometimes people also use information criteria. Like a Heide information criteria,
Schwarz-Basin criteria and things like that. So that's why I also will briefly say something
about that in the last section. Not very extensively, but something. Let's first take a look at
the fixed effects. Well, that's my estimate for the fixed effects. Rated least squares,
conditional on R. Conditionally on alpha, this estimate is multivariate normal. And what is
the mean? Well, the mean is easily calculated. This is constant. Here, this is constant. I only
need to take the expectation of Y. The expectation of Y is XI beta. And then you receive that
this whole first part vanishes with the second part. And the second part is still there. So I
have an unwise estimate. That's good news. That's very good stuff. What is the covariance?
Well, the covariance, this is constant. So it comes up front. It comes at the end again. Then a
covariance of the sum of independent subjects. This is the sum of covariances. Then again I
have a constant. I have to oppose, multiply by the transpose of that constant. And in the
middle I have the variance of Y then. What is the variance of Y? It's V. And that's the inverse
of W. So those two will vanish. And then this term will vanish with that term. And this is
what I have left. So it's very easy to do the calculations. Note that I have multivariate
normality due to the fact that my data were normally distributed. But again, if that would
not be the case, then due to central limit theories, I can still have asymptotic normality.
Because I have here a summation over independent subjects. So it looks all very
straightforward. Only, all the calculations I've done were conditional on knowing alpha. And
in practice, you don't know alpha. So what you do, you just take your estimate for alpha. You
plug it in. So you will have an unbiased estimate here for your parameters. And then in the
standard errors, in the covariance matrix, you just replace alpha by that estimate. That's
what you do. The first type of test you can do then is a wall test. What is a wall test? Well, the
univariate context of the wall test is just that you have a parameter. Minus its true value. And
you divide by the standard error of that parameter estimate. That's a wall test. And you
compare that to a standard normal distribution. That's what you often do. Also in maximum
likelihood context, you create what is called these statistics. Okay? So that's the wall test. But
you can do that more generally. Sometimes you have like multi-variant hypotheses or linear
combinations of parameters that you want to test. And in general, you might want to test a
linear hypothesis about the entire vector beta. So we denote that here as L times beta equals
to zero versus the alternative that it's not equal to zero. And that can be a set of linear
hypotheses. All simultaneously. That you can test. And you will see examples of that later on.
What would then be your test statistic? Well, this then is like multi-dimensional. So you
cannot just do this simple calculation here. So what you do is you square it. So it means you
take your estimate squared divided by the variance. But of course, if you work with
matrices, it's your two estimates. Your estimate twice. And in the middle, you have the
covariance inverse. And then while we had here a normal distribution, you would then get a
chi-square distribution. Okay? So the asymptotic null distribution of that test statistic would
be a chi-square distribution. And the degrees of freedom would be the rank of L. So if, for
example, you just want to test the difference between two slopes. The rank of L would be 1.
Because the contrast would be slope 1 minus slope 2. So that's just one linear hypothesis of
the parameters. All right. There's one problem with that. And that is that, as I indicated
before, what you do is you do all the calculations. And then at last instance, you replace
alpha by alpha hat. Now, that means that the standard errors that you use here in your test
statistic, they reflect the uncertainty that you have about beta. Assuming that you would
know alpha. But you don't know alpha. And you need to replace it by an estimate. So it
basically implies that... You can expect that the standard errors that we use, that they are too
small. Because they do not reflect all the uncertainty in the whole estimation process. It only
reflects the uncertainty about beta. Not the uncertainty about alpha. Of course, if you have
huge data sets, that's just a small difference. Because then you have consistency in all your
estimates. And your alpha hat will be very close to the true value. But in finite samples, not
too large, that can make a difference. Then what do you do? Well, people have come up with
a nice solution to that. The idea is the following. If we now use, let's say, here this normal
distribution. As a reference distribution. As a reference distribution for my test statistic. If
now there is more uncertainty in the whole estimation process. Than what is taken care of
by that normal distribution. What you should do is replace that normal by another
distribution which has bigger tails. Good guess. So what we will do is we will just stick to
that same type of test statistic. A quadratic form of our estimates, let's say. Or linear
hypothesis, linear combination of the estimates. And we will now, so it's still the same kind
of situation. The test statistic will still be the same. But we will replace that normal
distribution by a period. In the multivariate setting where we had a chi-squared before. It
becomes an f-distribution. So it's a very natural idea. If there is more variability than what is
represented by the normal. Just make the distribution a bigger test. So the idea is normal.
Sorry, it's easy. But that's only the start of the whole problem. So can you not feel it? It's just
at all. I mean in linear regression you can prove that you have these tests and x-tests. Not
here. You can not do that. But in linear regression it is the same. Why do you have a
t-distribution? Because you have to estimate the variance. If you would know the variance. It
would be a normal distribution. So it's exactly the same. So it's the same idea. And in linear
regression you can mathematically prove that the correct distribution then is x. Here you
can not prove it. Why not? Because you don't have those four distributions. To work with.
Yes? How do you understand there is always a t-distribution? In most of the cases. In most of
the cases. Yes. By default in your output you will get your t. T-distribution. T-tests. F-tests
and things like that. Now, especially due to the fact that you can not prove it and derive it
mathematically. You are stuck. I mean the idea is nice that you will replace the normal by a t.
But as you all know there is more than just one single t. There is many t's. There is an
infinite number of t's. Which one will you then choose? So the choice you will have to make
is how many degrees of freedom will I use? And since you cannot mathematically derive it.
The only thing you can do is come up with like approximations. Based on simulations. Based
on some analytical results. But approximations. And there are many methods that have been
proposed in the literature. Some of which have been mentioned here. The containment
method, Satter-Frey approximation. The most recent one is the Kenrod and Roger
approximation. And they all do something different. And they all lead to different degrees of
freedom. And that means they lead to different feedback. Which ones to use? That's difficult.
Now we are not going to spend much time on that. Because you can spend the whole day
discussing all the theory behind these methods. And then you get some feeling what is best
in what situation. Nowadays probably Kenrod and Roger is like the best approximation. It
also gives you like a small sample curve. And you can see that. It also gives you like a small
sample correction. But the good news is. And that's the reason why we don't spend too
much time on this. The good news is that in our context. In our context. It's not in general
the case. In our context. The degrees of freedom are typically relatively large. For humans.
So whether you need let's say 70 degrees of freedom. Or 72. Or 76. You hardly notice the
difference in your feedback. But there's other contexts where you have like ANOVA models
with random factors. Where like all the different. All the observations in your data set are
correlated to all other observations. Then in cases like that with degrees of freedom. That
you would need from much lower. Somewhere like 4 or 5 or 6. That affects the difference for
you. But not in our context. And the reason is that since we have independent replication in
our data sets. We have independent subjects. And we typically have a reasonable number of
subjects in our data set. These degrees of freedom are relatively large. So we don't worry
about that. If you do worry. Use ScanRef Project. It's like the best. And it's available in SAS.
So there's just an option to be specified to do it. A little confusion. I'm a little confused about
just this thing. Because I see that one chi-square variable divided by its degrees of freedom.
And I don't see another chi-square variable. There are two. There are two. The first one is
actually this multiplied by itself. And what is in the denominator is actually this part here.
And that's like the other value. Remember? Yes? Maybe forget what I've said. Because that's
going to confuse you. You're still thinking about the theory in linear regression. Where you'd
written that . Yes. But that's not what we prove here. You still use that same test statistic.
And you say, the test statistic. no distribution we used before has tails which are not
sufficiently big, so we just replace it by something else. It's purely at all. Why is it just
because of practice? Bigger tails. Bigger tails, yeah. That's very easy. And the normal view is
replaced by a P, so the chi-squared is replaced by an F. Yeah. That's the only thing. I think
that all this type of, can we go into the law of P, by the way? We'd like to say that the one that
the, oh, but if you like, if you prefer a small P value, I would propose to always report zero.
That's all P value. Yeah, but no, but speaking of, you wanna have some. Yeah, but it needs to
be correct. I mean, if that P value is systematically too low, because it does not take into
account all the variability in the estimation process. Then I find it extremely not. Yeah, then
you know that you're, I mean, you're just fooling yourself. Yeah, I know. I'm not talking to
you five percent, I'm talking to you four and a half percent. Well, you can do that, but it's the
same. I mean, either you adjust your P value or you adjust your statistical value. But
eventually, if you wanna do the right test, you should get the right result. Yes. It's how spin is
stationed, the estimate, the number, the denominator, the coming unit, the principle, the
message, the solution, the solution. But it's exactly, yeah, it is, but it's exactly what I said
before. You have a normal, so think of the univariate case. You have a normal, and you want
to look for something that looks like the normal, but with bigger pairs. So the first thing that
comes to mind is the P. So, let's do the P. That's what happens. And you know that in simpler
models, like in linear regression, it is a P. So it's not such a stupid choice, because you know
in simple cases, you have the P as an exact solution. And you can just use that in
approximation. And then it's a benefit. That's what happens. Yeah., all right. Let's apply this
in an example. For the prostate data, remember this was our model. And we will now test
some hypothesis, some of these linear hypothesis. And you can do that in contrast
statements. In a contrast statement, you can specify such a linear combination of your, first,
the P, and the P. So you can do that in contrast statements. In a contrast statement, you can
specify such a linear combination of your, first, the P, and the P. And in contrast statements,
you can specify such a linear combination of your, first, the P, and the P. You can specify
several linear combinations. And we will do that here. So for example, I want to test whether
local cancer cases evolve differently from metastatic cancer cases. So that I have these two
cancer groups. It's a natural question to wonder whether there's a difference between these
two cancer groups. So how would I test that? Well, if they would be equal in their evolution
on average, they would have the same intercept. They would have the same linear time
effect. And they would have the same quadratic time effect. So the null hypothesis I need to
test is whether beta 4 is equal to beta 5, beta 9 is equal to beta 10, and beta 14 is equal to
beta 15. And you immediately see that you can write that as L multiplied by beta is equal to
0 for the particular matrix L. So it fits in our general theory. It's a general. It's a linear
hypothesis of the parameters. And I have a three-dimensional. The dimensionality is three. I
have three of these linear hypotheses. So the numerator degrees of freedom in our F test
would be 3. The denominator degrees of freedom, we would have to approximate that with
any of these methods that have been proposed. How would we do that? Well, in the SAS
program, you would add. So here you don't see the full program, because most is just the
same as what you've seen before. You only see the model statement, because that's what I
need to build my contrast statement. So you add a contrast statement. And you give the
hypothesis a name such that you can recognize it in your output. And that's necessary,
because you can specify as many contrast statements as you want in one row. So in your
output, you should recognize your hypothesis that you've tested. So that's it. So here I called
it the local regional cancer is equal to metastatic cancer. And then you specify, divide it by
commas, these linear hypotheses. So for the group effects, it's the difference between the
third and the fourth parameter. For the group by time effects, it's the difference between the
third and the fourth parameter. For the group by time squared effects, it's the difference
between the first and the third, the third and the fourth parameter. Each time separated by
commas, meaning you start a new line in that L matrix. And I've now also added here the
option chi-squares to get also the wall tests. Because by default, you would get the M tests.
But just to see how different it would be, let's just do also the chi-square test. So here I've
used the Sattertrain approximation for the degrees of freedom. If you would want to do
Kenworth-Rothschild, you would replace that by Kr in Sats. So this is just for the illustration,
Sattertrain. So here you put in ddfM, the denominator degrees of freedom is equal to
Sattertrain. So that's where you specify the method that you want to use for the Sattertrain.
So that's the way you do it. Thank you. And now let's do the calculation of your degrees of
freedom. And this is like the additional output you would get. So a table with the title
contrast statement results. Yes? You also just need to use the D test to get the contrast to the
M function. Oh, but I do the F and the chi-squares. So by default, you see it here. By default
you get your F. On top of that, I also get the F. So I get the F. So I get the F. So I get the F. And I
also add the chi-squares. And this will be the result. Always the result. Yes. Always the result.
Yes. Otherwise you would indeed not need the method for the degrees of freedom. So an
additional table in the results section. The name of my hypothesis, three numerator degrees
of freedom, because I have three hypotheses that I test simultaneously. Note that the
denominator degrees of freedom are the same. So the numerator degrees of freedom are the
same. So the denominator degrees of freedom are the same. So the denominator degrees of
freedom are the same. So the denominator degrees of freedom are the same. So the
numerator degrees of freedom are the same. This is purely it but consequently you should
do the addition of this premier everywhere. Now, how can you do that? Well, if you假 vardır
la판 to be here 24.4. So it's a p-distribution that we use with 24.4 degrees of freedom in the
calculation of the p-values. Here you see the results for the chi-square test, here you see the
results for the f-test. So, sorry, these are the two test statistics, these are the two p-values.
And you see here in the p-values that yes, there is some difference. So with the chi-square
tests your p-value is small. In your chi-square test you kind of use an infinite number of
degrees of freedom, while the f-test uses only 24 degrees of freedom. Why only 24? Well,
this is not such a big data set, it's only 54 subjects. So it's not a huge data set. Subtitles by the
Amara.org community So contrast statements can be used to test whatever time of
hypothesis. Now, we've done that and we have started with the big model and we have used
contrast statements to like, in a backward selection, reduce the model step by step. And here
you see the results of all these tests that were performed. First of all, we tested whether the
age by time squared interaction was significant. It was not significant. Whether the age by
time interaction was significant. Not significant. Next step was whether the quadratic time
effect, the amount of curvature in the two cancer groups is the same. I mean, just from
looking at these plots that you've seen, the amount of curvature was very similar. So, let's
test that. And indeed that was accepted. The quadratic time effect is not needed for the
non-cancer groups. So not needed for the control group, not needed for the denyphostatic
hyperplasia group. The linear time effect is not significant for the controls. So it means that
the controls basically stay horizontal. Now, as we mentioned, I think, this morning, if you do
it in a backwards selection, you're never sure that the final model is acceptable. So what we
want to do now is test all these hypotheses at once. Because I want to make sure that my
final model is an acceptable model. How will I do that? A contrast state. And here you see it.
So you see here the null hypotheses that need to be tested. So all the previous hypotheses all
together. You can rewrite that in the format of L times beta is equal to zero. And that fits in
the contrast state in sense. So here you see your contrast state. Again, it's still with the
second plane approximation. And what you see is that you have here, you first for the age by
time interaction, a one. Because there is only one parameter assigned to that. Age by time
squared interaction also a one. For the group by time, the first parameter equal to zero
means that there's no linear time effect for the controls. For the group by time squared
interaction, the first parameter equal to zero means there's no curvature in the control. For
the group by time squared interaction, And the second parameter equal to zero means
there's no curvature in the denoted post-hapital hyperplasia. And then in the group by time
squared interaction, the difference between the last two groups means that you're testing
here whether the two cancer groups have the same curvature. And you will see here
whether you now look at your F-test or your chi-square test, both are not significant at all.
They don't even get in the neighborhood of not significant. So I'm happy to find the model is
an acceptable model. And it means that from now on, I can use this model. It's a simplified
version. It's a simplified version of the model where I still have my four intercepts, my age
correction. I have my benign post-hapital hyperplasia, linear and local and metastatic cancer
case, linear time trend. And I have only one. One quadratic time trend, which is the same for
the local and the metastatic cancer groups. So it's a simplified version. And from now on,
whenever we will use the prostate data, we will use this model. So you will see examples
illustrated with this model. You can also use an estimate statement with a contrast
statement. You only have a test. You get a P-value. You get a P-value. If your L is just one
dimensional, just one row, you can specify it also in the estimate statement. And then you
get a point estimate for that linear combination of the tracks. But that's only for one
dimensional matrices L, so for rows. But it works in exactly the same way as we've done it.
Thank you. Pardon? You cannot use comments, right? Yes. Then you estimate yourself. Yes.
Exactly. Thank you. Now, there's one more thing that we should discuss. Yes. Yes. Yes. Yes.
Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes.
Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes.
Yes. Yes. This is quite obvious that in whatever we do here, you need that covariance
method. And I think you already have a theory that specifying a good covariance structure is
not an easy task. So, what we would now do is we would make the��요 So, what we would
now do is we would make the we will try to investigate how sensitive our results are with
respect to possible mis-specifications in our comparison. Because, once again, we're only
interested, in the majority of the cases, we're only interested in the need. Because these are
the general trends, and we want to compare them, we want to do tests, and things like that.
And yes, we do need that covariance structure, but we hope that our results are not too
sensitive with respect to what we do during that covariance structure. But let's take a look at
that. So what I'll now do is I will look again at the calculations that we've done relatively
quickly on the previous page, and I will look where exactly I need that covariance measure.
So, this is my estimate for my fixed effects. Of course, the covariance matrix gets in here, the
W, that's the inverse covariance matrix. With respect to the distribution, we've already
mentioned that. This expression, no matter what covariance matrix you would plug in, this
expression is always asymptotically normal in this case. You have central indicators. So we
don't have to worry too much about the distribution of that thing. What about the mean? I
said before it's unbiased. But do I require a good covariance matrix in order for my estimate
to be unbiased? Well, let's take a look at that. So I take the expectation of my estimate, that
all I need to do is, instead of taking the expectation of my estimate, since everything here is
constant, I need to take here the expectation of Y, I replace that by the Xi beta, and then this
whole part vanishes with the first part, and I have beta of X. What did I use in these
calculations? The only thing I used was that the expectation of Y is Xi beta. So it means that if
my covariance matrix would be completely wrong, I still have an unbiased estimate. It also
means that if you would just do plain linear regression, you would still have a good estimate.
Just make that covariance matrix, sigma squared times the identity matrix, and you will have
an unbiased estimate. Because whatever matrix you plug in, it's unbiased, what you get. You
don't need that covariance matrix that you use in the calculations to be correct, in order for
the results to be unbiased. That's nice. Okay? Because that's hopeful, because it really means
that we will not be too sensitive in our results. What about the covariance matrix? Well, let's
see how we did the calculations. The covariance matrix of beta hat, was, you've seen that
expression before, there are many things constant here, so they come up front, then you
have here the covariance of Y, then that constant is transposed again, then the variance of Y
is V, it's the inverse of W, then this term vanishes with that, and I have this expression. And
that's what I used in the WAL test, and the T test, and the F test, and all these tests that we
discussed. What did I use in my calculations? That the variance of Y is equal to V. So this is
the sensitive place. Here is the place where I used in my calculations the assumption that the
covariance matrix of Y was correctly modeled. I used the assumption that the covariance
matrix of Y was correctly modeled. The assumption that the variance of Y is given by this
expression, Vi. So what does it mean? It means that if you misspecify your covariance matrix,
you still have an undiagnosed estimate, but your standard errors are wrong. But that's not
such a nice message. Because if your standard errors are wrong, test statistics would be
wrong, confidence intervals would be wrong, E-values would be wrong, and that's what
you're interested in. Is it not possible to solve that? Well, let's take a look. I need that
variance-covariance matrix to go from here to here. But if I don't want to make that
assumption, then why not use that topic question? I need that equation for my covariance
matrix. Don't make that step to the next equation. That's a very good idea, but it's useless.
Why? Because it contains the variance of Y, and I don't know the variance of Y. That's not the
whole issue. Well, what do we do in statistics? We want to calculate the variance of Y. We use
squared residuals. So instead of replacing here the variance of Y by what we assumed in the
model, let's replace it by squared residuals. And that means that, as an estimate of variance
of Y, you take your Y vector minus the fitted average squared. Is that a good estimate for B?
Well, it's going to be a good estimate. It's going to be an unbiased estimate, provided that the
mean was correctly specified. I mean, in order for this to be unbiased for B, I need that XI
beta hat is a good estimate for the mean. But that's something that, by the way, I already
needed in the previous step. I mean, to go from here to here, I already needed that the mean
was correctly specified. So that's not a new assumption. So what, where does it bring me? It
tells me that I can allow myself, to misspecify my covariance matrix. You use it in all the
calculations. It will lead you to estimates. But even if that covariance matrix was wrong, I
would have an unbiased estimate, and I can still have good standard errors by using the first
expression here, but then the variance of Y replaced by the squared residuals. Well, that
estimate for the covariance matrix is called the sandwich estimate, for obvious reasons. The
other parts are the same, and you have something in between. You can call it Y or G or
whatever. But in the literature, people call it the sandwich estimate. Or also, it's called the
robust variance estimate. Why robust? Because it's correct, even if your covariance matrix
was misspecified. Yes? In the linear regression, it wasn't called the Y. Well, yeah, that's right.
Yeah. Y has developed a whole theory of what happens with your estimates if your model
was misspecified. Exactly. That's what it does. Now, once you get to this place here, you can
still use your PPS, your FTS, your FVOLTEST as before, but then the only thing you do is you
use the standard errors by the robust ones. And the previous standard errors, we call them
model-based, because they're only correct if you really assume them to be correct, the
covariance matrix to be corrective also. People sometimes also call them naive, because you
really have, they're only correct if you're that naive to believe that the model was correct.
Okay? Yes? And the W matrix is also that estimated from the... No. No, W is just a matrix that
you use in all the calculations. Right? No matter what W you have, that beta hat is unbiased,
and that the covariance matrix is given by this expression, the only thing you need to solve is
get an estimate for the variance of Y, and that's the square root of Z. But you will not, you
cannot afterwards replace the W that you used in all your calculations by a square root of Z.
So the only thing you do is you correct the standard errors. You keep the estimates, and you
correct the standard errors. So, you don't need to process all of the inverse of E now. It is the
inverse of E, but it may be, it may be misspecified. Yes? So you need to estimate, you need to
estimate W then from your observed variance as well. Well, you, it's not, it would be a clever
idea, only it's impossible to do that, because look at that estimate here. Uh-huh. Yeah? It's a
column multiplied by a row. You cannot infer it. It has rank 1. So you can, you just cannot do
it. You just cannot use that to like update your W. So what, the only thing you do is you, you
just keep the W, and you know that it's misspecified. But that's not important, as long as
your final inferences are okay. And what do you need to do to get your final inferences okay?
Unwiseness, you already get for free. The only thing you need to do is update the standard
errors. But the standard errors still contain, the wrong W, and the correct covariance. That's
okay. Okay. Okay. Okay. But it is strange. When you see that on the first time, it is strange.
Now, this is just one example of a much more general theory, which is called general,
generalized estimating equations. And you will discuss that in row 2, in the context of
discrete data. In the context of generalized linear models, you can apply the same principle
for logistic regression, for Poisson regression. You just plug in a correlation matrix. You
know that it may be wrong. Afterwards, you correct the standard errors. But here you see
really analytically almost how it works, where it comes from. This is a theory, it's a very
strong theory, that has been developed in, halfway through the 80s, by Liang and Ziegler, in
Baltimore. I'm not going to go into the details of Baltimore, but it's known in Baltimore,
along with even in the study of Haitian. And, since then, it has many applications. People use
it all the time, because it's very strong. You can forget about that, the diagram structure. Just
do whatever you like, and you fix your results. Afterwards. There are also side remarks.
Because the extreme point of view would be that you can just do OLS. I mean, if you can just
forget about that, you could say, well, let's just do plain linear regression. We do OLS. And
then we will fix our standard errors afterwards. And that would be correct, only, in terms of
gaining a certain amount of time, and then, you can just do OLS. And then, you can just do
OLS. And then, in terms of gaining efficiency, it's important to have more of this than good
covariance structure specifying. I mean, you can have many testing procedures and
estimates for parameters that all are unbiased and correct testing procedures, but one can
still be more powerful than another. But it turns out, if your covariance matrix that you use
in the calculations is better, then you will gain power. So, in terms of efficiency, it's
worthwhile spending a little bit of time to come up with a good covariance structure. But
you should not take it too far and worry too much about it, because you can fix your results
afterwards. This is actually the final argument why we don't worry too much about sigma r,
because, first of all, it's only a small part of the variability structure, and second, even if it
would be this specified, here we see that it will not have a major role. And if you would
worry about it, you can still fix your results. There's one additional question and one
additional side remark here. You have to be careful if you have missing data, because it's
only valid and you will discuss that in full detail when you will talk about missing data
issues. If you have missing data, then this approach, GBE, it's called generalized estimating
equations, this approach is only valid if the reason for trouble is completely unrelated to
your outcomes. As soon as there might be some relation, you're screwed. And you have to be
careful. Yes? Um... Am I done? I might understand that the... your estimates of the piece
actually remain the same. Yes. They remain the same. It's something you do afterwards. So
you first calculate your estimates and then you correct your standard errors. You just use
other standard errors. So the output you see is already recorded? Pardon? The output you
see falls in? No. By default, what you would say, what you get is the model based. You can ask
for the correct ones. Oh, okay. You will have to ask for them. On slide 5, how do you do it?
How do you do it? Well, I will show you. Ah, I see. On slide 5, you have to... That's a long way
back. I can feel it. Yes. Yes. We will have two versions of the... of the values. The first one is
the... the estimate. Ah, but the structure is the same but if I have different estimates because
I have extended my parameter space. Yeah, okay. And then you have different estimates for
the fixed effect as well? Yes, because fixed effect depends on the covariance parameter. But
not on the volume? How is that? This here has nothing to do with what we have now
described. This here is just because what we did here is we allowed more values for the
parameters than what we had before. Before you restricted this d22 to be positive. You
really said it has to be positive. And now you allow negative values. And then you get... It's
still the same model but with a more... more... more... more... more... more... more... extensive
parameter space. So the parameter space is larger and therefore you can find a model that
better fits your data. So you can... But the model structure for the random effect? Well... Then
it would be different. I mean, the structure for the random effect the covariance structure
here remains the same. Of course, you can start with a different model with a different
component. With a different covariance structure. And then also you would get different
estimates. But the discussion that we now have is that you have made a choice about the
covariance structure. You've used that all the way through your calculations. You have
estimates. And you know that the standard errors that you then obtain are wrong if the
covariance may not be correct. And now what you say is let's replace these standard errors
by others which are no longer assuming that the covariance matrix I had in mind was the
correct one. Yeah, okay, good. I was just wondering about the... the mean and the estimates
of the mean. The estimates of the mean will always change. They will always change. They
change. I mean, not here. Because here we only change the standard error. So the standard
error is always the same. So the standard error is always the same. So the standard error is
always the same. So the standard error is always the same. So we have the estimates. They
are fixed. They are always unbiased. They are always unbiased. Yes. They are always
unbiased. So they are coinciding with the other estimates? They will not coincide. But you
can have many unbiased estimates. I mean, you can have millions of unbiased estimates for
one parameter. And then all the difference. And then all the bias. And that's what happens.
So let's try this. And I'll do that for the prostate data. And then we will stop for today. So this
is my reduced model we had already seen before. And in PROC MIXED, by default, you will
get the model-based standard data. So the naive ones. Assuming that the model was correct.
If you want the robust ones, if you want the corrected ones, you should add the option
empirical to the PROC MIXED statement. And then what you see is the following. You see the
following table. And now here I've reported the two standard errors next to each other. The
first set is for the naive ones. The others are the robust ones. And note that this is something
that only is valid for the fixed effects. You don't have that for the variance components. This
is for the fixed effects. So here you see all the estimates. You have only one set of estimates
because they remain the same. And it's the standard errors that you need to fix afterwards.
So here you see the results. And then you see that it does make some difference. So, for
example, here for the intercept in the control group, the naive standard error was larger
than the robust one. But for example, for the local candidates group, it's the other way
around. So you cannot say in advance that the model-based one will always be smaller than
the other one. It can go in two directions. The model-based one sometimes is too small,
sometimes is too large to be a good reflection of what is really the uncertainty about the
parameters. Alright. Another example is when we go back to the growth data. The growth
data that we analyzed and... extensively this morning. And we start the estimate. And now I
fit again my model 1. The model we start from. Unstructured mean, unstructured co-values.
The most general model you could think of. And what you see here are the estimates. And
then the naive standard errors. And we also, just for fun, we calculate... the robust standard
errors. And you see that there are quite some differences. Now what does that mean? If the
two sets are very close, it kind of tells you that the model you assumed is OK. The further
part is to add standard errors are. It basically means that probably the model you used is not
OK. Because what you obtain by assuming that the model is right, is different from what you
obtain if you do not make that assumption. And since here we have quite some differences,
like sometimes 10% difference in the standard error, maybe our covariance matrix was
wrong. But how can that be? It's an unstructured covariance matrix. But there's one thing
that we did assume. And that is that boys and girls have the same covariance matrix. And
that's not necessarily the case. So maybe that's why you need a correction in your standard
error. But can we check that? Whether the boys and girls have different covariance
structures? Yes, we can do that. The only thing we need to do in our SAS procedure, so this is
the same procedure as we used yesterday, the only thing is that I've added now group is
equal to gender. And there you specify that the variance components that are estimated
should be estimated separately for the boys and separately for the girls. And you will get two
covariance structures. Both unstructured, but with different estimates. We call that now
model 0, because it's even more general than what we had as original model 1. And you can
again utilize the integration test to compare the models. And what turns out is that it's
highly significant. So the covariance matrix for the boys is significantly different from the
covariance matrix for the girls. And the nice thing is, that if you use model 0, then the
standard error, the estimates for all the fixed effects are still the same. And the standard
error that you get, assuming model 0 to be correct, or the robust standard error that we got
before. So this shows that this correction, this sandwich estimate, is doing the right thing. So
it's the standard errors that we get here with the correction, or the standard errors that you
would have observed if you had fitted the right model. So the correction is going in the right
direction. I'm used to thinking that the robust of the plates, they are robust in some
assumptions, but they are not quite efficient, as one is going to have multiple problems for
the data. So that maybe means that somewhere you see, well, I'm not sure how that would
be. It doesn't mean that this is the best model, perhaps that's a correct... The best model,
here is one of the second covariance structures, definitely. Because if you formally test it, so
you compare the model with two covariance structures, with the one with one common
covariance structure for the two groups, it's highly significant. So in general, robust standard
errors are on the little less efficient than... Yes, yes. I mean, if your model is right, then using
that information in the calculations will lead to smaller standard errors. The risk is that the
model is not right, and that the standard errors are wrong, that you would be... Yes. And can
you scope the model when you realize what the problem is, or can you... Can you see the
difference in the model? Yes. It again depends if you have missing data or not. Because if you
would have missing data, then this correction would not necessarily be correct. It can help
you in pointing you towards possible errors in your model. And then if you can improve
your model, then it might still be wise to use that improved model, rather than sticking to
the old, wrong model and use the first standard. So it's always the case, if you have a better
model, you gain. But the only message here is, you should not worry too much about that
covariance structure. Try to do as good as possible. Explore the data. Look at the covariance
structure. Explore the correlation structure. But the positive thing is, that since we already
realized that specifying that covariance makes it difficult to practice, it's reassuring to see
that if we do have some doubts about our choices, we still have the possibility to correct our
influences. And that is the main message. But you didn't show us how to do it, or what? Did
you show us how to do it, or what? Did you show us what happened? But we don't have the
code for that. You only have to add the epilogue. Ah. All right. Any further questions? Yes. I
was wondering if Harley and the robot can do some sort of . It would seem that all the
samples would end up with the, let's say, estimates of your covariance that is quite long. Yes.
Because the whole theory is also asymptotic. Yeah. Is there, if people don't work on trying to
figure out, OK, How far is it? Oh, we shouldn't, yeah, I'm sure people have done that, because
this type of, these GEE methods, this robust inference, there's thousands of publications
where that has been studied and investigated in many different contexts, and people have
done many simulations to see how sensitive results are and things like that. But out of the
top of my head, I cannot say that's where you have to look. Now, probably there's not going
to be an easy message, because it all depends on the actual setting in which you work,
because if you have more repeated measurements per subject, it's more tricky than if you
have smaller numbers of repeated measurements per subject, because the more
measurements per subject you have, the larger that covariance structure is, and the more
things can go wrong in that covariance structure. If you have fewer observations, then you're
more safe about it. So, the answer to your question is never going to be an easy answer,
because you always have to say, for this type of outcomes, like continuous outcomes, that at
most so many observations per subject and so many subjects, you're safe. And then you still
need to, like, quantify how wrong your covariance structure would be, and then you would
have to try out different scenarios, if it's only a little bit wrong, or if it's very wrong. And you
would have to repeat that for binary outcomes and for counts and for ordinal outcomes, so
it's an endless story. So, there's no... I don't believe... I'm sure there is no easy answer to that,
or not a simple answer. There's not a unique answer to that. Was it a coincidence that the
naive standard errors on the bottom zero was exactly the same as the sandwich? Are they
analytically the same? Here they are the same, because we have completely balanced data.
No missingness. Everyone measured at exactly the same higher points. So, here it's
analytical. It's analytically the same. It's a strong theory, because it really tells you you can
mess up the covariance. I mean, assuming you don't have missing data. I mean, that's the
critical issue. But let's assume you don't have any missing data, and you have a sufficient
number of subjects so that you don't worry too much about efficiency. What you can do is
just do ordinary least squares. So, going back to the discussion of yesterday, you can
basically do a two-sample t-test and correct your standard errors. And what you would get
as standard errors, or standard errors you would have obtained if you had done a test t-test.
Because the treatment effect would be the same. The treatment, your estimate for the
treatment effect is the same under the fair t-test as under the two-sample t-test. So, there
you see that the estimation is the same. It's the standard errors that need to be corrected. . . .
. . . . . . But if you have unequally spaced measurements of different time points for different
subjects, that's another scenario where you would have to pay attention. So it does make a
difference. I mean, that you would be more sensitive if you have different time points for
different subjects than if you have the same time points for all the subjects. But the method
again is asymptotic. So asymptotically it works. How large your sample should be in order to
have an accurate approximation depends on so many different aspects. And one of these
aspects is whether or not your data are at fixed time points or not. And if there's many
different time points, your data set will have to be larger than if you have nicely balanced
data at a fixed time. Thank you. So there's numerous papers where people have done
simulations in all different kinds of settings to study how sensitive results are. But always
you end up with saying that you're pretty robust if you use that method. You can really
mis-specify your model very much in order for your results to be accurate. I don't think
that's true. Yeah. Or for your results to be completely wrong. The key is to go first. to be able
to go first is then me. but it's a big operation. Yes. . Okay. Um. It's a real success. It's really just
a simple- Yeah.ły What you have, not really, because... You just detect them and then you do
the estimation. How would you... Detection of influential observations... But then you're
talking about influential subjects. What about influential observations? So that by itself is
not an easy issue. Influence can be on the level of an observation or on the level of a subject.
And that's two different things. Because you use the residuals as the invariance. You use
something more of a variance of the Y-shape instead of the residuals. Yeah. But of course,
also in the estimation of your details, you use your normality as such. Also there you would
have influence of your... All of those. Because the beta is a maximum likelihood of estimates.
But afterwards you plug in the value for alpha. But they are maximum likelihood of
estimates. So they also are... I mean, you can see it in the expression. It's very quickly
squares. So let's say it's an average. And you know an average is influenced by extreme
values. So as soon as you have extreme values, they will influence everything. Your estimate
and standard values. Yeah. And a small question. In the formula for the self-explorer, you
have the fundamental. Are you going to replace the values as well? No. You're going to use
the specifier? You cannot replace it. Because if you would do that, if you would replace the V
by the sum of squared residuals, you would put in a matrix. So you would have a matrix of
rank one. It's a column multiplied by your row. And you cannot invert it. So you can never
calculate that W. No. You're thinking that the variance of Y... Yes, but I don't invert it. The W is
the inverse. Yes. And what you would then do is you would say Y by Y. Yes. And what you
would then do is you would say Y by Y. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes.
Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. Yes. This. Yes. It's possible to do that. So you're using
the mist as a type of variance and using that for the... Yeah. Any other questions? All right.
Well, if not, I think we can stop here for today. It's been a long day, and tomorrow we will
continue with, like, the ratio tests. And be aware of the fact that we will be back in the same
room as yesterday, for the entire day. Is time enough for me? Um, nine. Nine? Okay. Okay. I'm
just going to force that on. Are you going to force it? Yes.

You might also like