# A Coin with Random Bias

Hi. In this problem, we’re going to
be dealing with a variation of the usual coin-flipping
problem. But in this case, the bias
itself of the coin is going to be random. So you could think of it as, you
don’t even know what the probability of heads
for the coin is. So as usual, we’re still taking
one coin and we’re flipping it n times. But the difference here is that
the bias is because it was random variable Q. And
we’re told that the expectation of this bias is some
mu and that the variance of the bias is some sigma
squared, which we’re told is positive. And what we’re going to be
asked is find a bunch of different expectations,
covariances, and variances. And we’ll see that this problem
gives us some good exercise in a few concepts, a
lot of iterated expectations, which, again, tells you that
when you take the expectation of a conditional expectation,
it’s just the expectation of the inner random variable. The covariance of two random
variables is just the expectation of the product
minus the product of the expectations. Law of total variance is the
expectation of a variance, of a conditional variance plus the
variance of a conditional expectation. And the last thing, of course,
we’re dealing with a bunch of Bernoulli random variables,
coin flips. So as a reminder, for a
Bernoulli random variable, if you know what the bias is, it’s
some known quantity p, then the expectation of the
Bernoulii is just p, and the variance of the Bernoulli
is p times 1 minus p. So let’s get started. The problem tells us that we’re
going to define some random variables. So xi is going to be a Bernoulli
random variable for the i coin flip. So xi is going to be 1 if the i
coin flip was heads and 0 if it was tails. And one very important thing
that the problem states is that conditional on Q, the
random bias, so if we know what the random bias is, then
all the coin flips are independent. And that’s going to be important
for us when we calculate all these values. OK, so the first thing that we
need to calculate is the expectation of each of these
individual Bernoulli random variables, xi. So how do we go about
calculating what this is? Well, the problem
gives us a int. It tells us to try using the law
of iterated expectations. But in order to use it, you need
to figure out what you need the condition on. What this y? What takes place in y? And in this case, a good
candidate for what you condition on would be
the bias, the Q that we’re unsure about. So let’s try doing that
and see what we get. So we write out the law of
iterated expectations with Q. So now hopefully, we can
simplify it with this inter-conditional
expectation is. Well, what is it really? It’s saying, given what Q is,
what is the expectation of this Bernoulli random
interval xi? Well, we know that if we knew
what the bias was, then the expectation is just
the bias itself. But in this case, the
bias is random. But remember a conditional
expectation is still a random variable. And so in this case, this
actually just simplifies into Q. So whatever the bias is, the
expectation is just equal to the bias. And so that’s what
it tells us. And this part is easy because
we’re given that the expectation of q is mu. And then the problem also
defines the random variable x. X is the total number of heads
within the n tosses. Or you can think of it as a sum
of all these individual xi Bernoulli random variables. And now, what can
we do with this? Well we can remember that
linearity of expectations allows us to split
up this sum. Expectation of a sum, we could
split up into a sum of expectations. So this is actually just
expectation of x1 plus dot dot dot plus all the way to
expectation of xn. All right. And now, remember that we’re
flipping the same coin. We don’t know what the bias is,
but for all the n flips, it’s the same coin. And so each of these
expectations of xi should be the same, no matter
what xi is. And each one of them is mu. We already calculated
that earlier. And there’s 10 of them, so the
answer would be n times mu. So let’s move on to part B.
Part B now asks us to find what the covariance is
between xi and xj. And we have to be a little bit
careful here because there are two different scenarios, one
where i and j are different indices, different tosses,
and another where i and j are the same. So we have to consider both
of these cases separately. Let’s first do the case where
x and i are different. So i does not equal j. In this case, we can just apply
the formula that we talked about in the beginning. So this covariance is just equal
to the expectation of xi times xj minus the expectation
of xi times expectation of xj. All right, so we actually know
what these two are, right? Expectation of xi is mu. Expectation of xj is also mu. So this part is just
mu squared. But we need to figure out
what this expectation of xi times xj is. Well, the expectation of xi
times xj, we can again use the law of iterated expectations. So let’s try conditioning
on cue again. And remember we said
that this second part is just mu squared. All right, well, how can
we simplify this inner-conditional expectation? Well, we can use the fact that
the problem tells us that, conditioned on Q, the tosses
are independent. So that means that, conditioned
on Q, xi and xj are independent. And remember, when random
variables are independent, the expectation of product, you
could simplify that to be the product of the expectations. And because we’re in the
condition world on Q, you have to remember that it’s going
to be a product of two conditional expectations. So this will be expectation of
xi given Q times expectation of xj given Q minus
mu squared still. All right, now what is this? Well the expectation of xi given
Q, we already argued earlier here that it should just
be Q. And then the same thing for xj. That should also be Q. So this
is just expectation of Q squared minus mu squared. All right, now if we look at
this, what is the expectation of Q squared minus mu squared? Well, remember mu is just,
we’re told that mu is the expectation of Q. So what we
have is the expectation of Q squared minus the quantity
expectation of Q squared. And what is that, exactly? That is just the formula or
the definition of what the variance of Q should be. So this is, in fact, exactly
equal to the variance of Q, which we’re told is
sigma squared. All right, so what we found is
that for i not equal to j, the coherence of xi and
xj is exactly equal to sigma squared. And remember, we’re told that
sigma squared is positive. So what does that tell us? That tells us that xi and xj, or
i not equal to j, these two random variables
are correlated. And so, because they’re
correlated, they can’t be independent. Remember, if two intervals are
independent, that means they’re uncorrelated. But the converse isn’t true. But if we do know that two
random variables are correlated, that means that
they can’t be independent. And now let’s finish this by
considering the second case. The second case is when i
actually does equal j. And in that case, well, the
covariance of xi and xi is just another way of writing
the variance of xi. So covariance, xi, xi, it’s
just the variance of xi. And what is that? That is just the expectation
of xi squared minus expectation of xi quantity
squared. And again, we know what
the second term is. The second term is expectation
of xi quantity squared. Expectation of xi we know from
part A is just mu, right? So that’s just second term
is just mu squared. But what is the expectation
of xi squared? Well, we can think about
this a little bit more. And you can realize that xi
squared is actually exactly the same thing as just xi. And this is just a special case
because xi is a Bernoulli random variable. Because Bernoulli is
either 0 or 1. And if it’s 0 and you square
it, it’s still 0. And if it’s 1 and you square
it, it’s still 1. So squaring it doesn’t
really doesn’t actually change anything. It’s exactly the same thing as
the original random variable. And so, because this is a
Bernoulli random variable, this is exactly just the
expectation of xi. And we said this part
is just mu squared. So this is just expectation of
xi, which we said was mu. So the answer is just
mu minus mu squared. OK, so this completes part B.
And the answer that we wanted was that in fact, xi and xj are
in fact not independent. Right. So let’s write down some facts
that we’ll want to remember. One of them is that expectation
of xi is mu. And we also want to remember
what this covariance is. The covariance of xi and xj is
equal to sigma squared when i does not equal j. So we’ll be using these
facts again later. And the variance of xi is equal
to mu minus mu squared. So now let’s move on to the last
part, part C, which asks us to calculate the variance
of x in two different ways. So the first way we’ll
do it is using the law of total variance. So the law of total variance
will tell us that we can write the variance of x as a sum
of two different parts. So the first is variance of x
expectation of the variance of x conditioned on something
plus the variance of the initial expectation of x
conditioned on something. And as you might have guessed,
what we’re going to condition on is Q. Let’s calculate what these
two things are. So let’s do the two
terms separately. What is the expectation
of the conditional variance of x given Q? Well, what is– this, we can write out x. Because x, remember, is just
the sum of a bunch of these Bernoulli random variables. And now what we’ll do was, well,
again, use the important fact that the x’s, we’re told,
are conditionally independent, conditional on Q. And because they’re independent,
remember the variance of a sum is not the
sum of the variance. It’s only the sum of the
variance if the terms in the sum are independent. In this case, they are
conditionally independent given Q. So we can in fact split
this up and write it as the variance of x1 given Q
plus all the way to the variance of xn given Q. And in fact, all these
are the same, right? So we just have n copies of the
variance of, say, x1 given Q. Now, what is the variance
of x1 given Q? Well, x1 is just a Bernoulli
random variable. But the difference is that for
x, we don’t know what the bias or what the Q is. Because it’s some
random bias Q But just like we said earlier
in part A, when we talked about the expectation of x1
given Q, this is actually just Q times 1 minus Q. Because if
you knew what the bias were, it would be p times 1 minus p. So the bias times 1
minus the bias. But you don’t know what it is. But if you did, it
would just be q. So what we do is we just plug
in Q, and you get Q times 1 minus 2. All right, and now this
is expectation of n. I can pull out the n. So it’s n times the expectation
of Q minus Q squared, which is just n times
expectation Q, we can use linearity of expectations again,
expectation of Q is mu. And the expectation of Q 2
squared is, well, we can do that on the side. Expectation of Q squared is
the variance of Q plus expectation of Q quantity
squared. So that’s just sigma squared
plus mu squared. And so this is just going to
be then minus sigma squared minus mu squared. All right, so that’s
the first term. Now let’s do the second term. The variance the conditional
expectation of x given Q. And again, what we can do is we can
write x as the sum of all these xi’s. And now we can apply linearity
of expectations. So we would get n times one
of these expectations. And remember, we said earlier
the expectation of x1 given Q is just Q. So it’s the variance
of n times Q. And remember now, n is just– it’s not random. It’s just some number. So when you pull it out of a
variance, you square it. So this is n squared times
the variance of Q. And the variance of Q we’re
given is sigma squared. So this is n squared times
sigma squared. So the final answer is
just a combination of these two terms. This one and this one. So let’s write it out. The variance of x, then,
is equal to– we can combine terms
a little bit. So the first one, let’s
take the mus and we’ll put them together. So it’s n mu minus mu squared. And then we have n squared times
sigma squared from this term and minus n times sigma
squared from this term. So it would be n squared minus
n times sigma squared, or n times n minus 1 times
sigma squared. So that is the final answer
that we get for the variance of x. And now, let’s try doing
it another way. So that’s one way of doing it. That’s using the law of total
expectations and conditioning on Q. Another way of finding
the variance of x is to use the formula involving
covariances, right? And we can use that because x is
actually a sum of multiple random variables
x1 through xn. And the formula for this is, you
have n variance terms plus all these other ones. Where i is not equal to j, you
have the covariance terms. And really, it’s just, you can
think of it as a double sum of all pairs of xi and xj where if
i and j happen just to be the same, that it simplifies
to be just the variance. Now, so we pulled theses n terms
out because they are different than these because
they have a different value. And now fortunately, we’ve
already calculated what these values are in part B. So we
can just plug them them. All the variances
are the same. And there’s n of them,
so we get n times the variance of each one. The variance of each one we
calculated already was mu minus mu squared. And then, we have all
the terms were i is not equal to j. Well, there are actually n
squared minus n of them. So because you can take any one
of the n’s to be the first to be i, any one of
the n to be j. So that gives you
n squared pairs. But then you have to subtract
out all the ones where i and j are the same. And there are n of them. So that leaves you with n
squared minus n of these pairs where i is not equal to j. And the coherence for this case
where i is not equal to j, we also calculated in part B.
That’s just sigma squared. All right, and now if we compare
these two, we’ll see that they are proportionally
exactly the same. So we’ve use two different
methods to calculate the variance, one using this
summation and one using the law of total variance. So what do we learn
from this problem? Well, we saw that first of all,
in order to find some expectations, it’s very useful
to use law of iterated expectations. But the trick is to figure out
what you should condition on. And that’s kind of an
art that you learn through more practice. But one good rule of thumb is,
when you have kind of a hierarchy or layers of
randomness where one layer of randomness depends
on the randomness of the layer above– so in this case, whether or
not you get heads or tails depends on, that’s random, but
that depends on the randomness on the level above, which
was the random bias of the coin itself. So the rule of thumb is, when
you want to calculate the expectations for the layer where