David Mazières: “The Stellar Consensus Protocol” | Talks at Google


MALE SPEAKER: Hello, everybody. Welcome to this Tech Talk
on “The Stellar Consensus Protocol.” My name is [INAUDIBLE]. And I’m a director of
engineering here at Google. I want you to welcome our
speaker today, David Mazieres. David is a professor of computer
science at Stanford University and the chief scientist
at The Stellar Foundation. David also wears a
lot of cool T-shirts. And so give a warm
welcome to David. Thank you. [APPLAUSE] DAVID MAZIERES: Thank you. So before we get
started, Stanford requires me to tell you that
this work is paid consulting. It wasn’t part of my regular
Stanford University duties and responsibilities. So if you think about banking,
the core job of a bank is really keeping track
of customer assets and liabilities. If you walk into a Wells Fargo
branch here in California and you deposit
a bunch of money, they’re going to put your bills
in the same drawer as everybody else’s bills. And so they have to
keep track of the fact that they owe you so much money
and you owe them so much money and so on. And typically the way
they would do this is by replicating a ledger. So the ledger is the most
important thing the bank has. So they might keep
at least three copies of this thing around. So even if there’s
a data center that experiences some big
disaster, they’ll still know how much
money is in your account. But this begs the question,
what about interbank transfers? So suppose now that
somebody over here in China wants to send me
$150 in California. Well, now the bills aren’t
sitting in the same drawer. In fact, they’re not
even the same currency. And so what happens
is this Chinese bank and the bank in California
need to find some third bank, a corespondent bank that maybe
the Chinese bank has an account with so that they can adjust
the ledger at that bank. And then that bank
can do whatever it needs to do to send
the money to Wells Fargo. This is all very complicated. And the net result is that
it adds a lot of delay. And it adds a lot of cost. And so you end up seeing
headlines like this where people end up paying sometimes
huge commissions– up to, like, 22%– just to send money
around between countries. And a lot of the fees that
are paid to send money around are actually shouldered
by the poorest people. So it kind of brings this
kind of ironic comment that it turns out it’s
expensive to be poor because these poor people are
paying more in banking fees. It’s not all bad
news, of course. If you’re Western
Union, you managed to make $5.6 billion dollars
moving just $85 billion dollars around in 2014. But, of course, those
fees are coming out of the pockets of people
who can least afford it. Now there’s actually
something worse than having to pay excessive
fees to Western Union. And that is you might not
be able to send money around at all. So imagine that you
want to send money home to your family in
Somalia, and it just turns out there’s no way to do that. And in fact, that happened. So you see headlines like this
where people are literally unable to send money back to
family members who need it. So what’s going on here? Well, if you think about
today’s financial networks, they actually look a lot like
email did before the internet. So before we had
internet mail, we had things like UUCP and BITNET. Email addresses used
to look like this. There were these things
called bang paths where you had host1, bang,
host2, bang, host3, bang user. And these addresses, they’re
not globally meaningful. And they add a lot of
delay because the email had to go through a bunch of
hops with store and forward and nowadays it’s pretty easy. It doesn’t matter where
you have an email account, if it’s your Gmail or Yahoo
mail or wherever, you can just send email to anybody else. So wouldn’t it be nice
if we could send money around the internet
in the same way? And that’s really
been the core focus of what we’ve been working
on at the Stellar Development Foundation. And the first observation
here is that it actually would be possible to move
money around like this if we had some kind of
global ledger keeping track of who had what money. Because now, instead of having
to find some correspondent bank in order to send the
money from China to California, the two banks could
just make the adjustment in the global ledger
that $150 had moved here. And then the money could just
pop out in my account here. It would be much easier. The problem, of
course, is if you’re going to have this
kind of global ledger, there’s no single
party that would be trusted enough to manage
this ledger that keeps track of how much money
all of the banks have in this international
financial network. And even kind of trying
to split the trust across some kind
of consortium is problematic because
who gets more seats? The US? Or China because China
has a bigger population? Then if Cuba has a
seat at the table, can US banks participate? You can imagine all these thorny
questions that would come up. So this is problematic. On the other hand, we do
have this existence proof of a global
decentralized system that works really well without any
kind of centralized control. And that is inter-domain
routing on the internet. The internet is this
giant global network. It works pretty well. But it’s built out of these
pairwise peering and transit relationships between
pairs of ISPs. So what I’m going
to talk about today is how we can use the same
kind of pairwise trust to achieve this secure
global consensus. And once we have this kind
of internet-level consensus, we’ll use that to realize
the global ledger. Now why consensus? Well, turns out consensus
is the key to replication. The most common way to
build a replicated system is to use what’s called the
replicated state machine approach where
every replica starts in the same initial state. And then you agree on the
exact sequence of operations that you’re going to apply. And therefore all the replicas
maintain the same state. And the second one, making sure
that all the replicas agree on the exact sequence
of operations, turns out to be
the hard problem. So in this case, maybe we want
to agree that operation 7 here is this transfer of $150 into
my account in California. And so essentially,
the whole talk is going to be concentrated
on this question of, how does the whole
world agree that, yes, this $150 transfer
actually happened and was part of operation 7? So here’s an outline
of what I’ll talk. First, I’m going to
give some background and cover Byzantine Agreement. But I’m going to kind of present
it– there’s a novel twist. I’m going to present it
through this new lens, which is viewing everything
in terms of voting on what I’ll call irrefutable
and neutralizable statements. And we’ll see what that means. And then I’ll discuss
Federated Byzantine Agreement, which is a generalization
of Byzantine Agreement, to a model that could
actually accommodate this kind of
internet-level consensus. Then I’ll talk about
failure resistance and how well we can do in
terms of tolerating failures. And then I’ll present
my construction, which is the Stellar
Consensus Protocol. So let’s get into the
consensus problem. The goal of the
consensus problem is for multiple replicas to
agree on an output value. And so you have a bunch of
agents in the system, each of which is storing a replica. Each agent starts
with an input value. And the input value
would typically be the candidate for
the nth operation in some replicated
state machine log. And then the agents are going
to exchange a bunch of messages and use those messages to
agree on one of their inputs. And then they’re going
to output the value. In this case, v2’s input was 9. Maybe the three agents pick 9. And so then they output 9. And the important point here is
that the output is right once. Once you output the
value 9, it could cause ATM machines to spit
out $20 bills or something. Irreversible actions may happen. So you can never
change your mind. You better be sure everybody
agrees on the value before you output it. So there’s some properties you
might want of your consensus protocol. So one is that you’d like
all the outputs produced to have the same value. Otherwise, it’s not much
of a consensus protocol. So that’s called agreement. You’d also like the alpha
value to actually equal one of the agent’s input values. Otherwise, there’s
a trivial solution. Everybody just outputs
0, and it’s not a very interesting protocol. So together, this
agreement and validity, which is what you call
when the output actually equals one of the inputs,
I’ll call this safety. Those two properties
constitute safety. You might also
want your protocol to guarantee that eventually
all the non-faulty nodes will actually output a value. So that’s called liveness. And finally, you might want
your consensus protocol to provide fault
tolerance, meaning that you can recover from
the failure of an agent at any point in the
execution of a protocol. And when it comes
to fault tolerance, we actually care about
two different kinds of fault tolerance. In a fail-stop model, you assume
that agents fail by crashing, and then they just never
send any more messages. And in the Byzantine-fault
tolerant model, you assume that agents can
fail by behaving arbitrarily. So you can imagine an
attacker takes over an agent and sends the worst possible
messages with the goal of trying to mess everybody up. Now there’s this seminal
result in distributed system due to Fischer,
Lynch, and Paterson called the FLP
impossibility result, which states that no deterministic
consensus protocol can guarantee all three of safety,
liveness, and fault tolerance in an asynchronous system. So that’s a bummer since
that’s what we limited. But let’s try to build up an
intuition for why this is. So if you remember
a couple slides ago, we had these three agents. And they chose v2’s
input, which was 9. And they output the value 9. Now suppose we turn
back the clock, and we restart this protocol. But right as the agents
start to execute, there’s some kind of
transient network outage. And v2 is cut off from
the rest of the network. So if v1 and v3 can
talk, but not v2. So at this point, in
an asynchronous system, v1 and v3 may start
to think, oh, it looks like v2 may have failed. And so if the protocol
was fault tolerant, then v1 and v3 need to be able
to complete the protocol and terminate without
talking to v2. And since they don’t
know what v2’s input is, they’re going to pick another
input, so either 3 or 7. Let’s say that in this
case, they pick 7. And then they output 7. And at some point, of course,
the network may come back up. And then v2 can talk
to the other nodes. And if the protocol is safe
and provides agreement, then v2 also better output
7 so that it can match with the other nodes’ output. So what you can see here
is that the output value chosen by the nodes
actually depended on the network behavior. Depending on whose messages
got delivered first, they could pick the value 9 or
they could pick the value 7. So when a consensus protocol
is in this kind of state where the network can affect
which of several values can be output, we’ll
say that the protocol is in a bivalent state. Or you would maybe
think multivalent, but I’m using bivalent to be
consistent with the literature here. So conversely, you
could reach a point where there’s only one
output value possible. The fate of the
system’s been sealed. 7 is the only
possible output value. And in that case, we’ll say
the system is univalent. Or if the only
output value is 7, we’ll say the system
is 7-valent, say. In addition, you hope
this won’t happen, but if you don’t have a very
good consensus protocol, you could also end up in a
situation where you get stuck. And there are one
or more faulty nodes that can never output
a value, no matter what the– even if the network
decides to start cooperating. So in that case, we’ll say
that the protocol is stuck. So if you remember
from a few slides ago, we said the output
is right once. You can never change your mind. And, of course, all the
output values need to agree. And so what that means
is that it’s never possible to output
a value if you’re in a bivalent state because
once a particular value has been output, well,
then that should seal the state of the system. Any other output values
should be the same. So that means that if you
have one of these systems and you have an
execution that starts in a bivalent state and then
it eventually terminates, then it must at some point have
reached a univalent state. So now here’s an intuition
behind this FLP impossibility result. Let’s imagine that you
have a terminating execution of a bivalent system. You start bivalent. Then eventually everybody
outputs some value. And let’s let m be
the last message that was received by any node
in a bivalent state. So receiving m is what flipped
the system from a bivalent to a univalent state. So in that case, we’ll
call m the deciding message because somehow
delivering m is what decided the fate of the system. Now let’s turn back
the clock and say that instead of delivering
m, the network had delayed m, and a bunch of other messages
had gotten delivered first. And those messages, of
course, might have altered the state of the system. And they might have
altered the state of the system in such a way
that when m is eventually delivered– much later
than it was originally– it’s no longer a
deciding message. And just delivering m is
not enough to flip over to a univalent state. So in that case, we’ll
say that the message m has been neutralized. Neutralizing a
message means taking what would have been
a deciding message and now making it no
longer interesting or no longer able to decide
the fate of the system. So here’s the overview
of the FLP proof. You can show that there
are bivalent starting configurations. And then if your system
is fault tolerance, it turns out that the
network can neutralize any deciding message. And the reason is
that whatever nodes sent that deciding
message might have failed. So you can’t always
count on being able to receive that message. So there has to be
some way to recover and say, OK, let’s stop
waiting for that message and do something else so that
we can make some progress and pick a value. And so taken together,
what these two things mean is that the system can
essentially remain bivalent in perpetuity if
every time there’s about to be a deciding message
the network pathologically delays just that one message
until it’s been neutralized. And so you’ll never
end up transitioning to a univalent state. So how do we cope given that
we want to achieve consensus and a lot of systems
depend on it? Well, what you want to
do is design protocols that terminate in practice
even if they’re not guaranteed to terminate. And the most important property
they need to achieve for this is that you have to
avoid stuck states. You want to make sure that
whatever weird stuff is happening on the network,
if eventually you fix the network and
things start behaving in a more friendly way,
that you can recover and everybody can agree on a
value and output that value. Good. So now let’s look at a straw man
attempt to implement consensus. So let’s say that you have
a system with N nodes. And for now, to
keep things simple, we’ll assume fail-stop behavior. So they might crash, but
they won’t act arbitrarily. And now we’re going to pick
some quorum sized T where T is greater than N over 2. So T includes a
majority of nodes. And we say that if
T nodes– namely, in other words, a quorum– ever
all vote for the same value, then it becomes safe
to output that value. In this example up here, you
can notice that quorum A here has all voted for the value 9. So at that point it becomes
safe to output the value 9. And this guarantees agreement
because the quorum size includes a majority of nodes. So any two quorums will
intersect in at least one node. And nodes aren’t allowed
to change their votes. So there can be at most a one
value that will be output. Now what’s the problem? Well, the reason this isn’t a
very good consensus protocol is because you can
have stuck states. So, in particular, imagine
that one of these nodes that was voting 9 actually failed. Now you no longer have T
nodes voting for the value 9. Some people might have
heard all T votes. But some people
might not have heard the vote of this node that
failed because it won’t be around to retransmit its vote. And so there will be nodes that
cannot actually output a value. Another problem you could have
is you could have a split vote. So everybody votes
for a different value. And you can’t change your vote. Then you’ll never be
able to output any value because no value
will ever actually have a quorum voting for it. Make sense? So this is what voting
actually gives us. If we try to use voting
as a consensus protocol, we’ll find that we start
in some bivalent state. And then if we get
enough votes, we might end up in an a
valent state for some value a that we’re voting Or we could
end up in some a bar valent state– where a bar, we’ll let
that stand for any value that’s not equal to a
and contradicts a. But if enough people vote,
then maybe the whole system would end up agreeing on a. Otherwise, you can end
up in this stuck state over on the right. So voting might give
us what we wanted, but it might put us
in the stuck state that we can never recover from. So what this means is
you can never directly vote on the actual consensus
question you care about, never take an up or down
vote on is operation 7 a credit of $150 to this bank
in California to my account. You can’t vote on that
because if that gets stuck, you’ll never be able to
figure out how much money is in my account. There’ll be no more
well-defined notion of that. So what kind of statements
is it safe to vote on? Well, first of all, it would
be OK to vote on a statement if that statement
never gets stuck because that means you could
eventually reach agreement. And second of all, it’s
OK to vote on a statement if you can somehow break the
hold that that statement has on the actual consensus
question that we care about. So if a statement
gets stuck, but that doesn’t prevent me
from figuring out how much money is in my
account, well, then that’s OK. So let’s look at this
in a bit more detail. So first observation is that
the reason that you end up in these stuck states
is that nodes aren’t allowed to change their vote. If everybody votes
for a different value, and they can’t
change their vote, then they can’t turn around and
all vote for the same value, or at least have a quorum
vote for the same value. But suppose you have a statement
where, for some reason, you can reason about the fact
that no correct node would ever vote against it or
vote for anything that contradicts that statement. Well, then you won’t get stuck. And so I’m going to call
that kind of statement an irrefutable statement. So irrefutable statements
are fine to vote on because nobody is prevented
from voting on it by past votes, so eventually, as long
as there’s a quorum, that quorum can vote for
the irrefutable statement. The second kind
of statements are statements whose hold on
the consensus question can actually be broken. And if you recall from when
we were talking about the FLP proof that we said that for
a fault-tolerant system, you need to be
able to neutralize any deciding message. And so, we’ll say that a
statement is neutralizable if you can transition
the system to a state where you no longer care about
the value of that statement. As it turns out, most of
the trickiness involved in designing a
consensus protocol is in the question of how do
you formulate statements that are useful towards
consensus, but can also be neutralized if you need to. So there are two popular
ways of designing these neutralizable statements. One is the
ballot-based approach. And the other is what I call
the view-based approach. The view-based approach
may actually be simpler, and you should probably use it
if you can get away with it. But in this
particular context, I couldn’t figure out
how to make it work. So we’ll talk about
ballot-based neutralization. So ballot-based
neutralization is something that was introduced
by Leslie Lamport for Paxos. And the idea is that you vote to
commit or abort ballots, where the two are contradictory. You can only vote one way. And each ballot
is a pair– n, x– where n is just a
counter that can keep incrementing so you have
an unlimited supply of ballots. And x is a candidate
output value for the consensus protocol. And if you actually
have a quorum that votes to commit n, x, then we’ll
say that it’s safe to output the value x. Now in order to make
this work, we’re going to have to
preserve an invariant. And the invariant is that
all committed and all stuck ballots are going to have
to have the same value x. So how are we going to
preserve this invariant? Well, the way we preserve this
invariant, this thing, you can’t just vote to commit
a ballot arbitrarily. Before you vote to
commit a ballot, you have to prepare that ballot. And what does it mean
to prepare the ballot? Well, you prepare a ballot
n, x by aborting every ballot with a lesser or equal
counter and prime that has a different
value not equal to x. And because of the way
it’s been designed, even if some ballot
n, x gets stuck, there’s nothing that prevents
you from preparing n plus 1, x. So that means you can
neutralize a stuck ballot and move on to the next ballot. So let’s maybe illustrate
this with an example. So here, let’s say we have eight
candidate values, a through h. And initially, all the ballots
in the system are bivalent. And then let’s say the system
agrees that they’ve prepared 1, g. In other words, they’ve
aborted all the other ballots. At that point, you might
vote to commit 1, g. But you could lose that vote. It could turn out the 1, g
she was aborted after all. And so then you’ll move
on to ballot round two. And now maybe you’ll
try to prepare 2, f. So let’s say you
manage to prepare 2, f. And now you vote to commit it. And it might be that,
well, you know what? That vote gets stuck. It turns out we’ll never
manage to commit 2,f. Oh, well. Just bump the counter again. And now let’s move to 3, f. And now you’ll prepare 3,
f and vote to commit 3, f. And maybe this time,
the vote succeeds. So you’ve managed to
commit ballot 3, f. And now we’re done. We’ve chosen the
value of the system. We can output the value f. And notice that at
this point, nobody cares about the status of 2, f. anymore. That’s been totally–
it’s been neutralized. Whether 2, f is aborted, or
committed, or we’ll never know, it just doesn’t matter because
the question we actually care about– which is, should
we output the value f– that one has been decided. And if for some
reason not everybody can see the quorum
that voted for 3, f because maybe some nodes failed
or something, that’s fine. Just bump the counter again. And now we can prepare
and commit 4, f. So this is Paxos. This may be an unusual
way to present Paxos. But we’ll see that it’s a way
that generalizes to the model that we need for this kind
of global financial network. OK. So, so far, we’ve been
in the fail-stop case. We’ve been imagining that
the nodes have failed. They just crashed. And they never output
another message. What happens when you
have Byzantine failure? So the big difference
here is that when you have a Byzantine
failure, the failed node, it can actually change its vote. It can give inconsistent
messages to different people. It can tell one person,
oh, I vote for this value, and somebody else, oh,
I vote for that value. So in the fail-stop
case, in order to guarantee safety
and, in particular, in order to guarantee
agreement, we needed this guarantee that
any two quorums shared a node. And we said, well, that
node won’t change its vote. So therefore any two quorums
that vote for a value will vote for the same value. Now we need to kind of
strengthen that assumption. We need to say that
any quorums need to share a non-faulty node
because a faulty node can just lie and give
contradictory votes. So if you have N nodes
in your quorum size is T, then the minimum overlap
between any two quorums is going to be to
2T minus N. So what that means, if you want
an honest node in there, then the greatest number
of Byzantine failures you can tolerate is going
to be 2T minus N minus 1. And I’m going to
call that value f sub s because it’s the
failure tolerance with respect to safety, basically. Now what about liveliness? Well, for liveness you
actually want to make progress. You actually need
there to be a quorum. And so if you have N nodes
and a quorum size of T, that means you can tolerate
up to N minus T failures and still have a
quorum left over with which to make progress. So I’ll call that value f sub
l because it’s the failure tolerance for liveliness. And there exist
long-standing solutions to the Byzantine agreement
protocol in this problem, in this model. And typically what
people do is that they set the number of nodes, N, to
be 3f plus 1 for some integer f, and the quorum size
T to be 2f plus 1. And the reason they do
that is that it gives you the equilibrium point where
fs equals fl which equals f. In other words, this
failure tolerance for both safety and
liveness are the same. But for the purposes
of this talk, it’s going to be important to
keep those numbers different. So I’ll just keep referring
to them as fs and fl so you know which one I
mean when I talk about this. So here again is the picture
of what we get from a vote. And so a question
that we need to answer at this point is how do
we know when we’re done? So if you’re in the system, and
you see T votes for some value a, then you know system
is at least a valent. You know the system’s
not going to agree on some different value. But you don’t know that all
the non-faulty nodes will eventually agree on this
value a or even know that the system reached
an a valent state. So what do we do about this? Well, if you remember
from the previous slide, we said that f sub s was
the greatest number of nodes we could survive
failing and still be able to guarantee safety. So that means if fs plus
1 nodes actually fail, you’ve lost all
safety guarantees because now the
intersection of two quorums can be entirely malicious. So suppose you have
fs plus 1 nodes that all say the same thing. They all say, we saw a
quorum vote for the value a. Well, at this point,
you can say, well, either they’re all lying
to me, in which case they’re all malicious so
I’ve lost all safety anyway, or one of them is
telling the truth so I might as well assume
that a quorum has voted for a without actually losing safety. Now suppose you hear actually
fl plus fs plus 1 nodes all say the same thing. They all say, we saw
a quorum vote for a. Well, could be that after
saying this, fl or more than fl of those nodes die, in
which case the system– we can’t guarantee
liveness anyway. But if fl or fewer of
these nodes have failed, then that means that
there will continue to be fs plus 1 of them around. And those fs plus one,
if they’re not faulty, they’ll continue to repeat the
fact that they saw a quorum vote for a. So that will convince everybody
that a quorum voted for a. So at that point, all
the non-faulty nodes will come to accept the
fact that the system reached an a valent state. And so they’ll know that none
of them have gotten stuck. And so at that point, we can
say that system has actually agreed on the statement a. All right. So this is all good. But so far we’ve been
in this T of N system where you had to know what the
end nodes were to begin with. So now I’m going to talk about
this new model called Federated Byzantine Agreement,
which is a generalization of standard Byzantine Agreement
to a setting where you don’t have universal agreement
on who all the nodes are or how trustworthy
all the nodes are. And the key idea in
Federated Byzantine Agreement is that we’re going to pick our
quorums in a decentralized way. So each node in the system V
is going to pick one or more of what I call quorum slices. So quorum slices are sets of
nodes that include v itself. And a quorum slice
is a set of nodes that v believes is important
enough that if the whole quorum slice agrees on something,
then that thing is true. And it’s not going
to be rolled back. And you can think of
deciding to put somebody in your quorum slice. And one of your quorum slices
is vaguely analogous to in the inter-domain
routing setting of two ISPs deciding to peer. They decide there’s
some value in exchanging internet traffic or, in
this case, some value in waiting for somebody to agree
to something before considering that statement settled. So a Federated Byzantine
Agreement system is basically a set of nodes
V and a quorum function Q, where Q of V is
the set of slices that were chosen by node v. And
so now we can define a quorum. We can say that quorum
U is just a set of nodes that contains at least
one slice belonging to each one of its members. And this is like the key
definition in the whole talk. So I want to zoom in on this
with a couple of examples to make this much more concrete. So let’s start with a simple
example, just a system with four nodes. And you have quorum–
each node also, to make this even simpler, has
only a single quorum slice. And I’m representing
the quorum slices by drawing arrows from a
node to all the other nodes in its quorum slice. So in this case, you
can see v1’s quorum slice is v1, v2, v3. And the other three
nodes, v2, v3, v4 are a quorum slice
for each other. So you can see that if you
look at just nodes v2, v3, and v4, that is actually a
quorum because it contains a slice belonging to
each one of its members. If you look at the set v1, v2,
v3, that is a quorum slice, but it’s not actually a
quorum because, in a sense, v1 is saying, well, I’ll
agree to something as long as v2 and v3 agree to it. But then v2 is
saying, well, I’ll agree to something only
if v3 and v4 agree to it. So you can see v2 is
never to be able to accept any statement unless v4
agrees to it because v2 is not going to agree to it unless
v4 is somehow involved here. So in fact, the smallest
quorum that contains v1 is a set of all nodes in
the system, v1, v2, v3, v4. So this is a simple example. Let’s turn to maybe a more
realistic example, which would be a tiered quorum slice. So maybe there’s a top tier
of very important nodes. And they only know
about each other, but they’re
configured in a three out of four setting, where
each, v1, v2, v3, v4, believes that three
of those nodes constitutes a quorum slice. Then you have
maybe a middle tier that depends not on each
other, but on the top tier. So v5 here is saying, well, I’ll
consider a quorum slice myself plus two of the top tier notes. Then maybe you have
a leaf here that also depends on the middle tier. So this is a lot
like the internet where we have tier one ISPs. But the nice thing
about the internet is nobody had to magically
designate these tier one ISPs. The tier one status arose
out of market forces. And you don’t even need
universal agreement over who’s a tier one ISP. Occasionally you have, is
Cogent the tier one ISP or not? And level three and Cogent
will play chicken or something. And then eventually
market forces prevail. And they have to reach
some kind of agreement. So in this case, let’s say that
the market decides that these are top four banks in the US. T These are really the top
tier of financial institutions. So nothing happens
unless two out of these– or three out of
these four banks agrees to it. But the nice thing is
you don’t need to have unanimous agreement on that. So let’s see v7 and v8
are extremely paranoid. And they say, you know what? I realize I can’t through
life without interacting with these big banks. But I don’t trust them at all. I think they’re going to try
to rip me off the minute they have the opportunity to do so. Well, no problem. You can have some other
people come along. Let’s say we have a couple
of nonprofits here, Stellar, the EFF, or the ACLU. They can also participate
in this consensus protocol. And maybe v7 and
v8 say, not only am I going to wait for
two of the big banks, I also want one of
these nonprofits to sign off on a transaction
before I consider it settled. And the nonprofits
are going to depend on two out of
three of each other in addition to depending
on the main banks. So now let’s look at an example. Suppose Citibank decides
to pay a $1 billion to v7. And v7 hands over whatever,
$1 billion worth of goods. So why would v7 do this? Well, v7 would do this because
here’s two out of the four main banks doing this. And also because it hears
one of the nonprofits, let’s say, Stellar. But Stellar is not going to
agree to something unless one of the other one does. So let’s say the EFF had
to sign off on this, too. So basically, if you look at the
nodes with green check marks, that constitutes a quorum. And those nodes can agree
that this transaction happened and permanently
place it in a log. So now let’s see that
v7’s paranoia was actually justified. So these big three
banks collude. They’re evil. And they try to
reverse this $1 billion transaction after they’ve
walked away with $1 billion worth of goods. And now what they want to
do is they want to go to v8 and give v8 $1 billion and
also walk away with $1 billion worth of v8’s goods. Well, what’s going
to happen here? v8’s going to say, OK, I
need two out of the big banks to agree. That’s OK because these evil
banks, they’ll read anything. But then he also needs
one of the nonprofits. So v8’s going, well,
Stellar and the EFF aren’t going to agree to
this because they agreed to a contradictory– to a
different transaction, which the $1 billion went
somewhere else. So that leaves the ACLU. So v8 goes to the ACLU. But the ACLU says, well,
my quorum slice also has to include one of
Stellar or the EFF. And since those organizations
won’t agree to this, the ACLU won’t agree to that. And so v8 is never going
to accept this $1 billion and is never going to turn
over $1 billion worth of goods. So that’s the model. And now the question is,
how secure can we make this? So the first big difference
between Federated Byzantine Agreement and the standard
centralized Byzantine Agreement is that in the
Federated model, failure becomes a per node thing. In the centralized case,
either the whole system fails or at least all the
non-faulty nodes agree. Now you can have some nodes
failing and others not. And so particularly you
can divide the nodes into the ill-behaved nodes
and the well-behaved nodes. The ill-behaved nodes are
going to deviate from protocol, do all kinds of nasty things. They’ve obviously failed, but
who cares because they’re not implementing the protocol. The well-behaved nodes,
unfortunately, some of them might fail, too, if there are
too many ill-behaved nodes. And two things could happen. What would be bad is
if well-behaved nodes stop being able to make
progress because they can no longer assemble a quorum
or they can no longer agree to a new transaction. That’s like all your
ATM machines shut down. What’s worse is if all the
well-behaved nodes somehow get driven into divergent states. So that’s now like each
of your ATM machines pays out the entire value
of somebody’s account. So you’ve lost many times the
value of the deposits you had, were storing for someone. And, obviously, if a
well-behaved node is not in either of these
two cases, then we’ll say that it is correct. So let’s say that we
want to guarantee safety. Well, what’s a
necessary property for guaranteeing safety? Suppose your system
is configured such that you have two quorums
that are entirely disjoint. Well, at that point,
obviously, the guys on the left can vote for something. The guys on the right
can vote for something completely different. In fact, both sides
of the diagram can make progress without
even exchanging any messages across the two halves
of the diagram. So clearly, there’s no way
to guarantee safety here. So like traditional consensus,
in order to guarantee safety, we need what’s called
quorum intersection. Quorum intersection means that
any two quorums are actually going to share at
least one node. So in this case, you say,
OK, well, let’s fix this. Let’s throw in some node v7. And let’s make
everybody depend on v7. And great, now all are
quorums intersect at v7. But what happens if v7 fails? What happens if all
your quorums intersect at in the intersection
of two quorums because it’s entirely
of malicious nodes? Well, remember these
Byzantine malicious nodes, they can give inconsistent
messages to different people. So v7, it can say one
thing to the nodes on the left, something else
to the nodes on the right. And again, it can drive the
system into divergent states. So in fact, the property
that’s necessary for safety in the presence of
Byzantine failures is that we need
quorum intersection despite the behavior of
any ill-behaved nodes. What that means is that we
have to basically delete all the ill-behaved
nodes from the system and from everybody’s
quorum slices. And we want to make
sure that after we’ve deleted all those bad
nodes that we still have quorum intersection. But in this example,
if we delete v7, it just reverts to the
diagram on the previous page. And, obviously, there’s no
way to guarantee safety. So what about liveliness? So suppose you have some
three out of four system, and two out of the
four nodes fail. So in this case, v1, it
includes two quorum places that has v1, v2, v3–
actually three slices, v1, v2, v3– v1, v3, v4– v1, v2 v4–
Every single one of his quorum slices actually
contains a failed node. So if there’s a
set of nodes that intersect every single one
of some node’s v slices, we’ll say that– we’ll
call that a v-blocking set. So in this case, we can say
that the set of fail nodes is v1 blocking because it denies
v1 the ability to have a quorum slice of non-faulty nodes. So you can see that
failed nodes can’t be v-blocking for any correct
node v because if they were, then v could never make progress
because the failed nodes could just crash and never
output a message. And the view would never
be able to hear unanimously from one of its quorums
because each of its quorums would include one of
its quorum slices. And that quorum slice would
include a failed node. And it turns out saying that the
field nodes aren’t v-blocking for any correct node. That’s equivalent to saying
that the correct nodes have to form a quorum. So the necessary property
for liveness in the system is that the correct nodes
actually have to form a quorum. So to summarize,
let’s say you have a set U of well-behaved nodes in
a Federated Byzantine Agreement system. And let’s say that U bar are
the nodes that are not in U. And we don’t know. Those nodes might actually
be able ill-behaved. We can’t assume otherwise. So in order to guarantee
safety for you, we need to know that U enjoys
quorum intersection despite U bar, that even if you delete
U bar from everybody’s slices, you’ll still have
quorum intersection. And if you want to guarantee
both safety and liveness, then you need not only quorum
intersection despite U bar, but you also need U
itself to be a quorum, meaning U bar’s not v-blocking
for any of the nodes in U. And if both of these
properties hold, then we’ll say that
the– for set U– then we’ll say that the nodes
in set U are actually intact. And, in fact, you can show
that if your system has quorum intersection, then
there’s actually one maximal set
of intact nodes U. So that’s convenient
because it means we can just talk about the
intact nodes in the system. So now the question is,
can we actually achieve this kind of optimal failure
resistance in a Federated Byzantine Agreement system? So what I’ll talk about
now is the protocol that’s at the core
of the Stellar system that actually satisfies this
optimal failure resilience. So the Stellar
Consensus Protocol, it’s obviously the first
general FBA protocol. And it guarantees safety
for well-behaved nodes that enjoy quorum intersection
despite the ill-behaved nodes. So what that means is
if you use my protocol, and then you end up
in a situation where your nodes diverge
and you lost safety, then you can at least
take comfort in the fact that no other protocol
could have done any better, could have guaranteed you
safety under the circumstance. You just had too many failures. Or your quorum slices were too
small or were poorly chosen. Also, SCP is going to guarantee
that intact nodes never get stuck. So you can always end
up making progress. And the core idea
in the protocol is this technique
Federated voting. So the idea is that
nodes are going to exchange vote messages,
but piggybacked on the votes are going to be a specification
of every node’s quorum slices so that as you’re
assembling votes, you learn people’s
quorum slices. And out of that, you
can discover quorums and decide when a quorum has
actually voted for something. So as before, Federated
voting, nodes v are issuing these vote messages. And if a node votes for
some statement a, of course, that has to be consistent with
all of node v’s past votes and any statements
that v has accepted. And if you reach a point
where there’s a whole quorum U in which every node has
voted for the same statement, then we say that the quorum
has ratified the statement. So ratifying a statement means
an entire quorum voting for it. And you can show, and
it’s not very surprising, that if the well-behaved
nodes in the system enjoy a quorum intersection
despite the ill-behaved nodes, that they won’t ratify
contradictory statements. Makes sense intuitively
because when you delete all the
ill-behaved nodes that are issuing multiple votes, you
still have quorum intersection. Well, the good nodes at the
intersection of any two quorums are not going to
change their vote. So you’re not going to
have two quorums voting for contradictory statements. So that’s all well and good. We still have two problems,
though, which is you could have an
intact node v that’s unable to ratify
some statement a, even after other nodes have
ratified the statement a. So some nodes might get
stuck while other nodes think the system’s agreed
on something. Or you could have, yeah, and
so that could happen either because v voted against
a value, and it’s not allowed to change its vote. Or because some node failed
after voting and v didn’t hear the votes. So effectively, the
outcome of federated voting looks like this. Now this should be
a familiar diagram. This is exactly the
same set of outputs that we could have in
the centralized case. And so the first
thing you might think is, well, maybe we could
apply the same reasoning as in centralized voting. So if you remember, in
centralized voting, we said, well, let’s say you have the
intersection of two quorums. And that intersection
unanimously says, hey, we saw a quorum vote for a. Well, then, maybe
other people could decide that a quorum
was voting for a even if they themselves
didn’t vote for a, or if they didn’t
hear the votes. So does that work here? Well, unfortunately not. So what’s the problem? The problem is before
we had this premise that either the whole
system failed or it was OK and all the
non-faulty nodes were going to have safety, right? And now, failure is
this per-node property. It’s not a system-wide property. And so, in particular,
if you look at node vn minus 1 over on the
right-hand side here, you might say, oh, well, the
intersection of my quorum and quorum A is unanimously
telling me something. But I don’t really care about
quorum A. For all I know, quorum A on the left is some
weird [? civil ?] attack that was constructed just to try to
get me to accept some statement that I shouldn’t accept. So, tough. I can’t conclude
anything from the fact that these people are telling
me a quorum voted for a. In fact, in a federated system,
you only care about the quorums that you belong to yourself. And so what that means
that the only way to know that the system has
reached an a valent state is to ratify the
statement a firsthand. You can’t believe it just
because somebody else tells you that. So what are we going
to do about this? Well, here’s an idea. Suppose that there’s a
v-blocking set that unanimously tells node v, hey, we saw–
we’ve accepted the statement a. So either that whole set
is lying, in which case node v is no longer intact. So we can’t guarantee
it’s correct. Or, in fact, the system did
reach some a valent state, and these guys correctly
accepted the statement a. So based on [INAUDIBLE]
define this notion of accepting something. We’ll say that a node
accepts the statement a that’s consistent
with v’s history as long as one of
two things holds. Either there’s a
quorum that includes v, in which each member either
voted for a or accepted a. So that’s includes the
case where v is firsthand ratifying the statement a. Or there’s some
v-blocking set that unanimously claims that they
have accepted the statement a. So the statement
number two here is what allows a node to accept
a statement even after it’s voted against that statement. And, in fact, you can show. I show in the white paper
that corresponds to this talk that intact nodes
will never accept contradictory statements. So that sounds great except
we still have two problems. First of all, there is the
standard liveness and stuck problem. There’s still no guarantee
that because one intact node accepted something, all the
intact nodes will accept it. And second of all,
there’s this problem that we’ve actually
weakened safety by adding the second
property here. And you could end
up in a situation where you have
non-intact nodes but that ought to enjoy safety even
if they don’t enjoy liveness. And now we have no
way of guaranteeing that they won’t accept
contradictory statements. One way to look at
this, by analogy with the centralized system,
is that what we really wanted was a set of size fs plus
1 to tell us something. And what we have, a v-blocking
set, is more like fl plus 1 instead of fs plus 1. So how do we fix this? Well, the solution is
actually to hold a second vote on the fact that the
first vote succeeded. So we’ll say that a quorum
confirms some statement a when that quorum ratifies
the statement we accepted a. And so a node that’s
part of that quorum has confirmed the statement. So this immediately
solves problem two, the suboptimal safety problem,
because now in confirmation, it is a firsthand ratification. There’s no weird
second criterion here that involves some
v-blocking set. You need a quorum to
confirm something. Turns out this also
solves the first problem where you have intact nodes that
are unable to accept something because if you think about
it, you can have intact nodes that vote against a statement
that’s later accepted. But they won’t vote
against the fact that the state was accepted. The fact that the
state was accepted is actually irrefutable. And if you remember,
that was one of the two kinds of statements
that it’s OK to vote on, is irrefutable statements. And, in fact, I show that once
you have a single intact node in your system that’s
confirmed a statement, you’re guaranteed that all
intact nodes will eventually confirm that statement
once enough messages have been delivered. So here’s summarizing this
Federated voting process. You start in some
uncommitted state. You have some statement a
that you think is valid. You might vote for a. And then if you see a quorum
vote for a, you’ll accept a. You’ll hold another vote. And then you’ll confirm a. Or it could be that
you voted against a. That’s OK. You might see a
v-blocking set say that they’ve all accepted a. And so at that point,
you’ll accept a anyway. And then you’ll again
vote to confirm it and get to the right-hand side. And once you reach
the confirm stage, you know that all the other
intact nodes, if you’re intact, will also confirm a. And so you can rely on a
for both purposes of safety and liveness. So how do you get from
voting to consensus? Well, it turns out to be
exactly the same thing as Paxos. So you’re going to vote to
commit and abort ballots. We’re going to have the same
invariant as before, namely all the committed and
stuck ballots are going to have to have the same x. And you’re going to have to
prepare a ballot before you vote to commit it. There’s just, for convenience,
slightly different definition of prepare where we just totally
order all the ballots rather than the way [INAUDIBLE] Paxos. But it effectively works
out to be the same thing. Now there’s one
interesting twist, which is how do you
know what values to vote for in the first place? So far this has all
been very abstract. And what you’d
like is, actually, for the system to
converge on some value. And then everybody
votes for the same value with reasonable likelihood. And then you can just
finish your protocol quickly versus everybody
trying to nominate different incompatible ballots
and you never make progress. So the other thing,
the other component of this protocol that I
don’t have time to get into is a decentralized
nomination protocol that basically lets you
select a set of values that you can then combine
deterministically so everybody ends up with the same value. And the key insight in
the nomination protocol is the fact that you can
vote to nominate a value, but you can’t vote against
nominating a value. So nominating values is
essentially irrefutable. And that’s what
allows the system to converge on some set
of nominated values. So this protocol SCP has a
bunch of interesting properties. First of all, it’s
decentralized. And then it can actually
realistically accommodate this internet-level consensus. It’s also low latency, by which
I don’t mean microseconds, but, I mean, compared
to the amount of time you wait for
credit card swipe, yeah, we can do this in five
seconds, no problem. It’s got flexible trust. You can have a small nonprofit,
like Stellar or the EFF, helping to keep the big banks
honest, even though they have many fewer resources. And it just depends on digital
signatures and hash functions. So you’ve got a standard
asymptotic security. You tell me how powerful
your adversary is, and I’ll tell you whether to
choose a 256- or a 512-bit has function, how big your
signature keys should be, et cetera. And basically you can have
effectively zero chance of an adversary breaking it. Now, obviously, there’s
a lot of related work. Standard Byzantine Agreement
provides the latter three properties, but it’s
not decentralized. That’s why it didn’t
solve this problem. What did solve this
is proof of work. And we’ve seen how powerful that
is because of how much bitcoin has taken off. But bitcoin doesn’t have
the other three properties. In bitcoin, you can’t decide
you want to trust the EFF. You have to trust whoever
has the most hashing power. And then there’s
also proof of stake, which is like a cross
between proof of work and Byzantine Agreement. But it’s still,
there you’re trusting the people who have the most
money, which is not necessarily the ACLU or whatever. So it’s important to point
out that consensus is not the same thing as
a crypto currency. So SCP doesn’t offer
some rate-limited way to mint new coins. It doesn’t provide intrinsic
incentives for good behavior. We expect people are
participating in our system because they’re financial
institutions that see value in having this global
financial network. But they don’t
get paid to do it. They get paid because they’re
making money as banks. It also doesn’t tell
you whom to trust. So you could obviously configure
you quorum slices in a bad way and end up with an
insecure system. But that said, I think that
SCP has a bunch of applications beyond financial networks. And one of them that I want to
point out is CA accountability. So one problem is
that all our browsers, they ship with dozens and dozens
of root CAs, any one of which can sign certificates. True story, Turktrust went
and signed a bogus certificate for Google.com allowing
man-in-the-middle attacks. So people are trying to
address this through efforts like certificate transparency. But certificate
transparency requires you to trust a set
of logging agents who are going to output these
logs of certificates. And so you can actually increase
confidence in the system by having this be
more like a global log that the entire
internet signs off on. And that would make it
virtually impossible to roll back a certificate
that had been issued. So SCP could basically
increase confidence in the auditing process. Now Stellar is this
financial network that I’ve been mentioning. And, in fact, it is a
system that we’ve deployed. Our initial focus is in Nigeria
because the banking system there is a big pain point. There are many fewer people
with bank accounts there, and financial services
are much less smooth. A few months ago, we
integrated with Oradian, who’s a vendor of software to
micro-finance institutions. And so that provided
access to about 200 or 300 microfinance institutions
who can now send money over Stellar, or will
be able to soon as we get regulatory
approval for that. And finally, since
I know this is also going to be on YouTube where
some non-Googlers see it, I want to mention
that we’re hiring. So if you like the
idea of working on cutting edge
distributed systems, please contact us at Stellar. We’ve got some cool– everything
we’re doing is open source. And it’s pretty
interesting stuff. [APPLAUSE] FEMALE SPEAKER: Thank you
so much for joining us today and taking time out
of your busy schedule. I think we might have
just a few minutes. Does anybody in the
audience have any questions? AUDIENCE: Am I to
understand correctly that the slices are immutable? And if not, how do
they get reconfigured? DAVID MAZIERES: So
that’s a good question. So in traditional Byzantine
Agreement systems, it turns out the
reconfiguration is a big pain. And sometimes you have to feed
it back to the system itself. Turns out that that’s the one
area that we get that for free. So you can unilaterally
change your quorum slice whenever you want. Now it’s better if you do it
from one log entry to the next. In other words, if
you’re in the middle of deciding on something,
you could change it before you move to
the next question that you’re deciding on. But you can do it midstream. It’s just in terms
of safety, it’s a little bit weaker
because you have to treat it like the union
of all the configurations that you had. A question over there? Oh, sorry. AUDIENCE: [INAUDIBLE]
said the messages themselves contain the
definition of all your slices, essentially? DAVID MAZIERES:
Yeah, so they contain a cryptographic hash of it. And you have a big cache. And then if you don’t have– if
that hash is not in your cache, then you just– there’s a
separate side protocol where you say, please give me the
pre-image of this hash value. AUDIENCE: I see. And you guys– who runs
the service that provides these pre-images, basically? DAVID MAZIERES: Oh, each node. So if I hear a message from
you, and it includes a hash, and I don’t have that hash,
I just ask you for it. So I mean, effectively,
it’s just an optimization. Conceptually, it
includes the whole thing. It’s just it would
be of more bandwidth than we want to consume. AUDIENCE: Sort of
like public keys and [INAUDIBLE] in a sense. DAVID MAZIERES: Maybe. AUDIENCE: Thanks. This is a bit open
ended, but if you’ve given this talk to a
group of graph theorists, I was curious if you run
across some good analogies or other ways of describing
[INAUDIBLE] problems, about creating
[INAUDIBLE] or colorings or paths between subsets. Certainly following
along your explanation has been very helpful. DAVID MAZIERES: No, I haven’t. And it would be interesting. I mean, one thing is we have–
the stellar core daemon, there’s a command
line switch that lets you look at the health
of your quorum slices. And it can give you–
and it spits out a few examples of
failures that might undermine the system, just
to kind of give you a sense. But that mode is
actually– I think it’s like exponential time. So maybe there’s actually
some better way of doing this. So it could be interesting
future work there. FEMALE SPEAKER: [INAUDIBLE]
concludes the talk for today. As mentioned earlier, it’s going
to be on the YouTube channel. So feel free to check it out
and share with your friends and coworkers. All right. Great. Thank you. [APPLAUSE]

Leave a Reply

Your email address will not be published. Required fields are marked *