Saturday, May 28, 2011

The Bitcoin protocol - a highlevel explanation

Bitcoin is a peer to peer financial protocol. There is no central authority that guarantees that the operations are valid, how can this work? How can you trust strangers with your money? Let's say you are selling something for bitcoins - you receive the amount quoted - how can you verify that the transaction was valid? You need to check three things:

  1. That it was indeed the owner of the sender account that created the transaction.
  2. That the acount has the BTCs that are now being sent - i.e. that it previously received them.
  3. That he has not transfered these BTCs already somewhere else (double spending).

The first check is easy - all transactions are signed. For the second you need to know another transaction that transfers the money to the senders account. Of course now you'll need to verify the other transaction as well - and then, in a recursive fashion, all the others leading to the one that generated the bitcoins (more about generating later). The sender could send you this chain of signed transactions - but in fact he does not need to do that because all the transactions are public.

The third check is more tricky - for this you need to always know which transaction was first and which one is second and then possibly reject the second one if it is double spending. This would be easy with a central authority that would timestamp all transactions and log them somehow - but it is very hard to do in a p2p system. In bitcoin there is also a global log of all transactions - it is called block chain (for it's technical representation). The trick is that to add a valid record to that log you need to solve a hard computational problem. You practically have no chances to do that singlehandedly and you need the other nodes in the network to help you. The sender broadcast his will to transfer money to the whole network and then wait until someone from this network solves the problem and correctly saves the transaction to the global log. This is like going to the notary to make sure that all formalities are met. After the transaction is saved to the log it is very hard to undo that - because you'd need to solve this hard problem again - and without a conspiracy between a significant part of the peers this would be impossible.

To be more precise transactions are not logged individually - but rather in batches called blocks. Such a set of transactions is written to the log in one sweep. This is important for efficiency - but not only.

All of this happens in a distributed, asynchronous manner. This means that it is possible that two nodes will write two new blocks to the transaction log in parallel and we'll get two different and valid versions of the log. What would happen then is that when a next block is written down to one version of the log it will take over all of the transactions from the block in the alternative log version, so they are not lost. But it is possible that the alternative logs contain two conflicting transactions (double spending) - then only one of them will be eventually saved. This showcases one important thing - having the transaction written to the log does not yet guarantee that it will stay there, because it can happen that there exists an alternative, valid log version and that eventually this other one will be the one that is continued by the system. But if you wait until another block is written to the log after the one that you are checking - then you are much more safe. This is the most complex part of the protocol - but the general rule is that the more blocks are written to the log after your - the more safe you are, and the safety grows exponentially.

Now - why so many nodes in the network would spend their time trying to save a new transaction block to the log? This is because they are paid if they succeed. In each new block saved to the log there can be one transaction that generates a specific amount of BTCs (now it is 50 - but it will be less in the future). This is they way new bitcoins are generated.

Another question is how the bitcoin project can maintain that the transactions are anonymous when they are all public? The answer is that what is public is the addresses of the sender and the receiver but not who owns these addresses. It is easy to create new addresses. You can use a new one for every incoming transaction and you can also make transfers between your own accounts to hide the tracks. Maybe it is not total privacy - but it is still better then with current money transfers where the banks know everything.

This introduction leaves out lots of technical details - but I hope that it explains the protocol enough to convince readers that it can work :) For more details have a look at the original bitcoin paper and the bitcoin wiki (just remember that what I call transaction log here is called block chain there).

By the way - I've heard bitcoin makes internet tipping in fashion again - so 113uhu2LrJp8gXGDrDLLDqmFGxnC2e3suB is one of my addresses. The interesting twist to that is that the donations are public - you can view the current ballance at blockexplorer. I've already got some tips - thanks :) Consider this an experiment in micropayments for blog financing.

Update: A critical analysis of the decentralization of the bitcoin protocol: http://www.links.org/?p=1164

Sunday, May 15, 2011

Synthetic attributes

chromatic writes about synthetic attributes. I wonder why this is not more widely used - it's such a simple technique. I've been using it since I discovered Moose and all the time in the back of my head I had this thought: Why people don't use this more widely? It is so obvious solution to the testing problems. Maybe they know something that I don't? But maybe it is just the matter of some guru (like chromatic :) writing about this technique?

PS. Sorry I'll not explain what synthetic attributes mean - this post is only a comment. I cannot currently comment at chromatic's blog.