How do hash functions ensure transaction integrity

Blockchain: Basics of Blockchains: Getting started with transaction hash chains

  • 18 minutes to read

August 2018

Volume 33, number 8

By Jonathan Waldman | August 2018

In the first article in this series (msdn.com/magazine/mt845650), I have presented common examples illustrating the fundamentals with basic concepts needed to fully understand modern blockchains. In this article I will now explain in more detail some of the topics from the previous article hashed transaction, the role of the transaction pool, and how a longest blockchain can ever do this. This article is best read as a supplement to the previous article and provides introductory information for developers who are new to blockchain technologies.

As an aside, while the articles in this series that focus on the Bitcoin blockchain, I do not recommend introducing any particular blockchain products or technologies. My goal is to examine the foundation on which popular blockchain technologies are based, and to equip you with the knowledge you can apply, you need to decide whether to use existing blockchains, or develop your own. As you examine blockchains, you will soon find that implementation details differ significantly from one another. If you want to specialize in a particular blockchain implementation, you need to keep up with the fixes and updates to ensure the knowledge. But I noted that with the dynamic nature of these frequently new technologies, available books, videos, blogs, forums, and other documentation resources fall behind, sometimes it is necessary to consult the latest source code provided as a particular blockchain implementation's definitive reference.

The transaction hash chain Revisited

My previous article explained the transaction hash chain data structure that tracks digital ownership. In this article, I'll dig a little deeper into how this hash chain works.

To pay homage to Blockchain roots, I'll start by focusing on Satoshi Nakamoto's seminal whitepaper on Bitcoin (bitcoin.org/bitcoin.pdf) published October 31, 2008 - January 3, 2009 months before Bitcoin was launched Bitcoin implementation details since something changed and remains a useful reference in the whitepaper, especially in the diagram on p. 2, which expresses the original transaction hash chain design concept.

The purpose of the diagram is to convey how a transaction hash chain is created and digital signatures authorize the transfer of ownership sequence. However, it is desperately abstracted and therefore a little confusing. To add clarity, I've created a more detailed version that shows how the current transaction hash strings actually work (see Illustration 1).


Figure 1 is a newer version of Satoshi Nakamoto's original transaction hash chain diagram

Changed diagram shows three transactions (0-based, like the original document): Transaction0 for Alice, transaction1 for bob and transaction2 for Charlie. The first transaction makes Alice the original owner of the digital asset; the second transaction transfers ownership to Bob; and the third transaction transfers ownership to Charlie. Each transaction consists of these fields (shown with a solid color outline): transaction hash, digital asset ID, optional data, public key and signature. Other fields used, but not stored, in the transaction (shown with a dashed line): private key and the new hash of the transaction. The chart expresses field values ​​as indexed mixed-case names - e. B. the transaction hash value for transaction0 is TransactionHash0 and the value of the public key for the transaction2 PublicKey2.

illustration 1 is a simplified transaction hash chain, as it only tracks a single digital resource (DigitalAssetID0) when this ownership is changed (in contrast, cryptocurrency transaction hash strings usually require multiple digital inputs and outputs). Do not confuse the transaction hash chain with the blockchain component, which aggregates checks transactions in blocks. The transaction hash chain is not conclusively stored, usually represented as a single linked list data structure. Instead, they can be created (quickly using that of indexes) from transactional data stored on the blockchain.

As I described in my previous article, the sequence of transactions is preserved because each of the new owner's transactions contains a hash value that is linked back to the previous owner's transaction. In Illustration 1, Back links are formed when the transaction hash of the previous transaction is stored in the current transaction. For example, Bob's transaction contains a transaction hash field that contains Alice's TransactionHash0 % Value; accordingly, Charlie's transaction for a transaction hash field that contains Bob's TransactionHash1 Value and so on.

Back links are just one of several data integrity components of the transaction hash chain. The chain also enforces ownership transfer authorization. To follow an example, imagine that Alice is a purveyor of the world's finest wines and wants to keep a ledger that keeps track of every bottle he owns. One day Alice gets to her wine cellar and decides that he will be registered on her company's blockchain as the original owner of each bottle of wine, seeding effectively chain hashes for each of her beloved bottles of wine. To begin, she occasionally picks up a bottle of Cheval Blanc 1947 St.-Emilion and marks it with a QR code that contains a unique ID. He then scans you click the QR label in their blockchain client software that runs as a node on the network. The software translates the checked code into a digital object ID (DigitalAssetID0) then adds optional data (OptionalData0) together with your public key (public key0). See can Illustration 1, these fields are in their own contour rectangle that represents an unsigned transaction. Each transaction also contains a transaction hash backlink and signature, since this is the first transaction in the hash chain, but these fields are blank (represented by the shaded fields for transaction0).

A unique transaction hash value is displayed on each transaction that the client software calculates by SHA-256 hashing all of the transaction fields (transaction hash, digital asset ID, optional data, owner's public key, and signature) together. In this case, it is this transaction hash value that the next transaction backward link DigitalAssetID will use0.

When Bob, Alice's "Manhattan" restaurant manager, wants to fetch Alice's Bottle from Cheval Blanc, he uses his client software to generate a new public / private key pair for the transaction. Bob can skip this step and have all of his digital resources under a single, previously used public key, but he takes that unnecessary risk. Instead, it generates a new keypair and offers Alice a public key it has not yet used. In this way, if he ever loses the private key of the keypair, he will lose access to only a single digital resource.

In response to Bob's request, Alice starts her client software and searches her digital resources. She selects from the transaction ID associated with the Cheval Blanc Bottle Bob would like and then initiated the transfer request by providing Bob's public key, which doubled as a kind of destination address. The node then creates a new transaction (Transaction1) with the backlink value that hash the previous transaction (TransactionHash0), the value of the digital ID (DigitalAssetID0) for the Cheval Blanc Bottle (this is the same value as the digital asset ID for transaction0), the value for any custom fields related to the transaction (OptionalData1), and the value of Bob's public key (public key1) as Bob is the owner of this transaction.

The node has so far created an unsigned new transaction.1 for Bob. The next step is to sign the transaction with Alice's private key. This is an important step: Alice currently owns the digital object in question, only he can authorize the transfer of this digital object to Bob.

Elliptic Curve Cryptography

In Illustration 1, Designations 1 and 2 indicate in which the transaction is signed and where it is, respectively checked. In the current version, the Bitcoin blockchain uses an implementation of public key encryption (PKC) called elliptic curve cryptography (ECC). ECC offers stronger cryptographic results and shorter keys than the popular RSA / Diffie-Hellman alternative. Blockchain nodes use ECC to generate asymmetric key pairs using a formula that includes randomly selected data points on a two-dimensional diagram. This scheme allows a lost public key to be regenerated from the private key (however it does not allow a private key to be regenerated from a public key which is of course not lost).

Blockchains, modeled after Bitcoin, also use ECC when it comes to digital signatures. In contrast to the simplified PKC Rivest-Shamir-RSA key algorithm examples that I presented in my previous article, Bitcoin now uses an ECDSA Elliptic Curve Digital Signature Algorithm () (specifically SHA256withECDSA) to sign transactions. This algorithm works a little differently than other signature technologies: ECDSA, you need to pass the signer's private key, along with the message, to be logged to a function that uses an ECDSA signature generation algorithm to create a signature (this step is indicated, from the 1 in Illustration 1). To be able to verify this signature later, you must pass the signer's public key, message and signature, a function that uses an ECDSA verification algorithm to generate a "true" or "false" value indicating whether the signature is is valid (this step is indicated by marker 2 in Illustration 1). Figure 2 summarizes, signing and verifying ECDSA with.


Figure 2 Elliptic Curve Digital Signature Signature Generation Algorithm (above) and Verification Algorithm (below)

When creating a PKC RSA digital signature you use the signature verified by comparing the hash values ​​as shown in my previous article. For the curious, entrepreneurial minded people that the strategy for checking the signature is not possible with ECDSA. RSA PKC is made up of a deterministic digital signature algorithm because the same signature is generated every time a specific message is signed with a specific private key. ECDSA, on the other hand, is not deterministic: every time you pass a message and a private key to the ECDSA, the function you sign must be given a different signature. To see this in action, go to bit.ly/2MCTuwI.

The example Alice is about to sign the transaction that transfers ownership of the DigitalAsset0 to Bob. The node software is transferred to its private key ("PrivateKey").0) and a message (NewTransactionHash1) of the ECDSA signature generation algorithm function and receives a signature as output (signature1). The node adds this signature value to the signature field that contains the new transaction. Finally, the node calculates the hash for transaction (TransactionHash1) Value, which is a SHA-256 hash of all transaction fields, including the signature. At this point the node has successfully created a signed transaction that can be sent to the transaction pool.

A signed transaction is considered not verified until it has been verified by a node miner. When a node miner tries to verify Bob's transaction, it uses the transaction hash backward link on the public key for the previous transaction that is conducting Alice's transaction0. Once the node has access to the previous transaction, it hands over that public key transaction (public key0) together with the new transaction hash (NewTransactionHash1) and the signature in Bob's transaction (signature1) on the verification ECDSA algorithm, which returns a value "true" or "false" indicating whether the signature is valid.

By the way, Alice's private key ("PrivateKey"0) and the new transaction hash (NewTransactionHash1) are not saved in the transaction. Private key values ​​should not be stored on a blockchain, and there is no need to store the new hash value for the transaction as it can be calculated if necessary.

Bob grabs his corkscrew and assumes he is going to be enjoying the Cheval Blanc when he receives a skype call from Charlie, who is manager of one of Alice's other restaurants. Charlie would like to offer a special bottle of wine to a new employee sommelier to welcome them. Bob regretfully agrees to transfer the Cheval Blanc to Charlie. He asks for Charlie's public key and the same process is performed again to transfer DigitalAsset0 Owned by Bob, Charlie.

There are now three transactions for DigitalAsset0- one for Alice, one for Bob and one for Charlie. Every transaction was checked and integrated into the blockchain component. After a certain number of additional blocks have been derived on the block that contains a certain transaction confirms that the transaction is being viewed (this "certain number" is implementation specific). Therefore, the official owner of a particular digital object is always the person who holds the private key for the most recently confirmed transaction for the transaction hash chain of that digital object.

The need for consensus classification

As you have seen, a transaction hash chain is a data structure that aims to enforce ownership of a digital object. But remember that these transactions are stored on a distributed, decentralized, asynchronous, public network that is vulnerable to attack and unavailable to nodes that strictly adhere to blockchain protocol rules (called "risk takers"). The result is that invalid actor nodes could check transactions that were not actually valid or that could undermine the network's integrity of the blockchain.

The transaction pool To avoid these transaction integrity issues, all transactions go through a review and confirmation process. Each transaction is created by a single node on the network. For example, let's say Alice is in Albuquerque and Bob is in Boston. When Alice transfers ownership of her digital asset to Bob, transaction1 will be created by a node in Albuquerque and broadcast to other nodes on the network. Other nodes are actively sending the transactions they just created at the same time. Broadcasts are distributed to other nodes in a global network, and it takes some time for these transactions to be propagated due to network latency. Regardless of where the transaction originated on the global network, the blockchain protocol records all new transactions in a transaction pool of unverified transactions.

Proof of Work and Proof of Play In a blockchain that issues a reward for proof of work, miner nodes aggressively select transactions from the transaction pool. It model node miner to check every transaction while creating a candidate block, since a block with invalid transactions is immediately rejected by other nodes and that would mean the quota the node was naught for.

Recall from my previous article that every node has a race to find a nonce for the candidate block that it created so that a financial reward can be earned and energy accrued during the Proof of Work demonstration. As of this writing, the current financial benefit on the Bitcoin blockchain is 12.5 Bitcoin (BTC), which leads to approximately $ 100,000 USD. In some cases of financial benefit, there is a transaction fee, and sometimes there is a financial reward as well as a transaction fee.What is important to understand about Proof of Work is that nodes must expend energy, and equipment and infrastructure costs continue to be incurred to make profitable mining blocks. for a knot to be sustainable the costs must be offset according to sales.

Unsurprisingly, as soon as a miner finds a nonce it immediately broadcasts a block to every other node on the network in the hope that a block is just being added to the end of the blockchain. The Bitcoin blockchain calibrates the nonce difficulty so that new nonces are determined approximately every 10 minutes, so that a delay of a few seconds means that another miner is and possibly also a nonce and the candidate block is transmitted.

To estimate the effects of the mining losing race, consider the mining nodes that failed to find a nonce in time: the energy that was expended on it was wasted. The mining user that a nonce was not found must stop processing the current code block and start all over again by drawing circles out of the shape and checking the transactions from the transaction pool. The reason that they must stop mining as soon as they learn another, find a nonce miner, is that a candidate block has a backlink to the hash of the previous block on the blockchain. If another miner conducts the mining for a verified block that is linked to the previous block, the losing mining user must create a new block that references the hash for the newly discovered block. The losing mining users must also discard the transactions that they previously selected and choose a new group from the transaction pool, as the other nodes will be rejected new block that contains transactions already contained in a previous block.

A node must consider all costs required to support the mining devices. As the bitcoin blockchain has grown, this has led to a different type of race condition - a race for the most powerful mining devices. The more computing power a mining node can access, the more likely it can race every 10 minutes that are required to solve a nonce cryptographic puzzle.

A general criticism of the Proof of Work is that it encourages the ever-more powerful computer centers and the use of increasing power supplies. The owner of the most powerful computing devices in blockchain proof-of-work-based networks gets a competitive advantage. For example, multimillion dollar data centers now work exclusively towards mining Bitcoin. According to digiconomist.net, Bitcoin's annual energy consumption of the blockchain as of June 2018 is 71.12 TWh, with the annual energy consumption in Chile being similar (bit.ly/2vAdzdl).

Another frequently mentioned consensus algorithm is proof-of-play, which knots all of a sudden to illustrate an economic game on the network. Proof-of-play's greatest appeal is unquestionably that it is more energy efficient. In addition, it doesn't issue a cryptocurrency reward for mining a block, although it does issue transaction fees as a reward. It does not require a race to find the nonce that solves a cryptographic puzzle. Instead, the network randomly selects a node that has registered as a "counterfeiter" (analogous to Bitcoin's "miner") based on the total value and age of its cryptocurrency units. Various implementation details endeavor to ensure fairness and randomness in choosing between forgers. For example, after selecting a forger, it often doesn't take part in another round of fake ones for at least 30 days. High quality counterfeiters nodes with the oldest cryptocurrency coins actually have an edge against other counterfeiters nodes.

Proof-of-game supporters make the right that the cost of running a node is much lower: encouraging more programs to attend and a higher level of decentralization. Ironically, however, proof-of-game systems discourage the use of the cryptocurrency that blockchain is designed to transact because spending reduces the overall value of the node and reduces the likelihood of being selected as a forger.

Reflection is the point made by blockchain expert Andreas Antonopoulos: "Proof of work, is also a proof-of-game, but proof-of-game is not also proof of work." He explains that the Proof of Work offers a combination of both consensus algorithms to the point that during the mining user in a network proof-the-work-based part is not effectively selected on the number or cryptocurrency units, miner-node-based Show the economic-resources in the network by means of funds of the energy needs that are required to attend. Therefore, he argues, the "integration component" is the scheme of the proof of work, the cost of a node is ready in an effort to successfully mine a block of electricity (see Antonopoulos lectures at a Silicon Valley Bitcoin meetup on May 13th) September 2016: bit.ly/2MDfkA1).

Longest chain the blockchain network is constantly expanding, branching and cleaning itself. The complete view of the blockchain component is called the block structure. Each node miner actively conducts the mining for the block, the longest chain of the structure the block is terminated. You can imagine that the longest chain is defined by the chain with the greatest number of blocks, but it's how the genesisblock defines the blocks that generate the greatest amount of work. You can derive all of the work by adding up the "problems" of each block - a measure of how likely it is to find a nonce for a candidate block. The network protocol manages this value, which adjusts the Bitcoin blockchain every 2,016 blocks so that blocks of processing take about 10 minutes, time for data mining. The difficulty value is stored in each block so that nodes can work trying to identify the longest chain.

In some cases it is inevitable that two nodes A and B, Proof of Work are demonstrated by the mining of a new block deviating from each other within seconds or even milliseconds. As each node adds a new block to the end of what it appears as the longest chain before sending the block to the network, a branch (branch) will appear in the structure of the block. Depending on where these nodes are located, and with the bandwidth of the connected nodes on the network and other aspects of latency, a portion of the network block A will appear first as the new block and will be added to the end of the chain. The portion of the network will see block B as the new block and will be added to the end of the chain. This causes some nodes to be block one by one, and others with block B as the final block (see Figure 3).


Figure b-3 (above) a block with a block-structure branch and two chains of equal length (below); A block structure shows a block structure branch and a longest chain

When branching occurs as shown above Figure 3, two chains are on the block structure - are of equal length, and both are valid. The problem, if this puts forward, is clear when you consider that the longest chain mining nodes are sought before they start mining, as they need the hash for that chain's final block.

When a miner successfully completes the mining for Block-C and On-Chain One, it will add Block C at the end of the chain, which is called the final Block-Block One (see the lower block structure in the Figure 3). As soon as it performs this function, it transmits Block C to the network and other nodes will find that the chain is the longest chain. Nodes on chain B that see chain one is longer than chain B and block their current one so that they can begin to develop a new block, block C extends to chain A. Mining interrupted than in this case the network releases all transactions in block B to the transaction pool so that they can be taken into a new test round of mining.

You may be wondering what happens to that of the miner function that Block B: created by Bitcoin The transaction commissions and bonuses for Block are never issued. The Bitcoin network are not assigned to miners of these bonuses until 100 blocks have been successfully derived on the block in question.

In this article, I have explored in detail some of the topics introduced in my previous article. The two articles cover most of the basic concepts that you really need to understand in order to how blockchains work. Read from both, and you should understand blockchain decentralized, distributed architecture; SHA-256 hash values; Basics of the PKC and ECDSA; how to create node transactions on hash strings and how to create digital signatures authorizing the transfer of ownership of digital; like transactions in the transaction pool selection and verification before the first confirmed in a block "await"; how special nodes use a certain consensus algorithm (e.g. "mining users" using proof of work or "forgers" using proof-of-play) a blocks; generate and how to add nodes in the network blocks to generate the longest chain. If you want to dig deeper into blockchains, I recommend the books and videos available at Safari Books Online (safaribooksonline.com) and published by Andreas Antonopoulos (antonopoulos.com).


Jonathan Waldmanis a Microsoft Certified Professional Software Engineer, a solution architect with in-depth technical knowledge of a wide variety of industries and a specialist in software ergonomics. Waldman is a member of the Pluralsight technical team and currently leads software development projects in the public and private sector software development projects. He is at [email protected] and follow on Twitter: @jwpulse.

Thanks to the following Microsoft technical expert for reviewing this article: James McCaffrey


Discuss this article on the MSDN Magazine forum