Efficient MDP Analysis for Selfish-Mining in Blockchains

by   Roi Bar Zur, et al.

A proof of work (PoW) blockchain protocol distributes rewards to its participants, called miners, according to their share of the total computational power. Sufficiently large miners can perform selfish mining - deviate from the protocol to gain more than their fair share. Such systems are thus secure if all miners are smaller than a threshold size so their best response is following the protocol. To find the threshold, one has to identify the optimal strategy for miners of different sizes, i.e., solve a Markov Decision Process (MDP). However, because of the PoW difficulty adjustment mechanism, the miners' utility is a non-linear ratio function. We therefore call this an Average Reward Ratio (ARR) MDP. Sapirshtein et al. were the first to solve ARR MDPs by solving a series of standard MDPs that converge to the ARR MDP solution. In this work, we present a novel technique for solving an ARR MDP by solving a single standard MDP. The crux of our approach is to augment the MDP such that it terminates randomly, within an expected number of rounds. We call this Probabilistic Termination Optimization (PTO), and the technique applies to any MDP whose utility is a ratio function. We bound the approximation error of PTO - it is inversely proportional to the expected number of rounds before termination, a parameter that we control. Empirically, PTO's complexity is an order of magnitude lower than the state of the art. PTO can be easily applied to different blockchains. We use it to tighten the bound on the threshold for selfish mining in Ethereum.


page 1

page 2

page 3

page 4


When Blockchain Meets AI: Optimal Mining Strategy Achieved By Machine Learning

This work applies reinforcement learning (RL) from the AI machine learni...

Generic Selfish Mining MDP for DAG Protocols

Selfish Mining is strategic rule-breaking to maximize rewards in proof-o...

Anytime State-Based Solution Methods for Decision Processes with non-Markovian Rewards

A popular approach to solving a decision process with non-Markovian rewa...

Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach

Dynamic mechanism design studies how mechanism designers should allocate...

Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization

Learning Markov decision processes (MDP) in an adversarial environment h...

Under-Approximating Expected Total Rewards in POMDPs

We consider the problem: is the optimal expected total reward to reach a...

Polynomial Linear System Solving with Random Errors: new bounds and early termination technique

This paper deals with the polynomial linear system solving with errors (...

Please sign up or login with your details

Forgot password? Click here to reset