VAT tax gap prediction: a 2-steps Gradient Boosting approach
Tax evasion is the illegal non-payment of taxes by individuals, corporations, and trusts. It results in a loss of state revenue that can undermine the effectiveness of government policies. One measure of tax evasion is the so-called tax gap: the difference between the income that should be reported to the tax authorities and the amount actually reported. However, economists lack a robust method for estimating the tax gap through a bottom-up approach based on fiscal audits. This is difficult because the declared tax base is available on the whole population but the income reported to the tax authorities is generally available only on a small, non-random sample of audited units. This induces a selection bias which invalidates standard statistical methods. Here, we use machine learning based on a 2-steps Gradient Boosting model, to correct for the selection bias without requiring any strong assumption on the distribution. We use our method to estimate the Italian VAT Gap related to individual firms based on information gathered from administrative sources. Our algorithm estimates the potential VAT turnover of Italian individual firms for the fiscal year 2011 and suggests that the tax gap is about 30 potential tax base. Comparisons with other methods show our technique offers a significant improvement in predictive performance.
READ FULL TEXT