Journal of Tax Research

fa توسعه مدل شناسایی مؤدیان کم‌اظهار مالیات بر ارزش افزوده با رویکردهای داده‌کاوی Developing a Model to Identify the Unreal Returns on Value Added Tax Using Data Mining Approach اقتصادی Economic پژوهشي Research عدم ارائه اظهارنامه‌های مالیاتی دقیق توسط مؤدیان مالیات بر ارزش افزوده از مشکلات سازمان‌ مالیاتی کشور است. تعداد زیاد اظهارنامه‌ها، محدودیت منابع و مقرون به صرفه‌نبودن بررسی تمامی آن‌ها، توسعه روشی هوشمند جهت شناسایی مؤدیان با ریسک بالا در کم‌اظهاری مالیات را ضروری می‌نماید. در این مقاله، بر اساس نظرات ممیزین مالیاتی، داده‌های هجده متغیر بالقوه مؤثر بر شناسایی کم‌اظهاری مالیات بر ارزش افزوده در یکی از مناطق تهران به همراه نتایج ممیزی آن‌ها جمع‌آوری شده است. روش‌های فیتلری و روش الگوریتم ژنتیک تعداد متغیرهای مؤثر را به ترتیب ده و هفت متغیر شناسایی کرده‌اند. دو روش پایه رده‌بندی «درخت تصمیم» و «k‌ نزدیک‌ترین همسایگی» بر‌اساس دو نوع متغیرهای مؤثر (روش‌های فیلتری و الگوریتم ژنتیک) برای شناسایی کم‌اظهاری توسعه داده شده و برای توازن داده‌ها دو روش‌ جمعی «بگینگ» و «بوستینگ» استفاده شده است. بررسی دقت پیش‌بینی در دوازده مدل پیش‌بینی (درخت تصمیم و K نزدیکترین همسایگی با دو گروه متغیر مستقل و در سه حالت عادی، «بگینگ» و «بوستینگ») نشان می‌دهد، روش‌های جمعی «بگینگ» و «بوستینگ» تأثیری بر پیش‌بینی ندارند و درخت تصمیم ساده با ده متغیر منتخب با روش‌های فیلتری بیشترین دقت پیش‌بینی و معادل ‌۱۴/82%‌‌ را برای تشخیص مؤدیان کم‌اظهار دارد. استخراج قوانین مناسب برای تشخیص مؤدیان کم‌اظهار بر اساس ده متغیر مؤثر بر پیش‌بینی آن‌ها از دیگر نتایج این مقاله است. The tax evasion is a constant concern for the tax administrations, especially in developing countries. Due to the large number of Value Added Tax (VAT) returns and resource constraints or their unaffordable investigation, it is necessary to develop a mechanism to identify dishonest taxpayers on the basis of historical data in large databases in this area.  In this research via a survey approach, eighteen variables that potentially affecting the identification of unreal statements are identified and using some data provided from VAT returns and performance, their impact on the detection of tax fraud are investigated.  After preprocessing of the data based on filtering techniques, ten influential factors in predicting the tax records are set. Genetic Algorithm is reduced the potential independent variables to seven influential variables.  The variable for the status of the tax records in terms of fraud is defined and to predict their situation, the prediction model with a decision tree approach, which is a data mining method, is developed. Implementations based on decision tree and ensemble methods of Bagging and Boosting on observations indicate that the decision tree and ensemble Bagging and Boosting methods which using ten predictive factors, have the ability to predict the status of the records with the accuracy of 82.14 percent.  A set of rule in order to preprocess the record is identified that can identify potential fraud before it is reviewed by the tax auditors. داده‌کاوی, مؤدی مالیاتی, کم‌اظهاری مالیات, مالیات بر ارزش افزوده Data Mining, Taxpayers, Tax Understatement, VAT 103 139 http://taxjournal.ir/browse.php?a_code=A-10-425-5&slc_lang=fa&sid=1 وحید برادران V_baradaran@iau-tnb.ac.ir 10031947532846004117 10031947532846004117 Yes دانشگاه آزاد اسلامی واحد تهران شیما محمدحسنی sh.inta@chmail.ir 10031947532846004118 10031947532846004118 No سازمان امور مالیاتی کشور