AI in Auditing: Big-Data Opportunities and Challenges

by Melissa A. Dardani, CPA, CFE, MAcc, MD Advisory Services – April 23, 2024
AI in Auditing: Big-Data Opportunities and Challenges

The world is generating an astonishing amount of data. As the total volume of data increases, so does its predictive capabilities. The discussion of artificial intelligence (AI) in accounting should, therefore, be focused on enhanced reporting, performance and risk management with this predictive data.

Machine learning algorithms can evolve the accounting profession from its role of double entry-based recordkeeping and compliance to a critical value for strategic business decision-making. This evolution demands accountants to have at least some level of technical understanding of big data and AI.

Fears of the Unknown

Perhaps unsurprisingly, the accounting profession has shown hesitancy towards adopting AI due to its “black box” nature, particularly in unsupervised learning algorithms. These algorithms, capable of complex decision-making, often lack transparency in their logic and reasoning. This lack of clarity can be at odds with the accounting profession’s focus on accuracy, traceability and verifiability.

Supervised learning methodologies rely on pre-labeled data to train predictive models, using metrics like accuracy, precision and recall to refine their predictions. In contrast to supervised learning where pre-labeled data exists and desired classifications are known, unsupervised learning algorithms analyze unlabeled data. They autonomously identify patterns and struc­tures by finding similarities and differences among dataset attributes. It is useful in exploratory data analysis, anomaly detection and pattern discovery, where specific labels are absent or unknown, or where the goal is to uncover hidden trends.

Demystifying AI Methodologies in Accounting

Effective implementation of AI depends on several critical factors, including the choice of methodology. The selection and tuning of an appropriate AI methodology — be it a supervised or unsupervised method — is a critical step that shapes the model’s ability to return relevant and accurate insights. Method selection should include an evaluation of the objectives of the financial task, the nature and quality of the available data, the complexity of the problem at hand, and the desired outcome’s interpretability and transparency. Consider the following AI methodologies:


Classification algorithms are designed to categorize data into predefined classes. These algorithms, typically supervised in nature, learn from labeled datasets where each instance is tagged with a correct output label. By analyzing these datasets, the algorithm discerns patterns and rules that can be applied to new, unseen data to classify it accurately. The success of classification algorithms is often evaluated using metrics that assess how well the algorithm learned to classify the data, such as accuracy, precision, recall and the F1 score. One common example of a classification algorithm is Random Forest which functions by constructing numerous decision trees, each trained on different portions of the data. When it comes to making a prediction, each tree in this ‘forest’ casts a ‘vote’ for a particular class. The Random Forest then determines the final classification based on the most-voted class by these trees. This approach effectively combines the insights of multiple trees to reach a more accurate and reliable decision. Common hyperparameters in these algorithms include the number of decision trees in a Random Forest, the depth of each tree and the number of features considered for splitting at each node.


Clustering algorithms, an unsupervised methodology, are designed to group data points into clusters based on their similarity without relying on predefined labels. This method is useful for uncovering underlying structures in datasets where classification categories are not previously known. A common example of a clustering algorithm is K-Means, which segments data into distinct clusters by assigning each data point to the nearest cluster centroid and itera­tively optimizes these centroids’ positions. The success of clustering algorithms can be measured by the sum of squared errors (SSE), which quantifies the compactness of the clusters by calculating the squared distance between each data point and its assigned cluster centroid and then summing these distances across all data points. The goal is to minimize the SSE, indicating that data points are closely grouped around their centroids. Key hyperparameters in clustering algorithms like K-Means include the number of clusters (K) and the initial placement of the centroids.


Association algorithms are used to uncover relationships or associations between variables in large datasets, typically through unsupervised methods. These algorithms are adept at discovering rules that highlight the likelihood of relationships between data items. This approach is especially powerful in scenarios where the inter­connectivity or co-occurrence of items needs to be understood, without any prior assumptions. A prominent example of  an association algorithm is the Apriori algorithm, which is widely used for market-basket analysis. This algorithm works by identifying frequent item sets or groups of items that often occur together in a dataset and then deriving association rules that predict the likelihood of an item’s presence based on the presence of other items. For instance, in transactional data, Apriori can identify that if customers buy item A, they are likely to buy item B as well. The effectiveness of association algorithms is often evaluated based on the support, confidence and lift of the derived rules, which each measure how frequently items appear together and the reliability of the rules generated.

Outlier Detection

Outlier Detection is a critical process in data analysis that involves identifying data points that significantly differ from the majority of data. These outliers can be indicative of errors, anomalies or fraud. Various techniques are used for outlier detection, including statistical tests, proximity-based methods and deviation-based approaches. The effectiveness of these methods is often evaluated based on their ability to accurately identify true outliers while minimizing false positives.

Outlier detection can intersect with other algorithm methodologies like classification, clustering and even neural networks. For instance, a classification algorithm might be trained to distinguish between regular transactions and outliers, effectively categorizing data points as normal or anomalous. Similarly, clustering algorithms can group data, where points not fitting into any cluster may be considered outliers.

Integrating AI and predictive data in accounting will propel the profession towards a new era of business reporting. The benefits include enhanced risk management func­tions, strategic decision-making, and the ability to drive firm value, auditing and financial reporting. The future of accounting lies in embracing these advancements, not as replacements for human expertise, but as powerful tools that will fundamentally change the role of the accountant. 

Melissa A. Dardani

Melissa A. Dardani

Melissa A. Dardani, CPA, CFE, MAcc, is the founder of MD Advisory Services. She is a member of the NJCPA Emerging Technologies Interest Group and several other interest groups. She can be reached at

More content by Melissa A. Dardani:

This article appeared in the Spring 2024 issue of New Jersey CPA magazine. Read the full issue.