Use of Data Clustering and its Simple Implementation – Data Analytics in Auditing

DOI: 10.53122/ISSN.0452-5027/2026.1.08

„Kontrola Państwowa” 2/2026

Pełna treść artykułu (plik PDF)

Wiesław Karliński

Data clustering is the process of grouping a set of data objects into multiple groups (clusters) so that objects within a cluster are very similar to one another, but differ from objects in other clusters. Differences and similarities are identified based on the values of the variables describing the objects, and distance measures are frequently used for this purpose. The method belongs to the category of unsupervised machine-learning methods and, in the context of audit, to audit data analytics (ADA). Data clustering has found practical application in various decision-making areas (e.g. marketing, banking, insurance, social research, medicine, biology), while its application in auditing is still relatively limited. This is mostly because the audit community perceive it as overly complex, in terms of both mathematics and tools. The author discusses the subsequent stages of data clustering, presents selected measures and methods with a view to their application in auditing, and proposes a simple implementation of selected methods that allows for analysis automation. He also pays attention to a broad potential that data clustering may have at different stages of an audit, beyond typical application of this method for detecting anomalies in financial audits. Such analysis may be particularly useful for Supreme Audit Institutions due to the wide thematic scope and coverage of audits they conduct. This also applies to audits carried out by NIK, especially performance audits. The issue is presented from a perspective of an auditor and engineer simultaneously, rather than from a purely academic one, and the proposed implementation is based on tools that are widely known to the community of auditors.

Słowa kluczowe: data clustering, use of data clustering in auditing, data segmentation methods in auditing, anomaly identification, identification of risk areas, audit data analytics

Informacje o artykule

Udostępniający:
Najwyższa Izba Kontroli
Data utworzenia:
06 maja 2026 13:04
Data publikacji:
06 maja 2026 13:04
Wprowadził/a:
Data ostatniej zmiany:
11 maja 2026 13:17
Ostatnio zmieniał/a:

Przeczytaj treść ponownie

Warto zobaczyć na stronie nik.gov.pl