The Emergence of Large Language Models in Financial Report Auditing: Opportunities, Benchmarks, Risks, and The Road Ahead

Authors

  • Kadek Nita Sumiari Politeknik Negeri Bali, Bali, Indonesia
  • I Ketut Parnata Politeknik Negeri Bali, Bali, Indonesia
  • I Gusti Ayu Astri Pramitari Politeknik Negeri Bali, Bali, Indonesia
  • I Made Agus Putrayasa Politeknik Negeri Bali, Bali, Indonesia

DOI:

https://doi.org/10.38035/gijea.v4i2.901

Keywords:

Large Language Models, Financial Auditing, Cognitive Augmentation, AI Explainability, Audit Risk, Regulatory Compliance

Abstract

The intersection of Large Language Models (LLMs) and financial report auditing has rapidly evolved into a substantive area of peer-reviewed academic inquiry and industry experimentation. Grounded in the theoretical lenses of Agency Theory, Audit Risk Theory, and Socio-Technical Systems Theory, this review synthesizes 20 peer-reviewed sources (2023–2026) across five thematic streams: automated auditing pipelines, regulatory compliance verification, fraud detection, LLM benchmarking, and practitioner perceptions. The central conceptual contribution of this review is the positioning of LLMs not as autonomous auditing agents but as probabilistic cognitive augmentation tools — systems that extend human auditors' analytical reach while operating under mandatory human accountability. We find that while current LLMs demonstrate meaningful capability in error detection, compliance matching, and fraud screening, they consistently fall short in domain-specific accounting reasoning, explainability, and regulatory standard citation. The persistent challenge of hallucination, the absence of auditable reasoning chains, and the structural equity gap between large and small audit firms collectively represent the primary barriers to professional-grade LLM deployment. Future research priorities include domain-adapted models, PRISMA-calibrated benchmarking, and harmonized international AI governance for auditing.

References

Abdo-Salloum, A. M., & Chehade, S. (2026). The role of artificial intelligence in transforming accounting and auditing practices: A systematic review. SAGE Open, 16(1). https://doi.org/10.1177/21582440251403296

Association of Certified Fraud Examiners (ACFE). (2024). Occupational fraud 2024: A report to the nations. ACFE. https://www.acfe.com/-/media/files/acfe/pdfs/rttn/2024/2024-report-to-the-nations.pdf

Bank for International Settlements — Financial Stability Institute. (2024). How regulators can address AI explainability. FSI Papers No. 24. BIS. https://www.bis.org/fsi/fsipapers24.pdf

Bostrom, R. P., & Heinen, J. S. (1977). MIS problems and failures: A socio-technical perspective. MIS Quarterly, 1(3), 17–32. https://doi.org/10.2307/248710

Cai, C. J., Winter, S., Steiner, D., Wilcox, L., & Terry, M. (2019). 'Hello AI': Uncovering the onboarding needs of medical practitioners for human-AI collaborative decision-making. Proceedings of the ACM on Human-Computer Interaction, 3(CSCW), 1–24. https://doi.org/10.1145/3359206

Dong, M., Stratopoulos, T. C., & Wang, V. X. (2024). A scoping review of ChatGPT research in accounting and finance. International Journal of Accounting Information Systems. https://doi.org/10.1016/j.accinf.2024.100715

European Commission. (2024). Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Official Journal of the European Union. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=OJ:L_202401689

Financial Stability Board. (2025). Monitoring adoption of artificial intelligence and related financial stability risks. FSB. https://www.fsb.org/uploads/P101025.pdf

Hillebrand, L., Berger, A., Deußer, T., Dilmaghani, T., Khaled, M., Kliem, B., Leonhard, D., & Bauckhage, C. (2023). Improving zero-shot text matching for financial auditing with large language models. In Proceedings of the ACM Symposium on Document Engineering 2023 (pp. 1–4). ACM. https://doi.org/10.1145/3573128.3609344

Jensen, M. C., & Meckling, W. H. (1976). Theory of the firm: Managerial behavior, agency costs and ownership structure. Journal of Financial Economics, 3(4), 305–360. https://doi.org/10.1016/0304-405X(76)90026-X

Kamar, E. (2016). Directions in hybrid intelligence: Complementing AI systems with human intelligence. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (pp. 4070–4073). IJCAI.

Kim, A. G., Muhn, M., Nikolaev, V. V., & Tan, H. T. (2024). Large language models and financial reporting oversight. Working paper presented at the PCAOB Spring Research Conference. Chicago Booth School of Business. https://assets.pcaobus.org

Kirkos, E., Boskou, G., Chatzipetrou, E., Tiakas, E., & Spathis, C. (2024). Exploring the boundaries of financial statement fraud detection with large language models. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4842962

Landers, R. N., & Behrend, T. S. (2023). Auditing the AI auditors: A framework for evaluating fairness and bias in high stakes AI predictive models. American Psychologist, 78(1), 36–49. https://doi.org/10.1037/amp0000972

Li, Y., & Goel, S. (2025). Artificial intelligence auditability and auditor readiness for auditing artificial intelligence systems. International Journal of Accounting Information Systems, 56. https://doi.org/10.1016/j.accinf.2025.100739

Marcy, A. S., Boyle, D. M., Gomaa, A. A., & Li, Y. (2025). Leveraging AI in auditing: Exploring PCAOB deficiencies with ChatGPT. Journal of Accounting Education, 72. https://doi.org/10.1016/j.jaccedu.2025.100985

Murphy, B., Feeney, O., Rosati, P., & Lynn, T. (2024). Exploring accounting and AI using topic modelling. International Journal of Accounting Information Systems, 55. https://doi.org/10.1016/j.accinf.2024.100709

Public Company Accounting Oversight Board (PCAOB). (2023). 2023 annual report. PCAOB. https://pcaobus.org

Stratopoulos, T. C., & Wang, V. X. (2025). Artificial intelligence and accounting research: A framework and agenda. International Journal of Accounting Information Systems, 57. https://doi.org/10.1016/j.accinf.2025.100760

Thomas, J., & Harden, A. (2008). Methods for the thematic synthesis of qualitative research in systematic reviews. BMC Medical Research Methodology, 8(1), 45. https://doi.org/10.1186/1471-2288-8-45

Trist, E. L., & Bamforth, K. W. (1951). Some social and psychological consequences of the longwall method of coal-getting. Human Relations, 4(1), 3–38. https://doi.org/10.1177/001872675100400101

Vitali, S., & Giuliani, M. (2024). Emerging digital technologies and auditing firms: Opportunities and challenges. International Journal of Accounting Information Systems, 53. https://doi.org/10.1016/j.accinf.2024.100676

Wang, R., Liu, J., Zhao, W., Li, S., & Zhang, D. (2025). AuditBench: A benchmark for large language models in financial statement auditing. In Q. Wang et al. (Eds.), AI for Research and Scalable, Efficient Systems. AAAI Workshop 2025. Communications in Computer and Information Science, Vol. 2533 (pp. 1–15). Springer. https://doi.org/10.1007/978-981-96-8912-5_3

Yeo, W. J., van der Heever, W., Mao, R., Cambria, E., Satapathy, R., & Mengaldo, G. (2023). A comprehensive review on financial explainable AI. arXiv:2309.11960.

Zamain, N. S. A., & Subramanian, U. (2024). The impact of artificial intelligence in the accounting profession. Procedia Computer Science, 238, 849–856. https://doi.org/10.1016/j.procs.2024.06.102

Zhao, J., & Wang, X. (2024). Unleashing efficiency and insights: Exploring the potential applications and challenges of ChatGPT in accounting. Journal of Corporate Accounting & Finance, 35(1), 269–276. https://doi.org/10.1002/jcaf.22663

Published

2026-06-25

How to Cite

Sumiari, K. N., Parnata, I. K., Pramitari, I. G. A. A., & Putrayasa, I. M. A. (2026). The Emergence of Large Language Models in Financial Report Auditing: Opportunities, Benchmarks, Risks, and The Road Ahead. Greenation International Journal of Economics and Accounting, 4(2), 364–373. https://doi.org/10.38035/gijea.v4i2.901