Elastic
To many people, data are just numbers listed in neat tables. But our audits also use a lot of unstructured data, such as texts and numbers in interview reports, policy documents, Excel sheets and so on. And our auditors need to be able to search these documents quickly, both before and during their audits.
That’s why the Court of Audit has developed our in-house tool Elastic, which enables auditors to search various public and non-public sources quickly and accurately. And data analysts can then systematically analyse large numbers of these processed documents.
Sources
A lot of the information used in our audits is from public sources, including the parliamentary papers (in Dutch) on the House of Representatives’ website. But also our own reports and letters. Consulting all these sources individually is very time-consuming. But thanks to Elastic auditors can now search directly in various sources. The public sources are revised on a daily basis to ensure that information is up-to-date.
As well as publicly available documents, the Court of Audit also uses non-public documents in its audits, such as confidential documents that we receive from ministerial departments and implementing agencies, and our own interview reports. Elastic can also search these documents, while the information they contain continues to be stored safely in-house. A detailed set of authorisations ensures that this information is only accessible to those entitled to view it.
Text analysis in audit
The government is increasingly digitalising its activities, with the result that more and more information is becoming available digitally. Some of this information is in the form of unstructured data in, for example, PDF, Word or PowerPoint files. As part of its preparatory processing, Elastic converts these documents into plain text, enriched with meta data such as file creation data. Many photos and scans that would not otherwise be searchable can also be converted into plain text in this way.
Text analysis is used to convert this plain text into new insights for our audits. Auditors can then find information on frequently arising topics, for example, and see relationships between documents.
Technology
The application uses Elasticsearch as the base for its search algorithm. The other components used are also based on open source software. The source code is primarily written in Python and Typescript, and the application operates in a Kubernetes environment. This means we can benefit from recent developments in search algorithms, while the Court of Audit can continue managing its information in-house.