Building better AI models for the railways starts with a solid data foundation


The railway industry is rapidly digitalising and adopting new technologies such as machine learning and artificial intelligence (AI). These new technologies allow for increased overall efficiency in the sector, which is instrumental in reaching sustainability goals in Europe and mitigating the existing strain on the labour force. Dr Igor Dakic, Senior Consultant in Strategic Asset Management at the supplier of engineering, design, and advisory services AFRY, discusses the challenges companies wishing to utilise their data to train AI models for instance, are facing.

“AI is very useful as it can extract valuable insight from data, especially large datasets such as those of railway companies”, says Igor Dakic. “It can help identify trends and patterns that may be difficult to identify using traditional methods, and it can also be used to detect anomalies and pattern violations, which would otherwise require heavy manual analysis or corrections. It can supplement the skill gap that we have in data analytics, and thus allow employees to focus more on the strategic aspects of data management.”

When it comes to implementing AI, AFRY is “trying to shape the path towards being able to implement it, determine what data can it be utilised for, and what the possible implications are”, shares Dakic. “There is a consensus that AI and data applications can substantially reduce costs and improve quality and processes. For each use case, we first confirm what the benefits in each data environment are in a pilot phase before we decide to implement on a wider basis,” he adds.

“More specifically, we are trying to understand the costs and benefits of implementation as compared to doing things manually. The goal is to increase efficiency and accuracy as compared to manual effort, which is on the one hand prone to human error, but on the other hand more robust. AI can also offer more substantial scalability.” Indeed, there is much potential when it comes to using AI models to get the most out of railway data. However, it all starts with having a solid data foundation, defined by Dakic as “the integration of all the necessary components for establishing the right quality of information, especially when it comes to asset management systems.” There are however challenges to even this first step, particularly when it comes to legacy data.

Challenges of legacy data

“The ability to implement AI really depends on the initial state of the system in place and the data available”, posits Dakic. “I think many railway organisations have vast amounts of data spread across different departments and systems, and integrating information from diverse sources can be quite technically complex and challenging. Also, the available data can be inaccurate, incomplete, inconsistent… In short, contrary to what some may say, we are an innovative industry, but that also means that for many pitfalls, we were the first to fall in. And that all can affect the ability to essentially manage assets effectively and efficiently, as well as operate efficiently. What we are also seeing in our work is that this often stems from a lack of data governance, policies, and ownership, which all hinder comprehensive data management and the maintenance of high-quality data foundations.”

These issues are most common when it comes to legacy data, according to Dakic: “ “It is not easy to apply AI to legacy data cobbled together from various sources because in order to apply AI we need reliable data.” Indeed, “When it comes to legacy data, it is important to understand that it may be stored in outdated or obsolete systems, may be incomplete or may not follow common conventions, or may as well be disparate in terms of different assets, location hierarchies or structuring systems”, he continues. Consequently, legacy data can often suffer from the data quality issues such as inaccurate, incomplete, or inconsistent information. “Ensuring the accuracy and availability of legacy data can be time-consuming and require substantial resources,” highlights Dakic.

“Basically, we need transitioning platforms that can be very costly. It can also be very complex to transfer from the old systems to more modern technology, which allows for the implementation of more AI-based processes,” he elaborates. Otherwise, an expert needs to come in to make the data usable again, which is costly. “First, we need to understand if all the processes are in accordance with what is defined by technical regulation. Afterwards, we can define what the issues are. For example, if the problem is that the data sources are incompatible, then they can be brought together into a single database that can be utilised for other purposes,” he concludes.

A lot of legacy data is stored on outdated servers making it difficult to access.
A lot of legacy data is stored on outdated servers making it difficult to access.

Building a strong data foundation

As Dakic stated before, a strong data foundation is key. What are the first steps to building one? “Some organisations have a better quality of data than others, and for them, AI can be applied, the question is to which extent,” states Dakic. Indeed, he explains “It comes down to how the records are being stored, how the records have been maintained because railway organisations have very large sets of data and a wide range of different types of assets. All this data might be in various places, and proper maintenance must be applied for it to be usable reliably.”

In terms of best practices observed by AFRY, “an established, robust data governance framework” is key. Such a framework “defines the rules, responsibilities, processes, for management and maintenance of data quality, and standardised formats like product codes.” This serves to “enhance compatibility across different systems, which are all important in shaping the path towards AI,” emphasises Dakic. “What we often do is to establish comprehensive structure and conventions, provide clients with the means to automatically monitor and report data quality, and set up data governance processes supporting this. It is important to synchronise all the components and make it an integrated process,” states Dakic. According to him, that is the first requirement to establish a solid data quality foundation.

He also cites “change management that engages all relevant entities in discussions about data strategy integration” as instrumental. Indeed, “Many are accustomed to traditional methods and implementing new technologies can be difficult.” Furthermore, another important practice for companies stated by Dakic is “providing employees with the support and training needed to acquire the necessary knowledge to be able to cope with advancements in technology, either through hiring people with the appropriate background or training established technicians. In the end, what we need is a data quality culture and mindset.”

To learn more on this topic, attend Igor Dakic’s presentation entitled “Data Quality 360°: Guiding an Organisation to Data-Driven Success” on Tuesday 6 November 2023, at the Intelligent Rail Summit 2023 in Warsaw, Poland.

Further reading:

Author: Emma Dailey

Emma Dailey is an editor at and

Add your comment

characters remaining.

Log in through one of the following social media partners to comment.