Accelerating insight: analytics, meet data management
Every organisation wants to be data-driven these days. But what does “data driven” really mean? Translated, it means organisations want increased visibility of their inner workings, better understanding of the dynamics of the market in which they operate, insight into their customers’ demographic profiles and behaviours, and the performance of their products. And that isn’t an exhaustive list.
With the heightened level of visibility and understanding, also known as insight, organisations can answer questions and make better decisions, no more guessing. It is only once every staff member has access to these insights and actively and instinctively leverages them to plan and improve their performance in their role that an organisation can consider itself to be truly data-driven.
To achieve the state of commercial enlightenment, organisations hire teams of data scientists and analysts and put them to work. The new teams analyse data, lots of it. Some do answer the questions asked and provide insight to support a strategic decision, however the stakeholders that originally asked the questions are still dissatisfied.
This is a common situation. Indeed, according to Forbes, most data science teams do not deliver results that can be measured in terms of ROI by executives. Referencing Gartner’s Hype Cycle, Forbes’ research suggests data science has just passed the “Peak of Inflated Expectations” leading to a coming “Trough of Disillusionment”.
Why does data science fall short?
So why is this? The largest contributing factor is the speed at which a question can be answered. To be truly valuable to a business, insights need to be delivered in a timely fashion. Most questions that a business asks are both market and time sensitive – the answers are needed immediately. A business person identifies a possible opportunity, forms a question to validate the opportunity and poses the question to their elite team of data scientists at which point they’re told that they’ll have an answer in a few weeks.
However, the business person needs the answer that day, becomes frustrated and decides to act upon intuition and gut feel which introduces risk and moves the organisation away from being data-driven. The sense of frustration is also enhanced because business people are much more data literate these days. They read the success stories published on professional networks, they know what could be possible, their expectations are set high but not too high. In principle, what they want should be eminently achievable.
It’s because of situations like this that organisations are losing faith in the ability of data scientists to deliver meaningful insights at the speed that the business requires. The bubble has burst, the honeymoon period is over, and the value of maintaining expensive data science and analytics teams is in question. Data science needs to start delivering and delivering quickly.
The root of frustration
So why is it that most analytical teams lack speed? Let’s explore that question further. Data analytics is a field of two halves: data science and business intelligence. The former predicts the future whilst the later focuses on understanding the past. In order to understand the inefficiencies, we’ll look at a common process scenario for each.
Let’s start with data science as we’ve already seen a common process from the perspective of a frustrated business person. When we looked at that example, I mentioned that the business person will likely be told that their question will take a few weeks to answer. It’s worth spending a bit of time here understanding why that is by exploring how a data science team typically approaches a data science problem.
The process is kicked off when a business person puts a question to the team. The team then forms a theory around the question based on previous lessons learnt and informed assumptions. It’s this theory that the subsequent steps will aim to prove or disprove.
The next step is to pass the theory through a data science process such as the OSEMN framework (obtaining, scrubbing, exploring, modelling and interpreting data). It’s safe to say that most business questions within a single organisation or sector will be exploring related dimensions. It’s likely then that each question will be answered by interrogating some common data sets, particularly around core business entities like customers and products. The common data sets will be obtained from the same sources and transformed, cleansed and shaped in the same way when scrubbed.
Despite this, it is common practice for data science teams to complete these steps from the very start for every project, for every question asked. Yes, it’s a problem from a duplication of effort point-of-view, but a bigger problem when you understand the effort involved. The data preparation steps, obtaining and scrubbing the data, account for 80% of a data scientist’s effort invested into any given project and can take days or weeks to complete. The remaining 20% of their time is spent exploring and interpreting the data, applying the science itself and eliciting insight. Here we have our first root of business frustration. By the time the data scientist has prepared the data for analysis, the market has moved on, the opportunity has been missed, competitive advantage has been lost.
The scientific process is followed for good reason – to ensure accurate, repeatable and unbiased results from which to draw the best possible conclusion. The preparation steps can’t be skipped, or we risk making misinformed decisions, perhaps the wrong decision. Therefore we need to consider a different approach. Let’s now examine how business intelligence teams work.
The business intelligence process
Traditionally, business intelligence teams, being made up of engineers as opposed to scientists, are a little better when it comes to removing duplication in their processes. A BI team will take time to build automated, production-ready, data pipelines with reusable components. It’s these practices around reuse and good data engineering that their data scientist cousins can benefit greatly from. However, BI teams are not without their own inefficiencies.
Like a data science team, a BI team exists to answer business questions, but it tackles these questions from a different angle. A BI team typically provides the tools and technology for the business person to answer the question themselves, known as self-service BI.
The process starts with sourcing data directly from the source transactional systems. The BI team will build complicated ETL\ELT pipelines that are expensive to build, maintain, debug and fix. As data passes through the pipeline it’s cleansed, standardised and formatted. Typically, a complex and expensive pipeline is built per source system that’s integrated and, despite the source systems often containing similar data sets that require similar treatment, little is shared between them.
So, at a pipeline level duplication of effort exists like it did for data scientists. The pipelines deliver data into an enterprise data warehouse and sometimes from the warehouse to departmental data marts – small data stores that contain only the data required to answer questions that a specific department might ask. The business user interacts with the data in the warehouse or mart via one of many data visualisation tools available on the market today.
Once a BI solution has been built and automated there is little further human involvement. This means that any new data quality issues introduced upstream in the transactional systems that aren’t already catered for in the pipeline will flow straight through to the business user.
When a business user sees these inaccuracies the integrity of the report and indeed the whole BI solution is questioned. The seed of doubt is planted and it’s very difficult to earn back the trust of the business users. They will find another source – which may not be any more correct – or revert to intuition. In a worst case scenario, a business department might purchase their own BI tool and point it straight at the transactional systems, making decisions based on potentially unclean and incomplete data and affecting operational stability of the organisation’s production systems in the process.
In neither of the scenarios we’ve described does the effort invested by analytics teams in the data preparation stages, whether data science or business intelligence, actually address the ultimate source of frustration. So, what can organisations do to address this problem?
Solving the cause of the problem
In both scenarios discussed, the identified inefficient activity is ultimately the result of poor data quality at source. Poor data quality in terms of individual data sets being incomplete. Poor data quality in terms of duplicate sets being maintained in different source systems. Poor data quality in terms of the data not being available for consumption in a timely fashion.
The inefficient data preparation steps of the data science process and expensive BI data pipelines exist to remediate the data quality issues at process time, every time. If data quality at source was good enough to answer at least part of a given question the inefficient steps could be drastically simplified to the point of relative insignificance. The 80% of the effort required to complete these steps would then be diverted to the science of understanding the data, interpreting it, gleaning insight and answering the business’s toughest questions.
Good data quality comes from good data management. Implementing good data management practices requires participation and cooperation from all parties: the analytics teams, technology teams and the business people themselves.
Understanding what’s required for good data management starts with understanding that there is a gap between the vast volume and variety of data that an organisation has and the information the business people need, and that BI and analytics solutions alone will not bridge that gap. When businesses attempt to do just that they always fail, leading to frustrated business people.
VIQTOR DAVIS call the gap between data and the business the “data delta” and we’ve developed a proven, agile and practical data management methodology of the same name to cross it.
If you are interested in learning more about our tried and tested method, you can read about it by downloading a free copy of our book Crossing the Data Delta. Additionally, please get in contact and talk to us today about how we can help you.