Learning zone

ClarifiedBy… Data Quality

Written by Diligencia | Mar 30, 2019 12:00:00 AM

Whether or not you believe The Economist’s now familiar assertion that data is more valuable than oil, there is widespread acceptance that data is a highly valuable resource, particularly for businesses. Data has become a key driver for business decision-making at the strategic, operational and tactical levels.1

However, it is refreshing that even since publication of the Economist article in 2017, the conversation has moved beyond ‘big data’ as an end in itself. It is of course relatively easy for companies to accumulate huge amounts of data, but if they then have to employ a large team of people to clean, analyse and extract insights from it the value of the exercise is questionable. Instead, competitive edge can be achieved by those companies that draw on sophisticated analytics and high-quality datasets allowing insights and conclusions to be drawn more readily.

Data quality has always been central to Diligencia’s mission – not just in the authenticity of the data and how it is sourced (from official sources only) but also in the way we then structure and curate our information. While one approach is to scrape and compile data from multiple sources, with varying degrees of freshness and accuracy, and then allow users to draw their own conclusions, we believe that clearly sourced, reliable data that is consistent, clean and connected, is ultimately more valuable to our clients.

For example, we have around 40 tests built into our platform, which all company and individual profiles must pass before being published on ClarifiedBy. These rules have been designed to ensure:

Completeness – we ensure that key fields such as directors, shareholders, and company identifiers are fully populated prior to publishing each profile

Integrity – ranging from the simple (e.g. shareholdings cannot exceed 100%) through to the more technical (e.g. sole proprietorships cannot have more than one shareholder), a number of these tests ensure our information satisfies the demands of our discerning clients

De-duplication – not always easy with Arabic-English transliteration, but we dream of a world without false positives, and do our utmost to ensure that companies and individuals are not recorded twice in our database. This is also the key to producing our network diagrams

As our database continues to expand, data quality becomes ever-more important for us at Diligencia – particularly as we look to build tools and additional datasets that bring the relevant information and insights to the surface. To extend the oil analogy, why accept crude when you can have the refined product?