There is no such thing as a data quality problem…

Over the last couple of days I have presented at a couple of conferences on the subject of Asset Information Management. When discussing with a delegate on Tuesday the nature of data quality problems, they asked:

Are data quality problems often user problems?

My immediate response was that all data quality problems are user/people problems. If we recognise this fact, then solving data quality problems involves solving people problems.

This may be a slightly controversial view, but if we recognise that activities such as data profiling, data matching, data cleansing etc. are only correcting the symptoms of the problem, then these activities will only tend to provide short term, unsustainable improvements in quality if used in isolation.

So what are the root causes of data quality problems? Some examples from my work in transport and utilities include:

  • Users not providing data updates because they don’t understand the need
  • Staff finding field data problems, but assuming that someone else will correct them
  • Productivity targets not allowing staff the time to provide data updates as they will lose pay
  • Staff keeping ‘local’ copies of data sets in spreadsheets and data bases
  • Admin staff bulk corrupting data through poor process controls
  • Data being migrated between systems without assessing and correcting data quality issues
  • Not assessing the accuracy of data (those DQ ostriches again!)
  • User input forms with the order of entry fields incorrect
  • Drop down categorisation lists left at the default setting by users
  • Lists of fault codes that are so long, staff have problems finding the correct code, so ask for a new one to be created…
  • Staff storing data in a password protected spreadsheets which are in turn stored in a Notes database

I could go on, but you probably get the idea….

So if the above list represents some of the root causes of problems, these are the areas that will need to be corrected in order to provide sustainable improvements in the quality of outputs, i.e. improvements in data quality.

Am I being radical? What do you think?

About these ads

14 Responses to There is no such thing as a data quality problem…

  1. Jill Wanless says:

    I couldn’t agree more. I will be sending this post to many collegues. Great post!

  2. Jim Harris says:

    Julian,

    There are, without question, some data quality problems attributable to people problems.

    As your list exemplifies, a lack of data ownership and assuming data quality is someone else’s responsibility is the fundamental root case for many data quality problems.

    One of the primary goals of a data quality initiative must be to define the roles and responsibilities for data ownership and data quality.

    However, as with so many things, I cannot help but quote Shakespeare:

    “There is nothing either good or bad, but thinking makes it so.”

    Once you have eliminated the obviously poor practices that people participate in, which cause and perpetuate data quality problems, you will be faced with the difficult task of choosing who among the good people are the really good people whose behavior (and business rules, tactical priorities, strategic vision, etc.) should be used as the gold standard for data quality within the enterprise.

    To paraphrase George Orwell (who also meant it sarcastically):

    “All people are equal, but some people are more equal than others.”

    So, let’s not blame it all on the people – except for those damn ostriches!

    Best Regards,

    Jim

  3. Dylan Jones says:

    Great post, I think you’re spot on here Julian.

    There is no surprise why change management and improving the culture surrounding data can have such dramatic impacts, they are people driven initiatives.

    We have become a little tool-obsessed in the industry after witnessing what feats technology can deliver but most of this is typically downstream, reactive data quality improvement.

    To truly create a difference we need to educate and communicate far more effectively.

    So, yes, I agree with you, data quality problems stem ultimately from people problems.

  4. Abhishek says:

    The first issue of the data quality start with data entry where the user do not understand the implication of certain field and left it blank or some random data.

    Enterprise always looks for immediate gain rather than looking for a long sustained gain, and this leads to data quality compromise at every point of data lifecycle.

  5. Henrik Liliendahl Sørensen says:

    I agree – and then I don’t agree.

    With the same logic you could state:

    • There is no such thing as crime – only people who don’t know about social responsibility
    • There is no such thing as traffic jams – only people who don’t know how to drive
    • There is no such thing as cold – only people who don’t know how to dress

    Blaming it all on people is stating the bloody obvious.

    Right now I am engaged in a data management initiative at a public transportation authority. We have bad data and this has been going on for years. A lot of training and guiding has been directed at the drivers of busses in order to solve the matter at the root. This will continue but we will also implement correcting automated processes because we can – and because the main task for a bus driver is not operating the onboard computer but bringing passengers safely from point A to B.

    In other words – in the real world there is such things called data quality problems. Working on root causes it the first but not ulimately only option.

    • Henrik,

      Thanks for your input.

      Yes I was stating the obvious, and yes, comparing to a crime etc. analogy could also be relevant, but I had deliberately been a bit controversial to promote debate and thought.

      What is obvious to some, is not obvious to others. Some vendors would try make people believe that a technology solution will cure all data quality ills. The more expensive the technology, the better the result?

      In your bus data collection example it could be argued that another root cause was a false assumption about the role and capabilities of bus drivers to do data entry whilst transporting passengers (i.e. a human error again). What is good about that example is that someone has tried to look wider than the data and the system to develop a better solution.

  6. So, I agree with Henrik – there are some real world things that can’t be helped. But the fact that some of these exist doesn’t mean we have to give up on every option. I think a cultural shift towards long-term thinking – as Abhishek mentions – would do organizations a world of good when it comes to data quality.

    The question is, how much is the extra effort worth? Perhaps everyone involved in data entry might need to spend 10% more of their time to achieve a noticeable increase in data quality. Would this cost more and provide more benefit than the offset in errors and the reduction in data quality staff?

    If it does, what cause will we all find to champion next?

  7. Great discussion. Users create the business process and own the business process, but they are the first to dirty the data and foul up the business process.

    This is why I love to automate a good business process – to keep the data clean and keep the process clean. There are a couple of key tools that have helped to assure that source data is clean. This has included bar codes that eliminate data entry and re-entry. Additionally, Electronic Data Interchange (EDI) that has reduced data entry and errors in the exchange of electronic documents between businesses.

    A new tool to assist with keeping source data clean is RFID tag technology. This technology has the potential of storing a lot of source data about an item with the item that can be accessed by multiple systems on-demand – real-time, quality, source data.

    • Of course with EDI & RFID you assume the originator of the data got it right. Roughly 30% of all data for many industries originates outside the organization. How often does your business audit the data quality practices of companies that provide that data?

      The retail industry introduced the Global Data Synchronization Network to address product data quality problems but early on there were still companies that paid multiple full time consultants to review and clean data. Many suppliers had, amongst other problems, difficulty putting the right dimensional information into the right standardized field.

      We will all have very long careers, if we want them.

  8. Laurie Reynolds says:

    I believe the problem here is not the people, good or bad, its the lack of strong asset data models which capture the relationships and behaviour relevant to the business problem being analysed.
    The reference to bus drivers reminded me of a lecture I attended 20 years ago by Stafford Beer, one of the gurus of systems thinking. He had been working in Chile assisting Che Guevara’s government measure the impact of alternative economic policies. They needed an indicator which was a fast acting measure of economic activity and chose to measure the number of bus tickets sold on the buses each day and it was the bus conductor not the driver who owned the data. The data owner should be the individual who has the greatest stake in ensuring the data is correct, not the senior manager of the department? Maybe we need more bus conductors again?

  9. Ark Wingrove says:

    Rather than trying to define the scenario that eventually leads to bad data it may be better, (and easier) to define the criteria for success, my starter for 10 would be:

    - A software package that is easy and straightforward to use (for a USER and not a tecchy!)

    - A software package that is easy to update (same caveat as above)

    BUT

    - A change control procedure that prevents excessive or adverse customisation

    and

    - Constant, organisational interest in data integrity, followed by corrective action when necessary (retraining, changes to code lists etc)

    facilitated by

    - A strong user group that “owns” the system and feels empowered to tune the system to achieve its goals

    goals which..

    - Will (of course) be clear, concise and communicated

    One day we will get there!

  10. [...] security exploits rely on the fallibility of humans, as stated  in a previous post data quality issues typically have root causes of human [...]

  11. [...] users, data administrators and external parties. I have referred to this in a previous post “There is no such thing as a data quality problem…” which was deliberately being a little provocative in order to make a point. In this related [...]

  12. Glad the post was of use.
    It certainly has promoted lots of comments and debate. Some readers are in agreement, others seem to view it as an overgeneralisation. As mentioned before, this post was deliberately meant to be a little provocative to counter the views of some who believe that the application of technology (MDM, data profiling) etc. is ‘the answer’. These techniques and tools do have value, but only as part of an overall solution to data quality management.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,108 other followers

%d bloggers like this: