Unhealthy Data Diet

AI – Ozempic for Unhealthy Data Diet

Keeping Data Decisioning Fit in Tackling Fraud

Counter fraud use cases have become the poster child of Artificial Intelligence. It’s a discipline that has led the way in predictive analytics for well over a decade. Over the past year, our appetite for implementation has surged, with use cases now spanning advanced predictive analytics, voice and image analysis, and the integration of large language models into operations. The adoption of full agentic AI now feels just a small step away.

But with this great power comes great responsibility: to understand how AI functions and, crucially, to ensure the machine is being fed clean, untainted data to protect the integrity of insights, decisions, and actions.

In a recent webinar, I explored the topic of a healthy Data Diet for effective AI in fraud detection. It’s a hot topic and one close to my heart. I’ve been using data to investigate corporate and insurance fraud for 30 years, and I know too well the pain of “garbage in, garbage out.”

So, what makes for an unhealthy data diet for our AI superpowers? Consider:

  • An overly restricted diet: Valuable data exists but isn’t captured, stored, or made accessible — a common trait in the “minimum viable purgatory” of poorly executed data transformation.

  • A wantonly unrestricted diet: Data is captured but not understood or usable, leading to data swamps plagued by ROT (redundant, obsolete, trivial data) that block value.

  • An overly processed diet: Data is stripped of its value, distorted, and opaque — no one can explain its meaning or origins.

Mitigating the Impact of an Unhealthy Data Diet

In counter fraud, accuracy is essential. Poor decisions based on flawed data can harm genuine customers, increase fraud leakage, and inflate operational costs. Confidence in models requires trust in the data — it must be accurate, complete, and reliable.

Yes, AI may one day compensate for poor-quality data, but that comes at a cost. In April, OpenAI revealed that users adding pleasantries like “please” and “thank you” cost them millions in energy bills. Data cleansing via AI might be possible, but it’s resource-heavy.

If we can’t rely on AI to fix poor data, we must commit to feeding it a healthy data diet.