37%of reps admit to fabricating CRM data

There's a stat from a Validity report that came out earlier this year. 37% of staff admit to fabricating CRM data when they're staring down too many required fields. Not entering a best guess. Not skipping a field and coming back later. Making something up because the system won't let them do their job until every box has something in it.

I build and fix sales systems for small teams that are growing faster than their operations can keep up with. I've watched people fill in fields I know they don't have answers to because the alternative is sitting there for ten minutes trying to find a phone number that may not exist publicly while their next call window closes. I've never asked anyone directly if they fabricate data because I already know what the answer would be and I don't think the answer is the interesting part. The interesting part is that the system was designed in a way where fabricating is the rational move. You're asking someone whose job is selling to stop selling and perform data entry that benefits someone else's report. They will do it fast. They will do it badly. And then the report will look fine.

I started calling this the Lie Layer about a year ago because I couldn't find a term for it anywhere and I kept needing one. It's the distance between what your sales system reports and what you'd find if you actually clicked into the records underneath the report. Not data that went stale over time, although that happens too, something like 22.5% of B2B contact records decay every year just from people changing jobs. The Lie Layer is specifically the part where the data was wrong when it went in, everyone involved kind of knows it, and the system treats it as ground truth anyway.

25%of stale pipeline probably isn't real
100 : 3.6activities to quality conversations

01Where the lies actually are

They don't all look the same, which is why they're hard to talk about as one problem.

Stage lies are probably the most expensive. A deal enters Negotiation because a rep moved it there during a pipeline review while their manager was watching. Or because a required field only becomes available at that stage and the rep needed access to it for something unrelated to the deal actually being in negotiation. Or because the deal has been sitting in Discovery for nine weeks and that's starting to look bad.

Same CRM. Same pipeline. Same stage name. Three definitions.

There was a company written up recently, I think it was in a CRM Masters case study, where their reps were using three completely different interpretations of what "Demo Completed" meant as a stage. Same CRM. Same pipeline. Same stage name. Three definitions. The pipeline total rolled up into one number that looked like pipeline and was actually three incompatible measurements of different things added together. About 25% of pipeline that hasn't moved in over a month probably isn't real according to most of the benchmarking I've read, and that tracks with what I see when I open client systems for the first time. The number is always higher than people expect.

Activity lies are the ones that got noticeably worse this year. I've seen data showing the average rep generates around 100 outreach activities per day. Those activities produce roughly 3.6 quality conversations. I don't know how you look at that ratio and then open a dashboard sorted by activity count and feel like you're learning something about your team. A voicemail gets logged as a call. A sequence enrollment gets counted as outreach. The dashboard shows activity and it technically is activity in the same way that putting on running shoes is technically part of running a marathon.

SDRs spend about 30% of their week in actual selling motions. The other 70% is everything else. The dashboard doesn't separate the two. It counts them together and calls the total "activities."

Data lies are the ones everyone writes about so I'll keep it short. 76% of organizations say less than half their CRM data is accurate and complete. Duplicate rates in most enterprise CRMs run somewhere between 20 and 30 percent. One healthcare organization found 22% of their entire database was duplicates before they started cleaning it. Every pipeline report built on that data is summing up numbers that include the same opportunity multiple times and presenting a total that people are making hiring decisions against.

Timing lies are quieter than the rest. Deals that have been sitting open for six months with nothing happening. Close dates that have been pushed forward four times without anyone asking whether the deal is still alive. The distance between the original close date and the current one is maybe the most honest field in the whole CRM and I've never seen it on a default dashboard anywhere.

5.5hrsper week on manual CRM entry

02Why nobody catches it

Two things, and they feed each other.

The first is that telling the truth in most CRMs is more work than lying. Every CRM adoption initiative ever designed is fundamentally asking reps to choose between logging and selling. Reps spend roughly 5.5 hours per week on manual CRM entry, which is almost 14% of a 40-hour week burned on overhead that doesn't close deals. The rational response to that is to minimize it. Enter the minimum the system requires, in whatever quality gets the form to submit, and get back to work. 37% of the time that means making something up. I'd honestly guess the real number is higher because that 37% is self-reported and people tend to undercount the things they know they shouldn't be doing.

The second is that the summary hides everything. One deal in the wrong stage doesn't move a pipeline number. A hundred deals in slightly wrong stages moves it by maybe 15 or 20 percent, but 15 or 20 percent off still looks like a pipeline number. It still has a dollar sign and commas in the right places. Leadership sees $10M in pipeline and makes plans. They don't click into the records where $2M of it is duplicated opportunities and another $1.5M hasn't had a real human conversation attached to it since February.

The summary is also just more comfortable to look at. Opening individual records means finding specific deals that are probably dead and specific reps who are probably not performing and specific forecasts that are probably wrong. The summary lets everyone agree things are roughly fine and go to the next meeting. I've watched people choose the summary over the records in real time, not because they're lazy, because the records are confrontational and the summary isn't.

45%of CRM data isn't AI-ready

03AI made it structural

Everything above existed before AI. Reps were fabricating data when I started in this industry. Managers were ignoring stale pipeline when CRMs were still being sold on CD-ROMs, probably. The Lie Layer is older than most of the tools people are currently buying to try and fix it.

What happened recently is that AI turned the Lie Layer from a mess you could work around into load-bearing infrastructure.

It consumes whatever data exists and produces outputs with confidence scores attached.

When you build a lead scoring model on a database where more than a third of entries include fabricated data, the model trains on the fabrications. When you build a forecast on pipeline where a quarter of it hasn't moved in a month, the forecast inherits the inflation. When you route leads based on activity metrics where the ratio of activities to real conversations is 100 to 3.6, the routing optimizes for the 100 and ignores the 3.6 entirely because it can't tell the difference. That's the thing. It can't tell the difference. It consumes whatever data exists and produces outputs with confidence scores attached, and the confidence scores are the part that actually makes this dangerous because they make fabricated inputs look like considered analysis.

About 45% of CRM data isn't considered AI-ready according to several analyses I've pulled this year. Almost half the information being fed into the systems that are supposed to make sales smarter is broken before any model runs.

Here's the number that I keep staring at. Win rates in 2025 trended downward. The largest group of sales teams fell into the 21 to 25% win rate bracket, which is down from 31 to 40% the year before. In that same period, 45% of teams adopted hybrid AI-SDR models. Teams adopted more automation, generated more pipeline on paper, ran more sequences. Win rates went down. I'm not going to claim the AI caused the decline because there are a dozen other variables in play, but I will say that the combination of "more tools, more activity, more pipeline" alongside "fewer deals actually closing" should be making someone uncomfortable. If the data underneath is wrong, the tools are just processing the wrong data faster and calling it insight.

Most of the CRM data quality articles I read researching this piece frame the problem like it's weather. Data decays. Records go stale. Things degrade. And that's true but it's not the whole thing. The Lie Layer includes the part where people are actively making it worse because the system they work in makes fabrication easier than accuracy, and it includes the part where AI turned a manageable human mess into a structural dependency. You could work around bad data when humans were making decisions by looking at spreadsheets and using judgment. You can't work around it when automated systems are routing, scoring, forecasting, and escalating based on that data at machine speed with no judgment layer between the input and the output.

04Finding yours

I don't have a framework for this. There's no acronym. It's closer to just being willing to open the records instead of reading the report.

Filter the deals in your most advanced pipeline stage by how long they've been there. For most B2B teams, 60 days in Negotiation or Proposal without anything happening is a reasonable flag. Look at what percentage exceeds that. That percentage is roughly how inflated your late-stage pipeline is. It won't be zero. I've never seen it be zero.

Then open five of them. Not the five you already suspect are dead. Five random ones. Look at the last activity on the record, and I mean the last thing that represents a human being in contact with the buyer, not the last automated sequence step or system-generated task. If that real human activity happened more than three weeks ago and the deal is still sitting in Negotiation, the stage is a lie and someone is going to plan a quarter around it.

Required fields are the other diagnostic. Every required field you add increases the odds of fabricated data. The fields that are required but that nobody references in conversation or pipeline reviews are almost certainly full of garbage. They exist to get past the form. Run a mental test: if you deleted this field tomorrow, would anyone on the team notice it was gone? If the answer is no and the field is required, you've found a fabrication point.

Check your duplicate rate if you can. Most CRMs have dedup tooling built in or available as a plugin. Above 5% and your pipeline numbers carry a meaningful margin of error. Above 15% and I wouldn't trust any pipeline total without manually verifying the deals underneath it.

28% to 3%duplicate rate after one validation rule

05Fixing it

This is the boring part. That's the whole problem with doing this work.

The fix is almost always some version of: reduce the number of places where the system asks humans to lie, and more truth gets in. Fewer required fields, not more. Dropdowns instead of free text. Call recordings that auto-log so reps don't have to manually summarize every conversation. Email sync that captures activity without someone copy-pasting a note. Stage transitions tied to things that observably happened rather than someone dragging a card during a meeting because their manager is sharing their screen.

Where you can't eliminate manual input, add a check at the front door. One financial services company added validation rules that caught duplicates at the point of record creation. Duplicate rate went from 28% to 3% in six months. Pipeline accuracy improved by 40%. The whole fix was a validation rule. It probably took less time to implement than most teams spend evaluating their next AI tool.

Meanwhile, 57% of organizations say manual data cleaning is their primary approach to data quality, and those same organizations are cutting investment in dedicated data quality personnel. More cleanup work assigned to fewer people. That math doesn't resolve into anything good.

I keep expecting this to have a clean ending but it doesn't. The incentives haven't changed. Reps are still choosing between selling and logging, and they're still right to choose selling. Dashboards still show summaries and almost nobody clicks through. AI is still consuming whatever's in the database and attaching confidence scores to the output. And leadership is still building quarters around pipeline numbers that include a meaningful percentage of things that aren't real.

57% of organizations are trying to clean the data by hand with fewer people than last year. The Lie Layer is getting faster, not thinner.

Most teams I talk to, when they realize the pipeline number isn't real, their first instinct is to go shopping. New forecasting tool. New AI layer. New enrichment vendor. The equivalent of replacing the engine because the check engine light came on. Nobody ran the codes first. Nobody checked whether the actual problem was a $40 hose that takes fifteen minutes to replace. The fix for a 28% duplicate rate was a validation rule, not a platform migration. Fabricated stage data gets fixed by removing required fields and writing clearer exit criteria, which is boring work that nobody's built a landing page for.

A hose is still cheaper than an engine. But nobody's selling hoses at conferences, so the engine keeps winning.