Self-reported skill data is systematically biased. Engineers overestimate comfortable domains and underestimate gaps they haven't yet encountered. Here's what your incident log is actually telling you.
The survey problem isn't the tool — it's the epistemology
When you ask an engineer to rate themselves on Kubernetes, you're asking them to evaluate what they don't know they don't know. An engineer who has never encountered a specific Kubernetes failure mode will rate themselves as competent in Kubernetes because all the Kubernetes work they've done has gone fine.
The skill gap isn't visible to them. It's not that they're lying — it's that they're reporting from their own experience, which is necessarily incomplete.
What your workflow data is actually showing you
Pull request review patterns tell a different story. When a senior engineer leaves the same category of review comments — "this doesn't handle the edge case where..." — across multiple PRs from the same engineer, that's signal. It's not self-reported. It's observed.
When an incident involves the same team members repeatedly escalating because the on-call engineer lacks exposure to a specific failure domain, that's signal. When Jira tickets in a particular service stay unassigned or get reassigned multiple times before landing on the same two people, that's signal.
None of this data requires a survey. Your engineering org is producing it continuously, as a byproduct of normal work.
The confidence calibration problem
There's a documented phenomenon in skill assessment research: people with genuine expertise often underrate themselves relative to their actual performance, while people at intermediate levels tend to overestimate. Applied to engineering skill surveys, this creates a specific distortion: your most capable engineers may flag themselves as needing improvement in areas where they're actually strongest, while engineers with dangerous blind spots rate themselves as confident.
This isn't a problem you can solve by improving survey design. It's a fundamental limitation of self-report methodology for skills that require encountering failure to recognize.
What to read instead
The data your engineering organization is already producing has several properties that make it more reliable than surveys:
- It's behavioral, not self-reported — it reflects what engineers actually do under real conditions, not what they believe they'd do
- It's continuous — it updates as new incidents, PRs, and tickets arrive, not on a quarterly survey cadence
- It's specific to your codebase — it maps to the actual domains and failure patterns your infrastructure produces, not a generic skill taxonomy
- It surfaces unknown unknowns — engineers can't report gaps they haven't encountered; the codebase records when they encounter them
The gap between what surveys find and what the codebase reveals
In every engineering organization we've worked with, the competency graph built from workflow data surfaces at least two or three skill gaps that had never appeared in survey results. These aren't minor gaps — they're the ones correlated with actual incidents. They're the skills that are making your infrastructure fragile right now, and they're invisible to your current L&D planning because no engineer knew to report them.
Your incident log knows. Your PR review history knows. Your Jira ticket escalation patterns know. The question is whether you're reading them.
"We ran surveys for three years and built training programs from them. Tunlai showed us in the first week that two of our five training priorities were solving problems we didn't actually have — and missing the three we did."
The survey isn't wrong because engineers are dishonest. It's wrong because it's asking people to report the shape of their own blind spots. The alternative is to read the data that records what happens when those blind spots encounter reality.
That's what incident logs, PR reviews, and ticket patterns are for.