Kirkpatrick model: how to evaluate whether training worked
Most organizations stop at smile sheets. The Kirkpatrick Model has four levels of training evaluation, from learner reaction to business results. Here is how to use each one, with examples for compliance, onboarding, and skills programs.
Most organizations evaluate training the same way: they send a survey after the course, collect some scores, and file the results. That is Level 1 of the Kirkpatrick Model. It tells you whether people enjoyed the experience. It tells you almost nothing about whether the training actually worked.
The Kirkpatrick Model gives you a structured way to answer the question that matters: did this training change anything? Did people learn something new, apply it on the job, and produce measurable results for the business?
This guide covers all four levels of the model, how to apply each one in practice (with specific examples for compliance training, onboarding, and skills programs), where most organizations get stuck, and how the Kirkpatrick Model compares to other evaluation frameworks.
Discover:
- What is the Kirkpatrick Model?
- Why most organizations get stuck at Level 1
- The four levels of the Kirkpatrick Model
- How to apply the Kirkpatrick Model in practice
- Kirkpatrick vs. Phillips ROI vs. LTEM
- Common mistakes when using the Kirkpatrick Model
- FAQ
What is the Kirkpatrick Model?
The Kirkpatrick Model, also known as Kirkpatrick’s Four Levels of Training Evaluation, is a framework for measuring whether training programs produce real results. It moves evaluation beyond learner satisfaction to examine knowledge gain, behavioral change, and business outcomes.
The model has four levels: Reaction, Learning, Behavior, and Results. Each level builds on the one before it, and each gets progressively harder to measure, but also progressively more useful.
Donald Kirkpatrick first published the model in the 1950s. It has been revised three times since then. In 2016, the New World Kirkpatrick Model added a stronger emphasis on making training relevant to people’s actual jobs, not just checking a box.
The model works for any type of training: formal courses, on-the-job training, blended programs, or compliance certifications. It is industry-agnostic. But how much value you get from it depends entirely on how many levels you actually measure.
Why most organizations get stuck at Level 1
Here is the uncomfortable reality. The majority of organizations stop at Level 1. They collect smile sheets, average the scores, and report back that “learners rated the training 4.2 out of 5.” That number feels reassuring. It is also nearly useless.
A high satisfaction score does not mean people learned anything. It definitely does not mean they will do anything differently at work. And it says nothing about whether the business is better off for having invested in the program.
Why do organizations stop here? Three reasons:
- Level 1 is easy to measure. You send a survey. You get numbers. Done. Levels 2 through 4 require planning, coordination with managers, and longer timelines.
- Nobody asks for more. If leadership only asks “did people like the training?” then that is all L&D will measure. The performance management cycle rarely connects back to training evaluation at deeper levels.
- The tools are not set up for it. Tracking behavioral change and business results requires learning data infrastructure that many organizations do not have. When your LMS only records completions, deeper evaluation feels impossible.
The Kirkpatrick Model is most useful when you flip the order. Start by defining the business results you want (Level 4), then work backward to design training that targets specific behaviors (Level 3), teaches the right knowledge and skills (Level 2), and creates a positive learning experience (Level 1). This is the approach the New World Kirkpatrick Model recommends, and it changes the entire conversation about training ROI.
The four levels of the Kirkpatrick Model

Level 1: Reaction
Level 1 measures the learner’s immediate response to the training. Was it relevant to their role? Was it engaging? Do they believe they can use what they learned?
There are three dimensions to track:
- Satisfaction: Did the learner find the training worthwhile?
- Engagement: Were they actively involved, or did they click through passively?
- Relevance: Can they connect what they learned to their actual work?
Reaction data is typically collected through a post-training survey, sometimes called a “smile sheet.” These surveys are quick, which is both their strength and their limitation. They capture a snapshot of how people felt, not what they retained.
The areas a reaction survey might cover include program objectives, content relevance, course materials, and facilitator knowledge.
How to measure Level 1 effectively
- Use an online questionnaire distributed immediately after the training ends.
- Include open-ended questions alongside rating scales. Written answers reveal patterns that scores cannot.
- Ask questions focused on application: “What is one thing you will do differently this week based on this training?”
- Tell learners at the beginning of the session that they will be providing feedback. This gives them time to form specific observations rather than generic reactions.
- Ask for honesty explicitly. Polite scores are not useful scores.
- Use data from previous surveys to refine your questions. If the same issues come up repeatedly, that is a signal worth investigating at Level 3.
Level 1 data is useful, but it is a starting point. A training that scores poorly on relevance probably has a design problem worth fixing. A training that scores high on satisfaction but shows no results at Level 3 or 4 might be entertaining but ineffective. Both signals matter.
Level 2: Learning
Level 2 measures whether learners actually acquired the knowledge, skills, and attitudes that the training program was designed to build. It answers a simple question: did they learn what we intended?
This level evaluates five elements: knowledge, skills, attitude, confidence, and commitment. These can be measured formally (through tests or assessments) or informally (through observation and discussion).
For accurate results, use pre-training and post-training assessments. Without a baseline, you cannot attribute any learning gain to the training itself. The learner might have already known the material.
Level 2 measurement connects directly to how you define learning outcomes. If your outcomes are vague (“understand compliance requirements”), your assessments will be vague too. If your outcomes are specific (“correctly identify three reporting obligations under GDPR”), you can build an assessment that actually measures something.
How to measure Level 2 effectively
- Run assessments before and after the training. The difference between scores is your learning gain.
- Vary your assessment formats. Exams work for knowledge. Simulations or role-plays work for skills. Self-assessments work for confidence and attitude, though they should be combined with other data.
- Define scoring criteria in advance to reduce inconsistency across evaluators.
- When possible, use a control group that did not receive the training. This isolates the training’s effect from other factors.
- Collect qualitative feedback from both instructors and learners. Instructors often notice patterns that assessments miss.
- Align assessments with Bloom’s Taxonomy levels. Testing recall is different from testing application or analysis.
Level 2 tells you whether the training delivered knowledge. It does not tell you whether that knowledge will be used. That is what Level 3 is for.
Level 3: Behavior
This is the level that separates good training evaluation from surface-level reporting.
Level 3 measures whether learners are applying what they learned when they are back on the job. Knowledge that stays in the classroom is knowledge wasted.
Behavioral change is harder to measure than knowledge gain because it depends on more than the training itself. The learner’s manager, their team culture, the tools they have access to, and the processes they work within all affect whether new behaviors take hold.
The New World Kirkpatrick Model introduced the concept of required drivers: the processes, systems, and support structures that reinforce and reward the right behaviors after training. Without these drivers, even well-designed training programs will fail to produce lasting change.
This is an important point. If an employee completes a training program but does not change their behavior, it does not automatically mean the training failed. It might mean the workplace environment does not support the new behavior. Maybe the manager does not reinforce it. Maybe the tools are not available. Maybe the old process is still the path of least resistance.
Level 3 evaluation reveals these barriers. That makes it one of the most valuable, and most underused, forms of training evaluation.
How to measure Level 3 effectively
- Wait 3 to 6 months after training before measuring behavioral change. Anything earlier is too soon to see lasting patterns.
- Use a mix of direct observation and structured interviews. Managers are often the best source of behavioral data because they see the work daily.
- Minimize opinion-based assessments. Instead of asking “do you think this person improved?” ask “how frequently does this person perform X behavior?” Specificity reduces bias.
- Start with subtle observation, then move to more formal evaluation methods like interviews or surveys once patterns become visible.
- Define the target behaviors precisely before you start evaluating. What does success look like? How would someone demonstrate mastery? A competency model can help here.
- Track consistency. A behavior performed once is not a changed behavior. You need sustained practice.
- Integrate evaluation into existing management workflows. If you add a separate evaluation process on top of everything else, it will get deprioritized. Build it into performance management and regular check-ins.
Level 3 data also feeds back into training design. If most learners understood the material (Level 2) but few changed their behavior (Level 3), the gap might be in how the training translates to the workplace, not in the training content itself. That is a design problem, not a knowledge problem. It often points to a need for more on-the-job training or collaborative learning to bridge the gap.
Level 4: Results
Level 4 measures whether the training program produced the business outcomes it was designed to achieve. This is where training evaluation connects to organizational strategy.
Results are tracked through Key Performance Indicators that are defined before the training begins. The specific KPIs depend on the program. For compliance training, it might be reduced incidents or audit findings. For onboarding, it might be time to productivity. For sales training, it might be revenue per rep.
Level 4 also looks at what the New World Model calls leading indicators: short-term observations that suggest critical behaviors are on track. Leading indicators are useful because business results often take months to materialize. You need earlier signals to know if the training is working.
For example, if you are evaluating a safety training program in manufacturing, the ultimate result (Level 4) is fewer workplace incidents. But a leading indicator might be the percentage of employees who consistently follow the new safety protocol during observed inspections. If that number is climbing, you have evidence that the training is on the right track, even before the incident data reflects it.
How to measure Level 4 effectively
- Define your target KPIs before the training starts, not after. If you do not know what result you are aiming for, you cannot design training that achieves it.
- Share those KPIs with all participants. People perform better when they know what success looks like.
- Use a control group when possible. This helps isolate the training’s contribution from other factors that affect results (market changes, new tools, seasonal patterns).
- Give participants enough time to apply their new skills before measuring results. Rushing this step produces unreliable data.
- Combine quantitative data (KPIs) with qualitative data (observations, manager feedback). Numbers tell you what happened. Qualitative data tells you why.
- For senior employees and leadership programs, use annual evaluations with consistent benchmarks. Behavioral change at senior levels takes longer and has a wider impact radius.
When you evaluate training across all four levels, you build a complete picture. You know how people reacted (Level 1), what they learned (Level 2), whether they applied it (Level 3), and what impact it had on the business (Level 4). That picture is what turns L&D from a cost center into a strategic function. It is also what lets you measure training effectiveness in a way that leadership will actually take seriously.
How to apply the Kirkpatrick Model in practice
The model works for any training program. But the way you apply it shifts depending on what you are evaluating. Here are three common scenarios.
Compliance training
Compliance training is non-negotiable. Regulations require it. Auditors check for it. The consequences of getting it wrong are legal and financial.
For compliance programs, Level 4 results are usually well-defined: zero violations, audit-ready documentation, certified employees. Work backward from there.
- Level 4: Track audit results, incident rates, and certification completion rates. Compare year-over-year to see whether training is moving the numbers.
- Level 3: Observe whether employees follow required procedures on the floor or in the field. Manager spot-checks and compliance audits double as behavioral evaluation.
- Level 2: Use certification exams with passing scores. Pre and post-assessments show knowledge gain. Consider scenario-based questions that test application, not just recall.
- Level 1: Keep it brief. Compliance learners do not need a 20-question satisfaction survey. Focus on relevance and clarity. Did they understand what was expected?
A learning platform with built-in certification tracking, recertification workflows, and audit trail reporting makes Level 4 measurement significantly easier. When every completion, score, and certification is recorded automatically, you spend less time pulling data and more time acting on it.
Employee onboarding
Onboarding is where the Kirkpatrick Model has some of the clearest measurable impact. The business result you care about is time to competence: how quickly a new hire becomes productive in their role.
- Level 4: Track time to full productivity, 90-day retention rates, and manager satisfaction with new hire readiness. These are concrete numbers most HR teams already have access to.
- Level 3: Check in at 30, 60, and 90 days. Are new hires performing the key tasks independently? Are managers still filling gaps? Use your onboarding checklist as a behavioral benchmark.
- Level 2: Assess knowledge of company processes, tools, and role-specific skills at the end of onboarding. Use practical assessments, not just quizzes.
- Level 1: Ask new hires about the onboarding experience within the first week and again at 30 days. Early feedback catches confusion before it becomes disengagement.
Skills development programs
Skills programs are where evaluation gets tricky because the business outcomes are often indirect and take time to materialize. But that does not mean you skip Levels 3 and 4.
- Level 4: Define what “closing a skill gap” looks like for the business. It might be fewer project delays, higher client satisfaction scores, or the ability to staff projects without external contractors. Use your skills matrix to track capability changes over time.
- Level 3: Are employees using the new skills in their work? Managers and project leads can provide behavioral data. Look at project assignments, task performance, and whether employees are taking on work they could not handle before.
- Level 2: Pre and post-assessments aligned with specific skill gap areas.
- Level 1: Ask whether the training felt relevant to their actual skill needs, not just whether they enjoyed it.
Kirkpatrick vs. Phillips ROI vs. LTEM
The Kirkpatrick Model is the most widely used training evaluation framework, but it is not the only one. Here is how it compares to two other common approaches.
| Framework | Core approach | Levels | Best for |
|---|---|---|---|
| Kirkpatrick Model | Four-level evaluation from reaction to business results | 4 levels | General-purpose training evaluation across all program types |
| Phillips ROI Model | Extends Kirkpatrick with a fifth level: cost-benefit analysis | 5 levels | Programs where leadership requires a financial return figure |
| LTEM (Learning Transfer Evaluation Model) | Focuses on how well learning transfers to workplace performance | 8 tiers | Programs where the gap between learning and doing is the main concern |
The Phillips ROI Model builds directly on Kirkpatrick. Its first four levels are similar, with one key difference at Level 3: Phillips expands “behavior” to “application and implementation,” which includes looking at whether failures come from training design or from workplace conditions. The fifth level applies financial analysis to calculate whether the training’s benefits outweigh its costs. This is useful when executives want a hard ROI number, but it requires solid data at all previous levels. If your Level 3 data is weak, your ROI calculation will be unreliable. Read more in our full guide on how to measure training effectiveness.
LTEM is a newer framework from Will Thalheimer that breaks evaluation into eight tiers, with a strong emphasis on whether learners can actually perform tasks in realistic conditions. It is particularly useful for training programs where the gap between “knowing” and “doing” is large, such as technical skills training or on-the-job training programs.
You do not have to choose one framework. Many organizations use the Kirkpatrick Model as their overall structure and borrow elements from Phillips (for ROI reporting) or LTEM (for transfer-focused evaluation) where they add value.
Common mistakes when using the Kirkpatrick Model
1. Stopping at Level 1
This is the most common problem, and we have already covered why. Satisfaction data is easy to collect. It is also the least useful level for proving training impact. If your evaluation strategy ends with a survey, you are leaving the most valuable insights on the table.
2. Measuring in the wrong order
The New World Kirkpatrick Model recommends planning your evaluation from Level 4 down, not from Level 1 up. Start with the business result you want to achieve. Then identify the behaviors that produce that result. Then design training that builds the required skills. Then create a learning experience that engages the audience. This approach aligns training with business strategy from the start.
3. Evaluating too soon
Behavioral change takes time. If you measure Level 3 two weeks after the training, you will get unreliable data. Three to six months is the window where you can see whether new behaviors have actually taken hold.
4. Ignoring the environment
Training does not happen in a vacuum. If an employee returns from a training program to a workplace where their manager does not support the new behaviors, or the systems do not allow them, behavior will not change. That is not a training failure. It is an environment failure. Level 3 evaluation should look at both the learner and the context they work in.
5. Treating it as a one-time event
Evaluation is not something you do after the program ends and then forget about. It works best as an ongoing practice integrated into your performance management cycle and employee development methods. When evaluation is built into how the organization operates, the insights compound over time.
6. Using the wrong data infrastructure
If your learning technology only tracks course completions, you cannot do meaningful evaluation beyond Level 2. Deeper evaluation requires data on behavior (manager observations, task performance) and results (business KPIs). A Learning Record Store that captures xAPI data from multiple sources, not just courses, makes Levels 3 and 4 evaluation practical rather than theoretical.
Frequently asked questions
What are the 4 levels of the Kirkpatrick Model?
The four levels are Reaction (did learners find the training relevant and engaging), Learning (did they acquire the intended knowledge and skills), Behavior (are they applying what they learned on the job), and Results (did the training produce measurable business outcomes). Each level requires different measurement methods and timelines.
Why is the Kirkpatrick Model still relevant?
The model has been in use since the 1950s because the core question it answers has not changed: did this training work? The 2016 New World update made it more applicable to modern workplaces by emphasizing business results and required drivers. No other framework has matched its combination of simplicity and depth.
What is the difference between the Kirkpatrick Model and Phillips ROI?
The Phillips ROI Model extends Kirkpatrick by adding a fifth level that calculates the financial return on training investment. Phillips also expands Level 3 to include application and implementation factors. Use Phillips when leadership specifically requires a cost-benefit analysis. Use Kirkpatrick when you need a general evaluation framework.
How long should you wait before evaluating at Level 3?
Three to six months after training is the recommended window for evaluating behavioral change. Evaluating too soon produces unreliable data because learners have not had enough time to practice and integrate new behaviors into their daily work.
Can you use the Kirkpatrick Model for online training?
Yes. The model works for any training format, including e-learning, blended learning, classroom training, and on-the-job training. The measurement methods may differ (online assessments instead of in-person observation, for example), but the four-level structure applies regardless of format.
What is the New World Kirkpatrick Model?
The New World Kirkpatrick Model is the 2016 update that added two important concepts. First, it recommends planning evaluation from Level 4 down (start with business results, then work backward). Second, it introduced “required drivers,” which are the workplace processes, systems, and management support that reinforce the behaviors training is designed to build.
How does the Kirkpatrick Model connect to learning analytics?
Learning analytics tools provide the data infrastructure for Kirkpatrick evaluation at all four levels. Level 1 data comes from post-training surveys. Level 2 data comes from assessments. Level 3 data can be gathered through manager feedback tools and performance tracking. Level 4 connects training data to business KPIs. A learning platform with built-in analytics and xAPI support makes it possible to track all four levels in one place.
Training evaluation form
Get a handy printable form for evaluating training and course experiences.
Download now