# Cohen's Kappa Calculator

Calculate Cohen's kappa for inter-rater reliability from a 2x2 agreement matrix. Get kappa, standard error, and agreement strength interpretation.

## What this calculates

Enter a 2x2 agreement matrix for two raters to calculate Cohen's kappa, which measures agreement beyond what would be expected by chance. The calculator also reports standard error and the Landis & Koch agreement strength.

## Inputs

- **Both Raters Positive (a)** — min 0 — Both raters classify as positive / category 1.
- **Rater 1 Positive, Rater 2 Negative (b)** — min 0 — Rater 1 says positive, Rater 2 says negative.
- **Rater 1 Negative, Rater 2 Positive (c)** — min 0 — Rater 1 says negative, Rater 2 says positive.
- **Both Raters Negative (d)** — min 0 — Both raters classify as negative / category 2.

## Outputs

- **Cohen's Kappa (κ)** — The kappa coefficient (-1 to 1).
- **Observed Agreement (Po)** — formatted as percentage — Proportion of cases where both raters agree.
- **Expected Agreement (Pe)** — formatted as percentage — Agreement expected by chance alone.
- **Agreement Strength** — formatted as text — Qualitative interpretation of kappa (Landis & Koch scale).
- **Standard Error** — Approximate standard error of kappa.
- **Calculation** — formatted as text — Step-by-step computation.

## Details

**What is Cohen's Kappa?**

Cohen's kappa (κ) quantifies the level of agreement between two raters who each classify items into two categories. Unlike simple percent agreement, kappa accounts for the possibility that raters agree by chance.

**Formula:**

**κ = (Po - Pe) / (1 - Pe)**

- **Po** = observed proportion of agreement = (a + d) / N
- **Pe** = expected proportion of agreement by chance

**Interpreting Kappa (Landis & Koch, 1977):**

| Kappa | Strength of Agreement |
|-------|-----------------------|
| < 0.00 | Less than chance |
| 0.00 - 0.20 | Slight |
| 0.21 - 0.40 | Fair |
| 0.41 - 0.60 | Moderate |
| 0.61 - 0.80 | Substantial |
| 0.81 - 1.00 | Almost perfect |

**When to use Cohen's Kappa:**

- Two raters classifying the same subjects into two categories
- Assessing diagnostic agreement between clinicians
- Evaluating coding reliability in content analysis
- Quality control where two inspectors rate pass/fail

For more than two raters, use Fleiss' kappa. For ordinal categories, consider weighted kappa.

## Frequently Asked Questions

**Q: What is the difference between kappa and percent agreement?**

A: Percent agreement simply divides the number of agreements by total cases. Kappa goes further by subtracting out the agreement expected by chance. Two raters flipping coins would get about 50% agreement, but kappa near 0. Kappa only reaches 1.0 when agreement is genuinely above what chance predicts.

**Q: Can Cohen's kappa be negative?**

A: Yes. A negative kappa means the two raters agree less often than random chance would predict. This can happen when raters systematically disagree or when there is a consistent bias (one rater always picks the opposite of the other).

**Q: What is a good kappa value?**

A: Most researchers aim for kappa above 0.60 (substantial agreement) as a minimum for reliable measurement. In clinical settings, kappa above 0.80 (almost perfect agreement) is preferred. The acceptable threshold depends on your field and the consequences of disagreement.

---

Source: https://vastcalc.com/calculators/statistics/cohens-kappa
Category: Statistics
Last updated: 2026-04-08
