Boosting UX Evaluation Credibility with Rater Reliability in QMD 3.0
This article explains how 58 Tongcheng’s design team upgraded the QMD evaluation framework to QMD 3.0, using reliability testing, ICC analysis, and systematic process controls to make subjective UX scores more trustworthy and actionable across product lines.
Improving QMD Credibility through Rater Reliability
Following the core value of "users first," 58 Tongcheng’s design team built an evaluation mechanism (QMD) that uses basic experience metrics to assess key product scenarios. After identifying two pain points—difficulty implementing improvements and low confidence in evaluation conclusions—the team launched QMD 3.0.
The upgrade focuses on three pillars: indicator model, evaluation mechanism, and organizational practice, with a special emphasis on enhancing the credibility of subjective assessments by applying reliability testing.
What Is Reliability?
Reliability measures the consistency of measurement results. For example, a scale that consistently reads 10 kg for a 10‑kg watermelon is reliable.
What Is Rater Reliability?
Rater reliability evaluates the agreement among multiple experts rating the same set of test results. Because QMD relies on expert judgments, it is a subjective measurement and requires rater reliability analysis.
The chosen statistical method is the Intraclass Correlation Coefficient (ICC), calculated via variance analysis. Using SPSS (Analyze → Scale → Reliability Analysis), an example with six experts yielded ICC = 0.9704 > 0.75 and p < 0.05, indicating highly consistent ratings.
How to Analyze Rater Reliability
After selecting the measurement method, the team runs the analysis in SPSS and interprets the ICC and p‑value to confirm the trustworthiness of expert scores.
Factors Affecting Rater Reliability
Three main factors influence reliability:
Evaluation tool performance
Rater selection (expert screening, cognitive alignment, health stability)
Standardized implementation process
Making Reliability Perceptible
Beyond process control, users must perceive the results as reliable. This is achieved through four steps:
Clarify business expectations : co‑create metrics with stakeholders.
Full‑team participation : involve designers and product managers in evaluation.
Result review and error prevention : provide a feedback channel for business owners to revisit outcomes.
Validate implementation effects : track how evaluation conclusions improve business goals.
In summary, the score itself is less important than using the score to manage the process; by combining rater reliability, workflow management, and mechanism upgrades, the QMD evaluation becomes both statistically sound and perceptually trustworthy.
58UXD
58.com User Experience Design Center
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.