Product Management 9 min read

Boosting UX Evaluation Credibility with Rater Reliability in QMD 3.0

This article explains how 58 Tongcheng’s design team upgraded the QMD evaluation framework to QMD 3.0, using reliability testing, ICC analysis, and systematic process controls to make subjective UX scores more trustworthy and actionable across product lines.

58UXD
58UXD
58UXD
Boosting UX Evaluation Credibility with Rater Reliability in QMD 3.0

Improving QMD Credibility through Rater Reliability

Following the core value of "users first," 58 Tongcheng’s design team built an evaluation mechanism (QMD) that uses basic experience metrics to assess key product scenarios. After identifying two pain points—difficulty implementing improvements and low confidence in evaluation conclusions—the team launched QMD 3.0.

The upgrade focuses on three pillars: indicator model, evaluation mechanism, and organizational practice, with a special emphasis on enhancing the credibility of subjective assessments by applying reliability testing.

What Is Reliability?

Reliability measures the consistency of measurement results. For example, a scale that consistently reads 10 kg for a 10‑kg watermelon is reliable.

What Is Rater Reliability?

Rater reliability evaluates the agreement among multiple experts rating the same set of test results. Because QMD relies on expert judgments, it is a subjective measurement and requires rater reliability analysis.

The chosen statistical method is the Intraclass Correlation Coefficient (ICC), calculated via variance analysis. Using SPSS (Analyze → Scale → Reliability Analysis), an example with six experts yielded ICC = 0.9704 > 0.75 and p < 0.05, indicating highly consistent ratings.

How to Analyze Rater Reliability

After selecting the measurement method, the team runs the analysis in SPSS and interprets the ICC and p‑value to confirm the trustworthiness of expert scores.

Factors Affecting Rater Reliability

Three main factors influence reliability:

Evaluation tool performance

Rater selection (expert screening, cognitive alignment, health stability)

Standardized implementation process

Making Reliability Perceptible

Beyond process control, users must perceive the results as reliable. This is achieved through four steps:

Clarify business expectations : co‑create metrics with stakeholders.

Full‑team participation : involve designers and product managers in evaluation.

Result review and error prevention : provide a feedback channel for business owners to revisit outcomes.

Validate implementation effects : track how evaluation conclusions improve business goals.

In summary, the score itself is less important than using the score to manage the process; by combining rater reliability, workflow management, and mechanism upgrades, the QMD evaluation becomes both statistically sound and perceptually trustworthy.

ReliabilitySPSSUX evaluationICCQMDsubjective measurement
58UXD
Written by

58UXD

58.com User Experience Design Center

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.