Tag

model assessment

0 views collected around this technical thread.

Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Mar 13, 2025 · Artificial Intelligence

UniCBE: A Unified Multi‑Objective Optimization Framework for Contrastive Based Evaluation

UniCBE introduces a unified multi‑objective optimization framework for contrastive‑based evaluation that mitigates sampling bias, unbalanced uncertainty reduction, and inefficient resource allocation by combining three decoupled probability matrices through a greedy and Hadamard‑product strategy, achieving Pearson correlations above 0.995 with only 83 % of the annotation budget and cutting evaluation costs by more than 50 % across diverse LLM evaluators.

Contrastive EvaluationEfficiencyLarge Language Models
0 likes · 10 min read
UniCBE: A Unified Multi‑Objective Optimization Framework for Contrastive Based Evaluation
DataFunSummit
DataFunSummit
Dec 23, 2024 · Artificial Intelligence

Huolala's Large Model Evaluation Framework (LaLaEval) and Application Practices

This article presents Huolala's comprehensive LaLaEval framework for evaluating large language models, detailing the challenges of model deployment, the five‑step assessment process, two real‑world case studies in freight and driver invitation, and future directions toward more automated, product‑driven evaluation.

AILogisticsRAG
0 likes · 24 min read
Huolala's Large Model Evaluation Framework (LaLaEval) and Application Practices
DataFunSummit
DataFunSummit
Aug 31, 2024 · Artificial Intelligence

Evaluating Large Language Models: Defining Capabilities, Data Integration, Scientific Evaluation, and Statistical Analysis

The article outlines how to effectively deploy large language models by defining their capabilities, integrating high‑quality data, establishing scientific evaluation and statistical analysis methods, and illustrates these practices with a logistics industry case study.

AILarge Language Modelsdata integration
0 likes · 6 min read
Evaluating Large Language Models: Defining Capabilities, Data Integration, Scientific Evaluation, and Statistical Analysis
Efficient Ops
Efficient Ops
Dec 27, 2021 · Artificial Intelligence

Inside China Merchants Bank’s AI Model Risk Governance: Interview and Assessment Insights

China’s leading bank shares how its intelligent customer‑service semantic‑matching and multimodal service analysis models passed the AI model risk‑governance maturity assessment, detailing the governance measures, challenges faced, and future plans, while the CAICT framework that underpins the evaluation is explained.

AI ethicsAI risk governanceChina
0 likes · 13 min read
Inside China Merchants Bank’s AI Model Risk Governance: Interview and Assessment Insights