Tagged articles
6 articles
Page 1 of 1
Fun with Large Models
Fun with Large Models
May 22, 2026 · Artificial Intelligence

How to Rigorously Evaluate Large Models: Methods and Key Benchmark Datasets

This guide explains why systematic evaluation is essential for large models, outlines three core evaluation approaches—human assessment, benchmark‑dataset testing, and automated judge models—introduces the most widely used benchmark suites, and shows how to use the open‑source EvalScope framework and prompt‑design techniques to conduct reliable model assessments.

EvalScopeautomated judgebenchmark datasets
0 likes · 17 min read
How to Rigorously Evaluate Large Models: Methods and Key Benchmark Datasets
Fun with Large Models
Fun with Large Models
Jun 5, 2025 · Artificial Intelligence

EvalScope: The Ultimate Large‑Model Evaluation Framework You Control

This article introduces EvalScope, an open‑source framework for evaluating large language models, detailing its architecture, built‑in benchmarks, installation steps, and step‑by‑step guides for both performance stress testing and dataset‑based capability assessment, enabling users to independently verify model quality without relying on official documentation.

EvalScopebenchmark datasetslarge language models
0 likes · 12 min read
EvalScope: The Ultimate Large‑Model Evaluation Framework You Control
21CTO
21CTO
May 21, 2024 · Artificial Intelligence

How Google’s ScreenAI Could Redefine UI Understanding and UX Design

Google’s new ScreenAI visual‑language model, built on the PaLI architecture, can interpret user interfaces and infographics, answer UI‑related questions, generate summaries and navigate screens, and sets new benchmarks that may reshape future user‑experience research and applications.

Google AIScreenAIUI Understanding
0 likes · 9 min read
How Google’s ScreenAI Could Redefine UI Understanding and UX Design
NewBeeNLP
NewBeeNLP
Mar 13, 2024 · Artificial Intelligence

Can LLMs Clean Noisy Graphs? Introducing GraphEdit for Robust Structure Learning

GraphEdit leverages large language models and a lightweight edge predictor to remove noisy connections and uncover hidden node dependencies, achieving state‑of‑the‑art performance on benchmark graph datasets such as Cora, Citeseer, and PubMed, while demonstrating strong robustness to noise and limited supervision.

Edge PredictionGraph Structure Learningbenchmark datasets
0 likes · 17 min read
Can LLMs Clean Noisy Graphs? Introducing GraphEdit for Robust Structure Learning
DataFunTalk
DataFunTalk
Mar 23, 2021 · Artificial Intelligence

Explainability in Graph Neural Networks: A Taxonomic Survey

This article surveys recent advances in graph neural network explainability, systematically categorizing instance‑level and model‑level methods, reviewing datasets, evaluation metrics, and proposing new benchmark graph datasets for interpretable GNN research, and highlighting future research directions.

GNNInterpretabilityMachine Learning
0 likes · 40 min read
Explainability in Graph Neural Networks: A Taxonomic Survey