Data Party THU
Author

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

368
Articles
0
Likes
242
Views
0
Comments
Recent Articles

Latest from Data Party THU

100 recent articles max
Data Party THU
Data Party THU
May 24, 2026 · Artificial Intelligence

How Graphify Builds Codebase Knowledge Graphs and Replaces Vector Search with Graph Traversal

Graphify is a Python tool and Claude Code skill that creates a persistent, queryable knowledge graph of code, documentation, and media, cutting token usage by up to 71.5× compared with raw file reads, and it does so through a three‑pass pipeline that combines deterministic AST extraction, optional local audio transcription, and AI‑driven semantic extraction.

Claude CodeLLMPython
0 likes · 13 min read
How Graphify Builds Codebase Knowledge Graphs and Replaces Vector Search with Graph Traversal
Data Party THU
Data Party THU
May 24, 2026 · Artificial Intelligence

ICLR 2026 by the Numbers: Tsinghua Leads Overall, US Dominates Oral Papers

The ICLR 2026 analysis shows Chinese institutions, led by Tsinghua with 331 papers, dominate total submissions, while U.S. institutions retain the lead in high‑impact Oral papers, highlighting a shift in volume but not in top‑tier research influence.

AI research trendsICLR 2026Oral papers
0 likes · 7 min read
ICLR 2026 by the Numbers: Tsinghua Leads Overall, US Dominates Oral Papers
Data Party THU
Data Party THU
May 23, 2026 · Artificial Intelligence

ProteinOPD: Tsinghua’s Efficient Multi‑Objective Preference Alignment Framework for Protein Design

ProteinOPD introduces a multi‑teacher, on‑policy preference‑distillation framework that aligns protein language models with multiple design objectives—foldability, solubility and thermostability—while preserving generation quality, achieving up to 54% stability gains and an eight‑fold training speedup.

Language ModelsProteinOPDdeep learning
0 likes · 9 min read
ProteinOPD: Tsinghua’s Efficient Multi‑Objective Preference Alignment Framework for Protein Design
Data Party THU
Data Party THU
May 22, 2026 · Artificial Intelligence

First Survey of Agent Harnesses: What Powers Agents Beyond the Model?

The article surveys recent research on Agent Harness engineering, showing that real‑world agent instability stems from system‑level factors beyond model capability, introduces the seven‑layer ETCLOVG architecture, presents benchmark gains from harness tweaks, maps open‑source projects to the framework, and outlines five key open research directions.

AIAgent HarnessETCLOVG
0 likes · 12 min read
First Survey of Agent Harnesses: What Powers Agents Beyond the Model?
Data Party THU
Data Party THU
May 21, 2026 · Artificial Intelligence

ICML 2026: MedScope Introduces a New Paradigm for Long Medical Video Reasoning—From Watching to Verifying

MedScope proposes a "Think with Videos" paradigm that lets AI models actively locate and verify evidence in long clinical videos, using coarse‑to‑fine tool calling, evidence‑centric training data (ClinVideoSuite) and a grounding‑aware reinforcement learning objective, achieving superior performance on multiple video‑understanding benchmarks.

Evidence-based QALong Video ReasoningMedical Video AI
0 likes · 10 min read
ICML 2026: MedScope Introduces a New Paradigm for Long Medical Video Reasoning—From Watching to Verifying
Data Party THU
Data Party THU
May 20, 2026 · Industry Insights

What Invisible Changes Has AI Brought to Work, Education, and Organizations?

The article examines how AI is silently reshaping everyday life by enabling one‑person companies, redefining job roles, disrupting traditional education, and driving a shift toward networked, AI‑augmented organizations, supported by concrete industry examples and recent policy moves.

AI-driven startupsArtificial IntelligenceOrganizational Change
0 likes · 12 min read
What Invisible Changes Has AI Brought to Work, Education, and Organizations?
Data Party THU
Data Party THU
May 20, 2026 · Artificial Intelligence

How Introspection Adapters Enable LLMs to Self‑Report Hidden Behaviors

Anthropic's new paper introduces lightweight LoRA‑based introspection adapters that let large language models translate their internal activations into natural‑language reports of learned behaviors, achieving a 59% success rate on the AuditBench benchmark and exposing previously undetectable encrypted fine‑tuning attacks.

AI safetyAuditBenchEncrypted Fine‑Tuning
0 likes · 20 min read
How Introspection Adapters Enable LLMs to Self‑Report Hidden Behaviors
Data Party THU
Data Party THU
May 19, 2026 · Artificial Intelligence

Model Performance Lagging? Master Feature Engineering with a Complete Step‑by‑Step Guide

This article walks through the entire feature‑engineering pipeline—data cleaning, missing‑value imputation, encoding, outlier handling, scaling, feature construction, and selection—using Pandas and Scikit‑learn, and shows how to wrap the steps into a reproducible Scikit‑learn Pipeline.

Machine Learningdata preprocessingfeature engineering
0 likes · 9 min read
Model Performance Lagging? Master Feature Engineering with a Complete Step‑by‑Step Guide
Data Party THU
Data Party THU
May 19, 2026 · Artificial Intelligence

Anthropic Code w/ Claude Conference: How AI Cut a 10‑Week Project to 4 Days

Anthropic’s Code w/ Claude developer conference revealed three major upgrades—a stronger foundation model, the Claude Platform’s multi‑agent orchestration, and the Claude Code desktop client—showcasing real‑world cases where 50 k lines of Scala were rewritten in four days and a 20‑day approval process was halved, while API usage jumped 17‑fold and weekly developer time on Claude rose to 20 hours.

AI productivityAnthropicClaude
0 likes · 35 min read
Anthropic Code w/ Claude Conference: How AI Cut a 10‑Week Project to 4 Days