Ant Group’s Selected Papers at KDD2024: Abstracts and Highlights
The article presents a curated collection of Ant Group's research papers accepted at KDD2024, summarizing each paper's title, type, link, source, relevant fields, and abstract, covering topics such as graph mining, large language models, fraud detection, recommendation systems, and multimodal medical AI.
From August 25‑29, 2024, the International Conference on Knowledge Discovery and Data Mining (KDD) was held in Barcelona, receiving 2,046 submissions and accepting 409 papers (20% acceptance rate). Ant Group contributed 19 papers, with 7 research papers spanning graph representation learning, data mining, graph neural networks, large language models, natural language processing, and retrieval‑enhanced methods.
1. Efficient and Effective Anchored Densest Subgraph Search: A Convex‑programming based Approach Category: Research Paper Link: PDF Source: Ant Group research interns Fields: Data mining, graph theory, community detection, graph clustering Abstract: The paper proposes a new NR‑subgraph density metric and formulates the anchored densest subgraph problem as a linear program, solving it with two algorithms (FDP and FDPE) that achieve 3.6‑14.1× speedup over SOTA while improving subgraph quality.
2. Optimizing Long‑tailed Link Prediction in Graph Neural Networks through Structure Representation Enhancement Category: Research Paper Link: OpenReview Source: Independent work Fields: Graph neural networks, link prediction, long‑tail problem Abstract: The work defines the long‑tail issue for link prediction, proposes a subgraph structure‑enhancement framework that adds high‑confidence edges to tail samples, improving both tail and overall performance beyond SOTA.
3. On Finding Bi‑objective Pareto‑optimal Fraud Prevention Rule Sets for Fintech Applications Category: ADS Paper Link: arXiv Source: Independent work Fields: Multi‑objective optimization, rule learning, subset selection Abstract: Introduces SpectralRules for generating a compact, diverse rule pool and a Pareto‑front‑based selection stage (PORS) to obtain non‑dominated rule subsets, demonstrating superior performance on public and proprietary datasets.
4. FoRAG: Factuality‑optimized Retrieval Augmented Generation for Web‑enhanced Long‑form Question Answering Category: Research Paper Link: arXiv Source: Independent work Fields: Large language models, NLP, retrieval‑augmented generation Abstract: Proposes a fact‑optimizing RAG framework (FoRAG) with a outline‑enhanced generator and a dual‑granularity RLHF method, achieving better coherence, helpfulness, and factuality than WebGPT‑175B while using far fewer parameters.
5. Self‑Supervised Learning for Graph Dataset Condensation Category: Research Paper Link: ACM DL Source: CCF‑Ant Research Fund Fields: Graph learning, dataset compression, GNNs Abstract: Introduces SGDC, a self‑supervised graph dataset condensation method that aligns condensed‑graph representations with a pretrained SSL model, using a graph attention kernel and adjacency‑reuse strategy to improve accuracy and efficiency.
6. Cost‑Efficient Fraud Risk Optimization with Submodularity in Insurance Claim Category: Research Paper Link: OpenReview PDF Source: CCF‑Ant Research Fund Fields: Machine learning, operations research, online decision making Abstract: Presents CEROS, a submodular‑based optimization framework combining a submodular set classification model (SSCM) and a primal‑dual algorithm (PDA‑SP) to balance verification accuracy and investigation cost, achieving 66.9% speedup and 18.8% cost reduction in production.
7. Enhancing Pre‑Ranking Performance: Tackling Intermediary Challenges in Multi‑Stage Cascading Recommendation Systems Category: ADS Paper Link: DOI Source: Independent work Fields: Recommendation systems Abstract: Proposes a framework addressing sample‑selection bias, domain adaptation, and unbiased distillation in the pre‑ranking stage, introducing new evaluation metrics and demonstrating effectiveness in industrial deployment.
8. DDCDR: A Disentangle‑based Distillation Framework for Cross‑Domain Recommendation Category: ADS Paper Link: PDF Source: Independent work Fields: Cross‑domain recommendation, knowledge distillation Abstract: Builds a cross‑domain teacher model with adversarial domain discriminator, then distills shared representations to a target‑domain student, achieving new SOTA results on public and industrial datasets.
9. RJUA‑MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning Category: ADS Paper Link: arXiv PDF Source: Independent work Fields: Multimodal large models, document understanding, clinical reasoning Abstract: Introduces a benchmark with diverse medical report layouts and expert annotations, evaluating multimodal LLMs on VQA and clinical reasoning tasks, revealing current models’ limitations and encouraging future research.
10. LASCA: A Large‑Scale Stable Customer Segmentation Approach to Credit Risk Assessment Category: ADS Paper Link: PDF Source: CCF‑Ant Research Fund Fields: Black‑box optimization, machine learning, AI Abstract: Defines Stability Regret for customer segmentation, proposes a two‑stage framework (HDC + RDO) with evolutionary search and surrogate modeling, delivering 50% stability improvement and 25× speedup on 800 M‑scale credit data.
11. Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMs Category: ADS Paper Link: arXiv Source: Independent work Fields: Large language models Abstract: Describes a system that converts natural‑language marketing queries into structured audience expressions using LLM‑driven analogical reasoning, improving usability, interpretability, and accuracy in Alipay’s marketing platform.
12. Integrating System State into Spatio‑Temporal Graph Neural Network for Microservice Workload Prediction Category: ADS Paper Link: OpenReview Source: CCF‑Ant Research Fund Fields: Spatio‑temporal GNN, workload prediction, data mining Abstract: Proposes STAMP, a GNN that models microservice interactions, temporal dynamics, and system state, achieving 5.72% accuracy gain and 33.10% CPU savings in real‑world Alipay cloud deployments.
13. Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy Category: Research Paper Link: arXiv Source: Independent work Fields: Large language models, inference optimization Abstract: Introduces a trie‑based multi‑branch inference method that emits multiple tokens per step without accuracy loss, delivering 2.66‑6.26× speedup in production finance scenarios.
14. Intelligent Agents with LLM‑based Process Automation Category: ADS Paper Link: arXiv Source: Independent work Fields: Intelligent assistants, agents, LLMs, process automation Abstract: Presents a virtual assistant architecture (LLMPA) that parses natural‑language commands, reasons about goals, and executes multi‑step actions on mobile apps, demonstrated at scale in Alipay.
15. Towards Automatic Evaluation for LLMs’ Clinical Capabilities: Metric, Data, and Algorithm Category: ADS Paper Link: arXiv Source: Independent work Fields: Large models Abstract: Proposes an automated evaluation paradigm (LCP, SPs, RAE) for clinical LLMs, building a multimodal benchmark in urology and showing its effectiveness for safe deployment.
16. MFTCoder: Boosting Code LLMs with Multitask Fine‑Tuning Category: ADS Paper Link: arXiv Source: Independent work Fields: Artificial intelligence, large model technology Abstract: Introduces a multitask fine‑tuning framework that jointly optimizes several code‑related tasks, achieving superior performance and training efficiency, and powering the top‑ranked CodeFuse‑DeepSeek‑33B model.
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.