deep learning | BestHub

DaTaobao Tech

Jun 19, 2024 · Product Management

Multi‑Interest Vector Recall and PDN Models for Large‑Asset Recommendation in Alibaba Auction

Alibaba Auction improves large‑asset recommendation by deploying the multi‑interest vector recall model MIND and the two‑hop PDN model, adapting features and time weighting for unique, high‑value items, using hard‑negative sampling and combined rule‑based and vector similarity, which boosts conversion metrics while revealing filter‑bubble concerns.

PDNdeep learninge-commerce

0 likes · 13 min read

Multi‑Interest Vector Recall and PDN Models for Large‑Asset Recommendation in Alibaba Auction

DaTaobao Tech

Apr 26, 2023 · Artificial Intelligence

MD-VQA: Multi-Dimensional No-Reference Video Quality Assessment for CVPR NTIRE 2023

Alibaba’s Taobao VQA team won the CVPR NTIRE 2023 Video Enhancement Challenge by introducing MD‑VQA, a multi‑dimensional no‑reference video quality model that combines a Swin‑Transformer‑V2 spatial backbone, a pre‑trained SlowFast motion encoder, and a convolutional fusion module, pre‑trained on LSVQ, fine‑tuned on NTIRE data, and augmented spatio‑temporally, achieving state‑of‑the‑art SROCC and PLCC scores and now powering quality monitoring on Alibaba’s live‑streaming and short‑video services.

No-ReferenceSwin Transformercomputer vision

0 likes · 15 min read

MD-VQA: Multi-Dimensional No-Reference Video Quality Assessment for CVPR NTIRE 2023

Alimama Tech

Feb 1, 2023 · Artificial Intelligence

CapOnImage: Context-driven Dense Captioning on Images

The paper presents CapOnImage, a novel image‑on‑image captioning task that generates location‑specific decorative text for product images, introduces the 2.1‑million‑image CapOnImage2M dataset, and proposes a mixed‑modality transformer with position‑aware pre‑training and progressive training, achieving superior accuracy and diversity and already deployed in Alibaba’s advertising platforms for measurable business impact.

Multimodaladvertisingcontext-aware

0 likes · 9 min read

CapOnImage: Context-driven Dense Captioning on Images

Alimama Tech

Dec 21, 2022 · Artificial Intelligence

Adaptive Parameter Generation Network for Click-Through Rate Prediction

Adaptive Parameter Generation Network (APG) dynamically creates sample‑specific model parameters for click‑through‑rate prediction using low‑rank factorization, parameter sharing, and over‑parameterization, achieving up to 0.2% AUC improvement, 3% CTR lift, and up to 96.6% storage reduction with faster inference.

CTR predictionModel Efficiencyadaptive parameter generation

0 likes · 14 min read

Adaptive Parameter Generation Network for Click-Through Rate Prediction

Alimama Tech

Dec 14, 2022 · Artificial Intelligence

Contrastive Image Representation Learning with Debiasing for CTR Prediction

The article proposes a three-stage contrastive learning framework—pre‑training, fine‑tuning, and debiasing—to generate unbiased, fine‑grained image embeddings for mobile Taobao CTR prediction, achieving higher accuracy, fairness, and a 4‑5% CTR lift in large‑scale offline and online evaluations.

Bias MitigationCTR predictioncontrastive learning

0 likes · 14 min read

Contrastive Image Representation Learning with Debiasing for CTR Prediction

Alimama Tech

Dec 7, 2022 · Artificial Intelligence

Adaptive Domain Interest Network for Multi-domain Recommendation

The Adaptive Domain Interest Network (ADIN) introduces a shared backbone with scenario‑specific subnetworks, domain‑specific batch normalization and SE‑Block attention to capture both commonalities and divergences across recommendation scenarios, and, combined with self‑supervised training, consistently outperforms baselines, delivering a 1.8% revenue lift in Alibaba’s display‑ad platform and now runs in production.

advertisingdeep learningdomain adaptation

0 likes · 12 min read

Adaptive Domain Interest Network for Multi-domain Recommendation

Alimama Tech

Nov 16, 2022 · Artificial Intelligence

STARDOM: Semantic-Aware Deep Hierarchical Forecasting Model for Search Traffic Prediction

STARDOM is an end‑to‑end deep hierarchical forecasting model that jointly learns hierarchical constraints, query semantics via pretrained BERT, and a calibration matrix within an encoder‑decoder architecture, using a distilled reconciliation loss and hierarchical sampling to accurately predict large‑scale search traffic and outperform state‑of‑the‑art baselines.

Search Advertisingdeep learninghierarchical modeling

0 likes · 22 min read

STARDOM: Semantic-Aware Deep Hierarchical Forecasting Model for Search Traffic Prediction

Alimama Tech

Oct 19, 2022 · Artificial Intelligence

Understanding the One-Epoch Overfitting Phenomenon in Deep Click-Through Rate Models

The study reveals that industrial deep click‑through‑rate models often overfit dramatically after the first training epoch—a “one‑epoch phenomenon” caused by the embedding‑plus‑MLP architecture, fast optimizers, and highly sparse features, with performance dropping sharply unless sparsity is reduced or training is limited to a single pass.

MLPctrdeep learning

0 likes · 15 min read

Understanding the One-Epoch Overfitting Phenomenon in Deep Click-Through Rate Models

Tencent Cloud Developer

Apr 27, 2022 · Artificial Intelligence

Alignment-Uniformity Representation Learning for Zero-shot Video Classification (AURL)

The AURL framework, presented by Pu Shi, introduces alignment‑uniformity aware representation learning for zero‑shot video classification, achieving up to 28 % top‑1 accuracy gains on UCF101 and HMDB51, and has already boosted business metrics in Tencent’s advertising, search, and video‑channel recommendation systems.

alignmentcomputer visiondeep learning

0 likes · 19 min read

Alignment-Uniformity Representation Learning for Zero-shot Video Classification (AURL)

Alimama Tech

Nov 17, 2021 · Artificial Intelligence

Adaptive Masked Twins-based Layer for Efficient Embedding Dimension Selection in Deep Recommendation Models

AMTL inserts an adaptively‑learned twin‑network mask after each representation layer to prune unnecessary embedding dimensions per feature value, automatically assigning larger sizes to high‑frequency features, achieving higher CTR accuracy, about 60% storage reduction, and seamless hot‑starting across recommendation models.

CTR predictionRecommendation systemsadaptive masking

0 likes · 15 min read

Adaptive Masked Twins-based Layer for Efficient Embedding Dimension Selection in Deep Recommendation Models

iQIYI Technical Product Team

Nov 5, 2021 · Artificial Intelligence

iQIYI’s QAV1 Encoder Achieves High Compression and Bandwidth Savings Using AV1 and Deep Learning

iQIYI’s QAV1 encoder, which combines the next‑generation AV1 codec with deep‑learning techniques, delivers 20‑42% bandwidth savings and up to 36% higher compression efficiency than x265 while maintaining ultrafast 60 fps encoding speeds, enabling high‑quality 4K/8K streaming and live broadcast across devices.

AV1QAV1Streaming

0 likes · 6 min read

iQIYI’s QAV1 Encoder Achieves High Compression and Bandwidth Savings Using AV1 and Deep Learning

Alimama Tech

Sep 8, 2021 · Artificial Intelligence

Deep Uncertainty-Aware Learning (DUAL) for Click‑Through Rate Prediction and Exploration Strategies

The paper presents Deep Uncertainty‑Aware Learning (DUAL), a scalable Bayesian deep‑learning framework that combines a neural feature extractor with a Gaussian‑process prior to model CTR prediction uncertainty, mitigates feedback‑loop bias, and enables confidence‑driven exploration (UCB and Thompson sampling) that improves long‑term utility while preserving accuracy.

CTR predictionContextual BanditsGaussian Process

0 likes · 15 min read

Deep Uncertainty-Aware Learning (DUAL) for Click‑Through Rate Prediction and Exploration Strategies

Alimama Tech

Sep 8, 2021 · Artificial Intelligence

Engineering Optimizations for Large‑Scale Advertising Recall Models: Full‑Cache Scoring and Index Flattening

Alibaba Mama’s advertising platform modernized its Tree‑based Deep Model by introducing a dual‑tower full‑library DNN with aggressive pre‑filtering and custom GPU TopK kernels, and a flattened‑tree model that retains beam search with multi‑head attention, while applying memory‑aware tricks such as attention swapping, softmax approximation, tiled‑matmul splitting, TensorCore batching, INT8 quantization and cache‑resident ad vectors, enabling multi‑fold latency reductions with minimal recall loss.

GPU AccelerationRecommendation systemsTopK

0 likes · 15 min read

Engineering Optimizations for Large‑Scale Advertising Recall Models: Full‑Cache Scoring and Index Flattening

Didi Tech

Nov 3, 2020 · Artificial Intelligence

Advances in Single‑Channel Speech Separation and Target Speaker Extraction with Iterative Refined Adaptation

The article surveys recent advances in single‑channel speech separation and target‑speaker extraction, explains the encoder‑separator‑decoder framework, compares frequency‑ and time‑domain methods, highlights models such as SpEx+, DPRNN‑Spe, and introduces Iterative Refined Adaptation, which iteratively improves speaker embeddings to boost SI‑SDR performance and enables effective speaker‑suppression for applications like in‑vehicle voice interaction.

AIAudio Signal Processingdeep learning

0 likes · 13 min read

Advances in Single‑Channel Speech Separation and Target Speaker Extraction with Iterative Refined Adaptation

Didi Tech

Oct 21, 2020 · Artificial Intelligence

Deep Model Compression Techniques for Intelligent Automotive Cockpits

The article reviews deep‑model compression methods—ADMM‑based structured pruning, low‑bit quantization, and teacher‑student knowledge distillation—and their automated AutoCompress workflow, demonstrating how these techniques shrink neural networks enough to run real‑time driver‑monitoring and other intelligent cockpit functions on resource‑limited automotive hardware while preserving accuracy.

ADMMdeep learningedge AI

0 likes · 16 min read

Deep Model Compression Techniques for Intelligent Automotive Cockpits

Didi Tech

Oct 16, 2020 · Artificial Intelligence

Mask Detection System and Visual AI Competition Achievements

Didi’s COVID‑19 mask‑detection system, built on a DFS‑based face detector and an attention‑enhanced ResNet‑50 mask classifier achieving over 99.5 % accuracy, has been deployed in vehicles, open‑sourced, and complemented by top‑ranked results in international visual AI contests, including first place in driver‑gaze prediction and podium finishes in emotion recognition and model‑compression challenges.

AIcomputer visiondeep learning

0 likes · 22 min read

Mask Detection System and Visual AI Competition Achievements

iQIYI Technical Product Team

Jul 24, 2020 · Artificial Intelligence

Fine‑grained Character Sentiment Analysis in Scripts: Models, Challenges, and Future Directions

The article surveys fine‑grained character sentiment analysis for script evaluation, detailing traditional, target‑dependent and aspect‑level methods, describing iQIYI’s BERT‑TD‑LSTM and CNN architectures, addressing challenges such as character name recognition and long‑range context, and outlining future improvements after a Parasite case study.

BERTLSTMNLP

0 likes · 19 min read

Fine‑grained Character Sentiment Analysis in Scripts: Models, Challenges, and Future Directions

iQIYI Technical Product Team

Jul 10, 2020 · Artificial Intelligence

Video Highlight Analysis Technology Framework

iQIYI’s video highlight analysis framework combines a large supervised dataset, deep label distribution learning, multi‑task training with a canonical‑correlated autoencoder, and a weakly supervised ranking model enhanced by confidence weighting and graph convolution, then fuses these signals to improve highlight detection accuracy.

Weak Supervisiondeep learninggraph convolutional networks

0 likes · 17 min read

Video Highlight Analysis Technology Framework

iQIYI Technical Product Team

Dec 6, 2019 · Artificial Intelligence

Multimodal Person Identification: Techniques, Datasets, and Applications by iQIYI

In a CSDN Tech Open Class Plus talk, iQIYI’s Dr. Lu Xiangju detailed multimodal person‑identification techniques that combine face, voice, pose and clothing cues, introduced the massive iQIYI‑VID dataset for real and cartoon subjects, described semi‑supervised training with Unknown Identity Rejection loss, and explained how these advances power iQIYI video services.

AIMultimodal Recognitiondataset

0 likes · 14 min read

Multimodal Person Identification: Techniques, Datasets, and Applications by iQIYI

iQIYI Technical Product Team

Jul 12, 2019 · Artificial Intelligence

Multimodal Video Retrieval Solution for iQIYI Challenge: Feature Fusion and Model Ensemble

The ‘One Name’ team from Nanjing University achieved a MAP of 0.8986 and third place in the iQIYI multimodal video retrieval challenge by fusing official face embeddings with scene features, using channel‑attention‑based video feature fusion, a multimodal SE‑ResNeXt module, and a carefully partitioned model ensemble.

deep learningfeature fusioniQIYI challenge

0 likes · 7 min read

Multimodal Video Retrieval Solution for iQIYI Challenge: Feature Fusion and Model Ensemble