Tag

deep learning

0 views collected around this technical thread.

DaTaobao Tech
DaTaobao Tech
Jun 19, 2024 · Product Management

Multi‑Interest Vector Recall and PDN Models for Large‑Asset Recommendation in Alibaba Auction

Alibaba Auction improves large‑asset recommendation by deploying the multi‑interest vector recall model MIND and the two‑hop PDN model, adapting features and time weighting for unique, high‑value items, using hard‑negative sampling and combined rule‑based and vector similarity, which boosts conversion metrics while revealing filter‑bubble concerns.

PDNdeep learninge-commerce
0 likes · 13 min read
Multi‑Interest Vector Recall and PDN Models for Large‑Asset Recommendation in Alibaba Auction
DaTaobao Tech
DaTaobao Tech
Apr 26, 2023 · Artificial Intelligence

MD-VQA: Multi-Dimensional No-Reference Video Quality Assessment for CVPR NTIRE 2023

Alibaba’s Taobao VQA team won the CVPR NTIRE 2023 Video Enhancement Challenge by introducing MD‑VQA, a multi‑dimensional no‑reference video quality model that combines a Swin‑Transformer‑V2 spatial backbone, a pre‑trained SlowFast motion encoder, and a convolutional fusion module, pre‑trained on LSVQ, fine‑tuned on NTIRE data, and augmented spatio‑temporally, achieving state‑of‑the‑art SROCC and PLCC scores and now powering quality monitoring on Alibaba’s live‑streaming and short‑video services.

No-ReferenceSwin Transformercomputer vision
0 likes · 15 min read
MD-VQA: Multi-Dimensional No-Reference Video Quality Assessment for CVPR NTIRE 2023
Alimama Tech
Alimama Tech
Feb 1, 2023 · Artificial Intelligence

CapOnImage: Context-driven Dense Captioning on Images

The paper presents CapOnImage, a novel image‑on‑image captioning task that generates location‑specific decorative text for product images, introduces the 2.1‑million‑image CapOnImage2M dataset, and proposes a mixed‑modality transformer with position‑aware pre‑training and progressive training, achieving superior accuracy and diversity and already deployed in Alibaba’s advertising platforms for measurable business impact.

Multimodaladvertisingcontext-aware
0 likes · 9 min read
CapOnImage: Context-driven Dense Captioning on Images
Alimama Tech
Alimama Tech
Dec 21, 2022 · Artificial Intelligence

Adaptive Parameter Generation Network for Click-Through Rate Prediction

Adaptive Parameter Generation Network (APG) dynamically creates sample‑specific model parameters for click‑through‑rate prediction using low‑rank factorization, parameter sharing, and over‑parameterization, achieving up to 0.2% AUC improvement, 3% CTR lift, and up to 96.6% storage reduction with faster inference.

CTR predictionModel Efficiencyadaptive parameter generation
0 likes · 14 min read
Adaptive Parameter Generation Network for Click-Through Rate Prediction
Alimama Tech
Alimama Tech
Dec 14, 2022 · Artificial Intelligence

Contrastive Image Representation Learning with Debiasing for CTR Prediction

The article proposes a three-stage contrastive learning framework—pre‑training, fine‑tuning, and debiasing—to generate unbiased, fine‑grained image embeddings for mobile Taobao CTR prediction, achieving higher accuracy, fairness, and a 4‑5% CTR lift in large‑scale offline and online evaluations.

Bias MitigationCTR predictioncontrastive learning
0 likes · 14 min read
Contrastive Image Representation Learning with Debiasing for CTR Prediction
Alimama Tech
Alimama Tech
Dec 7, 2022 · Artificial Intelligence

Adaptive Domain Interest Network for Multi-domain Recommendation

The Adaptive Domain Interest Network (ADIN) introduces a shared backbone with scenario‑specific subnetworks, domain‑specific batch normalization and SE‑Block attention to capture both commonalities and divergences across recommendation scenarios, and, combined with self‑supervised training, consistently outperforms baselines, delivering a 1.8% revenue lift in Alibaba’s display‑ad platform and now runs in production.

advertisingdeep learningdomain adaptation
0 likes · 12 min read
Adaptive Domain Interest Network for Multi-domain Recommendation
Alimama Tech
Alimama Tech
Nov 16, 2022 · Artificial Intelligence

STARDOM: Semantic-Aware Deep Hierarchical Forecasting Model for Search Traffic Prediction

STARDOM is an end‑to‑end deep hierarchical forecasting model that jointly learns hierarchical constraints, query semantics via pretrained BERT, and a calibration matrix within an encoder‑decoder architecture, using a distilled reconciliation loss and hierarchical sampling to accurately predict large‑scale search traffic and outperform state‑of‑the‑art baselines.

Search Advertisingdeep learninghierarchical modeling
0 likes · 22 min read
STARDOM: Semantic-Aware Deep Hierarchical Forecasting Model for Search Traffic Prediction
Alimama Tech
Alimama Tech
Oct 19, 2022 · Artificial Intelligence

Understanding the One-Epoch Overfitting Phenomenon in Deep Click-Through Rate Models

The study reveals that industrial deep click‑through‑rate models often overfit dramatically after the first training epoch—a “one‑epoch phenomenon” caused by the embedding‑plus‑MLP architecture, fast optimizers, and highly sparse features, with performance dropping sharply unless sparsity is reduced or training is limited to a single pass.

MLPctrdeep learning
0 likes · 15 min read
Understanding the One-Epoch Overfitting Phenomenon in Deep Click-Through Rate Models
Tencent Cloud Developer
Tencent Cloud Developer
Apr 27, 2022 · Artificial Intelligence

Alignment-Uniformity Representation Learning for Zero-shot Video Classification (AURL)

The AURL framework, presented by Pu Shi, introduces alignment‑uniformity aware representation learning for zero‑shot video classification, achieving up to 28 % top‑1 accuracy gains on UCF101 and HMDB51, and has already boosted business metrics in Tencent’s advertising, search, and video‑channel recommendation systems.

alignmentcomputer visiondeep learning
0 likes · 19 min read
Alignment-Uniformity Representation Learning for Zero-shot Video Classification (AURL)
Alimama Tech
Alimama Tech
Nov 17, 2021 · Artificial Intelligence

Adaptive Masked Twins-based Layer for Efficient Embedding Dimension Selection in Deep Recommendation Models

AMTL inserts an adaptively‑learned twin‑network mask after each representation layer to prune unnecessary embedding dimensions per feature value, automatically assigning larger sizes to high‑frequency features, achieving higher CTR accuracy, about 60% storage reduction, and seamless hot‑starting across recommendation models.

CTR predictionRecommendation systemsadaptive masking
0 likes · 15 min read
Adaptive Masked Twins-based Layer for Efficient Embedding Dimension Selection in Deep Recommendation Models
iQIYI Technical Product Team
iQIYI Technical Product Team
Nov 5, 2021 · Artificial Intelligence

iQIYI’s QAV1 Encoder Achieves High Compression and Bandwidth Savings Using AV1 and Deep Learning

iQIYI’s QAV1 encoder, which combines the next‑generation AV1 codec with deep‑learning techniques, delivers 20‑42% bandwidth savings and up to 36% higher compression efficiency than x265 while maintaining ultrafast 60 fps encoding speeds, enabling high‑quality 4K/8K streaming and live broadcast across devices.

AV1QAV1Streaming
0 likes · 6 min read
iQIYI’s QAV1 Encoder Achieves High Compression and Bandwidth Savings Using AV1 and Deep Learning
Alimama Tech
Alimama Tech
Sep 8, 2021 · Artificial Intelligence

Deep Uncertainty-Aware Learning (DUAL) for Click‑Through Rate Prediction and Exploration Strategies

The paper presents Deep Uncertainty‑Aware Learning (DUAL), a scalable Bayesian deep‑learning framework that combines a neural feature extractor with a Gaussian‑process prior to model CTR prediction uncertainty, mitigates feedback‑loop bias, and enables confidence‑driven exploration (UCB and Thompson sampling) that improves long‑term utility while preserving accuracy.

CTR predictionContextual BanditsGaussian Process
0 likes · 15 min read
Deep Uncertainty-Aware Learning (DUAL) for Click‑Through Rate Prediction and Exploration Strategies
Alimama Tech
Alimama Tech
Sep 8, 2021 · Artificial Intelligence

Engineering Optimizations for Large‑Scale Advertising Recall Models: Full‑Cache Scoring and Index Flattening

Alibaba Mama’s advertising platform modernized its Tree‑based Deep Model by introducing a dual‑tower full‑library DNN with aggressive pre‑filtering and custom GPU TopK kernels, and a flattened‑tree model that retains beam search with multi‑head attention, while applying memory‑aware tricks such as attention swapping, softmax approximation, tiled‑matmul splitting, TensorCore batching, INT8 quantization and cache‑resident ad vectors, enabling multi‑fold latency reductions with minimal recall loss.

GPU AccelerationRecommendation systemsTopK
0 likes · 15 min read
Engineering Optimizations for Large‑Scale Advertising Recall Models: Full‑Cache Scoring and Index Flattening
Didi Tech
Didi Tech
Nov 3, 2020 · Artificial Intelligence

Advances in Single‑Channel Speech Separation and Target Speaker Extraction with Iterative Refined Adaptation

The article surveys recent advances in single‑channel speech separation and target‑speaker extraction, explains the encoder‑separator‑decoder framework, compares frequency‑ and time‑domain methods, highlights models such as SpEx+, DPRNN‑Spe, and introduces Iterative Refined Adaptation, which iteratively improves speaker embeddings to boost SI‑SDR performance and enables effective speaker‑suppression for applications like in‑vehicle voice interaction.

AIAudio Signal Processingdeep learning
0 likes · 13 min read
Advances in Single‑Channel Speech Separation and Target Speaker Extraction with Iterative Refined Adaptation
Didi Tech
Didi Tech
Oct 21, 2020 · Artificial Intelligence

Deep Model Compression Techniques for Intelligent Automotive Cockpits

The article reviews deep‑model compression methods—ADMM‑based structured pruning, low‑bit quantization, and teacher‑student knowledge distillation—and their automated AutoCompress workflow, demonstrating how these techniques shrink neural networks enough to run real‑time driver‑monitoring and other intelligent cockpit functions on resource‑limited automotive hardware while preserving accuracy.

ADMMdeep learningedge AI
0 likes · 16 min read
Deep Model Compression Techniques for Intelligent Automotive Cockpits
Didi Tech
Didi Tech
Oct 16, 2020 · Artificial Intelligence

Mask Detection System and Visual AI Competition Achievements

Didi’s COVID‑19 mask‑detection system, built on a DFS‑based face detector and an attention‑enhanced ResNet‑50 mask classifier achieving over 99.5 % accuracy, has been deployed in vehicles, open‑sourced, and complemented by top‑ranked results in international visual AI contests, including first place in driver‑gaze prediction and podium finishes in emotion recognition and model‑compression challenges.

AIcomputer visiondeep learning
0 likes · 22 min read
Mask Detection System and Visual AI Competition Achievements
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 24, 2020 · Artificial Intelligence

Fine‑grained Character Sentiment Analysis in Scripts: Models, Challenges, and Future Directions

The article surveys fine‑grained character sentiment analysis for script evaluation, detailing traditional, target‑dependent and aspect‑level methods, describing iQIYI’s BERT‑TD‑LSTM and CNN architectures, addressing challenges such as character name recognition and long‑range context, and outlining future improvements after a Parasite case study.

BERTLSTMNLP
0 likes · 19 min read
Fine‑grained Character Sentiment Analysis in Scripts: Models, Challenges, and Future Directions
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 10, 2020 · Artificial Intelligence

Video Highlight Analysis Technology Framework

iQIYI’s video highlight analysis framework combines a large supervised dataset, deep label distribution learning, multi‑task training with a canonical‑correlated autoencoder, and a weakly supervised ranking model enhanced by confidence weighting and graph convolution, then fuses these signals to improve highlight detection accuracy.

Weak Supervisiondeep learninggraph convolutional networks
0 likes · 17 min read
Video Highlight Analysis Technology Framework
iQIYI Technical Product Team
iQIYI Technical Product Team
Dec 6, 2019 · Artificial Intelligence

Multimodal Person Identification: Techniques, Datasets, and Applications by iQIYI

In a CSDN Tech Open Class Plus talk, iQIYI’s Dr. Lu Xiangju detailed multimodal person‑identification techniques that combine face, voice, pose and clothing cues, introduced the massive iQIYI‑VID dataset for real and cartoon subjects, described semi‑supervised training with Unknown Identity Rejection loss, and explained how these advances power iQIYI video services.

AIMultimodal Recognitiondataset
0 likes · 14 min read
Multimodal Person Identification: Techniques, Datasets, and Applications by iQIYI
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 12, 2019 · Artificial Intelligence

Multimodal Video Retrieval Solution for iQIYI Challenge: Feature Fusion and Model Ensemble

The ‘One Name’ team from Nanjing University achieved a MAP of 0.8986 and third place in the iQIYI multimodal video retrieval challenge by fusing official face embeddings with scene features, using channel‑attention‑based video feature fusion, a multimodal SE‑ResNeXt module, and a carefully partitioned model ensemble.

deep learningfeature fusioniQIYI challenge
0 likes · 7 min read
Multimodal Video Retrieval Solution for iQIYI Challenge: Feature Fusion and Model Ensemble