Tag

cross-lingual retrieval

1 views collected around this technical thread.

Tencent Technical Engineering
Tencent Technical Engineering
Apr 22, 2025 · Artificial Intelligence

Conan-Embedding-V2: A 1.4B LLM‑Based Multilingual Embedding Model Achieving SOTA on MTEB

Conan‑Embedding‑V2, a newly trained 1.4 B‑parameter LLM with a custom tokenizer, 32 k token context, SoftMask, cross‑lingual retrieval data and dynamic hard‑negative mining, delivers state‑of‑the‑art multilingual embeddings that surpass larger models on both English and Chinese MTEB benchmarks while remaining compact and fast.

EmbeddingLarge Language ModelMTEB
0 likes · 14 min read
Conan-Embedding-V2: A 1.4B LLM‑Based Multilingual Embedding Model Achieving SOTA on MTEB