Tencent Technical Engineering
Apr 22, 2025 · Artificial Intelligence
Conan-Embedding-V2: A 1.4B LLM‑Based Multilingual Embedding Model Achieving SOTA on MTEB
Conan‑Embedding‑V2, a newly trained 1.4 B‑parameter LLM with a custom tokenizer, 32 k token context, SoftMask, cross‑lingual retrieval data and dynamic hard‑negative mining, delivers state‑of‑the‑art multilingual embeddings that surpass larger models on both English and Chinese MTEB benchmarks while remaining compact and fast.
EmbeddingLarge Language ModelMTEB
0 likes · 14 min read