Why External Retrieval in RAG Is Redundant: Insights from NVIDIA’s INTRA Paper
The INTRA paper shows that using a decoder’s cross‑attention as an internal retrieval mechanism eliminates the need for a separate retriever, achieving state‑of‑the‑art multihop QA performance with only 164 K trainable parameters and shared pre‑encoded representations.
