Tagged articles
3 articles
Page 1 of 1
Tencent Cloud Developer
Tencent Cloud Developer
Apr 10, 2025 · Artificial Intelligence

The Magic of GPT‑4o: Technical Overview and Speculated Architecture

GPT‑4o combines extremely long‑form text generation, high‑quality image creation and interactive editing by likely using an autoregressive multimodal transformer that tokenizes visuals via VQ‑VAE/GAN pipelines, trained on massive data and refined through fine‑tuning and RLHF, offering a unified model for generation, editing, and understanding.

GPT-4oVQ-VAEautoregressive generation
0 likes · 17 min read
The Magic of GPT‑4o: Technical Overview and Speculated Architecture
AIWalker
AIWalker
Feb 28, 2025 · Artificial Intelligence

FlexTok: Reconstruct Images with as Few as 8 Tokens – Variable‑Length Tokenizer Beats TiTok

FlexTok is a flexible‑length 1‑D image tokenizer that can resample pictures into as few as 1‑256 discrete tokens, achieving superior reconstruction (FID) and autoregressive generation quality compared with TiTok, thanks to nested random dropout, causal masks and a flow‑based decoder evaluated on ImageNet and DFN.

FlexTokVision Transformerautoregressive generation
0 likes · 21 min read
FlexTok: Reconstruct Images with as Few as 8 Tokens – Variable‑Length Tokenizer Beats TiTok