Tag

BPE

0 views collected around this technical thread.

Code Mala Tang
Code Mala Tang
Mar 27, 2025 · Artificial Intelligence

How Do BPE, WordPiece, and SentencePiece Shape Modern NLP Tokenization?

This article explains the fundamentals, workflows, examples, and trade‑offs of three major subword tokenization algorithms—Byte Pair Encoding, WordPiece, and SentencePiece—helping practitioners choose the right method for their large language model pipelines.

BPENLPSentencePiece
0 likes · 12 min read
How Do BPE, WordPiece, and SentencePiece Shape Modern NLP Tokenization?
Nightwalker Tech
Nightwalker Tech
Jul 18, 2023 · Artificial Intelligence

Implementing the Input Processing Layer of a Transformer Model: Tokenization, Embedding, and Positional Encoding

This article explains how to build the input processing stage of a Transformer—including tokenization with Hugging Face tokenizers, token‑to‑embedding conversion using BERT models, custom BPE tokenizers, and positional encoding—providing complete Python code examples and test results.

BPEPositional EncodingPyTorch
0 likes · 14 min read
Implementing the Input Processing Layer of a Transformer Model: Tokenization, Embedding, and Positional Encoding