Tag

data deduplication

0 views collected around this technical thread.

php中文网 Courses
php中文网 Courses
Jul 29, 2024 · Backend Development

Using PHP’s array_unique() Function to Extract Unique Values

This article explains PHP’s built-in array_unique() function, detailing its syntax, behavior, and flags, and demonstrates how to remove duplicate values from arrays with practical code examples and real-world use cases such as data cleaning, input validation, and aggregation.

PHParray_uniquearrays
0 likes · 6 min read
Using PHP’s array_unique() Function to Extract Unique Values
NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
Jul 25, 2022 · Big Data

Probability Algorithms in Big Data: BloomFilter and Count-min Sketch Applications

The article explains how space‑efficient probabilistic structures such as BloomFilter and Count‑min Sketch enable large‑scale data deduplication, join pruning, real‑time idempotent filtering, and approximate top‑K analytics by trading modest accuracy loss for dramatically reduced storage and faster computation.

Big DataBloomFilterCount-Min Sketch
0 likes · 12 min read
Probability Algorithms in Big Data: BloomFilter and Count-min Sketch Applications
Top Architect
Top Architect
Nov 11, 2021 · Databases

How to Remove Duplicate Data in MySQL Tables Efficiently

This article explains why duplicate rows appear in MySQL tables, demonstrates how to identify them with SELECT queries, and provides step‑by‑step SQL solutions—including safe deletion of all duplicates or retaining a single record per group—using subqueries and temporary tables for efficient cleanup.

Database CleanupDuplicate DataMySQL
0 likes · 5 min read
How to Remove Duplicate Data in MySQL Tables Efficiently
Laravel Tech Community
Laravel Tech Community
Sep 10, 2021 · Databases

How to Remove Duplicate Records in MySQL Tables

This article explains why duplicate rows appeared in production MySQL tables, demonstrates how to identify them with SELECT queries, and provides two SQL solutions—one to delete all duplicates and another to keep a single record per duplicated key—while preserving data integrity.

Database CleanupDuplicate RemovalMySQL
0 likes · 5 min read
How to Remove Duplicate Records in MySQL Tables
Aikesheng Open Source Community
Aikesheng Open Source Community
Feb 10, 2020 · Databases

Handling Duplicate Data in MySQL: Techniques and Examples

This article explains how to identify and remove various kinds of duplicate data in MySQL—including fully duplicated rows, records with duplicate non‑key columns, and unwanted whitespace inside fields—by using SQL statements, table cloning, OS utilities, and regular‑expression updates, with performance measurements for each method.

MySQLSQLdata cleaning
0 likes · 13 min read
Handling Duplicate Data in MySQL: Techniques and Examples
Test Development Learning Exchange
Test Development Learning Exchange
Apr 16, 2019 · Fundamentals

Python Script for Merging and Deduplicating CSV Files

This article presents a Python script that merges multiple CSV files from a specified directory and removes duplicate rows using pandas, providing a practical solution for test case management.

CSV processingPythondata deduplication
0 likes · 3 min read
Python Script for Merging and Deduplicating CSV Files
Qunar Tech Salon
Qunar Tech Salon
Apr 21, 2017 · Big Data

Ensuring Exact‑Once Semantics in Spark Streaming with Kafka: Offline Repair and Data Deduplication Strategies

This article explains why Spark Streaming combined with Kafka can only guarantee at‑least‑once delivery, outlines the challenges of delayed and out‑of‑order events, and presents practical offline‑repair, deduplication, and output‑format techniques—including code examples—to achieve exact‑once semantics in big‑data pipelines.

Big DataExact-OnceHBase
0 likes · 11 min read
Ensuring Exact‑Once Semantics in Spark Streaming with Kafka: Offline Repair and Data Deduplication Strategies
Architects' Tech Alliance
Architects' Tech Alliance
Apr 11, 2017 · Fundamentals

Technical Overview of Huawei Dorado V3 All‑Flash Storage: GRIP and FAST Features

This article provides a detailed technical analysis of Huawei's Dorado V3 all‑flash storage system, explaining its classification, the GRIP (Granular management, ROW, Inline deduplication & compression, Parity RAID) and FAST (FlashLink, Active‑active, Zero‑loss snapshot, RAID‑TP) technologies, and why these features are essential for modern flash‑oriented solutions.

Dorado V3Flash OptimizationHuawei
0 likes · 12 min read
Technical Overview of Huawei Dorado V3 All‑Flash Storage: GRIP and FAST Features
Architects' Tech Alliance
Architects' Tech Alliance
Feb 24, 2017 · Information Security

Understanding SHA-1 Hash Collisions and Their Impact on Data Deduplication

Recent public SHA-1 collision demonstrated by Google and Dutch researchers highlights the insecurity of SHA-1, prompting a shift toward stronger hashes like SHA-256/3, and underscores the importance of robust hash functions in data deduplication, storage compression, and overall information security.

Hash CollisionSHA-1cryptography
0 likes · 7 min read
Understanding SHA-1 Hash Collisions and Their Impact on Data Deduplication
Architects' Tech Alliance
Architects' Tech Alliance
May 11, 2016 · Fundamentals

Comprehensive Overview of Flash Storage Architecture, Technologies, and Future Trends

This article provides an in‑depth, systematic overview of flash storage, covering architecture, metadata management, deduplication, wear‑leveling, power‑loss protection, NAND flash cell types, reliability techniques, emerging 3D‑Flash and memristor technologies, as well as PCIe/NVMe interface standards.

3D NANDFlash StorageNAND Flash
0 likes · 20 min read
Comprehensive Overview of Flash Storage Architecture, Technologies, and Future Trends