Tag

text similarity

1 views collected around this technical thread.

Test Development Learning Exchange
Test Development Learning Exchange
Apr 20, 2024 · Artificial Intelligence

Implementing a Simple University Paper Plagiarism Detection System in Python

This article outlines the design and implementation of a basic university paper plagiarism detection system using Python, covering text preprocessing with NLTK, TF‑IDF weighting, cosine similarity calculation, and a sample in‑memory paper database, while also discussing scalability, UI, and legal considerations.

NLPPythonTF-IDF
0 likes · 10 min read
Implementing a Simple University Paper Plagiarism Detection System in Python
NetEase LeiHuo Testing Center
NetEase LeiHuo Testing Center
Jun 2, 2023 · Artificial Intelligence

AI Techniques for a Global Search Platform: Word Segmentation, Text Similarity, Image Retrieval, and Multimodal Models

This article shares the development of a global search platform that leverages AI technologies such as Chinese word segmentation, part‑of‑speech tagging, text similarity via Simhash and Synonyms, image similarity using histogram, Hamming distance and ResNet‑50, and multimodal CLIP‑based models to improve search efficiency and accuracy.

AINLPimage retrieval
0 likes · 12 min read
AI Techniques for a Global Search Platform: Word Segmentation, Text Similarity, Image Retrieval, and Multimodal Models
KooFE Frontend Team
KooFE Frontend Team
May 21, 2023 · Artificial Intelligence

How Embeddings Power AI Knowledge Bases: From Theory to Practice

This article explains what embeddings are, how they capture semantic similarity, and how to use OpenAI's embedding API to transform a custom knowledge base into vector representations for efficient search, retrieval, and question‑answering.

AIOpenAIembeddings
0 likes · 11 min read
How Embeddings Power AI Knowledge Bases: From Theory to Practice
Qunar Tech Salon
Qunar Tech Salon
Jun 3, 2020 · Fundamentals

Optimizing International Hotel Data Aggregation Algorithms at Qunar

The article outlines Qunar’s challenges in aggregating international hotel data, analyzes issues such as localized address formats and limited text similarity parsing, and presents a pattern‑matching and weighted scoring approach that improves aggregation accuracy across multiple countries.

Pattern Matchingalgorithm optimizationdata integration
0 likes · 7 min read
Optimizing International Hotel Data Aggregation Algorithms at Qunar
JD Tech Talk
JD Tech Talk
Nov 15, 2019 · Artificial Intelligence

Legal Case Similarity Competition at CCL 2019: Dataset, Task Transformation, and Model Solutions

The article reviews the CCL “Chinese Law Research Cup” similarity competition, describing the legal text dataset, converting the triple‑sample task to a binary similarity problem, outlining challenges such as long documents, and summarizing the BERT‑based Siamese, InferSent, and triplet‑loss models that achieved top‑10 results.

BERTCCL 2019Siamese network
0 likes · 8 min read
Legal Case Similarity Competition at CCL 2019: Dataset, Task Transformation, and Model Solutions
Tongcheng Travel Technology Center
Tongcheng Travel Technology Center
Nov 1, 2019 · Artificial Intelligence

Improving International Hotel Room‑Type Merging with Text Similarity and Machine‑Learning Models

This article describes how a large‑scale international hotel platform reduced room‑type merging errors and user complaints by applying rule‑based methods, text‑similarity algorithms (Jaccard, LCS, N‑Gram) and supervised machine‑learning classifiers such as fastText to standardize and merge heterogeneous room‑type data.

N-gramfastTexthotel
0 likes · 9 min read
Improving International Hotel Room‑Type Merging with Text Similarity and Machine‑Learning Models
Xianyu Technology
Xianyu Technology
Dec 12, 2018 · Big Data

Community Data Normalization Using Prefix Matching and Text Similarity

The study presents a four‑step pipeline that normalizes community data for rental platforms by clustering records using longest‑common‑prefix patterns, geographic filtering, Levenshtein similarity, and pattern‑based parent‑child assignment, achieving under 8 % false positives and 5 % false negatives.

ClusteringReal Estatedata normalization
0 likes · 10 min read
Community Data Normalization Using Prefix Matching and Text Similarity