Implementing File Upload and Text Extraction with Elasticsearch Ingest Attachment Plugin in Spring Boot
This tutorial explains how to let users upload PDF, Word, or TXT files, install the Elasticsearch Ingest Attachment Processor Plugin, create an ingest pipeline and index mapping, convert files to Base64, and perform fuzzy searches with highlighted results using Spring Boot and Java code examples.
Hello everyone, I'm Chen, the author.
The product requires a feature that lets users upload PDF, WORD, or TXT files, perform fuzzy search by file name or content, and view the content online.
Environment
Project development environment:
Backend management system: Spring Boot + MyBatis-Plus + MySQL + Elasticsearch
Search engine: Elasticsearch 7.9.3 with Kibana UI
Implementation Steps
1. Set up environment
Elasticsearch and Kibana installation is omitted; ensure the Java Elasticsearch client version matches the ES version.
2. File content recognition
Install the Ingest Attachment Processor Plugin to extract text from attachments.
elasticsearch-plugin install ingest-attachmentWhen using Docker, install the plugin inside the container:
[root@... ]# docker exec -it es bash
...
elasticsearch-plugin install ingest-attachment
...After installation, restart Elasticsearch.
3. Create an ingest pipeline
{
"description": "Extract attachment information",
"processors": [
{
"attachment": {
"field": "content",
"ignore_missing": true
}
},
{
"remove": {
"field": "content"
}
}
]
}4. Define the index mapping
{
"mappings": {
"properties": {
"id": {"type": "keyword"},
"fileName": {"type": "text", "analyzer": "my_ana"},
"contentType": {"type": "text", "analyzer": "my_ana"},
"fileUrl": {"type": "text"},
"attachment": {
"properties": {
"content": {"type": "text", "analyzer": "my_ana"}
}
}
}
},
"settings": {
"analysis": {
"filter": {
"jieba_stop": {"type": "stop", "stopwords_path": "stopword/stopwords.txt"},
"jieba_synonym": {"type": "synonym", "synonyms_path": "synonym/synonyms.txt"}
},
"analyzer": {
"my_ana": {
"tokenizer": "jieba_index",
"filter": ["lowercase", "jieba_stop", "jieba_synonym"]
}
}
}
}
}Note: The searchable field is attachment.content and must be analyzed.
5. Test indexing
{
"id":"1",
"name":"Imported Red Wine",
"filetype":"pdf",
"contenttype":"article",
"content":"Article content"
}Convert the file to Base64 before sending (e.g., https://www.zhangxinxu.com/sp/base64.html).
Code
Key configuration and implementation files are shown below.
application.yml
# Data source configuration
spring:
devtools:
restart:
enabled: true
elasticsearch:
rest:
url: 127.0.0.1
uris: 127.0.0.1:9200
connection-timeout: 1000
read-timeout: 3000
username: elastic
password: 123456ElasticsearchConfig.java
package com.yj.rselasticsearch.domain.config;
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.time.Duration;
@Configuration
public class ElasticsearchConfig {
@Value("${spring.elasticsearch.rest.url}")
private String edUrl;
@Value("${spring.elasticsearch.rest.username}")
private String userName;
@Value("${spring.elasticsearch.rest.password}")
private String password;
@Bean
public RestHighLevelClient restHighLevelClient() {
final BasicCredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(userName, password));
RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
new HttpHost(edUrl, 9200, "http"))
.setHttpClientConfigCallback(httpClientBuilder -> {
httpClientBuilder.disableAuthCaching();
httpClientBuilder.setKeepAliveStrategy((response, context) -> Duration.ofMinutes(5).toMillis());
return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
}));
return client;
}
}FileInfo entity
package com.yj.common.core.domain.entity;
import com.baomidou.mybatisplus.annotation.TableField;
import com.yj.common.core.domain.BaseEntity;
import lombok.Data;
import lombok.EqualsAndHashCode;
import lombok.Getter;
import lombok.Setter;
import org.springframework.data.elasticsearch.annotations.Document;
import org.springframework.data.elasticsearch.annotations.Field;
import org.springframework.data.elasticsearch.annotations.FieldType;
import java.util.Date;
@Setter
@Getter
@Document(indexName = "fileinfo", createIndex = false)
public class FileInfo {
@Field(name = "id", type = FieldType.Integer)
private Integer id;
@Field(name = "fileName", type = FieldType.Text, analyzer = "jieba_index", searchAnalyzer = "jieba_index")
private String fileName;
@Field(name = "fileType", type = FieldType.Keyword)
private String fileType;
@Field(name = "contentType", type = FieldType.Text)
private String contentType;
@Field(name = "attachment.content", type = FieldType.Text, analyzer = "jieba_index", searchAnalyzer = "jieba_index")
@TableField(exist = false)
private String content;
@Field(name = "fileUrl", type = FieldType.Text)
private String fileUrl;
private Date createTime;
private Date updateTime;
}FileInfoController.java
package com.yj.rselasticsearch.controller;
import com.yj.common.core.controller.BaseController;
import com.yj.common.core.domain.AjaxResult;
import com.yj.rselasticsearch.service.FileInfoService;
import org.springframework.web.bind.annotation.*;
import org.springframework.web.multipart.MultipartFile;
import javax.annotation.Resource;
@RestController
@RequestMapping("/fileInfo")
public class FileInfoController extends BaseController {
@Resource
private FileInfoService fileInfoService;
@PutMapping("uploadFile")
public AjaxResult uploadFile(String contentType, MultipartFile file) {
return fileInfoService.uploadFileInfo(contentType, file);
}
}FileInfoServiceImpl.java (excerpt)
package com.yj.rselasticsearch.service.impl;
import com.alibaba.fastjson.JSON;
import com.yj.common.core.domain.AjaxResult;
import com.yj.rselasticsearch.mapper.FileInfoMapper;
import com.yj.rselasticsearch.service.FileInfoService;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
import org.springframework.stereotype.Service;
import org.springframework.web.multipart.MultipartFile;
import javax.annotation.Resource;
import java.io.File;
import java.util.Base64;
@Service
public class FileInfoServiceImpl implements FileInfoService {
@Resource
private FileInfoMapper fileInfoMapper;
@Resource
private RestHighLevelClient client;
@Override
public AjaxResult uploadFileInfo(String contentType, MultipartFile file) {
// Upload file, convert to Base64, index into ES with pipeline "attachment"
// (implementation omitted for brevity)
return AjaxResult.success();
}
private byte[] getContent(File file) throws IOException {
// read file bytes
}
}ElasticsearchServiceImpl.java (highlight search excerpt)
// Methods getAssociationalWordOther and queryHighLightWordOther implement fuzzy search
// with highlighted results using NativeSearchQueryBuilder and ElasticsearchRestTemplate.Testing request and response JSON demonstrate fuzzy search with highlighted keywords.
Conclusion
If this tutorial helped you, please like, share, and follow the author. Additional resources and a paid knowledge community are advertised.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.