Big Data 31 min read

Elasticsearch and Logstash Tutorial: Installation, Configuration, and Flight Data Import

This tutorial explains how to install and configure Elasticsearch and Kibana, demonstrates CRUD operations, bulk data import, and shows how to use Logstash to ingest, transform, and index flight JSON data, covering both batch and near‑real‑time processing techniques.

Architect
Architect
Architect
Elasticsearch and Logstash Tutorial: Installation, Configuration, and Flight Data Import

This article provides a comprehensive guide to building a data reduction system for flight data using Elasticsearch, Kibana, and Logstash.

It begins with an overview of the four V's of big data—volume, velocity, variety, and veracity—and introduces common tools for data processing, transmission, and storage such as Storm, Kafka, and ElasticSearch.

Installation steps for Elasticsearch are detailed, including directory structure, starting the service, and verifying cluster health. Key configuration files like config/elasticsearch.yml are mentioned, and the REST API endpoints for CRUD operations (PUT, GET, POST, DELETE) are illustrated with example commands and responses.

curl -X GET "localhost:9200/flight/_doc/1?pretty"

Bulk data import is covered, showing how to format JSON files for the _bulk API, the required application/x-ndjson content type, and example bulk request syntax.

POST /flight/_bulk
{ "index": { "_id": 4800770 } }
{ "Icao": "A0835D", "Alt": 2400, "Lat": 39.984322, "Long": -82.925616 }

Java utilities for converting raw flight JSON files into the bulk format are provided, with code snippets that read files, insert index lines, and remove trailing commas.

package com.jgc;
public class JsonFlightFileConverter {
    public static void main(String[] args) { /* conversion logic */ }
}

The guide then shifts to Logstash, describing its role as a data collection engine, installation options, and basic test command.

bin/logstash -e 'input { stdin { } } output { stdout {} }'

A full Logstash pipeline configuration is presented, including a file input that reads test.json , a JSON filter to parse the message field, a mutate filter to clean up unwanted fields, and outputs to Elasticsearch, a file, and the console.

input {
  file { path => "/path/to/test.json" codec => "json" }
}
filter {
  json { source => "message" }
  mutate { remove_field => ["path", "@version", "@timestamp", "host", "message"] }
}
output {
  elasticsearch { hosts => ["http://localhost:9200"] index => "flight-logstash" document_id => "%{[@metadata][Id]}" }
  file { path => "/path/to/output.json" }
  stdout { codec => rubydebug }
}

Additional Logstash filter examples (gsub for cleaning timestamps and date for converting UNIX_MS to ISO8601) are shown to transform the FSeen field into a readable date.

mutate { gsub => ["FSeen", "\/Date\(", "", "FSeen", "\)\/", ""] }
date { timezone => "UTC" match => ["FSeen", "UNIX_MS"] target => "FSeen" }

Finally, the article discusses when Logstash is appropriate for batch versus real‑time processing and points to further resources for scaling Logstash deployments.

Javareal-time processingElasticsearchJSONLogstashData IngestionBulk API
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.