ElasticSearch From Bottom Up
Elasticsearch is built on top of the Apache Lucene search library, and it inherits much of Lucene’s architecture. Here is a brief overview of the architecture of Elasticsearch and Lucene:
Lucene Architecture:
Lucene is a Java-based, open-source search library that provides full-text search capabilities. It is a low-level library that does not provide a user interface or distributed architecture. Lucene is designed to work with various data sources and provides a robust API for indexing and searching data.
Lucene consists of several key components, including:
- Index: Lucene stores indexed data in an inverted index, which maps terms to the documents that contain them.
- Analyzer: Lucene uses an analyzer to break down documents into individual terms and then index them.
- Query: Lucene provides a rich query API for constructing complex search queries.
- Scoring: Lucene uses a scoring algorithm to rank search results based on relevance.
Elasticsearch Architecture:
Elasticsearch builds on top of Lucene to provide a distributed, scalable search and analytics engine. Elasticsearch consists of several key components, including:
- Node: An Elasticsearch node is an instance of Elasticsearch that stores data and participates in the cluster.
- Cluster: An Elasticsearch cluster is a group of nodes that work together to provide a distributed search and analytics engine.
- Index: Elasticsearch stores data in an index, a collection of documents that share a similar structure.
- Shard: Elasticsearch uses sharding to split an index into smaller parts that can be distributed across multiple nodes.
- Query: Elasticsearch provides a query API similar to Lucene’s, but it also offers additional search capabilities such as faceting, aggregation, and geospatial search.
- REST API: Elasticsearch provides a RESTful API for interacting with the cluster.
- Plugins: Elasticsearch supports plugins that can extend its functionality, including plugins for monitoring, security, and machine learning.
Overall, Elasticsearch’s architecture is designed to provide a distributed, scalable search and analytics engine that can handle large volumes of data in real time. By building on top of Lucene, Elasticsearch inherits its powerful search capabilities while adding additional features and a distributed architecture.
This talk will teach you about Elasticsearch and Lucene’s architecture.
The essential data structure in search is the robust inverted index, which is simple to understand. We start there and ascend through abstraction layers to get an overview of how a distributed search cluster processes searches and changes.