Chapter_12_Elasticsearch_-_Distributed_Search_Engine
Chapter_12_Elasticsearch_-_Distributed_Search_Engine
Search Engine
Foreword
2 Huawei Confidential
Contents
1. Elasticsearch Overview
3 Huawei Confidential
Elasticsearch Overview
Elasticsearch is a high-performance Lucene-based full-text search service. It is a
distributed RESTful search and data analysis engine and can also be used as a
NoSQL database.
Lucene extension
Seamless switchover between the prototype environment and production
environment
Horizontal scaling
Support for structured and non-structured data
4 Huawei Confidential
Elasticsearch Features
High
Scalability Relevance Reliability
performance
The search results can Horizontal scaling is Searches results are Faults are
be obtained supported. sorted based on automatically detected,
immediately, and the Elasticsearch can run elements (from word hardware faults such
inverted index for full- on hundreds or frequency or proximate as and network
text search is thousands of servers. cause to popularity). segmentation, ensuring
implemented. The prototype the security and
environment and availability of your
production cluster (and data).
environment can be
seamlessly switched.
5 Huawei Confidential
Elasticsearch Application Scenarios
Elasticsearch is used for log search and analysis, spatiotemporal search, time
sequence search, and intelligent search.
Complex data types: Structured data, semi-structured data, and unstructured data
need to be queried. ElasticSearch can perform a series of operations such as
cleansing, word segmentation, and inverted index creation on the preceding data
types, and then provide the full-text search capability.
Diversified search criteria: Full-text search criteria contain words or phrases.
Write and read: The written data can be searched in real time.
6 Huawei Confidential
Elasticsearch Ecosystem
User access
layer
ELK/ELKB provides a
complete set of solutions.
Data access
layer
7 Huawei Confidential
Contents
1. Introduction to Elasticsearch
8 Huawei Confidential
Elasticsearch System Architecture
Obtain cluster
information.
ZooKeeper
Client cluster
Perform file indexing and
Update
search operations.
cluster
information.
Cluster
EsMaster EsNode1 ... EsNode9
Replica 0 Replica 0 Replicas Replica 1
Shard 1 Replica 1 Shards Shard 0
9 Huawei Confidential
Elasticsearch Internal Architecture
Elasticsearch provides RESTful
APIs or APIs related to other Restful Style API Java
languages (such as Java).
Transport
JMX
JMX
The cluster discovery Thrift HTTP
10 Huawei Confidential
Basic Concepts of Elasticsearch (1)
12 Huawei Confidential
Basic Concepts of Elasticsearch (2)
Cluster. Each cluster contains multiple nodes, one of which is the master node (the rest
Cluster are slave nodes). The master node can be elected.
Index shard. Elasticsearch splits a complete index into multiple shards and distributes
Shard them on different nodes.
Index replica. Elasticsearch allows you to set multiple replicas for an index. Replicas can
improve the fault tolerance of the system. When a shard on a node is damaged or lost,
Replica the data can be recovered from the replica. In addition, replicas can improve the search
efficiency of Elasticsearch by automatically balancing the load of search requests.
13 Huawei Confidential
Basic Concepts of Elasticsearch (3)
Data recovery or re-distribution. When a node is added to or deleted from the cluster,
Recovery Elasticsearch redistributes shards based on the load of the node. When a failed node is
restarted, data will be recovered.
Mode for storing Elasticsearch index snapshots. By default, Elasticsearch stores indexes in the
memory and only makes them persistent on the local hard disk when the memory is full. The
Gateway gateway stores index snapshots. When an Elasticsearch cluster is disabled and restarted, the
cluster reads the index backup data from the gateway. Elasticsearch supports multiple types
of gateways, including the default local file system, distributed file system, Hadoop HDFS,
and Amazon S3.
Interaction mode between an Elasticsearch internal node or cluster and the client. By default,
internal nodes use the TCP protocol for interaction. In addition, such transmission protocols
Transport (integrated using plugins) as the HTTP (JSON format), Thrift, Servlet, Memcached, and
ZeroMQ are also supported.
14 Huawei Confidential
Contents
1. Elasticsearch Overview
15 Huawei Confidential
ElasticSearch Inverted Index
Forward index: Values are
searched for based on keys. Jerry Doc1 ...
17 Huawei Confidential
Elasticsearch Access APIs
Elasticsearch can initiate RESTful requests to operate data. The request methods include GET, POST, PUT,
DELETE, and HEAD, which allow you to add, delete, modify, and query documents and indexes.
18 Huawei Confidential
Elasticsearch Routing Algorithm
Elasticsearch provides two routing algorithms:
Default route: shard=hash (routing)%number_of_primary_shards. In this routing
policy, the number of received shards is limited. During capacity expansion, the
number of shards needs to be multiplied (Elasticsearch 6.x). In addition, when
creating an index, you need to specify the capacity to be expanded in the future.
Elasticsearch 5.x does not support capacity expansion. Elasticsearch 7.x supports
expansion freely.
Custom route: In this routing mode, the routing can be specified to determine the
shard to which a document is written, or search for a specified shard.
19 Huawei Confidential
Elasticsearch Balancing Algorithm
Elasticsearch provides the automatic balancing function.
Application scenarios: capacity expansion, capacity reduction, and data import
The algorithms are as follows:
weight_index(node, index) = indexBalance * (node.numShards(index) -
avgShardsPerNode(index))
Weight_node(node, index) = shardBalance * (node.numShards() -
avgShardsPerNode)
weight(node, index) = weight_index(node, index) + weight_node(node, index)
20 Huawei Confidential
Elasticsearch Capacity Expansion
Scenarios:
High physical resource consumption such as high CPU and memory usage of
Elasticsearch service nodes, and insufficient disk space
Excessive index data volume for one Elasticsearch instance, such as 1 billion data
records or 1 TB data
Capacity expansion mode:
Add EsNode instances.
Add nodes with EsNode instances.
After capacity expansion, use the automatic balancing policy.
21 Huawei Confidential
Elasticsearch Capacity Reduction
Scenarios:
OS reinstallation on nodes required
Reduced amount of cluster data
Out-of-service
Precautions:
Ensure that replicas in the shard of the instance to be deleted exist in another instance.
Ensure that data in the shard of the instance to be deleted has been migrated to another
node.
22 Huawei Confidential
Elasticsearch Indexing HBase Data
When Elasticsearch indexes the HBase data,
job
the HBase data is written to HDFS and HBase2ES YARN
summit
Elasticsearch creates the corresponding Node Resource
Client Manager Manager
HBase index data. The index ID is mapped to
1.read
the rowkey of the HBase data, which 2.scan
3.write
ensures the unique mapping between each
index data record and HBase data and HBase Elasticsearch
Region
implements full-text search of the HBase HMaster EsNode 1 EsNode N
Server
data.
Batch indexing: For data that already exist in
HDFS
HBase, an MR task is submitted to read all
data in HBase, and then indexes are created NameNode DataNode
in Elasticsearch.
23 Huawei Confidential
Elasticsearch Multi-instance Deployment on a Node
Multiple Elasticsearch instances can be deployed on one node, and differentiated from
each other based on the IP address and port number. This method increases the usage
of the single-node CPU, memory, and disk, and improves the indexing and search
capability of Elasticsearch.
EsMaster EsMaster
24 Huawei Confidential
Elasticsearch Cross-node Replica Allocation Policy
When multiple instances are deployed on a single node with multiple replicas, if replicas
can only be allocated across instances, a single-point failure may occur. To solve this
problem, configure parameter cluster.routing.allocation.same_shard.host to true.
EsNode1 EsNode1
EsNode1
coll_shard_replica1 coll_shard_replica1
EsNode2 EsNode2
EsNode2
coll_shard_replica2 coll_shard_replica2
25 Huawei Confidential
New Features of Elasticsearch
HBase full-text indexing
After the HBase table and Elasticsearch indexes are mapped, indexes and raw data
can be stored in Elasticsearch and HBase, respectively. The HBase2ES tool is used for
offline indexing.
Encryption and authentication
Encryption and authentication are supported for a user to access Elasticsearch
through a security cluster.
26 Huawei Confidential
Quiz
B. Unstructured data
C. Semi-structured data
27 Huawei Confidential
Quiz
B. MongoDB
C. Memcached
D. Lucence
28 Huawei Confidential
Summary
29 Huawei Confidential
Recommendations
30 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.