ElasticSearch Interview Questions and Answers

In this tutorial, we would discuss about ElasticSearch Interview Questions and Answers for beginners as well as experienced.

Elasticsearch is a real-time and evenly distributed search engine that supports restful searching and analysis based upon the Apache Lucene full-text search engine. It has distributed and full-field real-time analytics storage. Along with Logstash and Kibana, Elasticsearch is widely used. In the field of Elasticsearch during the last few years, ’ maximum competition is created. Hence, it becomes mandatory to know the most common Elasticsearch interview questions if you are willing to build a career in this segment. Elasticsearch is broadly used by major platforms. Most important among them are Wikipedia, Netflix, IFTTT, Accenture, Hip chat, Fujitsu, Stack Overflow, and Medium

What is ElasticSearch?

Elasticsearch is an open-source, distributed, RESTful search and analytics engine designed for handling large amounts of data. It is based on Apache Lucene and provides a powerful search and analytics engine that can be used for a wide range of use cases, including full-text search, analytics, and logging.

What is an index in ElasticSearch?

An index is a collection of documents that have somewhat similar characteristics. For example, you can have an index for customer data, another index for a product catalog, and yet another index for order data. An index is identified by a name (that must be all lowercase) and this name is used to refer to the index when performing indexing, search, update, and delete operations against the documents in it.

In a single cluster, you can define as many indexes as you want.

What is Type (Mapping Type) in Index of ElasticSearch?

A type used to be a logical category/partition of your index to allow you to store different types of documents in the same index, e.g. one type for users, another type for blog posts.

It has been marked as deprecated and will no longer be possible to create multiple types in an index, and the whole concept of types will be removed in a ElasticSearch 7.x version.

What is the use of field level attributes- index and store?

The index is employed for searching. Indexed fields are transformed during analysis, and cannot retrieve the original data when necessary.

Store implies the data stored by Lucene, which will again return when necessary. Stored fields are not searchable.

What is an Analyzer in ElasticSearch?

While indexing data, it is transformed internally via the defined Analyzer for the index.

Analyzers are made of one Tokenizer, preceded by CharFilters and zero or many TokenFilters. On the other hand, analysis module refers Analyzers under the name of mapping definitions or any APIs.

Elasticsearch is prebuilt with analyzers that are ready to use. However, you can integrate the built in character, token filters, along with tokenizers to create custom analyzers.

What is Character Filter in Elasticsearch Analyzer?

A character filter obtains the ideal text as stream of characters, later on modifies it by adding, deleting, or altering characters. For example, any character filter in usage has the ability to convert Hindu-Arabic numerals (٠‎١٢٣٤٥٦٧٨‎٩‎) into Arabic-Latin numerals (0123456789), and even sometimes strip HTML elements via the stream.

What is Token filters in Elasticsearch Analyzer?

A token filter obtains the token stream, later on add, delete, or alter the tokens. For instance, a lowercase token filter modifies all tokens into lowercase, a stop token filter deletes stop words, and a synonym token filter includes synonyms into the token stream.

Token filters will be unable to change the position or character offsets of any certain token.

What is a Tokenizer?

Tokenizers break down a string into stream of tokens. A single tokenizer split the string into terms when working with punctuation and whitespace. Elasticsearch has a number of built in tokenizers which can be used to build custom analyzers.

 

What are the advantages of Elasticsearch?

  • Scalability: Elasticsearch is designed to scale horizontally and can handle large amounts of data and high search traffic.
  • Real-time search: Elasticsearch provides real-time search and analytics capabilities, making it ideal for use cases that require real-time data access.
  • Advanced search features: Elasticsearch supports advanced search features, such as full-text search, faceted search, and geospatial search.
  • Easy to use: Elasticsearch provides a simple and easy-to-use RESTful API, making it accessible to developers of all skill levels.

What is Elasticsearch REST API and use of it?

Elasticsearch provides a very comprehensive and powerful REST API that you can use to interact with your cluster. Among the few things that can be done with the API are as follows:

Check your cluster, node, and index health, status, and statistics

Administer your cluster, node, and index data and metadata

Perform CRUD (Create, Read, Update, and Delete) and search operations against your indexes

Execute advanced search operations viz. aggregations, filtering, paging, scripting, sorting, among many others

Does ElasticSearch have a schema?

Yes, Elasticsearch can have a schema. A schema is a description of one or more fields that describes the document type and how to handle the different fields of a document. The schema in Elasticsearch is a mapping that emphasizes the JSON document fields and other data type, as well as Lucene indexes under the hood. Because of this, in Elasticsearch terms, we usually call this schema a “mapping”.

What is a cluster in ElasticSearch?

Cluster is a collection of nodes that holds data together and enables indexing and search abilities across each. Each cluster is recognized by a unique default name i.e. “elasticsearch”. This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.

What is a node in ElasticSearch?

Node is a minute server and forms a part of the cluster. It stores the data and enjoys the clusters indexing and search functionalities.

What is Ingest Node in Elasticsearch?

Ingest nodes can execute pre-processing an ingest pipeline. It effectively transform and works on the document prior to indexing. Dedicated ingest nodes mark the master and data nodes either as false or true.

What is Elasticsearch Data Node?

Data nodes hold shards that handle indexed documents. They execute data related CRUD and search aggregation operations etc. Set node.data=true to make node as Data Node.

Data Node operations are I/O-, memory-, and CPU-intensive. Data nodes benefit the separation of the master and data roles.

What is Master Node and Master Eligible Node in Elasticsearch?

Master Node control cluster wide operations like to create or remove an index, track nodes of cluster, and decide to allocate shards on nodes. It is important for cluster health to have a stable master node. Master Node elected based on configuration properties node.master=true (Default).

Master Eligible Node decide based on below  configuration

discovery.zen.minimum_master_node : number (default 1)

and above number is decided based  on (master_eligible_nodes / 2) + 1

What is Tribe Node and Coordinating Node in Elasticsearch?

Tribe node connect variant clusters and execute search operations across each connected clusters. This node is configured by settings tribe.

Coordinating Node is just like a Smart Load balancer that handle master duties, to hold data, and pre-process documents, then you are left with a coordinating node that can only route requests, handle the search reduce phase, and distribute bulk indexing.

Every node can be termed as a coordinating node which has all three node.data, node.ingest and node.master, set to false. This node is impossible to disable as it possess enough memory and CPU to deal with the gather phase.

 

What is inverted index in Elasticsearch?

Inverted Index is backbone of Elasticsearch which make full-text search  fast.  Inverted index consists of a list of all unique words that occurs in  documents and for each word, maintain a list of documents number and positions in which it appears.

For Example: There are two documents and having content as:

1: FacingIssuesOnIT is for ELK.

2: If ELK check FacingIssuesOnIT.

To make inverted index each document will split in words (also called as terms or token) and create below sorted index .

 
Term Doc_1 Doc_2
FacingIssuesOnIT X X
Is X
For X
ELK X X
If X
Check X

Now when we do some full-text search for String will sort documents based on existence and occurrence of matching counts.

Usually in Books we have inverted indexes on last pages. Based on the word we can thus find the page on which the word exists.

What is a shard?

An index can potentially store a large amount of data that can exceed the hardware limits of a single node. For example, a single index of a billion documents taking up 1TB of disk space may not fit on the disk of a single node or may be too slow to serve search requests from a single node alone.

To solve this problem, Elasticsearch provides the ability to subdivide your index into multiple pieces called shards. When you create an index, you can simply define the number of shards that you want. Each shard is in itself a fully-functional and independent “index” that can be hosted on any node in the cluster.

Sharding is important for two primary reasons:

  • It allows you to horizontally split/scale your content volume
  • It allows you to distribute and parallelize operations across shards (potentially on multiple nodes) thus increasing performance/throughput

What is a replica?

Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short.

Replication is important for two primary reasons:

  • It provides high availability in case a shard/node fails. For this reason, it is important to note that a replica shard is never allocated on the same node as the original/primary shard that it was copied from.
  • It allows you to scale out your search volume/throughput since searches can be executed on all replicas in parallel.

What is a document in ElasticSearch?

A document is a basic unit of information that can be indexed. For example, you can have a document for a single customer, another document for a single product, and yet another for a single order. This document is expressed in JSON (JavaScript Object Notation) which is a ubiquitous internet data interchange format.

Within an index/type, you can store as many documents as you want. Note that although a document physically resides in an index, a document actually must be indexed/assigned to a type inside an index.

What are the basic operations you can perform on a document?

The following operations can be performed on documents

INDEXING A DOCUMENT USING ELASTICSEARCH.

FETCHING DOCUMENTS USING ELASTICSEARCH.

UPDATING DOCUMENTS USING ELASTICSEARCH.

DELETING DOCUMENTS USING ELASTICSEARCH.

What is the difference between a query and a filter in Elasticsearch?

A query in Elasticsearch is used to retrieve documents that match a specific set of criteria. Queries are used to search for specific data within the index. A filter, on the other hand, is used to limit the results of a search. Filters are used to narrow down the results based on specific conditions.

Can you explain how updates and deletes work in Elasticsearch?

In Elasticsearch, updates and deletes are performed by reindexing the document. When a document is updated, Elasticsearch removes the original document from the index and replaces it with a new, updated version. When a document is deleted, Elasticsearch removes it from the index completely.

 

Check out more Interview Questions

Big Data – Hadoop Interview Questions
Python Interview Questions & Answers
Python Programming & Data Analysis Interview Questions and Answers
Share this:

One thought on “ElasticSearch Interview Questions and Answers

Leave a Reply

Your email address will not be published. Required fields are marked *