Working with databases is very fun but can sometimes be challenging, especially when dealing with already-existing data.
For example, if you want to change the type of a specific field, it might require you to take the service down, which can have grave repercussions, especially in services that process large amounts of data.
Fortunately, we can use Elasticsearch’s powerful features such as Reindexing, ingest nodes, pipelines, and processors to make such tasks very easy.
This tutorial will show you how to change a field type in a specific index to another, using Elasticsearch Ingest nodes. Using this approach will eliminate downtime that affects services while still managing to perform the field type change tasks.
Introduction to Ingest Nodes
Elasticsearch’s ingest node allows you to pre-process documents before their indexing.
An Elasticsearch node is a specific instance of Elasticsearch; connected nodes (more than one) make a single cluster.
You can view the nodes available in the running cluster with the request:
The cURL command for this is:
Executing this command should give you massive information about the nodes, as shown below (truncated output):
"_nodes" : {
"total" : 3,
"successful" : 3,
"failed" : 0
},
"cluster_name" : "22e0bee6ef91461d82d9b0f1b4b13b4a",
"nodes" : {
"gSlMjTKyTemoOX-EO7Em4w" : {
"name" : "instance-0000000003",
"transport_address" : "172.28.86.133:19925",
"host" : "172.28.86.133",
"ip" : "172.28.86.133",
"version" : "7.10.2",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
"total_indexing_buffer" : 214748364,
"roles" : [
"data",
"data_cold",
"data_content",
"data_hot",
"data_warm",
"ingest",
"master",
"remote_cluster_client",
“transform”
],
"attributes" : {
"logical_availability_zone" : "zone-0",
"server_name" : "instance-0000000003.22e0bee6ef91461d82d9b0f1b4b13b4a",
"availability_zone" : "us-west-1c",
"xpack.installed" : "true",
"instance_configuration" : "aws.data.highio.i3",
"transform.node" : "true",
"region" : "us-west-1"
},
"settings" : {
"s3" : {
"client" : {
"elastic-internal-22e0be" : {
"endpoint" : "s3-us-west-1.amazonaws.com"
}
}
},
--------------------------------output truncated---------------------
By default, all Elasticsearch nodes enable ingest and are capable of handling ingest operations. However, for heavy ingest operations, you can create a single node dedicated to ingesting only.
To handle pre_process, before indexing the documents, we need to define a pipeline that states the preprocessors series.
Preprocessors are sets of instructions wrapped around a pipeline and are executed one at a time.
The following is the general syntax of how to define a pipeline:
"description" : "Convert me",
"processors" : [{
"convert" : {
"field" : "id",
"type": "integer"
} ]
}
The description property says what the pipeline should achieve. The next parameter is the preprocessors, passed on as a list in the order of their execution.
Create a Convert Pipeline
To create a pipeline that we will use to convert a type, use the PUT request with the _ingest API endpoint as:
{
“description”: “converts the field dayOfWeek field to a long from integer”,
"processors" : [
{
"convert" : {
"field" : "dayOfWeek",
"type": "long"
}
}
]
}
For cURL, use the command:
Reindex and Convert Type
Once we have the pipeline in the ingest node, all we need to do is call the indexing API and pass the pipeline as an argument in the dest of the request body as:
{
“source”: {
"index": "kibana_sample_data_flights"
},
"dest": {
"index": "kibana_sample_type_diff",
"pipeline": "convert_pipeline"
}
}
For cURL:
Verify Conversion
To verify that the pipeline has applied correctly, use the GET request to fetch that specific field as:
GET /kibana_sample_type_diff/_mapping/field/dayOfWeek
This should return the data as:
{
"kibana_sample_data_flights" : {
"mappings" : {
"dayOfWeek" : {
"full_name" : "dayOfWeek",
"mapping" : {
"dayOfWeek" : {
"type" : "integer"
}
}
}
}
}
}
-------------------------REINDEXED DATA-------------------------------
{
"kibana_sample_type_diff" : {
"mappings" : {
"dayOfWeek" : {
"full_name" : "dayOfWeek",
"mapping" : {
"dayOfWeek" : {
"type" : "long"
}
}
}
}
}
}
Conclusion
In this guide, we have looked at how to work with Elasticsearch Ingest nodes to pre-process documents before indexing, thus converting a field from one type to another.
Consider the documentation to learn more.
https://www.elastic.co/guide/en/elasticsearch/reference/master/ingest.html