How Product Data is Pushed to Elasticsearch from Magento

To serve data from elastic search we first need to push data to elastic search, This is done through the catalog search fulltext indexing.

Lets explore more in to this through the below sections.

Identity the Relevant Products to Push to Elasticsearch

The products that are enabled, visibility = catalog search will be pushed to elastic search. Each product data will be stored in the elastic search as a document.

A sample product data would be like the following

In the case of configurable product, The data from the child product is also merged and then pushed to elastic search. For ex if we have the below configurable product with 3 children where each child has a separate barcode.

Configurable ProductChild ProductElasticsearch Data
Name : Shirt
Barcode : <null>
Name : Red Shirt
Barcode : 10001
-----------------
Name : Blue Shirt
Barcode : 10002
------------------
Name : Green Shirt
Barcode : 10002
{
"Name" : [
Shirt
Red Shirt
Blue Shirt
Green Shirt
]
"Barcode" : [
10001
10002
10003
]
}

Reference :

  • \Magento\CatalogSearch\Model\Indexer\Fulltext\Action\Full::rebuildStoreIndex
  • \Magento\CatalogSearch\Model\Indexer\Fulltext\Action\DataProvider::getSearchableProducts
    • Fetches all the products that needs to be send to elastic search

Generate Elasticsearch Index Name

The first step in creating the elastic search index is to generate an index name. Magento creates a separate elastic search Indices for each store view.  For ex if we have 4 store views then 4 indices will be created as show below.

store_idcodewebsite_idnameES Index Name
1kwt_en1Kuwait English Storecommerce_product_1_v15
commerce_product_1_v16
2kwt_ar1Kuwait Arabic Store commerce_product_2_v11
3ind_en2India English Storecommerce_product_3_v11
4gbr_en3UK Eng Store Viewcommerce_product_4_v11

At times we will see there are multiple index available for a store view, In the below table for Kuwait English Store we could see two elastic search indices are created which we will discuss later.

The Elasticsearch Index pattern “commerce_product_1_v15” contains the below parts

  • commerce à Index Prefix which is configurable
  • product àRepresent Product Index
  • [1|2|3..] à Store ID
  • [v15 | v11] à A random version no given for each index name.

The below steps are involved to identify the index name for a store view.

For ex lets consider we are reindexing Kuwait English Store

  • Identify the primary index of Kuwait English Store, The primary index of a store can be identified using the elastic search alias which we will see in the next section.
  • Remove all the unwanted or orphaned index of Kuwait English Store apart from the primary index.
    • In our case, lets assume we have two indices commerce_product_1_v15 and commerce_product_1_v16, where commerce_product_1_v16 is the primary index so commerce_product_1_v15 will get deleted.
  • Now do +1 to the current primary index version so our new index name will be commerce_product_1_v17

Code Reference

  • \Magento\CatalogSearch\Model\Indexer\Fulltext::executeByDimensions
    • Indexing Process Starts Here
  • \Magento\Elasticsearch\Model\Indexer\IndexerHandler::cleanIndex
    • Delete the old unwanted Index and Create the New Index
  • \Magento\Elasticsearch7\Model\Client\Elasticsearch::existsAlias
    • Check if alias exists

Create the Index

Once the index name is identified, we can not create the new index.

Code Reference :

  • \Magento\Elasticsearch\Model\Indexer\IndexerHandler::saveIndex
Index Setting

While creating the index, Index level settings is also set such as filter setting, analyser, mapping fields count etc.

Request : PUT http://localhost:9200/commerce_product_1_v17

Body :

Response : 
{
"acknowledged":true,
"shards_acknowledged":true,
"index":"commerce_product_1_v17"
}

The response indicates that we have created the new index commerce_product_1_v17 along with the settings.

Code Reference:

  • \Magento\Elasticsearch\Model\Adapter\Elasticsearch::prepareIndex
    • Create new index with mapping.
Analyzer

The analyzer parameter specifies the analyzer used for text analysis when indexing or searching a text field.

An analyzer  — whether built-in or custom — is just a package which contains three lower-level building blocks: character filterstokenizers, and token filters.

Tokenizer

A tokenizer receives a stream of characters, breaks it up into individual tokens (usually individual words), and outputs a stream of tokens. For instance, a whitespace tokenizer breaks text into tokens whenever it sees any whitespace. It would convert the text "Quick brown fox!" into the terms [Quick, brown, fox!].

Character filters

A character filter receives the original text as a stream of characters and can transform the stream by adding, removing, or changing characters

Token filter

A token filter receives the token stream and may add, remove, or change tokens. For example, a lowercase token filter converts all tokens to lowercase, a stop token filter removes common words (stop words) like the from the token stream, and a synonym token filter introduces synonyms into the token stream.

Code Reference

  • \Magento\Elasticsearch\Model\Adapter\Index\Builder::build
    • Set the analyzer setting

Create Index Mapping

Before we start pushing the data to the index, we need to create the index mapping.

The mapping will be created for each and every product attribute that is available in the store.

Depending on the product attribute type the mapping type is set. Below are the few sample mapping

Searchable Product Attributes.

All the product attributes that is set to use in Search will be pushed to elastic search.

And based on the attribute input type the relevant mappings are set.

For ex, Lets have a look at the below product data.

NameSkuUrl KeyDescription

The copy_to parameter allows you to copy the values of multiple fields into a group field, which can then be queried as a single field.

So while searching for a record either in listing page or graphql etc, magento appends the below match query automatically.


So basically instead of searching in each field we search in _search field as all the values are copied here.

For the name attribute we see fields.sort_[attrbutecode], This is because name attribute has the sorting setting enabled.

In the case of description, we dont have the keyword mapping which means we cannot use the eq operator

But we can use the match operator

i.e For all the attributes which doesn't contains the keyword mapping we cant use eq operator to search.

Code Reference:

  • \Magento\Elasticsearch\Model\Adapter\FieldMapper\Product\CompositeFieldProvider::getFields
Based on the Product Attribute properties the mappings are set, static or dynamic
  • \Magento\Elasticsearch\Model\Adapter\FieldMapper\Product\FieldProvider\StaticField::getField

  • \Magento\Elasticsearch7\Model\Client\Elasticsearch::addFieldsMapping
    • Set the mapping for each attribute
  • \Magento\Elasticsearch7\Model\Client\Elasticsearch::applyFieldsMappingPreprocessors

In case if we need to customize mapping we can do here,

<type name="Magento\Elasticsearch7\Model\Client\Elasticsearch">
    <arguments>
        <argument name="fieldsMappingPreprocessors" xsi:type="array">
            <item name="elasticsearch7_nested_type_field_mapping" xsi:type="object">YourModule\NestedFieldMapping</item>
        </argument>
    </arguments>
</type>

  •  \Elasticsearch\Namespaces\IndicesNamespace::putMapping
    • Save the Mapping
Filterable Product Attributes

All the filterable attributes will have both text and keyword mapping represented in 2 fields attrcode & attrcode_value

MappingProduct Data
 
Non-Indexed Attributes
MappingProduct Data
We don’t push these kind of data to elastic search, But while creating the mapping still set these field and made is as index false
Sorting Fields

There are certain fields that is used for sorting such as Position, Product Name and Price.

For the purpose of position sorting position_category_[category_id] mapping is created. This mapping field is created for all the available store categories.

 Mapping Product Data
   
  

While fetching the products from the es we issue the sorting query as the follows

Elastic search Alias

An alias is a secondary name for a single index or a multiple indices. Magento create a alias for each store view and index (the one with version no) is assigned to the alias.

 In our case we always assign one index (the primary index) to the alias as shown below.

The data from the elastic search is always fetched using the alias. ex

http://localhost:9200/commerce_product_1/document/_search?

While full reindexing, Magento will not touch the current primary index that is assigned to the alias instead it will create a new index and push the data to the new index. Once all the data is pushed to the new index, The alias is switched to the new index and the old index is deleted.

Index Processing Summary

The Process of creating the elasticsearch indices for a store involves the below steps, For ex lets consider we are reindexing Kuwait English Store

  • First remove all the unwanted index apart from the primary index of the respective store
    • In our case, lets assume we have two indices commerce_product_1_v15 and commerce_product_1_v16, where commerce_product_1_v16 is the primary index so commerce_product_1_v15 will get deleted.
  • Now create a new elastic search indices (commerce_product_1_v17) along with the index settings and mapping.
  • Push the data to the new index.
  • Once all the data is pushed to this new indices, This new indices will be set as the primary indices and the old one commerce_product_1_v16 will get removed.
  • So while full reindexing, The data to the frontend will be served without any interruption from the  current primary index. On the completing of the  reindex process the new indices will be set as the new primary index while the old one get deleted.

Leave a Reply

Your email address will not be published.