當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

01.search_api_综述

發(fā)布時(shí)間：2024/2/28 编程问答 19 豆豆

生活随笔收集整理的這篇文章主要介紹了 01.search_api_综述小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

文章目錄

- 1. Search API 簡(jiǎn)介
- - 1. Routing
- 2. es選擇replica 的規(guī)則
- 3. Stats Groups
- 3. Global Search Timeout
- 4. Search Cancellation
- 5. Search concurrency and parallelism
- 6. search API 的多個(gè)index查詢

1. Search API 簡(jiǎn)介

Most search APIs are multi-index, with the exception of the Explain API endpoints.
除了使用explain功能，大部分的search api都支持多個(gè)索引

1. Routing

執(zhí)行搜索時(shí)，Elasticsearch將根據(jù)自適應(yīng)副本選擇公式選擇數(shù)據(jù)的“最佳”副本。也可以通過提供路由參數(shù)來控制要搜索哪些分片。例如，在為推特編制索引時(shí)，路由值可以是用戶名

POST /twitter/_doc?routing=kimchy {"user" : "kimchy","post_date" : "2009-11-15T14:12:12","message" : "trying out Elasticsearch" }

這種使用情況是一般我們只根據(jù)用戶名來識(shí)別用戶，那么就可以使用這種方式讓請(qǐng)求只路由到相關(guān)的shard上面來加速查詢過程。

POST /twitter/_search?routing=kimchy {"query": {"bool" : {"must" : {"query_string" : {"query" : "some query string here"}},"filter" : {"term" : { "user" : "kimchy" }}}} }

routing 參數(shù)可以是一個(gè)分割的string數(shù)組

2. es選擇replica 的規(guī)則

默認(rèn)情況下es會(huì)選擇自適應(yīng)的replica選擇方式，coordinate node 選擇某個(gè)target node上的shard來轉(zhuǎn)發(fā)請(qǐng)求一般基于以下幾個(gè)方面的因素

在之前的請(qǐng)求中coordiante和對(duì)應(yīng)的target node的耗時(shí)

對(duì)應(yīng)的node執(zhí)行search請(qǐng)求的耗時(shí)（不包括coordiante node 和target node之前的請(qǐng)求傳遞的耗時(shí)）

對(duì)應(yīng)的target node上的threadpool 堆積的請(qǐng)求

這個(gè)策略可以使用以下方式關(guān)閉

PUT /_cluster/settings {"transient": {"cluster.routing.use_adaptive_replica_selection": false} }

在關(guān)閉以后，es就使用round robin的方式來輪詢請(qǐng)求（所有有data的shard的primary+replica）

If adaptive replica selection is turned off, searches are sent to the index/indices shards in a round robin fashion between all copies of the data (primaries and replicas).

3. Stats Groups

A search can be associated with stats groups, which maintains a statistics aggregation per group. It can later be retrieved using the indices stats API specifically. For example, here is a search body request that associate the request with two different groups:

POST /_search {"query" : {"match_all" : {}},"stats" : ["group1", "group2"] }

3. Global Search Timeout

單個(gè)的search可以在request body中設(shè)置timeout。因?yàn)閟earch可以來自很多源，所以es具有一個(gè)動(dòng)態(tài)的痊愈的search timeout 設(shè)置。在超過一定的時(shí)候之后，request會(huì)被cancelled。cancel的機(jī)制可以在下一個(gè)小節(jié)設(shè)置。

個(gè)別搜索在請(qǐng)求正文搜索中可能會(huì)超時(shí)。由于搜索請(qǐng)求可以源自許多來源，因此Elasticsearch具有全局搜索超時(shí)的動(dòng)態(tài)集群級(jí)別設(shè)置，該設(shè)置適用于未在請(qǐng)求主體中設(shè)置超時(shí)的所有搜索請(qǐng)求。這些請(qǐng)求將在指定時(shí)間后使用以下有關(guān)搜索取消的部分中所述的機(jī)制取消。因此，有關(guān)超時(shí)響應(yīng)性的相同警告也適用。
可以使用 Cluster Update Settings API 對(duì)search.default_search_timeout進(jìn)行設(shè)置。

Individual searches can have a timeout as part of the Request Body Search. Since search requests can originate from many sources, Elasticsearch has a dynamic cluster-level setting for a global search timeout that applies to all search requests that do not set a timeout in the request body. These requests will be cancelled after the specified time using the mechanism described in the following section on Search Cancellation. Therefore the same caveats about timeout responsiveness apply.

The setting key is search.default_search_timeout and can be set using the Cluster Update Settings endpoints. The default value is no global timeout. Setting this value to -1 resets the global search timeout to no timeout.

4. Search Cancellation

可以使用標(biāo)準(zhǔn)任務(wù)取消機(jī)制來取消搜索。默認(rèn)情況下，運(yùn)行中的搜索超時(shí)檢查僅檢查僅在segment處理完之后才會(huì)發(fā)生,也就是檢查的最小粒度是segment,所以cancel可以會(huì)因?yàn)橛龅奖容^大的segment而產(chǎn)生延遲。可以通過將動(dòng)態(tài)cluster設(shè)置search.low_level_cancellation設(shè)置為true來提高搜索cacel的響應(yīng)性。但是，它會(huì)導(dǎo)致更頻繁的取消檢查從而產(chǎn)生額外開銷，這在大型快速運(yùn)行的搜索查詢中會(huì)很明顯。

5. Search concurrency and parallelism

默認(rèn)情況下，Elasticsearch不會(huì)根據(jù)請(qǐng)求命中的分片數(shù)量拒絕任何搜索請(qǐng)求。盡管Elasticsearch將優(yōu)化協(xié)調(diào)節(jié)點(diǎn)上的搜索執(zhí)行，但大量shard可能會(huì)對(duì)CPU和內(nèi)存方面產(chǎn)生重大影響。通常，最好以較少的比較大的shard來組織數(shù)據(jù)。如果您想配置軟限制，則可以更新action.search.shard_count.limit群集設(shè)置，以拒絕命中太多shard的搜索請(qǐng)求。

By default Elasticsearch doesn’t reject any search requests based on the number of shards the request hits. While Elasticsearch will optimize the search execution on the coordinating node a large number of shards can have a significant impact CPU and memory wise. It is usually a better idea to organize data in such a way that there are fewer larger shards. In case you would like to configure a soft limit, you can update the action.search.shard_count.limit cluster setting in order to reject search requests that hit too many shards.

請(qǐng)求參數(shù)max_concurrent_shard_requests可用于控制搜索API將針對(duì)該請(qǐng)求的每個(gè)node可以執(zhí)行的并發(fā)分片請(qǐng)求的最大數(shù)量。此參數(shù)應(yīng)用于保護(hù)單個(gè)請(qǐng)求以防止集群過載（例如，默認(rèn)請(qǐng)求將命中集群中的所有索引，如果每個(gè)節(jié)點(diǎn)的分片數(shù)量很高，則可能導(dǎo)致分片請(qǐng)求被拒絕）。該默認(rèn)值為5。

The request parameter max_concurrent_shard_requests can be used to control the maximum number of concurrent shard requests the search API will execute per node for the request. This parameter should be used to protect a single request from overloading a cluster (e.g., a default request will hit all indices in a cluster which could cause shard request rejections if the number of shards per node is high). This default value is 5.

6. search API 的多個(gè)index查詢

GET /twitter/_search?q=user:kimchy GET /kimchy,elasticsearch/_search?q=tag:wow GET /_all/_search?q=tag:wow

總結(jié)

以上是生活随笔為你收集整理的01.search_api_综述的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： 15.concurrent-contro
下一篇： 02.uri-search