sphinx搜索结果按权重排序php,Sphinx Ranking Mode(排序模式) (翻译)
渣渣英文水平,莫要介意
Search results ranking(搜索結果排序)
Ranking overview(概覽)
Ranking (aka weighting) of the search results can be defined as a
process of computing a so-called relevance (aka weight) for every
given matched document with regards to a given query that matched it.
So relevance is in the end just a number attached to every document
that estimates how relevant the document is to the query. Search
results can then be sorted based on this number and/or some additional
parameters, so that the most sought after results would come up higher
on the results page.
排序(又名加權),是基于請求匹配到的結果,計算所謂的相關性(又名權重)的一個程序。 相關性是請求結束后被附加在文檔結果中的一個估算出來的數值,表示匹配的文檔于請求的關鍵詞相關的程度,然后搜索的結果就能基于這個數值和其他的一些附加的參數進行排序,這樣大多數相關的結果就能排在前面。
There is no single standard one-size-fits-all way to rank any document
in any scenario. Moreover, there can not ever be such a way, because
relevance is subjective. As in, what seems relevant to you might not
seem relevant to me. Hence, in general case it's not just hard to
compute, it's theoretically impossible.
對排序來說,在任何場景中都沒有適應所有的情況的標準,甚至可以說不可能有這種標準,因為相關性是一種很主觀的東西,比如,對你來說相關性很強,對我來說卻沒有。因而一般很難去計算,理論上是不可能的。
So ranking in Sphinx is configurable. It has a of a so-called . A
ranker can formally be defined as a function that takes document and
query as its input and produces a relevance value as output. In
layman's terms, a ranker controls exactly how (using which specific
algorithm) will Sphinx assign weights to the document.
在sphinx中, 排序其實是可配置的, 他有一個叫ranker(這里我翻譯成排序器)的概念, 根據定義的方法, 把匹配的文檔和請求作為輸入,輸出來一個相關性的值。 簡而言之, 一個ranker可以精確的給每個文檔計算出相關性的值。
Previously, this ranking function was rigidly bound to the matching
So in the legacy matching modes (that is, SPH_MATCH_ALL,
SPH_MATCH_ANY, SPH_MATCH_PHRASE, and SPH_MATCH_BOOLEAN) you can not
choose the ranker. You can only do that in the SPH_MATCH_EXTENDED
(Which is the only mode in SphinxQL and the suggested mode in
SphinxAPI anyway.) To choose a non-default ranker you can either use
SetRankingMode() with SphinxAPI, or OPTION ranker clause in SELECT
statement when using SphinxQL.
以前,相排序方法被硬性的于匹配模式綁定在一起, 所以在一些老的匹配模式中(比如 SPH_MATCH_ALL, SPH_MATCH_ANY, SPH_MATCH_PHRASE, and SPH_MATCH_BOOLEAN), 你不能選擇ranker(排序器)。你只能在SPH_MATCH_EXTENDED(這也是在sphinxsql和sphinxApi中被建議使用的唯一的一種模式)模式下選擇。 如何選擇一個非默認的ranker(排序器),在SphinxApi中使用SetRankingMode()方法,在SphinxQL中設置ranker選項
As a sidenote, legacy matching modes are internally implemented via
the unified syntax anyway. When you use one of those modes, Sphinx
just internally adjusts the query and sets the associated ranker, then
executes the query using the very same unified code path.
注意,老的匹配模式被內置了統一的語法,當你使用這些模式的時候,sphinx僅僅內部判斷請求和設置相應的ranker,然后使用相同的代碼路徑去執行這些請求。
Available built-in rankers(內置的ranker)
Sphinx ships with a number of built-in rankers suited for different
A number of them uses two factors, phrase proximity (aka
LCS) and BM25. Phrase proximity works on the keyword positions, while
BM25 works on the keyword frequencies. Basically, the better the
degree of the phrase match between the document body and the query,
the higher is the phrase proximity (it maxes out when the document
contains the entire query as a verbatim quote). And BM25 is higher
when the document contains more rare words. We'll save the detailed
discussion for later.
Sphinx 內置了一系列的ranker, 用于不同的目的。他們中都是基于兩個因素, phrase proximity(又名LCS)和BM25, Phrase proximity用于表示關鍵字與關鍵字的位置有關, BM25于關鍵詞的出現的頻率有關。基本上, 請求與匹配的文檔越接近, phrase proximity就越高(當文檔完整的包含整個請求的關鍵字時最高)。當文檔中包含的關鍵詞越多,BM25就越高。我們稍候討論這些細節
Currently implemented rankers are:
當前內置的ranker有:
1.SPH_RANK_PROXIMITY_BM25, the default ranking mode that uses and
combines both phrase proximity and BM25 ranking.
1.SPH_RANK_PROXIMITY_BM25, 默認的ranker,基于hrase proximity and BM25 ranking兩個因素
2.SPH_RANK_BM25, statistical ranking mode which uses BM25 ranking only
(similar to most other full-text engines). This mode is faster but may
result in worse quality on queries which contain more than 1 keyword.
2.SPH_RANK_BM25, 當僅僅使用BM25這種排序因素的時候的模式(于大多數其他的全文引擎相似),這種模式雖然快,但結果的質量不高,很多結果包含的關鍵詞不止一個(即關鍵字越多,分值越高,但很多時候我們最想要的僅僅是一個完全命中的結果)
3.SPH_RANK_NONE, no ranking mode. This mode is obviously the fastest. A
weight of 1 is assigned to all matches. This is sometimes called
boolean searching that just matches the documents but does not rank
them.
3.SPH_RANK_NONE 沒有任何排序模式的模式,這種模式很明顯最快, 所有匹配的文檔的權重都是1, 有時候被成為布爾搜索,這種搜索僅僅搜索文檔,但不會排序
SPH_RANK_WORDCOUNT, ranking by the keyword occurrences count. This
computes the per-field keyword occurrence counts, then
multiplies them by field weights, and sums the resulting values.
4.SPH_RANK_WORDCOUNT 根據關鍵字出現的次數排序,這種排序方式的計算是基于每個字段的關鍵字出現的次數,然后整合這些字段的權重得出的結果。
5.SPH_RANK_PROXIMITY, added in version 0.9.9-rc1, returns raw phrase
proximity value as a result. This mode is internally used to emulate
SPH_MATCH_ALL queries.
5.SPH_RANK_PROXIMITY, 這種排序返回的是每個文檔于請求的相似程度,這種模式被內置用來在SPH_MATCH_ALL匹配模式的時候排序
6.SPH_RANK_MATCHANY, added in version 0.9.9-rc1, returns rank as it was
computed in SPH_MATCH_ANY mode earlier, and is internally used to
emulate SPH_MATCH_ANY queries.
6.SPH_RANK_MATCHANY, 早期的時候在SPH_MATCH_ANY匹配模式中使用, 返回相關值,在SPH_MATCH_ANY模式中內置的就是這種排序模式
7.SPH_RANK_FIELDMASK, added in version 0.9.9-rc2, returns a 32-bit mask
with N-th bit corresponding to N-th fulltext field, numbering from 0.
The bit will only be set when the respective field has any keyword
occurrences satisfying the query.
7.SPH_RANK_FIELDMASK 返回一個32位的掩碼, 每個位都對應一個相應的全文字段(不能應該要補零), 從0開始, 只有當相應的字段有關鍵字出現的時候才會被置1
8.SPH_RANK_SPH04, added in version 1.10-beta, is generally based on the
default SPH_RANK_PROXIMITY_BM25 ranker, but additionally boosts the
matches when they occur in the very beginning or the very end of a
text field. Thus, if a field equals the exact query, SPH04 should rank
it higher than a field that contains the exact query but is not equal
to it. (For instance, when the query is "Hyde Park", a document
entitled "Hyde Park" should be ranked higher than a one entitled "Hyde
Park, London" or "The Hyde Park Cafe".)
8.SPH_RANK_SPH04, 基于默認的SPH_RANK_PROXIMITY_BM25模式, 但假如匹配的文檔的開頭或者結尾出現了,那么這個文檔的相關值就會提升,所以,如果某個文檔的一個字段完全于請求的關鍵字一致, 那么這種模式下的排序的位置就應該比文檔中包含請求關鍵字的文檔高。(比如,如果請求的關鍵字是"Hyde Park", "Hyde Park"的文檔就會比"Hyde Park, London"或者"The Hyde Park Cafe"的排序高)
9.SPH_RANK_EXPR, added in version 2.0.2-beta, lets you specify the
ranking formula in run time. It exposes a number of internal text
factors and lets you define how the final weight should be computed
from those factors.
9.SPH_RANK_EXPR 這種模式讓你能在運行的時候指定排序規則, 他暴露了一系列的內置的文本的因素, 讓你能基于這些因素計算出最終的權重
You should specify the SPH_RANK_ prefix and use capital letters only
when using the SetRankingMode() call from the SphinxAPI. The API ports
expose these as global constants. Using SphinxQL syntax, the prefix
should be omitted and the ranker name is case insensitive.
你可以指定一個SPH_RANK_為前綴的排序模式,要全部大寫。在SphinxAPI中使用SetRankingMode()方法,這個API中定義了這些模式的全局常量。 在SphinxQL中, 這個前綴要被映射,且ranker的名稱是大小寫敏感的(就是要指定ranker模式的參數選項)
Example:
// SphinxAPI
$client->SetRankingMode ( SPH_RANK_SPH04 );
// SphinxQL
mysql_query ( "SELECT ... OPTION ranker=sph04" );
總結
以上是生活随笔為你收集整理的sphinx搜索结果按权重排序php,Sphinx Ranking Mode(排序模式) (翻译)的全部內容,希望文章能夠幫你解決所遇到的問題。
 
                            
                        - 上一篇: oracle中sql命令分为几类,常用的
- 下一篇: bo65连oracle报服务不响应,OR
