elasticsearch-analysis-ik-8.9.0.tar.gz资源-CSDN文库

SpringData

需积分: 2 200 浏览量 2024-04-11 00:38:14 上传评论收藏 3.11MB GZ 举报

共48个文件

java：25个

dic：11个

xml：3个

资源推荐

资源详情

资源评论

收起资源包目录

elasticsearch-analysis-ik-8.9.0.tar.gz （48个子文件）

elasticsearch-analysis-ik-8.9.0

.travis.yml 187B

pom.xml 11KB

.github

FUNDING.yml 62B

LICENSE.txt 11KB

src

main

resources

plugin-descriptor.properties 2KB

plugin-security.policy 125B

assemblies

plugin.xml 1KB

java

org

wltea

analyzer

core

CharacterUtil.java 3KB

CN_QuantifierSegmenter.java 7KB

IKSegmenter.java 4KB

IKArbitrator.java 5KB

LetterSegmenter.java 9KB

QuickSortSet.java 6KB

LexemePath.java 6KB

ISegmenter.java 1KB

CJKSegmenter.java 4KB

AnalyzeContext.java 12KB

Lexeme.java 6KB

dic

Hit.java 3KB

Monitor.java 3KB

DictSegment.java 9KB

Dictionary.java 18KB

lucene

IKTokenizer.java 4KB

IKAnalyzer.java 2KB

cfg

Configuration.java 2KB

help

PrefixPluginLogger.java 2KB

CharacterHelper.java 2KB

Sleep.java 1019B

ESPluginLoggerFactory.java 875B

elasticsearch

index

analysis

IkAnalyzerProvider.java 1KB

IkTokenizerFactory.java 1KB

plugin

analysis

AnalysisIkPlugin.java 1KB

.gitignore 81B

README.md 8KB

licenses

lucene-LICENSE.txt 24KB

lucene-NOTICE.txt 9KB

config

main.dic 2.92MB

stopword.dic 164B

IKAnalyzer.cfg.xml 625B

extra_single_word.dic 62KB

quantifier.dic 2KB

suffix.dic 192B

extra_main.dic 4.98MB

extra_single_word_full.dic 62KB

extra_single_word_low_freq.dic 11KB

surname.dic 752B

extra_stopword.dic 156B

preposition.dic 123B

IK Analysis for Elasticsearch ============================= The IK Analysis plugin integrates Lucene IK analyzer (http://code.google.com/p/ik-analyzer/) into elasticsearch, support customized dictionary. Analyzer: `ik_smart` , `ik_max_word` , Tokenizer: `ik_smart` , `ik_max_word` Versions -------- IK version | ES version -----------|----------- master | 7.x -> master 6.x| 6.x 5.x| 5.x 1.10.6 | 2.4.6 1.9.5 | 2.3.5 1.8.1 | 2.2.1 1.7.0 | 2.1.1 1.5.0 | 2.0.0 1.2.6 | 1.0.0 1.2.5 | 0.90.x 1.1.3 | 0.20.x 1.0.0 | 0.16.2 -> 0.19.0 Install ------- 1.download or compile * optional 1 - download pre-build package from here: https://github.com/medcl/elasticsearch-analysis-ik/releases create plugin folder `cd your-es-root/plugins/ && mkdir ik` unzip plugin to folder `your-es-root/plugins/ik` * optional 2 - use elasticsearch-plugin to install ( supported from version v5.5.1 ): ``` ./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.0/elasticsearch-analysis-ik-6.3.0.zip ``` NOTE: replace `6.3.0` to your own elasticsearch version 2.restart elasticsearch #### Quick Example 1.create a index ```bash curl -XPUT http://localhost:9200/index ``` 2.create a mapping ```bash curl -XPOST http://localhost:9200/index/_mapping -H 'Content-Type:application/json' -d' { "properties": { "content": { "type": "text", "analyzer": "ik_max_word", "search_analyzer": "ik_smart" } } }' ``` 3.index some docs ```bash curl -XPOST http://localhost:9200/index/_create/1 -H 'Content-Type:application/json' -d' {"content":"美国留给伊拉克的是个烂摊子吗"} ' ``` ```bash curl -XPOST http://localhost:9200/index/_create/2 -H 'Content-Type:application/json' -d' {"content":"公安部：各地校车将享最高路权"} ' ``` ```bash curl -XPOST http://localhost:9200/index/_create/3 -H 'Content-Type:application/json' -d' {"content":"中韩渔警冲突调查：韩警平均每天扣1艘中国渔船"} ' ``` ```bash curl -XPOST http://localhost:9200/index/_create/4 -H 'Content-Type:application/json' -d' {"content":"中国驻洛杉矶领事馆遭亚裔男子枪击嫌犯已自首"} ' ``` 4.query with highlighting ```bash curl -XPOST http://localhost:9200/index/_search -H 'Content-Type:application/json' -d' { "query" : { "match" : { "content" : "中国" }}, "highlight" : { "pre_tags" : ["<tag1>", "<tag2>"], "post_tags" : ["</tag1>", "</tag2>"], "fields" : { "content" : {} } } } ' ``` Result ```json { "took": 14, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 2, "hits": [ { "_index": "index", "_type": "fulltext", "_id": "4", "_score": 2, "_source": { "content": "中国驻洛杉矶领事馆遭亚裔男子枪击嫌犯已自首" }, "highlight": { "content": [ "<tag1>中国</tag1>驻洛杉矶领事馆遭亚裔男子枪击嫌犯已自首 " ] } }, { "_index": "index", "_type": "fulltext", "_id": "3", "_score": 2, "_source": { "content": "中韩渔警冲突调查：韩警平均每天扣1艘中国渔船" }, "highlight": { "content": [ "均每天扣1艘<tag1>中国</tag1>渔船 " ] } } ] } } ``` ### Dictionary Configuration `IKAnalyzer.cfg.xml` can be located at `{conf}/analysis-ik/config/IKAnalyzer.cfg.xml` or `{plugins}/elasticsearch-analysis-ik-*/config/IKAnalyzer.cfg.xml` ```xml <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd"> <properties> <comment>IK Analyzer 扩展配置</comment>  <entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic</entry>  <entry key="ext_stopwords">custom/ext_stopword.dic</entry>  <entry key="remote_ext_dict">location</entry>  <entry key="remote_ext_stopwords">http://xxx.com/xxx.dic</entry> </properties> ``` ### 热更新 IK 分词使用方法目前该插件支持热更新 IK 分词，通过上文在 IK 配置文件中提到的如下配置 ```xml  <entry key="remote_ext_dict">location</entry>  <entry key="remote_ext_stopwords">location</entry> ``` 其中 `location` 是指一个 url，比如 `http://yoursite.com/getCustomDict`，该请求只需满足以下两点即可完成分词热更新。 1. 该 http 请求需要返回两个头部(header)，一个是 `Last-Modified`，一个是 `ETag`，这两者都是字符串类型，只要有一个发生变化，该插件就会去抓取新的分词进而更新词库。 2. 该 http 请求返回的内容格式是一行一个分词，换行符用 `\n` 即可。满足上面两点要求就可以实现热更新分词了，不需要重启 ES 实例。可以将需自动更新的热词放在一个 UTF-8 编码的 .txt 文件里，放在 nginx 或其他简易 http server 下，当 .txt 文件修改时，http server 会在客户端请求该文件时自动返回相应的 Last-Modified 和 ETag。可以另外做一个工具来从业务系统提取相关词汇，并更新这个 .txt 文件。 have fun. 常见问题 ------- 1.自定义词典为什么没有生效？请确保你的扩展词典的文本格式为 UTF8 编码 2.如何手动安装？ ```bash git clone https://github.com/medcl/elasticsearch-analysis-ik cd elasticsearch-analysis-ik git checkout tags/{version} mvn clean mvn compile mvn package ``` 拷贝和解压release下的文件: #{project_path}/elasticsearch-analysis-ik/target/releases/elasticsearch-analysis-ik-*.zip 到你的 elasticsearch 插件目录, 如: plugins/ik 重启elasticsearch 3.分词测试失败请在某个索引下调用analyze接口测试,而不是直接调用analyze接口如: ```bash curl -XGET "http://localhost:9200/your_index/_analyze" -H 'Content-Type: application/json' -d' { "text":"中华人民共和国MN","tokenizer": "my_ik" }' ``` 4. ik_max_word 和 ik_smart 什么区别? ik_max_word: 会将文本做最细粒度的拆分，比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”，会穷尽各种可能的组合，适合 Term Query； ik_smart: 会做最粗粒度的拆分，比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”，适合 Phrase 查询。 Changes ------ *自 v5.0.0 起* - 移除名为 `ik` 的analyzer和tokenizer,请分别使用 `ik_smart` 和 `ik_max_word` Thanks ------ YourKit supports IK Analysis for ElasticSearch project with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit's leading software products: <a href="http://www.yourkit.com/java/profiler/index.jsp">YourKit Java Profiler</a> and <a href="http://www.yourkit.com/.net/profiler/index.jsp">YourKit .NET Profiler</a>.

评论收藏

内容反馈