计科的同学做实验如果需要分词可以试用下这个分词引擎。
github地址https://github.com/huaban/elasticsearch-analysis-jieba
es官方文档https://www.elastic.co/guide/en/elasticsearch/reference/2.3/getting-started.html
地址http://es.hylstudio.cn/jieba
接口说明
- index 主要用于索引分词,分词粒度较细
- search 主要用于查询分词,分词粒度较粗
返回json中的index为字符序号,从0开始,左闭右开。
接口地址 http://es.hylstudio.cn/jieba/_analyze?analyzer=jieba_index
请求方法 POST
请求示例
{“text”:”明天实验取消了,好高兴哈哈哈哈”}
返回示例
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 |
{ "tokens": [ { "token": "明天", "start_offset": 0, "end_offset": 2, "type": "word", "position": 0 }, { "token": "实验", "start_offset": 2, "end_offset": 4, "type": "word", "position": 1 }, { "token": "取消", "start_offset": 4, "end_offset": 6, "type": "word", "position": 2 }, { "token": "好", "start_offset": 0, "end_offset": 1, "type": "word", "position": 5 }, { "token": "高兴", "start_offset": 1, "end_offset": 3, "type": "word", "position": 6 }, { "token": "哈哈", "start_offset": 3, "end_offset": 5, "type": "word", "position": 7 }, { "token": "哈哈", "start_offset": 4, "end_offset": 6, "type": "word", "position": 8 }, { "token": "哈哈", "start_offset": 5, "end_offset": 7, "type": "word", "position": 9 }, { "token": "哈哈哈", "start_offset": 3, "end_offset": 6, "type": "word", "position": 10 }, { "token": "哈哈哈", "start_offset": 4, "end_offset": 7, "type": "word", "position": 11 }, { "token": "哈哈哈哈", "start_offset": 3, "end_offset": 7, "type": "word", "position": 12 } ] } |
接口地址 http://es.hylstudio.cn/jieba/_analyze?analyzer=jieba_search
请求方法 POST
请求示例
{“text”:”明天实验取消了,好高兴哈哈哈哈”}
返回示例
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
{ "tokens": [ { "token": "明天", "start_offset": 0, "end_offset": 2, "type": "word", "position": 0 }, { "token": "实验", "start_offset": 2, "end_offset": 4, "type": "word", "position": 1 }, { "token": "取消", "start_offset": 4, "end_offset": 6, "type": "word", "position": 2 }, { "token": "好", "start_offset": 0, "end_offset": 1, "type": "word", "position": 5 }, { "token": "高兴", "start_offset": 1, "end_offset": 3, "type": "word", "position": 6 }, { "token": "哈哈哈哈", "start_offset": 3, "end_offset": 7, "type": "word", "position": 7 } ] } |
0 Comments