es分词引擎 – HYL Studio – 个人学习笔记

计科的同学做实验如果需要分词可以试用下这个分词引擎。

github地址https://github.com/huaban/elasticsearch-analysis-jieba

es官方文档https://www.elastic.co/guide/en/elasticsearch/reference/2.3/getting-started.html

地址http://es.hylstudio.cn/jieba

接口说明

index 主要用于索引分词，分词粒度较细
search 主要用于查询分词，分词粒度较粗

返回json中的index为字符序号，从0开始，左闭右开。

接口地址 http://es.hylstudio.cn/jieba/_analyze?analyzer=jieba_index

请求方法 POST

请求示例

{“text”:”明天实验取消了，好高兴哈哈哈哈”}

返回示例


{
  "tokens": [
    {
      "token": "明天",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "实验",
      "start_offset": 2,
      "end_offset": 4,
      "type": "word",
      "position": 1
    },
    {
      "token": "取消",
      "start_offset": 4,
      "end_offset": 6,
      "type": "word",
      "position": 2
    },
    {
      "token": "好",
      "start_offset": 0,
      "end_offset": 1,
      "type": "word",
      "position": 5
    },
    {
      "token": "高兴",
      "start_offset": 1,
      "end_offset": 3,
      "type": "word",
      "position": 6
    },
    {
      "token": "哈哈",
      "start_offset": 3,
      "end_offset": 5,
      "type": "word",
      "position": 7
    },
    {
      "token": "哈哈",
      "start_offset": 4,
      "end_offset": 6,
      "type": "word",
      "position": 8
    },
    {
      "token": "哈哈",
      "start_offset": 5,
      "end_offset": 7,
      "type": "word",
      "position": 9
    },
    {
      "token": "哈哈哈",
      "start_offset": 3,
      "end_offset": 6,
      "type": "word",
      "position": 10
    },
    {
      "token": "哈哈哈",
      "start_offset": 4,
      "end_offset": 7,
      "type": "word",
      "position": 11
    },
    {
      "token": "哈哈哈哈",
      "start_offset": 3,
      "end_offset": 7,
      "type": "word",
      "position": 12
    }
  ]
}

{

"tokens": [

{

"token": "明天",

"start_offset": 0,

"end_offset": 2,

"type": "word",

"position": 0

{

"token": "实验",

"start_offset": 2,

"end_offset": 4,

"type": "word",

"position": 1

{

"token": "取消",

"start_offset": 4,

"end_offset": 6,

"type": "word",

"position": 2

{

"token": "好",

"start_offset": 0,

"end_offset": 1,

"type": "word",

"position": 5

{

"token": "高兴",

"start_offset": 1,

"end_offset": 3,

"type": "word",

"position": 6

{

"token": "哈哈",

"start_offset": 3,

"end_offset": 5,

"type": "word",

"position": 7

{

"token": "哈哈",

"start_offset": 4,

"end_offset": 6,

"type": "word",

"position": 8

{

"token": "哈哈",

"start_offset": 5,

"end_offset": 7,

"type": "word",

"position": 9

{

"token": "哈哈哈",

"start_offset": 3,

"end_offset": 6,

"type": "word",

"position": 10

{

"token": "哈哈哈",

"start_offset": 4,

"end_offset": 7,

"type": "word",

"position": 11

{

"token": "哈哈哈哈",

"start_offset": 3,

"end_offset": 7,

"type": "word",

"position": 12

}

]

}

接口地址 http://es.hylstudio.cn/jieba/_analyze?analyzer=jieba_search

请求方法 POST

请求示例

{“text”:”明天实验取消了，好高兴哈哈哈哈”}

返回示例


{
  "tokens": [
    {
      "token": "明天",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 0
    },
    {
      "token": "实验",
      "start_offset": 2,
      "end_offset": 4,
      "type": "word",
      "position": 1
    },
    {
      "token": "取消",
      "start_offset": 4,
      "end_offset": 6,
      "type": "word",
      "position": 2
    },
    {
      "token": "好",
      "start_offset": 0,
      "end_offset": 1,
      "type": "word",
      "position": 5
    },
    {
      "token": "高兴",
      "start_offset": 1,
      "end_offset": 3,
      "type": "word",
      "position": 6
    },
    {
      "token": "哈哈哈哈",
      "start_offset": 3,
      "end_offset": 7,
      "type": "word",
      "position": 7
    }
  ]
}

{

"tokens": [

{

"token": "明天",

"start_offset": 0,

"end_offset": 2,

"type": "word",

"position": 0

{

"token": "实验",

"start_offset": 2,

"end_offset": 4,

"type": "word",

"position": 1

{

"token": "取消",

"start_offset": 4,

"end_offset": 6,

"type": "word",

"position": 2

{

"token": "好",

"start_offset": 0,

"end_offset": 1,

"type": "word",

"position": 5

{

"token": "高兴",

"start_offset": 1,

"end_offset": 3,

"type": "word",

"position": 6

{

"token": "哈哈哈哈",

"start_offset": 3,

"end_offset": 7,

"type": "word",

"position": 7

}

]

}