在中文搜索场景中,用户经常使用拼音输入(如 “iPhone”、“pingguo”)来搜索中文内容(如“苹果手机”)。为了提升用户体验,Elasticsearch 可通过 拼音分词器 + Completion Suggester 实现 拼音补全(Pinyin Completion) 功能。
本文提供一套 完整、可落地的 Elasticsearch 拼音补全配置模板,支持:
- 中文输入 → 中文补全
- 拼音输入 → 中文补全
- 拼音首字母输入 → 中文补全
- 自动纠错与模糊匹配
一、前置准备
1. 安装拼音分词插件
Elasticsearch 官方不自带拼音分词器,需安装第三方插件:
# 进入 Elasticsearch 插件目录
cd /usr/share/elasticsearch# 安装拼音分词器(根据 ES 版本选择)
bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v8.11.0/elasticsearch-analysis-pinyin-8.11.0-linux-x86_64.zip
✅ 支持版本:6.x ~ 8.x,GitHub 项目地址
重启 Elasticsearch 使插件生效。
二、索引配置模板
PUT /products-pinyin
{"settings": {"analysis": {"analyzer": {"pinyin_analyzer": {"type": "custom","tokenizer": "pinyin","filter": ["lowercase"]}},"tokenizer": {"pinyin": {"type": "pinyin","keep_separate_first_letter": false,"keep_full_pinyin": true,"keep_original": true,"limit_first_letter_length": 16,"lowercase": true,"remove_duplicated_term": true}}}},"mappings": {"properties": {"title": {"type": "text","analyzer": "ik_max_word","fields": {"pinyin": {"type": "text","analyzer": "pinyin_analyzer"}}},"suggest": {"type": "completion","analyzer": "simple","preserve_separators": true,"preserve_position_increments": true,"max_input_length": 50},"suggest_pinyin": {"type": "completion","analyzer": "pinyin_analyzer","preserve_separators": false,"preserve_position_increments": false,"max_input_length": 50}}}
}
三、字段说明
字段 | 用途 |
---|---|
title | 原始文本,用于全文搜索 |
title.pinyin | 用于拼音搜索(如 match: { "title.pinyin": "pingguo" } ) |
suggest | 支持中文输入补全(如“苹” → “苹果手机”) |
suggest_pinyin | 支持拼音输入补全(如“ping” → “苹果手机”) |
四、写入文档示例
PUT /products-pinyin/_doc/1
{"title": "苹果手机 iPhone 15","suggest": {"input": ["苹果手机","iPhone 15","苹果","手机"],"weight": 30},"suggest_pinyin": {"input": ["pingguoshouji","yinguoshouji","pingguo","shouji","pgs","pg"],"weight": 30}
}
✅
input
列表包含:
- 完整拼音:
pingguoshouji
- 首字母:
pgs
、pg
- 分词拼音:
pingguo
,shouji
五、查询方式
1. 中文前缀补全
POST /products-pinyin/_search
{"suggest": {"text": "苹","completion": {"field": "suggest"}}
}
返回:
"suggest": [{"text": "苹","options": [{ "text": "苹果手机", "score": 30 }]}
]
2. 拼音前缀补全
POST /products-pinyin/_search
{"suggest": {"text": "ping","completion": {"field": "suggest_pinyin"}}
}
返回:
"options": [{ "text": "pingguoshouji", "score": 30 }
]
⚠️ 返回的是拼音,需在应用层映射回原始标题。
✅ 建议:在 suggest_pinyin
的 _source
中存储原始 title
:
"suggest_pinyin": {"input": ["pingguo"],"weight": 30,"_source": "苹果手机 iPhone 15"
}
3. 拼音首字母补全
POST /products-pinyin/_search
{"suggest": {"text": "pgs","completion": {"field": "suggest_pinyin"}}
}
只要
input
中包含pgs
,即可匹配。
4. 模糊拼音补全(支持纠错)
"suggest": {"text": "pinggou","completion": {"field": "suggest_pinyin","fuzzy": {"fuzziness": 1,"transpositions": true}}
}
可匹配
pingguo
(编辑距离为 1)。
六、优化建议 ✅
场景 | 建议 |
---|---|
输入性能 | 预生成拼音和首字母,避免运行时计算 |
存储空间 | suggest_pinyin.input 可能较多,控制 max_input_length |
权重控制 | 热门商品设置更高 weight |
缓存 | 应用层缓存高频拼音前缀(如“i”, “ip”, “iph”) |
多语言 | 支持英文、拼音混合输入 |
七、完整补全流程(应用层)
def get_suggestions(user_input):suggestions = []# 1. 如果是中文,查 suggestif is_chinese(user_input):res = es.search(index="products-pinyin", suggest={"text": user_input, "completion": {"field": "suggest"}})for opt in res['suggest'][0]['options']:suggestions.append(opt['text'])# 2. 如果是拼音,查 suggest_pinyinelif is_pinyin(user_input):res = es.search(index="products-pinyin", suggest={"text": user_input, "completion": {"field": "suggest_pinyin"}, "fuzzy": {"fuzziness": 1}})for opt in res['suggest'][0]['options']:# 从 _source 或映射表获取原始标题original = get_title_by_pinyin(opt['text'])if original not in suggestions:suggestions.append(original)return suggestions[:10]
八、扩展建议
场景 | 建议方案 |
---|---|
动态拼音生成 | 写入时用 Ingest Pipeline 自动生成拼音 |
拼音 + 中文混合补全 | 使用 multi_match 查询 title.pinyin 和 title |
个性化补全 | 结合用户历史行为调整 weight |
冷启动问题 | 初始填充运营配置的热门词 |
性能监控 | 监控 suggest 查询延迟与命中率 |
九、Ingest Pipeline 自动生成拼音(可选)
PUT /_ingest/pipeline/add_pinyin_suggest
{"description": "自动添加拼音补全字段","processors": [{"script": {"lang": "painless","source": """ctx.suggest_pinyin = [];def inputs = [ctx.title];// 可调用外部服务生成拼音// 此处简化为固定值ctx.suggest_pinyin.add('pingguoshouji');ctx.suggest_pinyin.add('pgs');"""}}]
}
写入时使用:
PUT /products-pinyin/_doc/2?pipeline=add_pinyin_suggest
{"title": "华为手机"
}