一. ES parent-child 文档简介
ES 提供了类似数据库中 Join 联结的实现,可以通过 Join 类型的字段维护父子关系的数据,其父文档和子文档可以单独维护。
二. 父子文档的索引创建与数据插入
ES 父子文档的创建可以分为下面三步:
- 创建索引 Mapping,指明数据类型为 join 与父子文档名
- 插入父文档
- 插入子文档
下面针对每一步做演示。
1. 创建索引
假设我们有一个博客系统,每篇博客下有若干条评论,那么博客 blog 与评论 comment 就构成了一个父子关系。
父子文档的创建方为:
- 指定字段类型为
join
- 通过
relations
指定父子关系
示例如下:
# blog 为父文档,comment 为子文档
PUT blog_index
{"mappings": {"properties": {"blog_comment_join": {"type": "join","relations": {"blog": "comment"}}}}
}
2. 插入父文档
PUT blog_index/_doc/1
{"title": "First Blog","author": "Ahri","content": "This is my first blog","blog_comment_join": {"name": "blog"}
}PUT blog_index/_doc/2
{"title": "Second Blog","author": "EZ","content": "This is my second blog","blog_comment_join": "blog"
}
3. 插入子文档
插入子文档时需要注意一点:
routing
设置:子文档必须要与父文档存储在同一分片上,因此子文档的routing
应该设置为父文档 ID 或者与父文档保持一致
示例代码如下:
PUT blog_index/_doc/comment-1?routing=1&refresh
{"user": "Tom","content": "Good blog","comment_date": "2020-01-01 10:00:00","blog_comment_join": {"name": "comment","parent": 1}
}PUT blog_index/_doc/comment-2?routing=1&refresh
{"user": "Jhon","content": "Good Job","comment_date": "2020-02-01 10:00:00","blog_comment_join": {"name": "comment","parent": 1}
}PUT blog_index/_doc/comment-3?routing=2&refresh
{"user": "Jack","content": "Great job","comment_date": "2020-01-01 10:00:00","blog_comment_join": {"name": "comment","parent": 2}
}
4. 其他
除了上面常见的父子文档类型,ES Join 还支持 多子文档 和 多级父子文档 的设置。如下:
构建多个子文档
Join 类型一个父文档可以配置多个子文档,创建方式如下:
PUT my_index
{"mappings": {"properties": {"my_join_field": {"type": "join","relations": {"question": ["answer", "comment"] }}}}
}
构建多级父子关系
PUT my_index
{"mappings": {"properties": {"my_join_field": {"type": "join","relations": {"question": ["answer", "comment"], "answer": "vote" }}}}
}
上面创建的父子文档层级如下图所示:
三. 父子文档的查询
基于父子文档的查询主要有三种:
parent_id
:基于父文档 ID 查询所有的子文档has_parent
:查询符合条件的父文档的所有子文档has_child
:查询符合条件的子文档的所有父文档
下面是具体查询示例:
【1】parent_id 查询
# 查询 ID 为 1 父文档的所有子文档
GET blog_index_parent_child/_search
{"query": {"parent_id": {"type": "comment","id": 1}}
}# 结果返回
{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 0.44183275,"hits" : [{"_index" : "blog_index","_type" : "_doc","_id" : "comment-1","_score" : 0.44183275,"_routing" : "1","_source" : {"user" : "Tom","content" : "Good blog","comment_date" : "2020-01-01 10:00:00","blog_comment_join" : {"name" : "comment","parent" : 1}}},{"_index" : "blog_index","_type" : "_doc","_id" : "comment-2","_score" : 0.44183275,"_routing" : "1","_source" : {"user" : "Jhon","content" : "Good Job","comment_date" : "2020-02-01 10:00:00","blog_comment_join" : {"name" : "comment","parent" : 1}}}]}
}
【2】has_parent 查询
# 查询 title 包含 first 的父文档的所有子文档
GET blog_index/_search
{"query": {"has_parent": {"parent_type": "blog","query": {"match": {"title": "first"}}}}
}
# 结果返回
{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 1.0,"hits" : [{"_index" : "blog_index","_type" : "_doc","_id" : "comment-1","_score" : 1.0,"_routing" : "1","_source" : {"user" : "Tom","content" : "Good blog","comment_date" : "2020-01-01 10:00:00","blog_comment_join" : {"name" : "comment","parent" : 1}}},{"_index" : "blog_index","_type" : "_doc","_id" : "comment-2","_score" : 1.0,"_routing" : "1","_source" : {"user" : "Jhon","content" : "Good Job","comment_date" : "2020-02-01 10:00:00","blog_comment_join" : {"name" : "comment","parent" : 1}}}]}
}
【3】has_child 查询
# 查询 user 包含 Jack 的所有子文档的父文档
GET blog_index/_search
{"query": {"has_child": {"type": "comment","query": {"match": {"user": "Jack"}}}}
}
# 结果返回
{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 1.0,"hits" : [{"_index" : "blog_index","_type" : "_doc","_id" : "2","_score" : 1.0,"_source" : {"title" : "Second Blog","author" : "EZ","content" : "This is my second blog","blog_comment_join" : "blog"}}]}
}
四. Nested 对象 VS 父子文档
下面是极客时间课程《Elasticsearch核心技术与实战》中给出的对比:
一般来说大多数数据还是读多写少的,因此大多数时候还是优先使用 Nested 对象。