《Similarity-based Memory Enhanced Joint Entity and Relation Extraction》论文阅读笔记
代码
原文
摘要
文档级联合实体和关系抽取是一项难度很大的信息抽取任务,它要求用一个神经网络同时完成四个子任务,分别是:提及检测、共指消解、实体分类和关系抽取。目前的方法大多采用顺序的多任务学习方式,这种方式将任务任意分解,使得每个任务只依赖于前一个任务的结果,而忽略了任务之间可能存在的更复杂的相互影响。为了解决这些问题,本文提出了一种新的多任务学习框架,设计了一个统一的模型来处理所有的子任务,该模型的工作流程如下:首先,识别出文本中的实体提及,并将它们聚合成共指簇;其次,为每个实体簇分配一个合适的实体类型;最后,在实体簇之间建立关系。图 1 给出了一个来自 DocRED 数据集的文档示例,以及模型期望输出的实体簇图。为了克服基于流水线的方法的局限性,在模型中引入了双向的记忆式依赖机制,使得各个子任务能够相互影响和提升,从而更有效地完成联合任务。
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/kmhbutxz_k5xl.png)
模型架构
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/dpxzvssk_6ins.png)
该方法受到了 JEREX 的启发,由四个任务特定的组件组成:提及抽取(![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/srvrvtbt_tjdv.png)
)、共指消解(![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/gxfqafoj_qomy.png)
)、实体抽取(![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/thcaqgob_ndvp.png)
)和关系抽取(![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/nmztimrk_zsug.png)
)。与原来的流水线式架构不同,这里引入了图 2 所示的记忆模块,使得各个组件的输入表示能够通过基于记忆的扩展表示模块进行更新。该模块使用 Memory Read 操作从记忆矩阵 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/sxlibdis_pzca.png)
和 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/soojbzvd_u2zx.png)
中读取信息,这两个矩阵分别由实体和关系分类器写入。这样,各个组件之间就形成了双向的信息交互,从而更有效地完成联合任务。
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/srvrvtbt_tjdv.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/gxfqafoj_qomy.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/thcaqgob_ndvp.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/nmztimrk_zsug.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/sxlibdis_pzca.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/soojbzvd_u2zx.png)
Memory reading
该方法与 TriMF 类似,都是利用注意力机制,将输入表示与从记忆中读取的信息相结合,得到扩展的表示。如图 2 所示,本文的架构对两种输入表示进行了扩展:一种是词嵌入 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/aginnbek_zww3.png)
,另一种是提及候选的跨度表示 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/pfwatewy_lh43.png)
。对于每一种输入表示 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/mcogynlx_mbw7.png)
(![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/qbxhqfir_ktt3.png)
),以及每一种记忆矩阵 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/uotyewzt_h5xt.png)
(![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/ofqqrnnq_fsiy.png)
),注意力机制将输入表示 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/lyrjhkfs_w1g8.png)
作为键和值,其中 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/vmgdlaky_7u0s.png)
是表示向量的个数,![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/semqcswa_0exg.png)
是嵌入的维度。
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/aginnbek_zww3.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/pfwatewy_lh43.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/mcogynlx_mbw7.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/qbxhqfir_ktt3.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/uotyewzt_h5xt.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/ofqqrnnq_fsiy.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/lyrjhkfs_w1g8.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/vmgdlaky_7u0s.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/semqcswa_0exg.png)
作为查询,注意力机制使用记忆矩阵 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/mmllfvfx_802m.png)
,其中 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/xjquxezj_dwit.png)
是记忆槽的个数,![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/cdpeezoh_2gml.png)
是记忆槽的维度。为了计算注意力权重向量 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/hlnjmkpl_x689.png)
,对记忆槽的维度求和,公式如下:
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/mmllfvfx_802m.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/xjquxezj_dwit.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/cdpeezoh_2gml.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/hlnjmkpl_x689.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/uochewmy_bb88.png)
其中 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/kuhwifwf_pmk0.png)
是注意力机制的可学习参数矩阵,![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/etkyyyvg_mk4q.png)
是 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/mdyusnxq_le6r.png)
的第 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/arbxkazq_pko5.png)
个记忆槽。然后,![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/oadbaiyg_6u00.png)
向量用于对 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/rjhaxjjv_gf9j.png)
进行加权,得到扩展的输入表示 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/bfssvezl_dlga.png)
:
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/kuhwifwf_pmk0.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/etkyyyvg_mk4q.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/mdyusnxq_le6r.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/arbxkazq_pko5.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/oadbaiyg_6u00.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/rjhaxjjv_gf9j.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/bfssvezl_dlga.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/ohjcfvav_e0kp.png)
对于每一种输入表示![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/tiyxxagp_2l4n.png)
,记忆读取操作基于两种记忆矩阵生成两种扩展表示 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/qejgdkfl_mv3a.png)
和 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/qwrugnhi_rfg2.png)
。最终的扩展表示是通过对 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/byxgvrmh_xtod.png)
和 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/wzavfwwm_01kd.png)
求逐元素平均值得到的。
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/tiyxxagp_2l4n.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/qejgdkfl_mv3a.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/qwrugnhi_rfg2.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/byxgvrmh_xtod.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/wzavfwwm_01kd.png)
Memory writing
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/sxlibdis_pzca.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/soojbzvd_u2zx.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/sxlibdis_pzca.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/soojbzvd_u2zx.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/zavtsfpo_evxk.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/dkgtwhkx_rv9t.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/aadbqzun_sq8y.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/ocxkhwrl_0fxq.png)
给定实体对 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/vvvaxeee_s1pg.png)
的表示向量 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/zgqdqqul_h1bz.png)
,它的关系类型的存在概率如下:
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/vvvaxeee_s1pg.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/zgqdqqul_h1bz.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/itmsmjez_wlvf.png)
定义 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/qmdnxkfe_l5iu.png)
为实例表示 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/azcdiweh_izct.png)
和记忆矩阵 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/hkdwyuby_fyk9.png)
之间的双线性相似度,形式如下:
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/qmdnxkfe_l5iu.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/azcdiweh_izct.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/hkdwyuby_fyk9.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/rxmalyrz_x6ur.png)
其中 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/ahlpndlt_ew7h.png)
是一个可学习的参数矩阵。对于实体和关系分类器,分别使用了不同的双线性相似度权重矩阵:![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/ltnyrvor_wl5b.png)
和 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/pcktlclt_3mpx.png)
,其中 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/qrwbelno_n6p4.png)
和 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/mrovrzwt_wfg2.png)
是实体和实体对表示的维度。![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/puyhfgry_hls1.png)
和 ![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/czijpwrg_rrkk.png)
是实体和关系记忆矩阵的记忆槽的维度。在这里,记忆矩阵的记忆槽的数量与分类器的类别的数量相同。
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/ahlpndlt_ew7h.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/ltnyrvor_wl5b.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/pcktlclt_3mpx.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/qrwbelno_n6p4.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/mrovrzwt_wfg2.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/puyhfgry_hls1.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/czijpwrg_rrkk.png)
Training
最后,模型被训练优化关节损失![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/lepeqvik_mwyi.png)
,该关节包含与JEREX中相同的四个子任务相关的损失![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/nlkktfil_nyeh.png)
,并以固定的任务相关权值![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/opfjudeg_5ulx.png)
加权:
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/lepeqvik_mwyi.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/nlkktfil_nyeh.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/opfjudeg_5ulx.png)
![](https://img.shuduke.com/static_img/cnblogs/blog/3038153/202402/gcrvphih_vypu.png)
本文还纳入了TriMF中提出的两阶段训练方法,在超参数搜索过程中调整记忆预热比例。