计算机科学 ›› 2019, Vol. 46 ›› Issue (4): 8-13.doi: 10.11896/j.issn.1002-137X.2019.04.002
徐耀丽, 李战怀
XU Yao-li, LI Zhan-huai
摘要: 实体解析(Entity Resolution,ER)是数据集成和清洗领域的基础问题,而不一致性消歧(Inconsistency Reconciliation,IR)通过对现存的不同ER算法产生的不一致记录对进行消歧,进一步提升解析效果。但是现有的IR方法有一个局限,即消歧结果没有质量保障。对此,首次提出了一个基于概率推断的质量控制智能体,记为QCAgent。该智能体不需要训练数据集,能够在满足给定查准率的约束条件下输出查全率最大的消歧结果。它的核心思想是:首先,使用异常点检测模型来估算不一致记录对匹配的概率,并依据这些概率估算查准率和查全率,再将计算出的查准率和查全率作为环境端的反馈;其次,使用二分搜索算法,选择满足查准率要求且查全率最大的翻转方案,作为QCAgent的下一次行动;然后,用更新后的一致结果训练异常点模型,并估算查准率和查全率。按此循环,当新估计的查准率满足约束条件时,该迭代过程停止。在真实的数据集上,实验结果表明:QCAgent能够有效解决消歧结果的质量控制问题。
中图分类号:
[1]XU Y,LI Z,CHEN Q,et al.GL-RF:A Reconciliation Framework for Label-free Entity Resolution [J].Frontiers of Compu-ter Science,2018,12(5):1035-1037. [2]LI G.Human-in-the-loop data integration [J].Proceedings of the VLDB Endowment,2017,10(12):2006-2017. [3]FAN F F,LI Z H,CHEN Q,et al.An outlier-detection based approach for automatic entity matching [J].Chinese Journal of Computers,2017,40(10):2197-2211.(in Chinese) 樊峰峰,李战怀,陈群,等.一种基于离群点检测的自动实体匹配方法[J].计算机学报,2017,40(10):2197-2211. [4]EFTHYMIOU V,STEFANIDIS K,CHRISTOPHIDES V.Minoan ER:Progressive Entity Resolution in the Web of Data[C]∥Proceedings of the 19th International Conference on Extending Database Technology.2016:670-671. [5]LI L,LI J,GAO H.Rule-Based Method for Entity Resolution [J].IEEE Transactions on Knowledge & Data Engineering,2015,27(1):250-263. [6]WHANG S E,MARMAROS D,GARCIA-MOLINA H.Pay-as-you-go entity resolution [J].IEEE Transactions on Knowledge and Data Engineering,2013,25(5):1111-1124. [7]BELLARE K,IYENGAR S,PARAMESWARAN A,et al.Active Sampling for Entity Matching with Guarantees [J].ACM Transactions on Knowledge Discovery from Data,2013,7(3):1-24. [8]BELLARE K,IYENGAR S,PARAMESWARAN A G,et al. Active sampling for entity matching[C]∥Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM:New York,2012:1131-1139. [9]WANG J,LI G,YU J X,et al.Entity matching:how similar is similar [J].Proceedings of the VLDB Endowment,2011,4(10):622-633. [10]MONGE A E,ELKAN C.The Field Matching Problem:Algorithms and Applications[C]∥Proceedings of the Second International Conference on Knowledge Discovery and Data Mining.AAAI Press:California,1996:267-270. [11]ZHANG D,GUO L,HE X,et al.A Graph-Theoretic Fusion Framework for Unsupervised Entity Resolution[C]∥Procee-dings of the 34th IEEE International Conference on Data Engineering.IEEE Computer Society,2018:713-724. [12]ARASU A,GÖTZ M,KAUSHIK R.On active learning of record matching packages[C]∥Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data.ACM:New York,2010:783-794. [13]MUDGAL S,LI H,REKATSINAS T,et al.Deep Learning for Entity Matching:A Design Space Exploration[C]∥Proceedings of the 2018 International Conference on Management of Data.ACM:New York,2018:19-34. [14]COHEN W,RAVIKUMAR P,FIENBERG S.A comparison of string metrics for matching names and records[C]∥Proceedings of the KDD Workshop on Data Cleaning and Object Consolidation.2003:73-78. [15]EBRAHEEM M,THIRUMURUGANATHAN S,JOTY S,et al. Distributed representations of tuples for entity resolution[J].Proceedings of the VLDB Endowment,2018,11(11):1454-1467. [16]COHEN W W.Data integration using similarity joins and a word-based information representation language [J].ACM Transactions on Information Systems,2000,18(3):288-321. [17]DAS A,KOTTUR S,MOURA J M F,et al.Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning[C]∥Proceedings of the IEEE International Conference on Computer Vision.2017:2970-2979. [18]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning [J].Nature,2015,518(7540):529-533. [19]LIU Q,ZHAI J W,ZHANG Z Z,et al.A Survey on Deep Reinforcement Learning [J].Chinese Journal of Computers,2018,41(1):1-27.(in Chinese) 刘全,翟建伟,章宗长,等.深度强化学习综述 [J].计算机学报,2018,41(1):1-27. [20]ZHAO X Y,DING S F.Research on Deep Reinforcement Learning [J].Computer Science,2018,45(7):1-6.(in Chinese) 赵星宇,丁世飞.深度强化学习研究综述 [J].计算机科学,2018,45(7):1-6. [21]CHEN Z,CHEN Q,FAN F,et al.Enabling quality control for entity resolution:A human and machine cooperation framework[C]∥Proceedings of the 2018 IEEE 34th International Confe-rence on Data Engineering.IEEE:New Jersey,2018:1156-1167. [22]EFTHYMIOU V,PAPADAKIS G,PAPASTEFANATOS G,et al. Parallel meta-blocking for scaling entity resolution over big heterogeneous data [J].Information Systems,2017,65:137-157. [23]WANG Q,CUI M,LIANG H.Semantic-aware blocking for entity resolution [J].IEEE Transactions on Knowledge and Data Engineering,2016,28(1):166-180. [24]SIMONINI G,BERGAMASCHI S,JAGADISH H.BLAST:a loosely schema-aware meta-blocking approach for entity resolution [J].Proceedings of the VLDB Endowment,2016,9(12):1173-1184. [25]PAPADAKIS G,KOUTRIKA G,PALPANAS T,et al.Meta- Blocking:Taking Entity Resolution to the Next Level [J].IEEE Transactions on Knowledge & Data Engineering,2014,26(8):1946-1960. [26]SCHÖLKOPF B,PLATT J C,SHAWE-TAYLOR J,et al.Estimating the support of a high-dimensional distribution [J].Neural computation,2001,13(7):1443-1471. [27]PEDREGOSA F,VAROQUAUX G,GRAMFORT A,et al.Scikit-learn:Machine learning in Python [J].Journal of Machine Learning Research,2011,12:2825-2830. [28]CORMEN T H,LEISERSON C E,RIVEST R L,et al.算法导论 [M].殷建平,徐云,王刚,等译.北京:机械工业出版社,2013. [29]KÖPCKE H,THOR A,RAHM E.Evaluation of entity resolution approaches on real-world match problems [J].Proceedings of the VLDB Endowment,2010,3(1-2):484-493. |
[1] | 熊丽琴, 曹雷, 赖俊, 陈希亮. 基于值分解的多智能体深度强化学习综述 Overview of Multi-agent Deep Reinforcement Learning Based on Value Factorization 计算机科学, 2022, 49(9): 172-182. https://doi.org/10.11896/jsjkx.210800112 |
[2] | 史殿习, 赵琛然, 张耀文, 杨绍武, 张拥军. 基于多智能体强化学习的端到端合作的自适应奖励方法 Adaptive Reward Method for End-to-End Cooperation Based on Multi-agent Reinforcement Learning 计算机科学, 2022, 49(8): 247-256. https://doi.org/10.11896/jsjkx.210700100 |
[3] | 张明新. 面向超大规模社会系统仿真的概念模型 Conceptual Model for Large-scale Social Simulation 计算机科学, 2022, 49(4): 16-24. https://doi.org/10.11896/jsjkx.210900136 |
[4] | 王春静, 刘丽, 谭艳艳, 张化祥. 基于模糊颜色特征和模糊相似度的图像检索方法 Image Retrieval Method Based on Fuzzy Color Features and Fuzzy Smiliarity 计算机科学, 2021, 48(8): 191-199. https://doi.org/10.11896/jsjkx.200800202 |
[5] | 周天阳, 曾子懿, 臧艺超, 王清贤. 基于多Agent联合决策的队组协同攻击规划 Team Cooperative Attack Planning Based on Multi-agent Joint Decision 计算机科学, 2021, 48(5): 301-307. https://doi.org/10.11896/jsjkx.200800174 |
[6] | 高枫越, 王琰, 朱铁兰. 有适应力的分布式状态估计方法 Resilient Distributed State Estimation Algorithm 计算机科学, 2021, 48(5): 308-312. https://doi.org/10.11896/jsjkx.200300117 |
[7] | 左剑凯, 吴杰宏, 陈嘉彤, 刘泽源, 李忠智. 异构无人机编队防御及评估策略研究 Study on Heterogeneous UAV Formation Defense and Evaluation Strategy 计算机科学, 2021, 48(2): 55-63. https://doi.org/10.11896/jsjkx.191100053 |
[8] | 杜威, 丁世飞. 多智能体强化学习综述 Overview on Multi-agent Reinforcement Learning 计算机科学, 2019, 46(8): 1-8. https://doi.org/10.11896/j.issn.1002-137X.2019.08.001 |
[9] | 文习明,方良达,余泉,常亮,王驹. 多智能体模态逻辑系统KD45n中的知识遗忘 Knowledge Forgetting in Multi-agent Modal Logic System KD45n 计算机科学, 2019, 46(7): 195-205. https://doi.org/10.11896/j.issn.1002-137X.2019.07.030 |
[10] | 颜功达, 董鹏, 文昊林. 基于多智能体的复杂工程项目进度风险评估仿真建模 Simulation Modeling of Complex Engineering Project Schedule Risk AssessmentBased on Multi Agent 计算机科学, 2019, 46(6A): 523-526. |
[11] | 张森, 刘文奇, 赵宁. 复杂网络上多智能体系统的一致性研究 Research of Consensus in Multi-agent Systems on Complex Network 计算机科学, 2019, 46(4): 95-99. https://doi.org/10.11896/j.issn.1002-137X.2019.04.015 |
[12] | 张杰, 王刚, 姚小强, 宋亚飞, 郑康波. 双向RNN下的航迹拟合模型研究 Research on Track Fitting Model Under Two-way RNN 计算机科学, 2019, 46(11A): 58-61. |
[13] | 董鹏, 吴翀, 余鹏, 文昊林. 基于多智能体的海上垂直补给规划仿真研究 Simulation Research on Offshore Vertical Replenishment Planning Based on Multi-agent 计算机科学, 2019, 46(11A): 72-75. |
[14] | 王世丽, 金英花, 吴晨. 基于通信时滞和噪音的群集运动 Flocking Based on Communication Delay and Noise 计算机科学, 2019, 46(10): 311-315. https://doi.org/10.11896/jsjkx.180901706 |
[15] | 边宅安,李慧嘉,陈俊华,马雨晗,赵丹. 多智能体系构架下的属性图分布式聚类算法 Distributed and Heterogeneous Multi-agent System for Attributed Graph Clustering 计算机科学, 2017, 44(Z6): 407-413. https://doi.org/10.11896/j.issn.1002-137X.2017.6A.092 |
|