精品一区二区91,国产无套粉嫩白浆在线观看

第六章 總結(jié)和討論

第6.1.節(jié) 總結(jié)

信息抽取是近十年來新發(fā)展起來的領(lǐng)域。 MUC 等國際研討會(huì)給予高度關(guān)注，并提出了評(píng)價(jià)這類系統(tǒng)的方法，定義了評(píng)價(jià)指標(biāo)體系。

信息抽取技術(shù)的研究對象包括結(jié)構(gòu)化、半結(jié)構(gòu)化和自由式文檔。對于自由式文檔，多數(shù)采用了自然語言處理的方法，而其他兩類文檔的處理則多數(shù)是基于分隔符的。

網(wǎng)頁是信息抽取技術(shù)研究的重點(diǎn)之一。通常用分裝器從一特定網(wǎng)站上抽取信息。用一系列能處理不同網(wǎng)站的分裝器就能將數(shù)據(jù)統(tǒng)一表示，并獲得它們之間的關(guān)系。

分裝器的建造通常是費(fèi)事費(fèi)力的，而且需要專門知識(shí)。加上網(wǎng)頁動(dòng)態(tài)變化，維護(hù)分裝器的成本將很高。因此，如何自動(dòng)構(gòu)建分裝器便成為主要的問題。通常采用的方法包括基于歸納學(xué)習(xí)的機(jī)器學(xué)習(xí)方法。

有若干研究系統(tǒng)被開發(fā)出來。這些系統(tǒng)使用機(jī)器學(xué)習(xí)算法針對網(wǎng)上信息源生成抽取規(guī)則。 ShopBot ， WIEN ， SoftMealy 和 STALKER 生成的分裝器以分隔符為基礎(chǔ)，能處理結(jié)構(gòu)化程度高的網(wǎng)站。 RAPIER ， WHISK 和 SRV 能處理結(jié)構(gòu)化程度稍差的信息源。所采用的抽取方法與傳統(tǒng)的 IE 方法一脈相承，而學(xué)習(xí)算法多用關(guān)系學(xué)習(xí)法。

網(wǎng)站信息抽取和分裝器生成技術(shù)可在一系列的應(yīng)用領(lǐng)域內(nèi)發(fā)揮作用。目前只有比價(jià)購物方面的商業(yè)應(yīng)用比較成功，而最出色的系統(tǒng)包括 Jango ， Junglee 和 MySimon 。

第6.2.節(jié) 討論

目前的搜索引擎并不能收集到網(wǎng)上數(shù)據(jù)庫內(nèi)的信息。根據(jù)用戶的查詢請求，搜索引擎能找到相關(guān)的網(wǎng)頁，但不能把上面的信息抽取出來。“暗藏網(wǎng)”不斷增加，因此有必要開發(fā)一些工具把相關(guān)信息從網(wǎng)頁上抽取并收集起來。

由于網(wǎng)上信息整合越來越重要，雖然網(wǎng)站信息抽取的研究比較新，但將不斷發(fā)展。機(jī)器學(xué)習(xí)方法的使用仍將成為主流方法，因?yàn)樘幚韯?dòng)態(tài)的海量信息需要自動(dòng)化程度高的技術(shù)。在文獻(xiàn) [52] 中提出，結(jié)合不同類型的方法，以開發(fā)出適應(yīng)性強(qiáng)的系統(tǒng)，這應(yīng)是一個(gè)有前途的方向。在文獻(xiàn) [36] 中，一種混合語言知識(shí)和句法特征的方法也被提出來。

本文介紹的系統(tǒng)多數(shù)是針對 HTML 文檔的。以后幾年 XML 的使用將被普及。 HTML 描述的是文檔的表現(xiàn)方式，是文檔的格式語言。 XML 則可以告訴你文檔的意義，即定義內(nèi)容而不只是形式。這雖然使分裝器的生成工作變得簡單，但不能排除其存在的必要性。

將來的挑戰(zhàn)是建造靈活和可升級(jí)的分裝器自動(dòng)歸納系統(tǒng)，以適應(yīng)不斷增長的動(dòng)態(tài)網(wǎng)絡(luò)的需要。

參考文獻(xiàn)

[1] S. Abiteboul.

Querying Semistructured Data.

Proceedings of the International Conference on Database Theory (ICDT), ,

January 1997.

[2] B. Adelberg.

NoDoSE - A tool for Semi-Automatically Extracting Semistructured Data from Text

Documents.

Proceedings ACM SIGMOD International Conference on Management of Data, Seat-

tle, June 1998.

[3] D. E. Appelt, D. J. Israel.

Introduction to Information Extraction Technology.

Tutorial for IJCAI-99, , August 1999.

[4] N. Ashish, C. A. Knoblock.

Semi-automatic Wrapper Generation for Internet Information Sources.

Second IFCIS Conference on Cooperative Information Systems (CoopIS),

olina, June 1997.

[5] N. Ashish, C. A. Knoblock.

Wrapper Generation for semistructured Internet Sources.

SIGMOD Record, Vol. 26, No. 4, pp. 8--15, December 1997.

[6] P. Atzeni, G. Mecca.

Cut & Paste.

Proceedings of the 16‘th ACM SIGACT-SIGMOD-SIGART Symposium on Principles

of Database Systems (PODS‘97), , May 1997.

[7] M. Bauer, D. Dengler.

TrIAs - An Architecture for Trainable Information Assistants.

Workshop on AI and Information Integration, in conjunction with the 15‘th National

Conference on Artificial Intelligence (AAAI-98), , July 1998.

[8] P. Berka.

Intelligent Systems on the Internet.

http://lisp.vse.cz/ berka/ai-inet.htm, Laboratory of Intelligent Systems, University

of Economics,

[9] L. Bright, J. R. Gruser, L. Raschid, M. E. Vidal.

A Wrapper Generation Toolkit to Specify and Construct Wrappers for Web Accessible

Data Sources (WebSources).

Computer Systems Special Issue on Semantics on the WWW, Vol. 14 No. 2, March

1999.

[10] S. Brin.

Extracting Patterns and Relations from the World Wide Web.

International Workshop on the Web and Databases (WebDB‘98), , March 1998.

[11] M. E. Califf, R. J. Mooney.

Relational Learning of Pattern-Match Rules for Information Extraction.

Proceedings of the ACL Workshop on Natural Language , July 1997.

[12] M. E. Califf.

Relational Learning Techniques for Natural Language Information Extraction.

Ph.D. thesis, Department of Computer Sciences, , August

1998. Technical Report AI98-276.

[13] S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J.

Ullman, J. Widom.

The TSIMMIS Project: Integration of Heterogeneous Information Sources.

In Proceedings of IPSJ Conference, pp. 7--18, , Japan, October 1994.

[14] B. Chidlovskii, U. M. Borghoff, P-Y. Chevalier.

Towards Sophisticated Wrapping of Web-based Information Repositories.

Proceedings of the 5‘th International RIAO Conference, , June 1997.

[15] M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, S. Slattery.

Learning to Extract Symbolic Knowledge from the World Wide Web.

Proceedings of the 15‘th National Conference on Artificial Intelligence (AAAI-98),

, , July 1998.

[16] M. Craven, S. Slattery, K. Nigam.

First-Order Learning for Web Mining.

Proceedings of the 10‘th European Conference on Machine , April

1998.

[17] R. B. Doorenbos, O. Etzioni, D. S. Weld.

A Scalable Comparison-Shopping Agent for the World Wide Web.

Technical report UW-CSE-, , 1996.

[18] R. B. Doorenbos, O. Etzioni, D. S. Weld.

A Scalable Comparison-Shopping Agent for the World-Wide-Web.

Proceedings of the first International Conference on Autonomous Agents, ,

February 1997.

[19] O. Etzioni

Moving up the Information Food Chain: Deploying Softbots on the World Wide Web.

AI Magazine, 18(2):11-18, 1997.

[20] D. Florescu, A. Levy, A. Mendelzon.

Database Techniques for the World Wide Web: A Survey.

ACM SIGMOD Record, Vol. 27, No. 3, September 1998.

[21] D. Freitag.

Information Extraction from HTML: Application of a General Machine Learning Ap-

proach.

Proceedings of the 15‘th National Conference on Artificial Intelligence (AAAI-98),

, , July 1998.

[22] D. Freitag.

Machine Learning for Information Extraction in Informal Domains.

Ph.D. dissertation, , November 1998.

[23] D. Freitag.

Multistrategy Learning for Information Extraction.

Proceedings of the 15‘th International Conference on Machine Learning (ICML-98),

, , July 1998.

[24] R. Gaizauskas, Y. Wilks.

Information Extraction: Beyond Document Retrieval.

Computational Linguistics and Chinese Language Processing, vol. 3, no. 2, pp. 17--60,

August 1998,

[25] H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, J.

Widom.

Integrating and Accessing Heterogeneous Information Sources in TSIMMIS.

In Proceedings of the AAAI Symposium on Information Gathering, pp. 61--64, Stan-

ford, , March 1995.

[26] S. Grumbach and G. Mecca.

In Search of the Lost Schema.

Proceedings of the International Conference on Database Theory (ICDT‘99),

, January 1999.

[27] J-R. Gruser, L. Raschid, M. E. Vidal, L. Bright.

Wrapper Generation for Web Accessible Data Source.

Proceedings of the 3‘rd IFCIS International Conference on Cooperative Information

Systems (CoopIS-98), New York, August 1998.

[28] J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, A. Crespo.

Extracting Semistructured Information from Web.

Proceedings of the Workshop on Management of Semistructured Data, , Ari-

zona, May 1997.

[29] J. Hammer, H. Garcia-Molina, S. Nestorov, R. Yerneni, M. Breunig, V. Vassalos.

Template-Based Wrappers in the TSIMMIS System.

Proceedings of the 26‘th SIGMOD International Conference on Management of Data,

, , May 1997.

[30] C-H. Hsu.

Initial Results on Wrapping Semistructured Web Pages with Finite-State Transducers

and Contextual Rules.

Workshop on AI and Information Integration, in conjunction with the 15‘th National

Conference on Artificial Intelligence (AAAI-98), , July 1998.

[31] C-H. Hsu and M-T Dung.

Generating Finite-Sate Transducers for semistructured Data Extraction From the

Web.

Information systems, Vol 23. No. 8, pp. 521--538, 1998.

[32] C. A. Knoblock, S. Minton, J. L. Ambite, N. Ashish, P. J. Modi, I. Muslea, A. G.

Philpot, S. Tejada.

Modeling Web Sources for Information Integration.

Proceedings of the 15‘th National Conference on Artificial Intelligence (AAAI-98),

, , July 1998.

[33] N. Kushmerick, D. S. Weld, R. Doorenbos.

Wrapper Induction for Information Extraction.

15‘th International Joint Conference on Artificial Intelligence (IJCAI-97), ,

August 1997.

[34] N. Kushmerick.

Wrapper Induction for Information Extraction.

Ph.D. Dissertation, . Technical Report UW-CSE-,

1997.

[35] N. Kushmerick.

Wrapper induction: Efficiency and expressiveness.

Workshop on AI and Information Integration, in conjunction with the 15‘th National

Conference on Artificial Intelligence (AAAI-98), , July 1998.

[36] Kushmerick, N.

Gleaning the Web.

IEEE Intelligent Systems, 14(2), March/April 1999.

[37] S. Lawrence, C.l. Giles.

Searching the World Wide Web.

Science magazine, v. 280, pp. 98--100, April 1998.

[38] A. Y. Levy, A. Rajaraman, J. J. Ordille.

Querying Hetereogeneous Information Sources Using Source Descriptions.

Proceedings 22‘nd VLDB Conference, , September 1996.

[39] S. Muggleton, C. Feng.

Efficient Induction of Logic Programs.

Proceedings of the First Conference on Algorithmic Learning Theory, ,

1990.

[40]

Extraction Patterns: From Information Extraction to Wrapper Induction.

Information Sciences Institute, , 1998.

[41]

Extraction Patterns for Information Extraction Tasks: A Survey.

Workshop on Machine Learning for Information Extraction, , July 1999.

[42] Muslea, S. Minton, C. Knoblock.

STALKER: Learning Extraction Rules for Semistructured, Web-based Information

Sources.

Workshop on AI and Information Integration, in conjunction with the 15‘th National

Conference on Artificial Intelligence (AAAI-98), , July 1998.

[43] Muslea, S. Minton, C. Knoblock.

Wrapper Induction for Semistructured Web-based Information Sources.

Proceedings of the Conference on Automatic Learning and Discovery CONALD-98,

, June 1998.

[44] Muslea, S. Minton, C. Knoblock.

A Hierarchical Approach to Wrapper Induction.

Third International Conference on Autonomous Agents, (Agents‘99), Seattle, May

1999.

[45] S. Nestorov, S. Aboteboul, R. Motwani.

Inferring Structure in Semistructured Data.

Proceedings of the 13‘th International Conference on Data Engineering (ICDE‘97),

, , April 1997.

[46] STS Prasad, A. Rajaraman.

Virtual Database Technology, XML, and the Evolution of the Web.

Data Engineering, Vol. 21, No. 2, June 1998.

[47] J.R. Quinlan, R. M. Cameron-Jones.

FOIL: A Midterm Report.

European Conference on Machine Learning, , 1993.

[48] A. Rajaraman.

Transforming the Internet into a Database.

Workshop on Reuse of Web information, in conjunction with WWW7, Brisbane, April

1998.

[49] A. Sahuguet, F. Azavant.

WysiWyg Web Wrapper Factory (W

http://cheops.cis.upenn.edu/ sahuguet/WAPI/wapi.ps.gz,

nia, August 1998.

[50] D. Smith, M. Lopez.

Information Extraction for Semistructured Documents.

Proceedings of the Workshop on Management of Semistructured Data, in conjunction

with PODS/SIGMOD, , , May 1997.

[51] S. Soderland.

Learning to Extract Text-based Information from the World Wide Web.

Proceedings of the 3‘rd International Conference on Knowledge Discovery and Data

Mining (KDD), , August 1997.

[52] S. Soderland.

Learning Information Extraction Rules for Semistructured and Free Text.

Machine Learning, 1999.

[53] K. Zechner.

A Literature Survey on Information Extraction and Text Summarization.

Term paper, , 1997.

[54] About mySimon.

http://www.mysimon.com/about mysimon/company/backgrounder.anml

本站僅提供存儲(chǔ)服務(wù)，所有內(nèi)容均由用戶發(fā)布，如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請點(diǎn)擊舉報(bào)。

免费视频淫片aa毛片_日韩高清在线亚洲专区vr_日韩大片免费观看视频播放_亚洲欧美国产精品完整版

第6.1.節(jié) 總結(jié)

第6.2.節(jié) 討論