Study on the Effect of Discarding XML Declaration and Changing the File Extension on Increasing the Index ability and Visibility of Metadata Records in the Web Search Engines Environment

Document Type : مقالات پژوهشی

Authors

Islamic Azad University, Science and Research Branch

Abstract

Purpose: This research was conducted to examine the effect of discarding XML declaration and changing the file extension on increasing the index ability and visibility of the element tag names of metadata records based on DCXML, MARCXML, and MODS in the Web search engines environment.
Methodology: Two groups of metadata records (300 XML-based records as the control group with the normal structure, and 300 XML-based records without XML declaration and with the extension according to the name of related metadata standard in an experimental group) were analyzed through the experimental approach. Two independent websites assigned to each studied group, and they were introduced to Google and Yahoo search engines. Subsequently, using element-based search strategies, the index ability and visibility of the published metadata records on those websites were examined.
Findings: Findings showed that Google and Yahoo indexed all the elements of tagged names of the metadata records relating to the experimental group indexes, and presented them in their search results. But the elements of tagged names in the control group’s metadata records were not indexed by the search engines.
Based on this study, it is possible to retrieve the experimental group's metadata records by their element of tagged names in the search engines. But the records of the control group are accessible by the values of the element only. In the end, , some patterns were suggested to the metadata creators and the search engine developers

Keywords


Alexa: The Web Information Company (2011). Global Top 500. Retrieved 14 Dec. 2011 from: http://www.alexa.com/site/ds/top_sites?ts_mode=global&lang=none
Aqa Abedi, E. (2012). The effect of syntax on the indexing & ranking of metadata records by the web search engine: a comparative study on MARCXML and DCXML metadata records, unpublished Master’s thesis, Science and Research Branch, Islamic Azad University, Tehran. (In Persian)
Bogaards, Peter J. (2004). Metadata and XML: Improving the Findability of Information. Retrieved 7 Feb. 2012 from: http://www.bogieland.com/taxi/metadata_and_xml.pdf
Bray, Tim; Paoli, Jean; Sperberg-McQueen, C. M.; Maler, Eve; Yergeau, François (2008). Extensible Markup Language (XML) 1.0 (Fifth Edition): W3C Recommendation 26 November 2008. Retrievd 7 Feb. 2012 from: http://www.w3.org/TR/xml/
Campex (2012). Top Search Engines. Retrieved 7 Feb. 2012 from: http://capmex.biz/resources/top-search-engines
ComScore (2012). comScore Releases May 2012 U.S. Search Engine Rankings. Retrieved 7 Feb. 2012 from: http://www.comscore.com/Press_Events/Press_Releases/2012/6/comScore_Releases_May_2012_U.S._Search_Engine_Rankings
Farance, Frank; Gillman, Dan (2007). Not as hard as it sounds: Using XML in Metadata-enabled infrastructure. Retrieved 5 Nov. 2011 from:http://jtc1sc32.org/doc/N1851-1900/32N1896-ISO_Focus-metadata.pdf
Gigee, Grant (2006). MARC and MARCXML.Retrieved 5 Nov. 2011 from:http://threegee.files.wordpress.com/2006/05/marcxml.pdf
Gill, Toney (2008).Metadata and the Web: Introduction to Metadata. Retrieved 5 Nov. 2011 from:http://www.getty.edu/research/publications/electronic_publications/intrometadata/metadata.pdf
Hariri, N.; Taheri, S.M. and Fattahi, S.R. (2014). Interoperability between metadata systems and web search engines: Current Development and Trends, Library and Information Research Journal. 3 (2). (In Persian)
Harold, Elliotte Rusty (2012). Effective XML. Retrieved 5 Dec. 2011 from:http://www.cafeconleche.org/books/effectivexml/chapters/01.html
Henshaw, Robin; Valauskas, Edward J (2001).Metadata as a Catalyst: Experiments with Metadata and Search Engines in the Internet Journal, First Monday. Retrieved 14 Dec. 2011 fromwww.librijournal.org/pdf/1999-3pp125-131.pdf
Hirwade, Mangala Anil (2011). A study of metadata standards. Library Hi Tech News, Vol. 28 Iss: 7 pp. 18- 25
JCommerce Developer Network (2012).Version declaration.Retrieved 14 Dec. 2011 from:http://www.javacommerce.com/displaypage.jsp?name=version.sql&id=18238
Lewis, Edward (2008). Top Ten Search Engines. Retrieved 14 Dec. 2011 from http://www.seoconsultants.com/search-engines/
Luk, Robert; Chan, Alvin; Dillon, Tharam; Leong, H. V. (2000).A Survey of Search Engines for XML Documents. Retrieved 14 Dec. 2011 from: http://www.haifa.il.ibm.com/sigir00-xml/final-papers/Luk/XMLSUR.htm
Luk, Robert; Leong, H. V.; Dillon, Tharam S.; Chan, Alvin T. S.; Croft, W. Bruce; Allan, James (2002). A Surveyin Indexing and Searching XML Documetns.Retrieved 14 Dec. 2011 from: http://onlinelibrary.wiley.com/doi/10.1002/asi.10056/full
Mascaro, Michelle J (2004). The Value of Flexibility in Metadata Schemas. Retrieved 14 Dec. 2011 from: http://etd.ils.unc.edu:8080/dspace/bitstream/1901/57/1/michellemascaro.pdf
Microsoft Developer Network (MSDN) (2012). XML Declaration [XML Standards]. Retrieved 14 Feb. 2012 from: http://msdn.microsoft.com/en-us/library/ms256048.aspx.
Mohamed, KhaledA.f. (2006). The impact of metadata in web resources discovering.Online Information Review. 30 (2): 155-167
Qin, Jian (2000). Representation and Organization of Information in the Web Space: From MARC to XML. Retrieved 12 Dec. 2011 from: http://inform.nu/Articles/Vol3/v3n2p83-88.pdf
Quevedo-Torrero, Jesus Ubaldo (2004). IMPROVING WEB RETRIEVAL BY MINING THE HTML TAGS FOR KEYWORDS AND EXPLORING THE HYPERLINK STRUCTURES WEB PAGES.Ph. D. Dissertation, Department of Computer Science, University of Houston. Retrieved 12 Dec. 2011 from: ‎http://wwwlib.umi.com/dissertations/fullcit/3156028‎
Safari, Mehdi (2005). Search Engine and Resource Discovery on the Web: Is Dublin Core an Impact Factor. Retrieved 12 Dec. 2011 from: www.webology.ir/2005/v2n2/a13.html
Search Engine Watch (2007). How Search Engines Work. Retrieved 12 Dec. 2011 from: http://searchenginewatch.com/article/2065173/How-Search-Engines-Work
Sharif, A. (2007). Study the effectiveness of metadata elements on web page visibility in publicsearch engines, available at: http://eprints.rclis.org/handle/10760/9171#.UHPNcVG94hA (accessed December 7, 2012). (In Persian)
Sokvine, Lloyd (2000). An Evaluation of the Effectiveness of Current Dublin Core Metadata for Retrieval. Retrieved 12 Dec. 2011 from: www.vala.org.au/vala2000/2000pdf/Sokvitne.PDF
Tabatabai Amiri F, Taheri S. M., Farajpahlou A, Osareh F, Moarrefzadeh A. (2012). Web Search Engines and Indexing and Ranking the Content Object Including Metadata Elements Available at the Dynamic Information Environments, Journal of Information Processing and Management. 27 (4). (In Persian)
Taheri, S. M.; Hariri, Nadjla (2012).A Comparative Study on the Indexing and Ranking of the Content Objects Including the MARCXML and Dublin Core's Metadata Elements by General Search Engines. Electronic Library.Vol 30, issue 4
Taheri, S.M. and Hariri, N. (2012). A comparative study on the indexing and ranking of the content objects including the MARCXML and Dublin Core’s metadata elements by general search engines, Library and Information Quarterly. No.: 48. (In Persian)
Tizag (2012).XML Tutorial. Retrieved 12 Dec. 2011 from: http://www.tizag.com/xmlTutorial/xmlprolog.php
Turner, Thomas P.; Brackbill, Lise (1998).Rising to the Top: Evaluating the Use of the HTML META Tag To Improve Retrieval of World Wide Web Documents through Internet Search Engines Retrieved 12 Dec. 2011 from: http://cat.inist.fr/?aModele=afficheN&cpsidt=1748620
Validome (2012).Error in the XML-Declaration.Retrieved 12 Dec. 2011 fromhttp://www.validome.org/lang/en/errors/XML-DECLARATION
Zhang, Jin; Dimitroff, Alexandra (2004).Internet search engine's response to metadata Dublin Core implementation. Retrieved 12 Dec. 2011 from http://portal.acm.org/citation.cfm?id=1142111
Zhang, Jin; Dimitroff, Alexandra (2005a).The impact of metadata implementation on Webpage visibility in search engine result (Part II). Retrieved 12 Dec. 2011 from: http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6VC8-4BHCBX4-2&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&view=c&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=a853d410a866732d3f8ab5dd3217d412
Zhang, Jin; Dimitroff, Alexandra (2005b). The impact of Webp age content characteristics on webpage visibility in search engine result (Part I). Retrieved 12 Dec. 2011 from: http://web.simmons.edu/~braun/467/part_1.pdf
CAPTCHA Image