Systematic review of Query Expansion in Persian Language

Document Type : مقالات پژوهشی

Authors

1 University of Tehran

2 University of Tabriz

3 Iranian Research Institute for Information Science and Technology (IranDoc)

Abstract

Introduction: Query expansion is considered as an appropriate solution for solving the problem of short and ambiguous user queries. The purpose of this study is to conduct a systematic review of Persian language query expansion.
Methodology: Current research is done with using a systematic review method based on Okoli & Schabram's Guidance. Searching in the scientific databases with related keywords ​​led to 35 works in Persian language and 18 works in English language. By applying primary refinement, inclusion and exclusion criteria to study and after expert review, six Persian works and eight English works were selected for doing a systematic review. A checklist was designed and needed information extracted from the works. Finally, findings were processed to achieve four goals of the study: identifying methods, knowledge sources, test collections, and research gaps.
Findings: The systematic review showed that 14 works deal with query expansion in Persian language. These works fall into four categories based on knowledge sources for term expansion: relevance-based (eight works), knowledge-based (two works), web-based (two works), and combined-based resources (two works). Most of these studies have been done on news test collections and the Hamshahri newspaper corpus has been used almost in the half of researches as a knowledge source for term expansion as well as test collection.
Conclusion: There is much room to research in the field of query expansion in Persian language. Various knowledge sources, especially web-based ontologies and resources should be considered and used for query expansion in Persian language. Besides, the use of a standard test collection provides the researchers with the facility of comparing the different methods.

Keywords


خالقی، مرتضی، و مینایی، بهروز (1394). چهارچوبی مستقل از زبان برای گسترش پرس‌وجو. سومین کنفرانس بین‌المللی پژوهش‌های کاربردی در مهندسی کامپیوتر و فن‌آوری اطلاعات.
خسروی، عبدالرسول، فتاحی، رحمت‌الله، پریرخ، مهری و دیانی، محمدحسین (1392). بررسی کارآمدی کلیدواژه‌ها و عبارت‌های پیشنهادی موتور کاوش گوگل در بسط جستجو و افزایش ربط از دیدگاه دانشجویان تحصیلات تکمیلی. پژوهش‌های نظری و کاربردی در علم اطلاعات و دانش‌شناسی، 3(1)، 133-150.
دیانت، روح‌الله، علی‌احمدی، مرتضی، اخلاقی، محمدیحیی و باباعلی، باقر (1395). ارائه یک روش جدید بازیابی اطلاعات مناسب برای متون حاصل از بازشناسی گفتار. پردازش علائم و داده‌ها، 4(3). 93-108.
ساعدی، سیامک (1390). گسترش پرس‌وجو در موتورهای جستجوی فارسی. پایان‌نامه کارشناسی ارشد رشته مهندسی فناوری اطلاعات- شبکه‌های کامپیوتری. دانشگاه یزد، دانشکده مهندسی برق و کامپیوتر.
شبان‌زاده حبیب‌آبادی، مژگان (1389). گسترش معنایی پرس‌وجو. پایان‌نامه کارشناسی ارشد دانشگاه اصفهان، دانشکده فنی و مهندسی.
عبدالحسینی، زهرا (1392). بسط پرس‌وجوی کاربر با بهره‌گیری از روش‌های استنتاج روابط معنایی در بانک‌های اطلاعاتی متنی. پایان‌نامه کارشناسی ارشد، دانشکده فنی، دانشگاه الزهرا.
کریسانی، پیام (1390). گسترش پرس‌وجوهای فارسی در موتورهای جستجو. پایان‌نامه کارشناسی ارشد رشته کامپیوتر- گرایش نرم‌افزار. دانشگاه تهران، دانشکده مهندسی برق و کامپیوتر.
ملبوس‌باف، رامین و عزیزی، فریدون (1389). مرور سیستماتیک "Systematic Review" چیست و چگونه نگاشته می‌شود؟. پژوهش در پزشکی، 34(3)، 203-207.
Abdelmgeid Amin, A. (2008). Using a query expansion technique to improve document retrieval. Information Technologies and Knowledge, 7(2), 343-345.
Agichtein, E., & Cucerzan, S. (2005). Predicting Extraction Performance Using Context Language Models. In the SIGIR 2005 Workshop on Methodologies and Evaluation of Lexical Cohesion Techniques in Real-World Applications, 2005.
AleAhmad, A., Amiri, H., Darrudi, E., Rahgozar, M., & Oroumchian, F. (2009). Hamshahri: A standard Persian text collection. Knowledge-Based Systems, 22(5), 382-387.
AleAhmad, A., Hakimian, P., Mahdikhani, F., & Oroumchian, F. (2007, February). N-gram and local context analysis for persian text retrieval. In Signal Processing and Its Applications, 2007. ISSPA 2007. 9th International Symposium on (pp. 1-4). IEEE.
Atwan, J., & Mohd, M. (2017). Arabic Query Expansion: A Review. Asian Journal of Information Technology, 16(10), 754-770.
Azad, H. K., & Deepak, A. (2019). Query expansion techniques for information retrieval: a survey. Information Processing & Management, 56(5), 1698-1735.
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Addison-Wesley Harlow, England.
Bendersky, M., & Croft, W. B. (2008, July). Discovering key concepts in verbose queries. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 491-498). ACM.
Bhogal, J., MacFarlane, A., & Smith, P. (2007). A review of ontology based query expansion. Information processing & management, 43(4), 866-886.
Dolamic, L., & Savoy, J. (2009, September). Ad hoc retrieval with the Persian language. In Workshop of the Cross-Language Evaluation Forum for European Languages (pp. 102-109). Springer, Berlin, Heidelberg.
Efthimiadis, E. N. (1996). Query Expansion. Annual review of information science and technology (ARIST), 31, 121-87.
Farhoodi, M., Mahmoudi, M., Bidoki, A. Z., Yari, A., & Azadnia, M. (2009). Query expansion using persian ontology derived from Wikipedia. World Applied Sciences Journal, 7(4), 410-417.
Hakimian, P., & Taghiyareh, F. (2007, December). Tuning Local Context Analysis for Farsi Documents. In Semantic Media Adaptation and Personalization, Second International Workshop on (pp. 116-121). IEEE.
Hakimian, P., & Taghiyareh, F. (2008, December). Customizing local context analysis for farsi information retrieval by using a new concept weighting algorithm. In 2008 Third International Workshop on Semantic Media Adaptation and Personalization (pp. 45-51). IEEE.
Harman, D., & Buckley, C. (2004, July). The NRRC reliable information access (RIA) workshop. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 528-529). ACM.
Hashemi, H. B., & Shakery, A. (2014). Mining a Persian–English comparable corpus for cross-language information retrieval. Information Processing & Management, 50(2), 384-398.
Karisani, P., Rahgozar, M., & Oroumchian, F. (2016). A query term re-weighting approach using document similarity. Information Processing & Management, 52(3), 478-489.
Lavrenko, V., & Croft, W. B. (2017, August). Relevance-based language models. In ACM SIGIR Forum (Vol. 51, No. 2, pp. 260-267). ACM.
Lee, K. S., Croft, W. B., & Allan, J. (2008, July). A cluster-based resampling method for pseudo-relevance feedback. In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval (pp. 235-242). ACM.
Mehdi, M., Okoli, C., Mesgari, M., Nielsen, F. Å., & Lanamäki, A. (2017). Excavating the mother lode of human-generated text: A systematic review of research that uses the wikipedia corpus. Information Processing & Management, 53(2), 505-529.
Okoli, C., & Schabram, K. (2010). A guide to conducting a systematic literature review of information systems research. Sprouts, 10-26.
Phan, N., Bailey, P., & Wilkinson, R. (2007, July). Understanding the relationship of information need specificity to search query length. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 709-710). ACM.
Robertson, S. E., & Jones, K. S. (1976). Relevance weighting of search terms. Journal of the American Society for Information science, 27(3), 129-146.
Rocchio, J. J. (1971). Relevance feedback in information retrieval. The SMART retrieval system: experiments in automatic document processing, 313-323.
Saboori, F., Bashiri, H., & Oroumchian, F. (2012). Assessment of query reweighing, by rocchio method in farsi information retrieval. International Journal of Information Science and Management (IJISM), 6(1), 9-16.
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620.
Shamsfard, M., Hesabi, A., Fadaei, H., Mansoory, N., Famian, A., Bagherbeigi, S., ... & Assi, S. M. (2010). Semi automatic development of farsnet; the persian wordnet. In Proceedings of 5th global WordNet conference, Mumbai, India (Vol. 29).
Spink, A., Wolfram, D., Jansen, M. B. J., & Saracevic, T. (2001). Searching the web: The public and their queries. Journal of the American Society for Information Science and Technology, 52(3), 226-234.
Wollersheim, D. (2005). Dynamic query expansion for information retrieval of imprecise medical queries. La Trobe University.
Zhang, H. (2013). Query enhancement with topic detection and disambiguation for robust retrieval. Indiana University.
CAPTCHA Image