Implementation of Experts' Retrieval Model Using Latent Semantic Indexing (LSA) Method and Temporal Graph

Document Type : مقالات پژوهشی

Authors

1 PhD graduate of. of Information Sciene and Knowledge Studies, University of Tehran, Tehran, Iran

2 Associate Professor of Information Science and Knowledge Studies, University of Tehran, Tehran, Iran

3 PhD in IT Management, Sahmeto Ltd., Tehran-Iran

Abstract

Introduction: Retrieval of experts is a subset of information retrieval that aims to provide a ranking of people who have knowledge in a particular field. Automated expertise work is challenging due to the abundance of expert information and data sources. Many expert approaches in both industry and academia have been proposed using new techniques in information retrieval, data mining, knowledge discovery, statistical modeling, probabilistic modeling, and complex networking. All researchers estimate the relationship between the query and the supporting documents of the expert candidate based on the occurrence of query words in the supporting documents, and they are main and important researches. These models are not capable of semantic communication. Therefore, in this research, the document-oriented method was considered using the LSA recovery model and the use of a time graph
Methodology: The research method is experimental ones, aside from this, survey and library methods have been used. The method used in current study to retrieve articles on LSA or Latent Semantic Analysis, which is based on the articles of the test collection prepared by Web of Science. These documents include English articles in information science and librarianship from 1989 to 2018 is indexed under the category of information science and librarianship on the website. Total number of these articles were 126924 and queries made by users were provided to all these articles. The retrieved documents were judged by relevance and after judging the relevance of the documents by the participants in the study, the performance of the information retrieval model was measured by the evaluation measurements of information retrieval systems. The result of the calculated measures was compared with the value of each of these measures in the basic model. A temporal graph was used to include the time factor. After that, the authors who had the most relevant work and their value of micro index of social network were introduced as experts. Then ten queries from the present research model and the basic model were randomly selected and given to eight people introduced by the second community for judgment and the results were compared.
Findings: According to the innovation used in the current research, which was the application of the information retrieval model of latent semantic analysis, which was finally used to retrieve expert authors, in terms of the amount obtained from each of the information retrieval metrics, i.e., the accuracy level at the level of the first five results, or p@5, mean average precision (MAP) and mean inverse rank (MRR) with values of 0.895, 0.839 and 0.909, respectively, the latent semantic analysis recovery model performed better than the base model. In addition, this is due to the better performance of the retrieval using the dimensionality reduction method compared to keyword matching. In this method, hidden meaning indexing is used, which is a kind of conceptual indexing and uses the statistical method of least squares, and the above indexing is extracted by applying this statistical method. As we know, there are many ways to express a word (synonyms), so it is possible that the query words do not match the words of the document. In addition, most words have multiple meanings (multiple synonyms), so retrieving information based on the concept and meaning of a document is a better approach. LSI assumes that there is a number of latent structures in word usage that are partially blocked by diverse word choices. SVD is used to estimate this structure. The vectors that are obtained statistically strengthen the indicators of meaning more than individual words. The results of other researches also indicate that retrieving documents by matching query keywords with documents is a relatively weaker method. Also, the LSA retrieval model has a better performance in retrieving documents in a large set of documents than in a small set. According to the next innovation of the current research, which was the involvement of the time factor in expert search, and also according to the use of social network indicators and the final relevance judgment, the results showed that the performance of this method is significantly better than the model has been the base. The time factor was included in the retrieval of experts so that people who are no longer alive or who have been around for a long time since their last publication in a certain field are not retrieved. Considering the useful life of publications in the field of knowledge and information science, a ten-year period was involved. After using publication time as the determining factor of expert retrieval, those who had published the most related work were considered as the next determining factor and then the micro indicators of the social network such as degree centrality, betweenness centrality, closeness and special vector are other determining factors that are widely used in scientometric researches and recently in expert retrieval researches. The ten queries proposed in the current research were sent to 8 people who defined the second statistical population of the research, and the results indicated that the performance of the time graph and expert finding performed better by using the factor of the most relevant published works and the factor of micro-indexes of the social network.
Conclusion: LSI assumes that there is a number of latent structures in word usage that are partially blocked by diverse word choices. SVD is used to estimate this structure. The vectors that are obtained statistically strengthen the indicators of meaning more than individual words. The results of other researches also indicate that retrieving documents by matching query keywords with documents is a relatively weaker method. Also, the LSA retrieval model has a better performance in retrieving documents in a large set of documents than in a small set. According to the next innovation of the current research, which was the involvement of the time factor in expert search, and also according to the use of social network indicators and the final relevance judgment, the results showed that the performance of this method is significantly better than the model that has been the base. The time factor was included in the retrieval of experts so that people who are no longer alive or who have been around for a long time since their last publication in a certain field are not retrieved. Considering the useful life of publications in the field of knowledge and information science and, a ten-year period was involved. After using publication time as the determining factor of expert retrieval, those who had published the most related work were considered as the next determining factor and then the micro indicators of the social network such as degree centrality, betweenness centrality, closeness and special vectors are other determining factors that are widely used in scientometric researches and recently in expert retrieval researches. is used. The results showed that the LSA model performed better than the base model for retrieving related documents and the use of time graph showed better performance than the base model.
 

Keywords

Main Subjects


Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern Information Retrieval. Addison-Wesley, Wokingham, UK, [In Persian]
Atapour, H. (2016). Investigating the improvement of expert authors finding model on the basis of persons-documents associations. (Unpublished Phd's thesis), University of Tehran. Faculty of Information Science and Knowledge Studie. Retrieved on 12/11/2017 from https://ganj.irandoc.ac.ir/viewer/55cff892e865583a5b67c568a357516b?sample=1 [In Persian]
Nazari, M., Habibi, M. (2016). Review of novel methods LDA, LSA and PLSA in Textmining. The First International Conference on new vistas in Electrical and Computer Engineering. retrieved on 5/20/2023 from https://civilica.com/doc/555595
Sarmad, Z., Bazargan, A. & Hejazi. E. (2011). Research methods in behavioral sciences. Tehran. Agah.
Askari, A., Verberne, S. & Pasi, G. (2022). Expert Finding in Legal Community Question Answering.ArXiv.2202.07667V3[cs.IR].
https://doi.org/10.48550/arXiv.2201.07667
Balog, K. (2008). People search in the enterprise. PhD thesis, University of Amsterdam. https://doi.org/10.1145/1480506.1480526
Berry, M.W., Dumais, S.T., O’Brien, G.W. (1995). Using Linear Algebra for Intelligent Information Retrieval. SIAM Review. Vol.37. No.4. pp. 575-595. https://doi.org/10.1137/1037127
Berry, M.W., Dumais, S.T. & Shippy, A.T (1995). A case study of latent semantic indexing. Tech. Rep CS-95-271, University of Tennessee, Knoxville, January http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.48.1929
Bogers, T., Kox, K., & Van Den Bosch, A. (2008). Using Citation Analysis for Finding Expert in Workgroups. In Proc. DIR, pp. 21-28. Retrieved on 10/02/2019 from     https://www.semanticscholar.org
Campell, C.S., Maglio, P.P., Cozzi, A., & Down, B. (2003). Expertise identification using email communications. In Proceedings of twelfth international conference on Information and knowledge management. pp. 528-531. ACM https://doi.org/10.1145/956863.956965
Chen, C.M., Paul, R.L. (2001). Visualizing a Knowledge domain’s intellectual Structure. IEEE Computer, vol. 34. No.3. 67-71. Retrieved on 25/10/2018 from http://www.pages.drexel.edu/~cc345/papers/ieeecomputer2001.pdf
Cheng, B. (2005). Towards Understanding Latent Semantic Indexing. 25/10/2018 from: http://www.cs.ualberta.a/TechReports/2003/TR03-03/TE03-03.pdf
Daud, A., Li, J., Zhou, L., Muhammad, F. (2010). Temporal Expert Finding through Generalized Time Topic Modeling. Knowledge-Based System. Pp.615-625. https://doi.org/10.1016/j.knosys.2010.04.008
Deerwester, S., Dumais, s. T. Furnas, G. W., Landauer, T. K. (1990). Indexing by Latent Sematic Analysis. Journal of the American Society of Information Science. Vol. 4. No.6. pp. 391-407. Retrieved on 25/10/2018 from http://lsa.colorado.edu/papers/JASIS.lsi.90.pdf
Evangelopoulos, Nicholas, E. (2013). Latent Semantic analysis. WIREs Cogn Sci, 4: 683-692. https://doi.org/10.1002/wcs.1254
Ehrlich, K., Lin, C. Y., & Griffiths-Fisher, V. (2007). Searching for experts in the enterprise: combining text and social network analysis. In Proceedings of the 2007 international ACM conference on supporting group work (pp. 117-126). ACM. Retrieved on 25/10/2018 from http://www.sciweavers.org/publications/searching-experts-enterprise-combining-text-and-social-network-analysis.
Fang, H. and Zhai, C. (2007). Probabilistic models for expert finding. In ECIR, pages 418–430 DOI: 10.1007/978-3-540-71496-5_38
Fu, Y., Xiang, R., Zhang, M., Liu, Y., & Ma, Sh. (2006). A PDD-Based searching Approach for Expert Finding in International Information Management. In AIRS, LNCS 482, pp. 43-53. DOI: 10.1007/11880592_4
Li, J., Tang, J., Zhang, J., Luo, Q., Liu, Y., & Hong, M. (2007). Eos: expertise oriented search using social networks. In Proceedings of the 16th international conference on World Wide Web (pp. 1271-1272). ACM. DOI:10.1145/1242572.1242803
Lightenberg, Wouter, Pei, Yulong. (2017). Introduction to Temporal. Benchmark.ArXiv: 1703.02852[cs.sl. https://doi.org/10.48550/arXiv.1703.02852
Kanhabua, N, Nøvag, K. (2010). Determining Time of Queries for Re-ranking Search Results. Retrieved on 25/10/2018 from at: https://pdfs.semanticscholar.org/be2b/eae7c24866b270ea1d583ac8d2daa8e91770.pdf
Macdonald, C. (2009). The voting model for people search. PhD thesis. University of Glasgow. Retrieved on 25/10/2018 from: https://theses.gla.ac.uk/609/
Magerman, T.; Looy, B.V. & Song, X. (2010). Exploring the feasibility and accuracy of Latent Semantic Analysis based text-mining techniques to detect similarity between patent and scientific publications. Scientometrics, Vol. 82, No. 2, 289-306. https://doi.org/10.1007/s11192-009-0046-6
 Mathews, L., Kanmani, S.D. (2012). A Survey on Temporal Information Retrieval Systems. International Journal of Computer Applications. Vol. 58. No. 4. pp. 24-28 Retrieved on 25/10/2018 from https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.736.6511&rep=rep1&type=pdf
Michail, O. (2015). An Introduction to Temporal Graphs: An Algorithmic Prospective. Retrieved on 29/12/2018 from: https:arixv.org
 Omidvar, A., Garakani, M., & Safarpour, R. (2014). Context based user rankig in forums for expert finding using WordNet dictionary and social network analysis. Inf Techno Manag. 15: 51-63. https://doi.org/10.1007/s10799-013-0173-x
Salton, G., Yang, C., &, Wong, A. (1975). A Vector Space Model for Automatic Indexing. Communication of the ACM. Vol. 18, No. 11, pp. 613-620 https://doi.org/10.1145/361219.361220.
Sanderson, M. (2010). Test Collection based evaluation of Information retrieval systems. Foundation and Trends in Information retrieval. 4(4), 247-37. Retrieved on 27/10/2018 from: https://www.ccs.neu.edu/home/vip/teach/IRcourse/IR_surveys/FnTIR.pdf 5
Schwartz, M. F. & Wood, D. C. M. (1993). Discovering shared interests using graph analysis. Commun. ACM, 36(8), 78–89. https://doi.org/10.1145/163381.163402
Serdyukov, P. & Hiemstra, D. (2008). Modeling Documents as Mixtures of Person for Expert Finding. In ECIR, LNCS 4956, pp. 309-320. DOI: 10.1007/978-3-540-78646-7_29
Zhang, J., Ackerman, M. S., & Adamic, L. (2007). Expertise networks in online communities: structure and algorithms. In Proceedings of the 16th international conference on World Wide Web (pp. 221-230). ACM. https://doi.org/10.1145/1242572.1242603
Zhang, J. Tang, J & Li, J. (2007). Expert Finding in a Social Network. In DASFAA, LNCS 4443, pp. 1066-1069. DOI: 10.1007/978-3-540-71703-4_106
 Zhang J. Tang, J., Liu, L., & Li, J. (2008). A Mixture for Expert Finding. PAKDD, LANI 5012. Pp. 466-478. DOI: 10.1007/978-3-540-68125-0_41
Zhou, D., Orshanskiy, S., Zha, H., & Giles, C.L. (2007). Co-Rankig Authors and Documents in a Heterogeneous Network. IEEE Computer Society. pp. 739-744. DOI 10.1109/ICDM.2007.57
 
CAPTCHA Image