Topic Modeling and its Application in Research: A Review of Specialized Literature

Document Type : Review Article

Authors

1 Student\Knowledge and Information Science \University of Isfahan, Isfahan, Iran

2 Associate ProfKnowledge and Information science department,, University of Isfahan, Isfahan, Iran

3 Artificial Intelligence Department, Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran

Abstract

Introduction: Topic modeling is one of the text mining techniques that allows you to discover unknown topics in a collection of documents, interpret documents based on these topics, and use these interpretations to organize, summarize, and search for texts automatically. Familiarity with the concept and technique of topic modeling, and its application in discovering topics and organizing information is one of the main goals of this research.
Methodology: The present study is a review-analytical type in which, while introducing topic modeling, it has categorized and reviewed the applications of this technique based on its performance and provided a sample of research that has used this technique.
Findings: Topic modeling algorithms is used not only in addition to the three main objectives of discovering hidden topics, interpreting documents based on topics, and finally organizing and classifying texts, but also is used in discovering hidden topics and relationships in the fields of science, information retrieval, categorizing documents based on topics, discovering outstanding patterns and emerging events, clustering the concepts of scientific fields, analyzing the course of conceptual evolution during historical periods, determining the hierarchical relationships of concepts. A specific scientific field or field and vocabulary enrichment.
Conclusion: Topic modeling based on machine learning and artificial intelligence knowledge has been proposed as one of the new approaches to organizing information resources and serious studies are being conducted in this field. Therefore, by using topic modeling algorithms in order to automate the extraction of the subject and discover the hidden issues in the source, it is possible to strengthen and update the new systems of organizing information resources.

Keywords


سهیلی، فرامرز؛ شعبانی، علی؛ خاصه، علی‌اکبر (1395). ساختار فکری دانش در حوزه رفتار اطلاعاتی: مطالعه هم واژگانی. تعامل انسان و اطلاعات، 2(4)، 21-36. بازیابی شده در 23 تیر 1400 از https://hii.khu.ac.ir/article-1-2446-fa.pdf
 محمدیان، بنفشه (1393). شناسایی سرقت علمی در اسناد فارسی بر اساس مدل‌سازی موضوعی. پایان‌نامه
 کارشناسی ارشد. دانشگاه خوارزمی. استاد راهنما میر محسن پدرام. بازیابی شده در 23 تیر 1400 از https://ganj-old.irandoc.ac.ir/articles/800558
مسعودی، بابک؛ راحتی‌قوچانی، سعید (1394). رفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی  LDA. پردازش علائم و داده‌ها. 4(26)، 117-125. بازیابی شده در 23 تیر 1400 از: http://jsdp.rcisp.ac.ir/article-1-58-fa.html
Arora, M., Kanjilal, U., Varshney, D(2010). Using social network analysis for information retrieval for mapping information and knowledge flows. Consultation Workshop on Knowledge Management, IMT Ghaziabad. Retrieved in 10 july 2020 from SSRN: https://ssrn.com/abstract=2162039
Aitchison, J. and Shen, S. (1980). Logistic normal distributions: Some properties and uses. Biometrika (67) 261–272. Retrieved in 10 jun 2020 from https://www.jstor.org/stable/2335470
Alghamdi, R., Alfalqi, Kh. (2015). A survey of topic modeling in text mining. I. J. ACSA,(1), 147-153. Retrieved jun30, 2020 from http://Doi.org/ 10.14569/IJACSA.2015.060121
American National Standards Institute/National Information Standards Organization (ANSI/NISO). (2005). Guidelines for the construction, format, and management of monolingual thesauri. Bethesda: NISO Press. Retrieved in 10 september 2020 from http://www.niso.org/standards/
Anoop, V.s., Ashraf, S., Deepak, P. (2016). Unsupervised concept hierarchy learning: topic modeling guided approach. Procedia computer science. (89): 386- 394. Retrieved in 11 september from: https://www.sciencedirect.com/science/article/pii/S1877050916311516
Anthony, A.; DesJardins, M. (2007). "Data clustering with a relational push- pull mod-el". Association for the Advancement of ArtificialIntelligence. Retrieved Nov.30, 2019 from: http://maple.cs.umbc.edu/papers/anthonya-Cluster-ingRPPM.pdf
Baeza-Yates, B., Ribeiro-Neto. Modern Information Retrieval. ACM Press, New York, 1999.
Bisgin H., Chen M., Wang Y., Kelly R., Hong F. et al. (2013) A systems approach for analysis of high content screening assay data with topic modeling. BMC Bioinform 14(Suppl 14):1–10. Retrieved Nov.30, 2019 from https://www.researchgate.net/publication/258169928_A_Systems_Approach_for_Analysis_of_High_Content_Screening_Assay_Data_with_Topic_Modeling
Bitterman, A.; Fischer, A. (2018). How to identify hot topics in psychology using topic modeling. Zeitschrift fur psychologie. 226(1) 3-13. Retrieved Nov.30, 2020 from https://www.researchgate.net/publication/321171283_How_to_Identify_Hot_Topics_in_Psychology_Using_Topic_Modeling
Blei, D., Jordan, M. (2003), "Latent dirichlet allocation." Journal of Machine Learning Research, (3), 993–1022. Retrieved jun30, 2020 from https://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
Blei, D. & Lafferty, J. (2007). A correlated topic model of science. The annual of applied statistics,1(1), 17-35. Retrieved jun30, 2020 from http://Doi.org/ 10.1214/07-AOAS114
Boyd-Graber, J., Blei, D.M., Zhu, X. (2007). A topic model for word sense disambiguation. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 1024–1033. Retrieved jun30, 2020 from https://www.researchgate.net/publication/221013017_A_Topic_Model_for_Word_Sense_Disambiguation
Castellani U., Perina A., Murino V., Bellani M., Rambaldelli G. et al. (2010) Brain morphometry by probabilistic latent semantic analysis. Int Conf Med Image Comput Computer Assist Intervent (13)177–184: Retrieved jun30, 2020 from http://Doi.org/ 10.1007/978-3-642-15745-5_22
Chen, J; Zhu, J; Lu, J; Liu, S. (2018). Scalable training of hierarchical topic modelis. Proceeding of VLDB endowment, 11(7), 826- 839. Retrieved jun30, 2020 from: https://www.researchgate.net/publication/325304116_Scalable_training_of_hierarchical_topic_models.
Chien, Jt. (2016). Hierarchical theme and topic modeling. IEEE trans neural netw learn syst.27(3): 565-578.Retrieved jun20, 2020 from: https://www.researchgate.net/publication/274394886_Hierarchical_Theme_and_Topic_Modeling.
Ding, Y., Chowdhhury, G., & Foo, S. (2001). Bibliometric cartography of information retrieval research by using coword analysis. Information Processing And Management, 37, 817-842. Retrieved nov30, 2020 from: https://www.sciencedirect.com/science/article/abs/pii/S0306457300000510
Eklund, J.; Nehans, G. (2017). Topic modeling approaches to aggregated citiation data. Science, technology and innovation indicators. Retrieved in 10 april from: https://pdfs.semanticscholar.org/ce33/50eb3de81f84b07177bd856e326c9bc4c7f5.pdf.
Fang, D., Yang, H., Gao, B. and Li, X. (2018), "Discovering research topics from library electronic references using latent Dirichlet allocation", Library Hi Tech, 36(3), 400-410. Retrieved in 20 april 2020 from:https://doi.org/10.1108/LHT-06-2017-0132.
Fayyad, U.; Piatestky-Shapiro, G. (2010). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37. Retrieved in 10 april 2020 from https://ojs.aaai.org//index.php/aimagazine/article/view/1230
Figuerola, C.G., García Marco, F.J. & Pinto, M. (2017) Scientometrics 112: 1507. Retrieved 10 Feb 2020 from https://doi.org/10.1007/s11192-017-2432-9.
Gao, Y. (2015). Pattern-based topic modeling and its application for information filtering and information retrieval. PhD thesis. Computer Department. University of technology, Queensland. Retrieved 11 April 2020 from: https://eprints.qut.edu.au/83982/1/Yang_Gao_Thesis.pdf
Gertisio, Ch. & Dussauchoy, A. (2010). “Knowledge Discovery from Industrial Data base”. Journal of Intelligent Manufacturing, 15, 29-37. Retrieved 20 Feb 2020 from https://www.researchgate.net/publication/227069567_Knowledge_discovery_from_industrial_databases
Hagen, L. (2018). Content analysis of e-petition with topic modeling: how to train and evaluate LDA models? Information processing & management,54(6), 1292-1307. Retrieved 10 April 2019 from https://www.sciencedirect.com/science/article/abs/pii/S0306457317307240
Hofmann, T. (1999) "Probabilistic latent semantic analysis," Proceedings of the 15th Conference on Uncertainty in AI. Retrieved 2 Feb 2020 from http://canterbury.eblib.com.au/patron/fullrecord.aspx?p=1687540
Hu, Z., Fang, S. & Liang, T. (2014)Emprical study of constructing a knowledge organization system of patent documents using topic modeling, Scientometrics, 100(3) 787- 799, Retrieved 11 Nov 2020 from: https://link.springer.com/article/10.1007%2Fs11192-014-1328-1
Hwang, S.Y., Wei, C.P., Lee, C.H., & Chen, Y.S. (2017). Coauthor ship network based literature recommendation with topic model. Online Information Review, 41(3), 318-336. Retrieved 11 Nov 2020 from https://www.emerald.com/insight/content/doi/10.1108/OIR-06-2016-0166/full/html
Cain, J. O. (2016): Using Topic Modeling to Enhance Access to Library Digital Collections, Journal of Web Librarianship. Retrieved 11 Nov 2020 from DOI: 10.1080/19322909.2016.1193455
Kontostathis, A., and Pottenger, W., (2006) “A framework for understanding Latent Semantic Indexing (LSI) performance”, Information Processing and Management, 42 (1), 56- 73. Retrieved 9 Nov 2020 from https://www.sciencedirect.com/science/article/abs/pii/S0306457304001529
Kostoff, R. N., Eberhart, H.J, & Toothman. D. R. (1997). Database Tomography for in- formation retrieval. Journal of Informution Science 23 (4): 301-311. Retrieved 9 March 2020 from https://www.sciencedirect.com/science/article/abs/pii/S0306457304001529
Kurata, K. (2018). Analyzing library and information science full-text articles using a topic modeling approach. 81Annual meeting of the association for information science & technology I nVancouvar of Canada (10-14, November, 2018). Retrieved 14 March 2020 from: https://www.researchgate.net/publication/330812928_Analyzing_library_and_information_science_full-text_articles_using_a_topic_modeling_approach
Lee, M., Liu, Z., Kelly, R., Tong, W. (2014) Of text and gene—using text mining methods to uncover hidden knowledge in toxicogenomics. BMC Syst Biol 8(1):1–11. Retrieved 14 March 2020 from https://www.researchgate.net/publication/264797394_Of_text_and_gene_-_using_text_mining_methods_to_uncover_hidden_knowledge_in_toxicogenomics
Lee, Y. S., Lo, R., Chen, C. Y., Lin, P. C., & Wang, J. C. (2015). News topics categorization using latent dirichlet allocation and sparse representation classifier. In 2015 IEEE international conference consumer Electronics-Taiwan (ICCE-TW) (pp. 136–137). Taiwan. Retrieved 14 March 2020 from https://ieeexplore.ieee.org/document/7216819
Li, P; He Jun-Qing; Ma, Ch(2016). Short text classification based on Latent Topic Modeling and word embedding. Joint international conference on artificial intelligence and computer engineering. Retrieved 14 March 2020 from https://www.researchgate.net/publication/313541675_Short_text_classification_based_on_LDA_topic_model
Liu, G., Hu, J., & Wang, H. (2012). A Co-word analysis of digital library field in china. Scientometrics 91(1), 203-217. Retrieved 1 March 2020 from https://link.springer.com/article/10.1007/s11192-011-0586-4
Masseroli, M., Chicco, D., Pinoli, P. (2012) Probabilistic latent semantic analysis for prediction of gene ontology annotations. The 2012 international joint conference on neural networks (IJCNN), 1–8. Retrieved 1 April 2020 from https://www.researchgate.net/publication/261087004_Probabilistic_Latent_Semantic_Analysis_for_prediction_of_Gene_Ontology_annotations
Meen, Ch. & Yongjun, Zh. (July 18th 2018). Scientometrics of Scientometrics: Mapping Historical Footprint and Emerging Technologies in Scientometrics, Scientometrics,. Retrieved 12 April 2020 from: https://www.intechopen.com/books/scientometrics/scientometrics-of-scientometrics-mapping-historical-footprint-and-emerging-technologies-in-scientome
Nadezhda, Y & Aleksey, F (2018). Improving the quality of information retrieval using syntactic analysis of search query. Retrieved 23 April 2020 from:https://www.semanticscholar.org/paper/Improving-the-Quality-of-Information-Retrieval-of-Yarushkina-ilippov/d0955103ee4e4cd78a0d24f880a1cda7f3b35d5e
Neff, M. W., & Corley, E. A. (2009). 35 years and 160,000 articles: a bibliometric exploration of the evolution of ecology. Scientometrics, 80(3), 657–682. Retrieved 23 April 2020 from https://www.researchgate.net/publication/257662787_35_years_and_160000_articles_A_bibliometric_exploration_of_the_evolution_of_ecology
Papadimitriou, C, H., Raghavan, P., Tamaki, H., and Vempala, S(1998). Latent semantic indexing: A probabilistic analysis. Journal of Computer and system sciences. 61(2), 217-235. Retrieved 23 Nov 2020 from https://www.sciencedirect.com/science/article/pii/S0022000000917112
Newman, D., Hagedorn, K., Chemudugunta, C., & Smyth, P. (2007). Subject metadata enrichment using statistical topic models. JCDL. Retrieved 23 Nov 2020 from file:///C:/Users/Ariyana/AppData/Local/Temp/Subject_Metadata_Enrichment_using_Statistical_Topi.pdf
Rani, M; Dhar, A; Vyas, O.P(2017). Semi- Automatic terminology ontology learning based on topic modeling. Engineering Application of Artificial Intelligence, 63, 108-125. Retrieved 15 Nov 2020 from https://www.sciencedirect.com/science/article/abs/pii/S0952197617300891
Rokaya, M. (2008). Ranking of field association terms using co-word analysis. Information Processing & Management, 44(2) 738-755. Retrieved 26 Nov 2020 from https://www.sciencedirect.com/science/article/abs/pii/S0306457307001227
Sanandres, E; Madariaga, C; Abello, R(2018). Topic modelling of twitter conversations. Retrieved 26 Nov 2020 from: https://www.researchgate.net/publication/326450126_Topic_Modeling_of_Twitter_Conversations/citations
Sarioglu, E.; Yadav, K.; Chio, H. (2017). Topic modeling based classification of clinical report. Association for computational linguistics, 67-73. Retrieved 22 Nov 2020 from: http://aclweb.org/anthology/P13-3010.
Selvi, M. (2019). Classification of Medical Dataset Along with Topic Modeling Using LDA. In: Nath V., Mandal J. (eds) Nanoelectronics, Circuits and Communication Systems. Lecture Notes in Electrical Engineering, 511. Springer. Retrieved 22 Nov 2020 from https://doi.org/10.1007/978-981-13-0776-8_1
Soergel, D)1974. (Indexing language and thesauri: construction and maintenance. Los Angeles, CA: Melville.
Sohrabi, B.; Raeesi vanani, I.; Baranizade Shineh, M. (2017). Topic Modeling and classification of cyberspace papers using text mining. Cyberspace studies, 2(1), 103- 125. Retrieved 1 Nov 2020 from http://ensani.ir/fa/article/380869/topic-modeling-and-classification-of-cyberspace-papers-using-text-mining
Sparck Jones, K. (1971). Automatic Keyword Classification for Information Retrieval. Butterworths: London.
Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440. Retrieved 1 April 2020 from https://psycnet.apa.org/record/2007-04818-021
Suominen, A.; Toivanen, H. (2015). Map of Science with topic modeling: Comparison of unsupervised learning and human- assigned subject classification. Journal of association for information and technology,67(10):2464–2476. Retrieved 12 April 2020 from https://dl.acm.org/doi/10.1002/asi.23596
Vickery, B.; Vickery, A. (1987). Information science in theory and practice. Berlin, New York: K. G. Saur Retrived 10 Jun from: https://doi.org/10.1515/9783598440083
Wittgenstein, L. (2010). Philosophical investigations. John Wiley & Sons.
Xing, Y., & James A (2009). A Comparative Study of Utilizing Topic Models for Information Retrieval. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval (ECIR ’09). Springer-Verlag, Berlin, Heidelberg, 29–41. Retrived 10 Jun from DOI:https://doi.org/10.1007/978-3-642-00958-7_6
Xu, Z. (2015) Doctoral dissertations in Chinese Interpreting Studies: A scientometric survey using topic modeling. FORUM. Revue internationale d’interprétation et de traduction / International Journal of Interpretation and Translation.13(1),131-165. Retrived 12 Jun from https://doi.org/10.1075/forum.13.1.07xuz
CAPTCHA Image