Topic Modeling and its Application in Research: A Review of Specialized Literature

Document Type : مقالات مروری


1 Student\Knowledge and Information Science \University of Isfahan, Isfahan, Iran

2 Associate ProfKnowledge and Information science department,, University of Isfahan, Isfahan, Iran

3 Artificial Intelligence Department, Faculty of Computer Engineering, University of Isfahan, Isfahan, Iran


Introduction: Topic modeling is one of the text mining techniques that allows you to discover unknown topics in a collection of documents, interpret documents based on these topics, and use these interpretations to organize, summarize, and search for texts automatically. Familiarity with the concept and technique of topic modeling, and its application in discovering topics and organizing information is one of the main goals of this research.
Methodology: The present study is a review-analytical type in which, while introducing topic modeling, it has categorized and reviewed the applications of this technique based on its performance and provided a sample of research that has used this technique.
Findings: Topic modeling algorithms is used not only in addition to the three main objectives of discovering hidden topics, interpreting documents based on topics, and finally organizing and classifying texts, but also is used in discovering hidden topics and relationships in the fields of science, information retrieval, categorizing documents based on topics, discovering outstanding patterns and emerging events, clustering the concepts of scientific fields, analyzing the course of conceptual evolution during historical periods, determining the hierarchical relationships of concepts. A specific scientific field or field and vocabulary enrichment.
Conclusion: Topic modeling based on machine learning and artificial intelligence knowledge has been proposed as one of the new approaches to organizing information resources and serious studies are being conducted in this field. Therefore, by using topic modeling algorithms in order to automate the extraction of the subject and discover the hidden issues in the source, it is possible to strengthen and update the new systems of organizing information resources.


سهیلی، فرامرز؛ شعبانی، علی؛ خاصه، علی‌اکبر (1395). ساختار فکری دانش در حوزه رفتار اطلاعاتی: مطالعه هم واژگانی. تعامل انسان و اطلاعات، 2(4)، 21-36. بازیابی شده در 23 تیر 1400 از
 محمدیان، بنفشه (1393). شناسایی سرقت علمی در اسناد فارسی بر اساس مدل‌سازی موضوعی. پایان‌نامه
 کارشناسی ارشد. دانشگاه خوارزمی. استاد راهنما میر محسن پدرام. بازیابی شده در 23 تیر 1400 از
مسعودی، بابک؛ راحتی‌قوچانی، سعید (1394). رفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی  LDA. پردازش علائم و داده‌ها. 4(26)، 117-125. بازیابی شده در 23 تیر 1400 از:
Arora, M., Kanjilal, U., Varshney, D(2010). Using social network analysis for information retrieval for mapping information and knowledge flows. Consultation Workshop on Knowledge Management, IMT Ghaziabad. Retrieved in 10 july 2020 from SSRN:
Aitchison, J. and Shen, S. (1980). Logistic normal distributions: Some properties and uses. Biometrika (67) 261–272. Retrieved in 10 jun 2020 from
Alghamdi, R., Alfalqi, Kh. (2015). A survey of topic modeling in text mining. I. J. ACSA,(1), 147-153. Retrieved jun30, 2020 from 10.14569/IJACSA.2015.060121
American National Standards Institute/National Information Standards Organization (ANSI/NISO). (2005). Guidelines for the construction, format, and management of monolingual thesauri. Bethesda: NISO Press. Retrieved in 10 september 2020 from
Anoop, V.s., Ashraf, S., Deepak, P. (2016). Unsupervised concept hierarchy learning: topic modeling guided approach. Procedia computer science. (89): 386- 394. Retrieved in 11 september from:
Anthony, A.; DesJardins, M. (2007). "Data clustering with a relational push- pull mod-el". Association for the Advancement of ArtificialIntelligence. Retrieved Nov.30, 2019 from:
Baeza-Yates, B., Ribeiro-Neto. Modern Information Retrieval. ACM Press, New York, 1999.
Bisgin H., Chen M., Wang Y., Kelly R., Hong F. et al. (2013) A systems approach for analysis of high content screening assay data with topic modeling. BMC Bioinform 14(Suppl 14):1–10. Retrieved Nov.30, 2019 from
Bitterman, A.; Fischer, A. (2018). How to identify hot topics in psychology using topic modeling. Zeitschrift fur psychologie. 226(1) 3-13. Retrieved Nov.30, 2020 from
Blei, D., Jordan, M. (2003), "Latent dirichlet allocation." Journal of Machine Learning Research, (3), 993–1022. Retrieved jun30, 2020 from
Blei, D. & Lafferty, J. (2007). A correlated topic model of science. The annual of applied statistics,1(1), 17-35. Retrieved jun30, 2020 from 10.1214/07-AOAS114
Boyd-Graber, J., Blei, D.M., Zhu, X. (2007). A topic model for word sense disambiguation. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 1024–1033. Retrieved jun30, 2020 from
Castellani U., Perina A., Murino V., Bellani M., Rambaldelli G. et al. (2010) Brain morphometry by probabilistic latent semantic analysis. Int Conf Med Image Comput Computer Assist Intervent (13)177–184: Retrieved jun30, 2020 from 10.1007/978-3-642-15745-5_22
Chen, J; Zhu, J; Lu, J; Liu, S. (2018). Scalable training of hierarchical topic modelis. Proceeding of VLDB endowment, 11(7), 826- 839. Retrieved jun30, 2020 from:
Chien, Jt. (2016). Hierarchical theme and topic modeling. IEEE trans neural netw learn syst.27(3): 565-578.Retrieved jun20, 2020 from:
Ding, Y., Chowdhhury, G., & Foo, S. (2001). Bibliometric cartography of information retrieval research by using coword analysis. Information Processing And Management, 37, 817-842. Retrieved nov30, 2020 from:
Eklund, J.; Nehans, G. (2017). Topic modeling approaches to aggregated citiation data. Science, technology and innovation indicators. Retrieved in 10 april from:
Fang, D., Yang, H., Gao, B. and Li, X. (2018), "Discovering research topics from library electronic references using latent Dirichlet allocation", Library Hi Tech, 36(3), 400-410. Retrieved in 20 april 2020 from:
Fayyad, U.; Piatestky-Shapiro, G. (2010). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37. Retrieved in 10 april 2020 from
Figuerola, C.G., García Marco, F.J. & Pinto, M. (2017) Scientometrics 112: 1507. Retrieved 10 Feb 2020 from
Gao, Y. (2015). Pattern-based topic modeling and its application for information filtering and information retrieval. PhD thesis. Computer Department. University of technology, Queensland. Retrieved 11 April 2020 from:
Gertisio, Ch. & Dussauchoy, A. (2010). “Knowledge Discovery from Industrial Data base”. Journal of Intelligent Manufacturing, 15, 29-37. Retrieved 20 Feb 2020 from
Hagen, L. (2018). Content analysis of e-petition with topic modeling: how to train and evaluate LDA models? Information processing & management,54(6), 1292-1307. Retrieved 10 April 2019 from
Hofmann, T. (1999) "Probabilistic latent semantic analysis," Proceedings of the 15th Conference on Uncertainty in AI. Retrieved 2 Feb 2020 from
Hu, Z., Fang, S. & Liang, T. (2014)Emprical study of constructing a knowledge organization system of patent documents using topic modeling, Scientometrics, 100(3) 787- 799, Retrieved 11 Nov 2020 from:
Hwang, S.Y., Wei, C.P., Lee, C.H., & Chen, Y.S. (2017). Coauthor ship network based literature recommendation with topic model. Online Information Review, 41(3), 318-336. Retrieved 11 Nov 2020 from
Cain, J. O. (2016): Using Topic Modeling to Enhance Access to Library Digital Collections, Journal of Web Librarianship. Retrieved 11 Nov 2020 from DOI: 10.1080/19322909.2016.1193455
Kontostathis, A., and Pottenger, W., (2006) “A framework for understanding Latent Semantic Indexing (LSI) performance”, Information Processing and Management, 42 (1), 56- 73. Retrieved 9 Nov 2020 from
Kostoff, R. N., Eberhart, H.J, & Toothman. D. R. (1997). Database Tomography for in- formation retrieval. Journal of Informution Science 23 (4): 301-311. Retrieved 9 March 2020 from
Kurata, K. (2018). Analyzing library and information science full-text articles using a topic modeling approach. 81Annual meeting of the association for information science & technology I nVancouvar of Canada (10-14, November, 2018). Retrieved 14 March 2020 from:
Lee, M., Liu, Z., Kelly, R., Tong, W. (2014) Of text and gene—using text mining methods to uncover hidden knowledge in toxicogenomics. BMC Syst Biol 8(1):1–11. Retrieved 14 March 2020 from
Lee, Y. S., Lo, R., Chen, C. Y., Lin, P. C., & Wang, J. C. (2015). News topics categorization using latent dirichlet allocation and sparse representation classifier. In 2015 IEEE international conference consumer Electronics-Taiwan (ICCE-TW) (pp. 136–137). Taiwan. Retrieved 14 March 2020 from
Li, P; He Jun-Qing; Ma, Ch(2016). Short text classification based on Latent Topic Modeling and word embedding. Joint international conference on artificial intelligence and computer engineering. Retrieved 14 March 2020 from
Liu, G., Hu, J., & Wang, H. (2012). A Co-word analysis of digital library field in china. Scientometrics 91(1), 203-217. Retrieved 1 March 2020 from
Masseroli, M., Chicco, D., Pinoli, P. (2012) Probabilistic latent semantic analysis for prediction of gene ontology annotations. The 2012 international joint conference on neural networks (IJCNN), 1–8. Retrieved 1 April 2020 from
Meen, Ch. & Yongjun, Zh. (July 18th 2018). Scientometrics of Scientometrics: Mapping Historical Footprint and Emerging Technologies in Scientometrics, Scientometrics,. Retrieved 12 April 2020 from:
Nadezhda, Y & Aleksey, F (2018). Improving the quality of information retrieval using syntactic analysis of search query. Retrieved 23 April 2020 from:
Neff, M. W., & Corley, E. A. (2009). 35 years and 160,000 articles: a bibliometric exploration of the evolution of ecology. Scientometrics, 80(3), 657–682. Retrieved 23 April 2020 from
Papadimitriou, C, H., Raghavan, P., Tamaki, H., and Vempala, S(1998). Latent semantic indexing: A probabilistic analysis. Journal of Computer and system sciences. 61(2), 217-235. Retrieved 23 Nov 2020 from
Newman, D., Hagedorn, K., Chemudugunta, C., & Smyth, P. (2007). Subject metadata enrichment using statistical topic models. JCDL. Retrieved 23 Nov 2020 from file:///C:/Users/Ariyana/AppData/Local/Temp/Subject_Metadata_Enrichment_using_Statistical_Topi.pdf
Rani, M; Dhar, A; Vyas, O.P(2017). Semi- Automatic terminology ontology learning based on topic modeling. Engineering Application of Artificial Intelligence, 63, 108-125. Retrieved 15 Nov 2020 from
Rokaya, M. (2008). Ranking of field association terms using co-word analysis. Information Processing & Management, 44(2) 738-755. Retrieved 26 Nov 2020 from
Sanandres, E; Madariaga, C; Abello, R(2018). Topic modelling of twitter conversations. Retrieved 26 Nov 2020 from:
Sarioglu, E.; Yadav, K.; Chio, H. (2017). Topic modeling based classification of clinical report. Association for computational linguistics, 67-73. Retrieved 22 Nov 2020 from:
Selvi, M. (2019). Classification of Medical Dataset Along with Topic Modeling Using LDA. In: Nath V., Mandal J. (eds) Nanoelectronics, Circuits and Communication Systems. Lecture Notes in Electrical Engineering, 511. Springer. Retrieved 22 Nov 2020 from
Soergel, D)1974. (Indexing language and thesauri: construction and maintenance. Los Angeles, CA: Melville.
Sohrabi, B.; Raeesi vanani, I.; Baranizade Shineh, M. (2017). Topic Modeling and classification of cyberspace papers using text mining. Cyberspace studies, 2(1), 103- 125. Retrieved 1 Nov 2020 from
Sparck Jones, K. (1971). Automatic Keyword Classification for Information Retrieval. Butterworths: London.
Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440. Retrieved 1 April 2020 from
Suominen, A.; Toivanen, H. (2015). Map of Science with topic modeling: Comparison of unsupervised learning and human- assigned subject classification. Journal of association for information and technology,67(10):2464–2476. Retrieved 12 April 2020 from
Vickery, B.; Vickery, A. (1987). Information science in theory and practice. Berlin, New York: K. G. Saur Retrived 10 Jun from:
Wittgenstein, L. (2010). Philosophical investigations. John Wiley & Sons.
Xing, Y., & James A (2009). A Comparative Study of Utilizing Topic Models for Information Retrieval. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval (ECIR ’09). Springer-Verlag, Berlin, Heidelberg, 29–41. Retrived 10 Jun from DOI:
Xu, Z. (2015) Doctoral dissertations in Chinese Interpreting Studies: A scientometric survey using topic modeling. FORUM. Revue internationale d’interprétation et de traduction / International Journal of Interpretation and Translation.13(1),131-165. Retrived 12 Jun from