Classification Models for PUP BS Chemistry Thesis Abstracts Using Latent Dirichlet Allocation (LDA) Topic Modelling

Authors

  • Chester C. Deocaris
  • Evelyn M. Matchete
  • Jan Bernel P. Padolina

Keywords:

chemistry, thesis, text mining, topic modeling, latent dirichlet allocation

Abstract

Chemistry is dubbed as the ‘central science’ due to its convergence with other disciplines of science such as biology, physics, and mathematics. In addition, there are several branches within the science of Chemistry (eg. Organic, Inorganic, Biochemistry, Analytical, Physical Chemistry etc.) which contributes to the diversity of this field. Undergraduate students of Chemistry study two or more branches of this science in the form of courses as part of their BS program. BS Chemistry students are required to conduct a research in the form of a thesis related to student’s chosen theme or topic within the different branches of Chemistry. In this paper, we propose a taxonomy of the thesis topics produced by BS Chemistry students of the Polytechnic University of the Philippines (PUP) from 2014 to 2017. A corpus consisting of 74 thesis documents was examined in the study. Using text mining methods and the Latent Dirichlet Allocation (LDA) topic modeling, taxonomy models composed of 27-33 topics were generated. The taxon with stemmed keywords “adsorpt, adsorb, isotherm, model, studi, kinet, and metal” was the most frequent thesis topic among the PUP BS Chemistry students. This topic makes up 9 (12%) to 12 (16%) of the thesis documents. Reading through the actual abstracts, the topics were mostly on the study of the adsorption and/or absorption kinetics of metals to various heavy metals ions. These concepts are usually covered in General Chemistry 2, Inorganic and Physical Chemistry courses.

Published

2018-11-18