A Self-Aggregated Hierarchical Topic Model for Short Texts


Yue Niu and Hongjie Zhang, University of Science and Technology of China, China


With the growth of the internet, short texts such as tweets from Twitter, news titles from the RSS, or comments from Amazon have become very prevalent. Many tasks need to retrieve information hidden from the content of short texts. So ontology learning methods are proposed for retrieving structured information. Topic hierarchy is a typical ontology that consists of concepts and taxonomy relations between concepts. Current hierarchical topic models are not specially designed for short texts. These methods use word co-occurrence to construct concepts and general-special word relations to construct taxonomy topics. But in short texts, word cooccurrence is sparse and lacking general-special word relations. To overcome this two problems and provide an interpretable result, we designed a hierarchical topic model which aggregates short texts into long documents and constructing topics and relations. Because long documents add additional semantic information, our model can avoid the sparsity of word cooccurrence. In experiments, we measured the quality of concepts by topic coherence metric on four real-world short texts corpus. The result showed that our topic hierarchy is more interpretable than other methods.


Hierarchical Topic Model, Texts Analysis, Short Texts, Data Mining.

Full Text  Volume 11, Number 12