The advent of digitalization has brought a massive proliferation of unstructured data, producing vast repositories of textual data, from various sources, such as Web sites, academic publications, news articles, blog posts, e-mail, corporate communication platforms, reports, and social media feeds. This proliferation coupled with the upsurge in mobile and Web technologies alongside ever-improving connectivity has led to various digital platforms and applications rapidly achieving mass-market penetration. With the production of textual and other forms of unstructured data certain to continue at unprecedented rates for the foreseeable future, this availability on massive scale presents both opportunities and challenges that researchers and practitioners must address. Ability to utilize text data on a large scale not only provides better coverage in terms of sample size but also opens opportunities to build a deeper understanding of phenomena that otherwise are simply unobservable, "hidden in the noise.'' However, as the world races towards high-volume production, distribution, and consumption of digital text, information systems (IS) researchers are proving slow to start reaping the potential of analyzing textual data. There is an urgent need for methods and techniques that can meet the challenge of analyzing vast bodies of textual data. In an effort to demonstrate potential application of text-mining methods in information systems research, the dissertation presents essays that address large-scale text-based datasets' use in literature analysis and studies of system-specific behavioral outcomes. The first essay deals with identifying the research themes presented in a large body of publications on cloud computing, and the second essay demonstrates the machine-based classification of papers in leading information-systems journals. Of the behavior-focused pieces, the third essay utilizes user-generated content to illustrate system-driven viewing outcomes in the context of binge watching of television shows, and the final essay examines a large volume of content connected with a business-to-business Web portal, reporting on a study of browsing-device-linked differences in interest in marketing material. In addition to the individual essays, the dissertation contributes to the scholarly discussion of text-mining research issues in three important ways. Firstly, it presents a conceptual framework that aids in revealing the fundamentals of text-mining research in terms of two dimensions: research objective and level of text analysis. Secondly, the four essays provide concrete demonstrations of various suitable applications of text-mining. Finally, the dissertation examines the implications of the work, highlighting specific issues and challenges pertaining to text-mining research. The findings and implications of this work should benefit IS researchers and practitioners striving to exploit large volume of textual data.
|Publication status||Published - 2019|
|MoE publication type||G5 Doctoral dissertation (article)|
- text mining, information systems, systematic review, topic models, text classification, word embedding, system-driven behaviour, social media