Abstract
Today we seldom suffer from lack of information; on the contrary, we often suffer from too much information. As a consequence, important information might go unnoticed, which of course is harmful for individuals, companies, and the economy as a whole. To alleviate the current situation, tools for analyzing financial news are developed in this dissertation. This thesis consists of an introductory part and six research essays. These essays cover three different aspects of these matters. The first two essays cover the data mining and document filtering aspects. In Essay 1, the Wiki-SR method is presented. This approach uses Wikipedia to calculate the relatedness between two concepts, which enhances search queries by implicitly expanding them. This essay also introduces a framework that allows for multiple models in order to improve document modeling. Essay 2 presents a modified Wilks' lambda technique for finding the concepts that best describe a specific document. Even if the proposed approach is light-weight, it is still very efficient. The second group of essays focuses on sentiment analysis. Essay 3 presents an approach that parses sentences and detects any words that might change the polarity of a sentiment-bearing word. This approach shows a significant improvement in accuracy of the analysis. The result was verified with our manually annotated sentiment corpus. A more advanced sentiment corpus was published in Essay 4. This new dual-layer corpus is annotated on both the document and sentence level. As it also allows multiple sentiment-bearing entities in the same sentence, more advanced techniques can be developed. Both corpora are publicly available, and they alleviate the current lack of method evaluation sets in the financial domain. The last two essays put this research in context. Essay 5 studies the research done in the field of sentiment analysis over the last decade. When the keywords given by authors and publishers are compared and the wording of titles and abstracts is analyzed, there are four distinctive areas of interest. Two of them are related to techniques used for sentiment analysis (sentiment classification and sentiment lexicon), and two are common domains of the analysis (reviews and social media). Essay 6 describes the steps needed for a computational approach to financial news analysis as well as commonly used tools and resources.
Translated title of the contribution | Semantic Content Filtering and Sentiment Analysis for Financial News |
---|---|
Original language | English |
Qualification | Doctor's degree |
Awarding Institution |
|
Supervisors/Advisors |
|
Publisher | |
Print ISBNs | 978-952-60-7097-1 |
Electronic ISBNs | 978-952-60-7096-4 |
Publication status | Published - 2016 |
MoE publication type | G5 Doctoral dissertation (article) |
Keywords
- data mining
- document filtering
- text analysis
- sentiment detection
- sentiment corpora