Assessing Big Data SQL Frameworks for Analyzing Event Logs

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Standard

Assessing Big Data SQL Frameworks for Analyzing Event Logs. / Hinkka, Markku; Lehto, Teemu; Heljanko, Keijo.

Proceedings - 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2016. Institute of Electrical and Electronics Engineers, 2016. p. 101-108 7445319.

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Harvard

Hinkka, M, Lehto, T & Heljanko, K 2016, Assessing Big Data SQL Frameworks for Analyzing Event Logs. in Proceedings - 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2016., 7445319, Institute of Electrical and Electronics Engineers, pp. 101-108, Euromicro International Conference on Parallel, Distributed, and Network-Based Processing , Heraklion, Greece, 17/02/2016. https://doi.org/10.1109/PDP.2016.26

APA

Hinkka, M., Lehto, T., & Heljanko, K. (2016). Assessing Big Data SQL Frameworks for Analyzing Event Logs. In Proceedings - 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2016 (pp. 101-108). [7445319] Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/PDP.2016.26

Vancouver

Hinkka M, Lehto T, Heljanko K. Assessing Big Data SQL Frameworks for Analyzing Event Logs. In Proceedings - 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2016. Institute of Electrical and Electronics Engineers. 2016. p. 101-108. 7445319 https://doi.org/10.1109/PDP.2016.26

Author

Hinkka, Markku ; Lehto, Teemu ; Heljanko, Keijo. / Assessing Big Data SQL Frameworks for Analyzing Event Logs. Proceedings - 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2016. Institute of Electrical and Electronics Engineers, 2016. pp. 101-108

Bibtex - Download

@inproceedings{38232fe92fcc41afbb679e66243c12f8,
title = "Assessing Big Data SQL Frameworks for Analyzing Event Logs",
abstract = "Performing Process Mining by analyzing event logs generated by various systems is a very computation and I/O intensive task. Distributed computing and Big Data processing frameworks make it possible to distribute all kinds of computation tasks to multiple computers instead of performing the whole task in a single computer. This paper assesses whether contemporary structured query language (SQL) supporting Big Data processing frameworks are mature enough to be efficiently used to distribute computation of two central Process Mining tasks to two dissimilar clusters of computers providing BPM as a service in the cloud. Tests are performed by using a novel automatic testing framework detailed in this paper and its supporting materials. As a result, an assessment is made on how well selected Big Data processing frameworks manage to process and to parallelize the analysis work required by Process Mining tasks.",
keywords = "automatic business process discovery, distributed computing framework, distributed SQL, event log analysis, Hadoop, Hive, Presto, process mining, Spark",
author = "Markku Hinkka and Teemu Lehto and Keijo Heljanko",
year = "2016",
month = "3",
day = "31",
doi = "10.1109/PDP.2016.26",
language = "English",
isbn = "9781467387750",
pages = "101--108",
booktitle = "Proceedings - 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2016",
publisher = "Institute of Electrical and Electronics Engineers",
address = "United States",

}

RIS - Download

TY - GEN

T1 - Assessing Big Data SQL Frameworks for Analyzing Event Logs

AU - Hinkka, Markku

AU - Lehto, Teemu

AU - Heljanko, Keijo

PY - 2016/3/31

Y1 - 2016/3/31

N2 - Performing Process Mining by analyzing event logs generated by various systems is a very computation and I/O intensive task. Distributed computing and Big Data processing frameworks make it possible to distribute all kinds of computation tasks to multiple computers instead of performing the whole task in a single computer. This paper assesses whether contemporary structured query language (SQL) supporting Big Data processing frameworks are mature enough to be efficiently used to distribute computation of two central Process Mining tasks to two dissimilar clusters of computers providing BPM as a service in the cloud. Tests are performed by using a novel automatic testing framework detailed in this paper and its supporting materials. As a result, an assessment is made on how well selected Big Data processing frameworks manage to process and to parallelize the analysis work required by Process Mining tasks.

AB - Performing Process Mining by analyzing event logs generated by various systems is a very computation and I/O intensive task. Distributed computing and Big Data processing frameworks make it possible to distribute all kinds of computation tasks to multiple computers instead of performing the whole task in a single computer. This paper assesses whether contemporary structured query language (SQL) supporting Big Data processing frameworks are mature enough to be efficiently used to distribute computation of two central Process Mining tasks to two dissimilar clusters of computers providing BPM as a service in the cloud. Tests are performed by using a novel automatic testing framework detailed in this paper and its supporting materials. As a result, an assessment is made on how well selected Big Data processing frameworks manage to process and to parallelize the analysis work required by Process Mining tasks.

KW - automatic business process discovery

KW - distributed computing framework

KW - distributed SQL

KW - event log analysis

KW - Hadoop

KW - Hive

KW - Presto

KW - process mining

KW - Spark

UR - http://www.scopus.com/inward/record.url?scp=84968928128&partnerID=8YFLogxK

U2 - 10.1109/PDP.2016.26

DO - 10.1109/PDP.2016.26

M3 - Conference contribution

SN - 9781467387750

SP - 101

EP - 108

BT - Proceedings - 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2016

PB - Institute of Electrical and Electronics Engineers

ER -

ID: 2548969