Throughput Prediction based on ExtraTree for Stream Processing Tasks

Zheng Chu1, Jiong Yu1 and Askar Hamdulla1

  1. School of Information Science and Engineering, Xinjiang University
    Urumqi 830046, PR China
    chuzheng@stu.xju.edu.cn,{yujiong,askar}@xju.edu.cn

Abstract

In the era of big data, as the amount of streaming data continues to increase, stream processing tasks (SPTs) face serious challenges in real-time processing scenarios with low latency and high throughput. However, much of the current literature on the performance of SPTs pays attention to the reactive approach, which cannot well avoid the problem of system crashes due to the inherent performance volatility. In this paper, a novel throughput prediction method based on ExtraTree for SPTs is presented to address these challenges. A volatility detection algorithm was proposed to obtain the reasonable metric values after the performance volatility of SPTs was studied. Moreover, a selection algorithm of regression function was proposed to output the performance values of SPTs under a relative stead state. Furthermore, a ExtraTree-based algorithm was proposed to predict the throughput of SPTs. The experimental results from two open-source benchmarks running on Apache Flink, a popular stream processing system (SPS), indicated that the average of the accuracy and efficiency of the proposed method could achieve 90.535% and 0.835 s/10,000 samples, which proved the effectiveness of the proposed method on the task of predicting the throughput of SPTs.

Key words

streaming data, stream processing tasks, performance prediction, ensemble learning, ExtraTree

Digital Object Identifier (DOI)

https://doi.org/10.2298/CSIS200131031C

Publication information

Volume 18, Issue 1 (January 2021)
Year of Publication: 2021
ISSN: 2406-1018 (Online)
Publisher: ComSIS Consortium

Full text

DownloadAvailable in PDF
Portable Document Format

How to cite

Chu, Z., Yu, J., Hamdulla, A.: Throughput Prediction based on ExtraTree for Stream Processing Tasks. Computer Science and Information Systems, Vol. 18, No. 1, 1–22. (2021), https://doi.org/10.2298/CSIS200131031C