یک بررسی تجربی در چارچوب داده های بزرگ An experimental survey on big data frameworks
- نوع فایل : کتاب
- زبان : انگلیسی
- ناشر : Elsevier
- چاپ و سال / کشور: 2018
توضیحات
رشته های مرتبط مهندسی فناوری اطلاعات و مدیریت
گرایش های مرتبط مدیریت سیستم های اطلاعات، سیستم های اطلاعاتی پیشرفته
مجله نسل آینده سیستم های کامپیوتری – Future Generation Computer Systems
دانشگاه University of Tunis El Manar – Faculty of Sciences of Tunis – Tunisia
منتشر شده در نشریه الزویر
کلمات کلیدی انگلیسی Big data, MapReduce, Hadoop, HDFS, Spark, Flink, Storm, Samza, Batch/stream processing
گرایش های مرتبط مدیریت سیستم های اطلاعات، سیستم های اطلاعاتی پیشرفته
مجله نسل آینده سیستم های کامپیوتری – Future Generation Computer Systems
دانشگاه University of Tunis El Manar – Faculty of Sciences of Tunis – Tunisia
منتشر شده در نشریه الزویر
کلمات کلیدی انگلیسی Big data, MapReduce, Hadoop, HDFS, Spark, Flink, Storm, Samza, Batch/stream processing
Description
1. Introduction In recent decades, increasingly large amounts of data are generated from a variety of sources. The size of generated data per day on the Internet has already exceeded two exabytes [1]. Within one minute, 72 h of videos are uploaded to Youtube, around 30.000 new posts are created on the Tumblr blog platform, more than 100.000 Tweets are shared on Twitter and more than 200.000 pictures are posted on Facebook [1]. Big Data problems lead to several research questions such as (1) how to design scalable environments, (2) how to provide fault tolerance and (3) how to design efficient solutions. Most existing tools for storage, processing and analysis of data are inadequate for massive volumes of heterogeneous data. Consequently, there is an urgent need for more advanced and adequate Big Data solutions. Many definitions of Big Data have been proposed throughout the literature. Most of them agreed that Big Data problems share four main characteristics, referred to as the four V’s (Volume, Variety, Veracity and Velocity) [2]. The volume refers to the size of available datasets which typically require distributed storage and processing. The variety refers to the fact that Big Data is composed of several different types of data such as text, sound, image and video. The veracity refers to the biases, noise and abnormality in data. The velocity deals with the place at which data flows in from various sources like social networks, mobile devices and Internet of Things (IoT). In this paper, we first give an overview of most popular and widely used Big Data frameworks which are designed to cope with the above mentioned Big Data problems. We identify some key features which characterize Big Data frameworks. These key features include the programming model and the capability to allow for iterative processing of (streaming) data. We also give a categorization of existing frameworks according to the presented key features. Then, we present an experimental study on Big Data processing systems with several representative batch, stream and iterative workloads. Extensive surveys have been conducted to discuss Big Data Frameworks [3,4,5]. However, our experimental survey differs from existing ones by the fact that it considers performance evaluation of popular Big Data frameworks from different aspects. In our work, we compare the studied frameworks in the case of both batch processing and stream processing which is not studied in existing surveys. We also mention that our experimental study is concluded by some best practices related to the usage of the studied frameworks in several application domains. More specifically, the contributions of this paper are the following: • We present an overview of most popular Big Data frameworks and we categorize them according to some features. • We experimentally evaluate the performance of the presented frameworks and we present a comparative study of them in the case of both batch processing, stream processing. • We highlight best practices related to the use of popular Big Data frameworks in several application domains. The remainder of the paper is organized as follows. In Section 2, we present existing surveys on Big Data frameworks and we highlight the motivation of our work. In Section 3, we discuss existing Big Data frameworks and provide a categorization of them. In Section 4, we present a comparative study of the presented Big Data frameworks and we discuss the obtained results. In Section 5, we present some best practices of the studied frameworks. Some concluding points are given in Section 6.