SemLinker: اتوماسیون یکپارچگی کلان داده ها برای کاربران تصادفی / SemLinker: automating big data integration for casual users

SemLinker: اتوماسیون یکپارچگی کلان داده ها برای کاربران تصادفی SemLinker: automating big data integration for casual users

  • نوع فایل : کتاب
  • زبان : انگلیسی
  • ناشر : Springer
  • چاپ و سال / کشور: 2018

توضیحات

رشته های مرتبط کامپیوتر، فناوری اطلاعات
گرایش های مرتبط داده کاوی
مجله کلان داده – Journal of Big Data
دانشگاه School of Computer Science and Informatics – Cardif University – UK

منتشر شده در نشریه اسپرینگر
کلمات کلیدی انگلیسی  Data integration, Big data, Data lake, Modeling, Schema evolution, Schema mapping, Metadata management

Description

Introduction Big data is growing rapidly from an increasing plurality of sources, ranging from machine-generated content such as purchase transactions and sensor streams, to human-generated content such as social media and product reviews. Although much of these data are accessible online, their integration is inherently a complex task, and, in most cases, is not performed fully automatically but through manual interactions [1, 2]. Typically, data must go through a process called ETL (Extract, Transform, Load) [3] where they are extracted from their sources, cleaned, transformed, and mapped to a common data model before they are loaded into a central repository, integrated with other data, and made available for analysis. Recently the concept of a data lake [4], a fat repository framework that holds a vast amount of raw data in their native formats including structured, semi-structured, and unstructured data, has emerged in the data management feld. Compared with the monolithic view of a single data model emphasized by the ETL process, a data lake is a more dynamic environment that relaxes data capturing constraints and defers data modeling and integration requirements to a later stage in the data lifecycle, resulting in an almost unlimited potential for ingesting and storing various types of data despite their sources and frequently changing schemas, which are often not known in advance [5]. In one of our earlier papers [6], we propose personal data lake (PDL), an exemplar of this fexible and agile storage solution. PDL ingests raw personal data scattered across a multitude of remote data sources and stores them in a unifed repository regardless of their formats and structures. Although a data lake like PDL, to some extent, contributes towards solving the big data variety challenge, data integration remains an open problem. PDL allows its users to ingest raw data instances directly from the data sources, but the data extraction and integration workfow, without predefned schemas or machine-readable semantics to describe the data, is not straightforward. Often the user has to study the documentation of each data source to enable suitable integration [7]. An enterprise data lake system built with Hadoop [8] would rely on professionals and experts playing active roles in the data integration workfow. PDL, however, is designed for ordinary people, and has no highly trained and skilled IT personnel to physically manage its contents. To this end, equipping PDL with an efcient and easy-to-use data integration solution is essential for casual users and allows them to process, query, and analyze their data, and to gain insights for supporting their decision-making [9].
اگر شما نسبت به این اثر یا عنوان محق هستید، لطفا از طریق "بخش تماس با ما" با ما تماس بگیرید و برای اطلاعات بیشتر، صفحه قوانین و مقررات را مطالعه نمایید.

دیدگاه کاربران


لطفا در این قسمت فقط نظر شخصی در مورد این عنوان را وارد نمایید و در صورتیکه مشکلی با دانلود یا استفاده از این فایل دارید در صفحه کاربری تیکت ثبت کنید.

بارگزاری