سیری در آمارهای کلان داده / Journeys in big data statistics

سیری در آمارهای کلان داده Journeys in big data statistics

  • نوع فایل : کتاب
  • زبان : انگلیسی
  • ناشر : Elsevier
  • چاپ و سال / کشور: 2018

توضیحات

رشته های مرتبط آمار، مدیریت
گرایش های مرتبط مدیریت فناوری اطلاعات
مجله آمار و احتمال – Statistics & Probability Letters
دانشگاه School of Mathematical Sciences – University of Nottingham – UK

منتشر شده در نشریه الزویر
کلمات کلیدی انگلیسی Big data, Object-oriented data, Transport, Networks

Description

1. A new natural resource We will be the first to admit that it is difficult to keep up. How can you expect someone who is trained in dealing with datasets of n = 30 observations with p = 3 variables to suddenly cope with a 100 K-fold increase of n = 3 000 000 observations and p = 300 000 for example, or even worse? Everything has to change. Summarizing a dataset becomes a  major computational challenge and p-values take on a ludicrous role where everything is significant. Yet dealing with a  wide range of sizes of datasets has become vital for the modern statistician.  Virginia Rometty, chairman, president and chief executive officer of IBM said the following at Northwestern University’s 157th commencement ceremony in 2015:  What steam was to the 18th century, electricity to the 19th and hydrocarbons to the 20th, data will be to the 21st century.  That’s why I call data a new natural resource. The need to make sense of the huge rich seams of data being produced underlines the great importance of Statistics, Mathematical and Computational Sciences in today’s society. But what is ‘new’ about data? Data has been used for centuries, for example data collected on the first bloom of cherry blossoms in Kyoto, Japan starting in 800AD and now highlighting 13 climate change (Aono, 2017); Gauss’ meridian arc measurements in 1799 used to define the metre (Stigler, 1981); and Florence Nightingale’s 1859 mortality data and graphical rose diagram presentation on causes of death in the Crimean War leading to modern nursing practice (Nightingale, 1859). All of these old, small datasets are at the core of important issues for mankind, so it is not the data or its importance but the size, structure and ubiquity of data that is new. Many of the challenges in the new world of Statistics in the Age of Big Data are of a different nature from traditional scenarios. Statisticians are used to dealing with bias and uncertainty, but how can this be handled when datasets are so large 19 and collected in the wild without traditional sampling protocols? What do you do with all the data is an important question. The last 20 years has seen an explosion of statistical methodology to handle large p, often with sparsity assumptions (Hastie et al., 2015). Large n used to be the realm of careful asymptotic theory or thought experiments, but in reality one often does 2 encounter large n now in practice.
اگر شما نسبت به این اثر یا عنوان محق هستید، لطفا از طریق "بخش تماس با ما" با ما تماس بگیرید و برای اطلاعات بیشتر، صفحه قوانین و مقررات را مطالعه نمایید.

دیدگاه کاربران


لطفا در این قسمت فقط نظر شخصی در مورد این عنوان را وارد نمایید و در صورتیکه مشکلی با دانلود یا استفاده از این فایل دارید در صفحه کاربری تیکت ثبت کنید.

بارگزاری