کلان داده: برخی از مسائل آماری / Big data: Some statistical issues

کلان داده: برخی از مسائل آماری Big data: Some statistical issues

  • نوع فایل : کتاب
  • زبان : انگلیسی
  • ناشر : Elsevier
  • چاپ و سال / کشور: 2018

توضیحات

رشته های مرتبط آمار

مجله آمار و احتمال نامه ها – Statistics and Probability Letters
دانشگاه Medical Research Council Population Health Research Unit – University of Oxford – UK
شناسه دیجیتال – doi https://doi.org/10.1016/j.spl.2018.02.015
منتشر شده در نشریه الزویر

Description

1 Introduction Over the last 125 years computational techniques have evolved from slide rule and log tables, through hand operated machines like the Brunsviga, to electric desk-top machines, and from them to modern computers, at first complex to use and limited in scope and then to the ever expanding modern ubiquitous version. The development of statistical technique and theory over that time has mirrored and been strongly influenced by that growth in computer power and availability. Big data have been around a long time, for example in population censuses. In an engineering context, paper traces recorded such properties as the stress at various points in an aircraft wing during flight. In a manufacturing context, the mass per unit length of textile yarn was recorded. These examples produced very large amounts of data for visual inspection, but in the past suitable for quantitative analysis at most on a sampling basis. Three questions that characterize today’s big data are largely absent from these earlier contexts. In outline the questions are: Are the data relevant for the purpose of the investigation? Is the data quality adequate for its intended purpose? Is the detailed statistical analysis appropriate, in particular is the assessment of the precision of the conclusions seriously overoptimistic? Sometimes the first two aspects may be inverted: the data are available, for what are they useful? We comment on these issues largely, but not entirely, from an epidemiological perspective. In an epidemiological context, large data sets with many individuals arise from routinely collected medical records, from cohorts assembled with a defined objective, and from registries of patients with specific conditions. Some large population-based studies are of mixed type, in that they are cohorts with a purpose-built baseline data set augmented by linkage to routinely collected records or registries. Many aspects of study design and analysis are common to large and not-so-large sets of data but the achievement of high quality in large sets of data may be a particular challenge. There are a number of conceptual aspects of a study all of which may have statistical implications. These are: Question formulation; Choice of study population; Study design; Metrology; Data collection; Monitoring and quality control; Data analysis; Presentation of conclusions; Interpretation. When big data are involved all of these may raise special features. Here we concentrate largely but not entirely on the aspects prior to data analysis.
اگر شما نسبت به این اثر یا عنوان محق هستید، لطفا از طریق "بخش تماس با ما" با ما تماس بگیرید و برای اطلاعات بیشتر، صفحه قوانین و مقررات را مطالعه نمایید.

دیدگاه کاربران


لطفا در این قسمت فقط نظر شخصی در مورد این عنوان را وارد نمایید و در صورتیکه مشکلی با دانلود یا استفاده از این فایل دارید در صفحه کاربری تیکت ثبت کنید.

بارگزاری