Created at 7pm, Jan 17
cyranodbTechnology
0
50 Years of Data Science
C8CeuJl0A6ThmrovnjgPhCS4vIY8sRgDX0uD84n4ZqA
File Type
PDF
Entry Count
187
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

ABSTRACTMore than 50 years ago, John Tukey called for a reformation of academic statistics. In “The Future of DataAnalysis,” he pointed to the existence of an as-yet unrecognizedscience, whose subject of interest waslearning from data, or “data analysis.” Ten to 20 years ago, John Chambers, Jeff Wu, Bill Cleveland, andLeo Breiman independently once again urged academic statistics to expand its boundaries beyond theclassical domain of theoretical statistics; Chambers called for more emphasis on data preparation andpresentation rather than statistical modeling; and Breiman called for emphasis on prediction rather thaninference. Cleveland and Wu even suggested the catchy name “data science” for this envisioned field. Arecent and growing phenomenon has been the emergence of “data science”programs at major universities,including UC Berkeley, NYU, MIT, and most prominently, the University of Michigan, which in September2015 announced a $100M “Data Science Initiative” that aims to hire 35 new faculty. Teaching in these new programs has significant overlap incurricular subject matter with traditional statistics courses;yetmanyaca-demic statisticians perceive the new programs as “cultural appropriation.”This article reviews some ingredi-ents of the current “data science moment,”including recent commentary about data science in the popularmedia, and about how/whether data science is really different from statistics. The now-contemplated fieldof data science amounts to a superset of the fields of statistics and machine learning, which adds sometechnology for “scaling up”to “big data.”This chosen superset is motivated by commercial rather than intel-lectual developments. Choosing in this way is likely to miss out on the really important intellectual eventof the next 50 years. Because all of science itself will soon become data that can be mined, the imminentrevolution in data science is not about mere “scaling up,”but instead the emergence of scientific studies ofdata analysis science-wide. In the future, we will be able to predict how a proposal to change data analysisworkflows would impact the validity of data analysis across all of science, even predicting the impacts field-by-field. Drawing on work by Tukey, Cleveland, Chambers, and Breiman, I present a vision of data sciencebased on the activities of people who are “learning from data,”and I describe an academic field dedicatedto improving that activity in an evidence-based manner. This new field is a better academic enlargement ofstatistics and machine learning than today’s data science initiatives, while being able to accommodate thesame short-term goals.David DonohoDepartment of Statistics, Stanford University, Standford, CA

Tukey proposed that a science of data analysis exists and should be recognized as among the most complicated of all sciences. He advocated the study of what data analysts in the wild are actually doing, and reminded us that the true effectiveness of a tool is related to the probability of deployment times the probability of effective results once deployed.43 Data scientists are doing science about data science when they identify commonly occurring analysis/processing workflows, for example, using data about
id: 765a316f636132efb26cde36b3925c5b - page: 13
It is striking how, when I review a presentation on todays data science, in which statistics is supercially given pretty short shrift, I cannot avoid noticing that the underlying tools, examples, and ideas which are being taught as data science were all literally invented by someone trained in Ph.D. statistics, and in many cases the actual software being used was developed by someone with an MA or Ph.D. in statistics. The accumulated eorts of statisticians over centuries are just too overwhelming to be papered over completely, and cannot be hidden in the teaching, research, and exercise of Data Science.
id: 5460f720abbd2aabb38208b40538bd3c - page: 13
Leo Breiman () was correct in pointing out that academic statistics departments (at that time, and even since) have under-weighted the importance of the predictive culture in courses and hiring. It clearly needs additional emphasis. Data analysis per se is probably too narrow a term, because it misses all the automated data processing that goes on under the label of data science about which we can also make scientic studies of behavior in the wild.
id: c803b97b93df79748cc449fdc6e39019 - page: 13
The scope here also includes foundational work to make future such science possiblesuch as encoding documentation of individual analyses and conclusions in a standard digital format for future harvesting and meta-analysis. As data analysis and predictive modeling becomes an ever more widely distributed global enterprise, science about data science will grow dramatically in significance.
id: ebb379d982fcf734ce769ac8ba9e210e - page: 13
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "C8CeuJl0A6ThmrovnjgPhCS4vIY8sRgDX0uD84n4ZqA", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "C8CeuJl0A6ThmrovnjgPhCS4vIY8sRgDX0uD84n4ZqA", "level": 2}'