Does Data Science have a distinctive research focus?

08 August 2013 17 8K Report

If 'Data Science' is indeed a new, separate discipline, then it must have a research agenda

Edwin D. Huff Popular answer

I don't know what this particular Data Science program is teaching, but research topics could be much more varied than just statistics, and data mining. Consider that Dr. Deming's notion that whenever there is data collected for a specific purpose, there is a process involved to accomplish that, and wherever there are processes, there is natural variation. One research question from this premise is : How much variation is acceptable for data to be "good" data? What are the characteristics of "good" data? How are they measured? What are reasonable standards, in different domains of data collected for different purposes, for the quality of data? Other research questions that flow from these include: How do the characteristics of data quality impact decision-making - in theory, and then again, in practice? Also, what should the qualification standards be for a new measurement system before it is deployed for use in production? Here's an example from my work in these areas mentioned above: https://www.researchgate.net/publication/13780659_Comprehensive_reliability_assessment_and_comparison_of_quality_indicators_and_their_components?ev=prf_pub

Article Comprehensive Reliability Assessment and Comparison of Quali...

Alessandro Giuliani

Just a different name for the old dear Statistics (more toward its descriptive multidimensional territories than toward the inferential probability lands) , but in these days scientists need advertising to live, and advertising asks for new brands....very sad indeed...

Arturo Geigel

The closest domain to data science is data mining and if you look for example:

* Principles and Theory for Data Mining and Machine Learning byBertrand Clarke · Ernest Fokoue ́ · Hao Helen Zhang

*Data Mining Methods and Models by Daniel T. Larose

*Principles of Data Mining by Max Bramer

The focus is more on an algorithmic perspective than a data oriented field which I think is the focus of Data Science. Where it gets fuzzy is in references such as:

Data Mining Concepts and Techniques by Jiawei Han and Micheline Kamber

where they also cover Data Warehouse and OLAP Technology which I would consider deviates from the above references and more into data science. While this reference does not cover all that would be included it starts to blur the lines of what I would consider topics belonging to data science.

And I would agree with Alessandro that this is more related to re-branding more than anything else, since the other topics in data science are covered in other already established disciplines.

My two cents

Arturo Ortiz Tapia

I agree with Arturo Geigel, that in the sense described by Michael Brimacombe, Data Science may be taken as another name for Data mining, also known as Knowledge Discovery. The main issue is that there is a huge amount of measurements being taken at any moment all the time. Can we make sense out of it? can we transform it into information, and model back this information, so that we can obtain new knowledge?

Robert W Ferguson

Statistical methods have taken us a long way in data mining or data science. Since statistical methods are so closely aligned to measures it has somewhat limited the research. Thus we are expanding the mathematics and looking for applied problems. Let me explain.

Machine learning may be separated into Bayesian Belief models and Rough Set models. If you have studied Bayesian theory, then you should alreay be aware that it has some different fundamental understanding than Fisher's frequentist model. Rough Sets are different again.

The motivation for rough sets is readily understood. I can normally distinguish human gender by a simple observation. However, there exists Androgynous Pat from Saturday Night Live where it is difficult to know with certainty. What set of attributes of observations improves my classification? Rough sets are defined by the concept that things may {belong, not-belong, cannot-distinguish}. The distinction from fuzzy sets is the absence of an appropriate measure.

The mathematical methods of rough sets then are about how to weaken the axioms of "measure" and improve the odds of making correct decisions about belonging.

Since rough set theory is still in its adolescence, it requires reading some pretty academic content. It does show promise over the Bayesian models in some circumstances. Some methods have been shown to have faster convergence than the Bayesian training methods.

Muhammad Riaz

Data science is largely fast evolving new discipline to tackle the current age complex and very large data sets. It has foundation in data warehousing, statistics and high performance computing. It is developing new and sophisticated ways, algorithms and techniques to analyse the huge data sets in simplified and user friendly ways.

Peshawa Jammal Muhammad Ali

I think the data science is a data mining.

Edwin D. Huff

Article Comprehensive Reliability Assessment and Comparison of Quali...

Arturo Geigel

Edwin,

I agree with you that there is more to data science than just data mining, the emphasis in data science is "data". But in this regard some of the boundaries on what should be covered in DM vs. DS courses is blurry at best. if you take data quality it could fall on data preprocessing and knowledge extraction in DM. Even the security of data is still blurry (which I think is a distinction between DM and DS) and some of the security issues are surfacing as adversarial data mining.

Both fields are very young and I think we need to give them time to settle down into their respective boundaries, and I think having this discussion and different opinions is very healthy in defining the field.

Sartaj Alam

Robert Ferguson really added new knowledge to my existing 'rough set'! At the moment we are at a tipping point in terms of data generation, data management and its storage, let alone its consumption in terms of descriptive and inferential analysis with the same pace. Conventional statistical software such as SAS, SPSS, Stata and R are all wrecked even before uploading huge sets of data repository. New tools are the must to venturesome such new territories.

Edwin D. Huff

I agree Arturo, with the newness of these fields, and with the extent that other disciplines can inform them, like: metrology, for learning about how to characterize a measurement system, its reliability, precision, repeatability and reproducibility, etc.; and signal processing, from engineering, which can help data scientists learn how to identify signal from noise, and find patterns.

Susan E Smith

I think Data Science provides an opportunity to integrate and apply some of the highly specialised and technical domains mentioned. To me it would be most valuable to consider it more of an umbrella term since 'science' is a broader classification than KD or DM. Innovation is not shaped like an I, but like a T where we see deep specific knowledge, but also the bridging and linking to different fields. Data Science as a new field has the opportunity to be innovative in this way.

In healthcare, linking Informatics research, KD, & DM with classical epidemiological research methods would be highly beneficial because epidemiology has a greater understanding of the data in the field of medicine and healthcare than the computational sciences alone can - a greater emphasis on the data and it's applied use.

To me, a research agenda should include questions from Implementation Research and Applied Science in how can we better combine the mathematical/computational side to the applied decision-making side eg how do we improve the ability of healthcare to use the tools already developed to create a more informed data-driven system, what are the practical risks and benefits eg as per data quality mentioned, and how do we quantify these in practical use? I think this is where we will see the greatest benefit to society.

Bin Liu

I think data scicence is just state-of-art statistical techniques developed to meet the data analysis requirements in this big data time era.

Mingfei Li

From the meeting I attended recently for data science, it integrates Applied Math / Stats, Computer Science and Business( the goal ) . It does cover data mining, but maybe more than that, I think, especailly in practical problems. It depends on the goal of projects. Some problems needs more computer science technique, such as some big data issue. Some need more methods and algorithms' innovations in operations research or quantititive analysis, which is definitely in statistics and applied math scope. About the research focus, I think it dependes on the problem goal. I agree with Dr. Deming's comments, like Edwin mentioned.

Allan John Brimicombe

Thank you for all your answers so far - I'm on holiday at the moment and will respond fuller as recreation permits (!). However, looking at the subtext to my question (above), quite a lot got missed off that would have clarified my question. So here it is in full:

If 'Data Science' is indeed a new, separate discipline, then it must have a research agenda that distinguishes it from other disciplines. What then distinguishes Data Science from its nearest neighbours. What are the fundamental research topics that make it distinctive? Or is it that its methods and objects of study fundamentally different?

Without some clarity here, Data Science may be seen merely as a re-branding.

It may of course just fill a gap that the Venn diagram of other disciplines leaves vacant or has opened up and therefore borrows from many but adds of its own so that the whole is greater than the sum of the parts....and therefore worthwhile pursuing. I think Data Science goes beyond technique and algorithm to processes of a data value pipeline which would include issues of data accuracy/uncertainty, metadata, security, privacy, legal oversight, business models, big data, open data and so on. I also view 'data' very broadly as being numerical, text, spatial, audio, image, video - anything that can be analysed, co-analysed in the production of new understandings and knowledge.

Edwin Huff says "I don't know what this particular Data Science program is teaching"...well, as with any question there is an ulterior motive in asking. I have recently opened a Professional Doctorate programme in Data Science (D.DataSc.), the content of which was crowd-sourced from a previous question I asked the Research Gate community, though the structure has to conform to my university's model for such programmes. Details can be found here:

http://www.uel.ac.uk/geo-information/DataScience_ProfDoc.htm

But I value your opinions and want to hear more of them....

Allan John Brimicombe

Could a snappy defintion of Data Science be: "the production of value from data". To extract value requires attention to the whole process chain and relevant technologies from data gathering/harvesting to information consumption. (?)

Alessandro Giuliani

Data Science IS a rebranding, all the activities mentioned along this debate in terms of quality control of data, visualization, data mining...are required to any average student in Statistics when he/she takes his/her degree. Or at least this was the situation here in Roma until few years ago..I hope, notwithstanding the global degradation of culture taking place in these last years this is still the case...

Badges
Science topic

Similar topics
Computer Science
Data Mining

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Which Scopus Journal provides the most affordable fees?

"PUBLISHING IN A SCOPUS JOURNAL" Researchers are now at a cross road. The critical need to publish in a Scopus or ISI, etc journal is ever vital. Journal Publication fees must be submitted....

10 August 2024 8,621 1 View

Seeking Advice on Viability and Execution of Undergraduate Thesis Topic?

Hello everyone, I am currently developing a thesis proposal and would appreciate your input on its viability and how to effectively carry it out. My proposed topic is: "Does the perceived threat...

10 August 2024 8,992 0 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Is this a facetotecta nauplius?

This larva was captured using a plankton net in the Persian Gulf during the summer. I believe it may be a Facetotecta nauplius.

08 August 2024 3,746 4 View