Statistics processes data. If the data is meaningful, it turns into information. (It should be noted. It is provided by information systems. Not those systems that the authors call information. And those that are really information systems.) If the information is suitable for decision-making to achieve goals or for management, it is already knowledge. Work with knowledge is provided by intellectual systems (artificial intelligence systems). Artificial intelligence systems provide the transformation of empirical data into information, and its knowledge and solution based on this knowledge of a variety of tasks. These tasks can be divided into three groups: 1. Classification problems (recognition, identification, forecasting, diagnostics, etc.). 2. Decision-making tasks. 3. Objectives of the study of the simulated subject area by studying its model.
Статистика обрабатывает данные. Если данные осмысленны - то они превращаются в информацию. (Надо заметить. Это обеспечивают информационные системы. Не те системы, которые авторы называют информационными. А те, которые действительно являются информационными системами.) Если информация пригодна для принятия решений по достижению целей или для управления, то это уже знания. Работу со знаниями обеспечивают интеллектуальные системы (системы искусственного интеллекта). Системы искусственного интеллекта обеспечивают преобразование эмпирических данных в информацию, а ее в знание и решение на основе этих знаний разнообразных задач. Эти задачи можно разделить на три группы: 1. Задачи классификации (распознавания, идентификации, прогнозирования, диагностики и т.д.). 2. Задачи принятия решений. 3. Задачи исследования моделируемой предметной области путем исследования ее модели.
There are many artificial intelligence systems. Universal cognitive analytical system "Eidos-x++" differs from them in the following parameters:
- developed in a universal setting, independent of the subject area. Therefore, it is universal and can be applied in many subject areas (http://lc.kubagro.ru/aidos/index.htm);
- is in full open free access (http://lc.kubagro.ru/aidos/_Aidos-X.htm), and with relevant source texts (http://lc.kubagro.ru/__AIDOS-X.txt);
- is one of the first domestic systems of artificial intelligence of the personal level, i.e. it does not require special training from the user in the field of artificial intelligence technologies (there is an act of introduction of the "Eidos" system of 1987) (http://lc.kubagro.ru/aidos/aidos02/PR-4.htm);
- provides stable detection in a comparable form of force and direction of cause-and-effect dependencies in incomplete noisy interdependent (nonlinear) data of very large dimension of numerical and non-numerical nature, measured in different types of scales (nominal, ordinal and numerical) and in different units of measurement (i.e. does not impose strict requirements on the data that can not be performed, and processes the data that is);
- contains a large number of on-premises (supplied with the installation) and cloud-based educational and scientific applications (currently 31 and 143, respectively) (http://lc.kubagro.ru/aidos/Presentation_Aidos-online.pdf);
- provides multi-language interface support in 44 languages. Language databases are included in the installation and can be replenished automatically;
- supports on-line learning environment and is widely used all over the world (http://aidos.byethost5.com/map5.php);
- the most computationally intensive operations of model synthesis and recognition are realized with the help of a graphics processor (GPU), which on some tasks provides acceleration of solving these problems by several thousand times, which really provides intelligent processing of big data, big information and big knowledge;
- provides conversion of initial empirical data into information, and its knowledge and solution using this knowledge of classification problems, decision support and research of the subject area by studying its system-cognitive model, while generating a very large number of tabular and graphical output forms (development of cognitive graphics), many of which have no analogues in other systems (examples of forms can be found in: http://lc.kubagro.ru/aidos/aidos18_LLS/aidos18_LLS.pdf)
Существует много систем искусственного интеллекта. Универсальная когнитивная аналитическая система «Эйдос-Х++» отличается от них следующими параметрами:
- разработана в универсальной постановке, не зависящей от предметной области. Поэтому она является универсальной и может быть применена во многих предметных областях (http://lc.kubagro.ru/aidos/index.htm);
- находится в полном открытом бесплатном доступе (http://lc.kubagro.ru/aidos/_Aidos-X.htm), причем с актуальными исходными текстами (http://lc.kubagro.ru/__AIDOS-X.txt);
- является одной из первых отечественных систем искусственного интеллекта персонального уровня, т.е. она не требует от пользователя специальной подготовки в области технологий искусственного интеллекта (есть акт внедрения системы «Эйдос» 1987 года) (http://lc.kubagro.ru/aidos/aidos02/PR-4.htm);
- обеспечивает устойчивое выявление в сопоставимой форме силы и направления причинно-следственных зависимостей в неполных зашумленных взаимозависимых (нелинейных) данных очень большой размерности числовой и не числовой природы, измеряемых в различных типах шкал (номинальных, порядковых и числовых) и в различных единицах измерения (т.е. не предъявляет жестких требований к данным, которые невозможно выполнить, а обрабатывает те данные, которые есть);
- содержит большое количество локальных (поставляемых с инсталляцией) и облачных учебных и научных приложений (в настоящее время их 31 и 143, соответственно) (http://lc.kubagro.ru/aidos/Presentation_Aidos-online.pdf);
- обеспечивает мультиязычную поддержку интерфейса на 44 языках. Языковые базы входят в инсталляцию и могут пополняться в автоматическом режиме;
- поддерживает on-line среду накопления знаний и широко используется во всем мире (http://aidos.byethost5.com/map5.php);
- наиболее трудоемкие в вычислительном отношении операции синтеза моделей и распознавания реализует с помощью графического процессора (GPU), что на некоторых задачах обеспечивает ускорение решение этих задач в несколько тысяч раз, что реально обеспечивает интеллектуальную обработку больших данных, большой информации и больших знаний;
- обеспечивает преобразование исходных эмпирических данных в информацию, а ее в знания и решение с использованием этих знаний задач классификации, поддержки принятия решений и исследования предметной области путем исследования ее системно-когнитивной модели, генерируя при этом очень большое количество табличных и графических выходных форм (развития когнитивная графика), у многих из которых нет никаких аналогов в других системах (примеры форм можно посмотреть в работе: http://lc.kubagro.ru/aidos/aidos18_LLS/aidos18_LLS.pdf)
It is transdisciplinary, since you need both the CS and the stats modules. I would place it under computer science, because the statistical component, although critical, should not be the main thrust of the course. Students who have a solid statistical foundation can learn the additional statistics they need to know for a specific problem. However, learning the additional computer science parts I find is more difficult. I have also found that my engineering students struggled more with the CS than the stats.
The data science employs mathematics, statistics and computer science disciplines, and incorporates techniques like machine learning, cluster analysis, data mining and visualization.
Computer sciences and statistics are two disciplines within the technical aspects which are required from a successful data scientist. This is essential for a data science to be knowledgable in the math. Apart from that, you must account for three more aspects: programming/modelling, business understanding and communication/presentation. Programming skills of a script language e.g., R/Python or even DB mannipulation (SQL) are essential for implementation, bussiness domain knowledge is essential in practice for any analytics project and communication is essential in order to creat impact within the considered business.
Data science is a blend of skills in three major areas:
1. Mathematics Expertise
2. Technology
3. Strong Business Strategy
Today, successful data professionals understand that they must advance past the traditional skills of analyzing large amounts of data, data mining, and programming skills. In order to uncover useful intelligence for their organizations, data scientists must master the full spectrum of the data science life cycle and possess a level of flexibility and understanding to maximize returns at each phase of the process.
As pointed out in the other's comments, it is a blend of statistics and CS. I would say that most techniques are of statistical origin, but nonetheless, being forced to select one, I would put it under CS. The reason is the over-arching ontology. Statistics is foremost concerned with induction while CS and Data Science also use abduction. Hence, in terms of philosophy of science CS and DS are more alike.
I think we should choose information theory, because it's not just about computer science, it's about statistics.
Луценко Е.В. Решение задач статистики методами теории информации / Е.В. Луценко // Политематический сетевой электронный научный журнал Кубанского государственного аграрного университета (Научный журнал КубГАУ) [Электронный ресурс]. – Краснодар: КубГАУ, 2015. – №02(106). С. 1 – 47. – IDA [article ID]: 1061502001. – Режим доступа: http://ej.kubagro.ru/2015/02/pdf/01.pdf, 2,938 у.п.л.
It depends on the goal of teaching/learning data science: industry or academia.
If it aims to solve practical problems in industry, data science subject should train the skills of data ELT, data engineering, data analysis, visualization and deliver as a data product (software engineer) for application. In this sense, DS should be in computer science subject. It provides complete workflow of data science work, also this is the expectation of a data scientist in industry. In this sense, data science is more technique than science. Some people said this kind of data scientist is "unicorn". It might be true recently because the education has not developed as fast as industry. But in the long run, one can and should develop full-stack of skills in this field.
On the other hand, if the goal is only focus on the performance of data analysis part, it would say statistics. One can deep dive models using statistics knowledge and know how to interpret and improve current approach.Statistics is a classic subject with long history and beautiful theory foundation. It trains critical thinking, logical reasoning and how you view the world. Although Data Science has high demand in industry recently, Statistics make Data Science subject more "Science".
One thing ignored in both DS and statistics is business model in companies, this is outside of training and one can only get such experiences in industry. Dealing with real data in industry is the right way to make data tell the right story. Young data scientists with ambitions should embrace both CS and Statistics. If one has to choose one, CS.
In academia data science most applies to Business Intelligence field which often is under IS or CS programs, but if their is an IT program with Data Science specialization then students are cross-disciplines with courses in Business Intelligence (integrated systems, ecosystem, cloud and database management systems), Programming (python and R) Data Analytics or Visualization (BI Tools and report creation), and Statistics.
If you take a wide angle view of statistics, data science is statistics. In my recent book on The Real work of Data Science we quote the famous 1962 paper by John Tukey on the future of data analysis, adding that "the future is here and it is called data science" https://www.amazon.com/gp/product/1119570700/ref=dbs_a_def_rwt_bibl_vppi_i0