Big Data database systems can be very diverse, because they can be different in virtually every respect. In individual institutions and companies in which Big Data database systems are built and developed, they may differ, among others: 1. Technical and technological characteristics of servers, disks and other elements of IT and network equipment, potential of data processing speed. 2. The type and generation of operational database systems supporting specific Big Data databases. 3. Installed applications for archiving, indexing, searching and analyzing information collected in database systems. 4. The possibilities of expanding and improving specific, existing Big Data database systems. Therefore, if there is a need to accurately compare specific Big Data database systems functioning in institutions and companies, then a universal, multi-faceted, multi-factor scoring model should be built, thanks to which it will be possible to conduct such comparative analyzes.
I recommend to use statistical methods to establish relationships between extracted performance measures from Big Data Applications, or may be take help from SaaS tools available in Cloud Computing platforms for measurement of Quality and the software engineering quality concepts would be a plus point..
Surveys, social licence and the Integrated Data Infrastructure ABSTRACT INTRODUCTION: Statistics New Zealand’s Integrated Data Infrastructure (IDI) is a central repository for researchers to access multiple government agency datasets. The aim of this investigation was to understand social licence for including survey data in the IDI. METHODS: Two convenience samples were recruited: (1) participants in one of 10 focus groups; and (2) respondents to pilot surveys for the 2018 NZ census or a population-based survey on violence experience. Qualitative data were transcribed and analysed using thematic analysis. Analyses were conducted independently by two members of the research team and results compared. FINDINGS: Whilst little prior awareness of the IDI existed, participants developed considered judgements about it, identifying concerns and proposing safeguards that would encourage them to support its maintenance and use. CONCLUSIONS: While there is the potential for social licence to be granted for the IDI, an on-going, transparent engagement process is required to maintain trust with agencies and researchers. As an over-represented population within government agency data, active, honest engagement is required with Ma¯ori, as are safeguards to reduce risks of further stigmatisation and marginalisation. KEYWORDS: big data; social licence; indigenous data; policy development
CORRESPONDENCE TO: Pauline Gulliver [email protected] AOTEAROA NEW ZEALAND SOCIAL WORK 30(3), 57–71. Pauline Gulliver, Monique Jonas, Tracey McIntosh, Janet Fanslow and Debbie Waayer, University of Auckland, New Zealand
As the 3C model recommended in the Springer-Verlag paper cited by Ari("A Data Quality in Use Model for Big Dat"a) is dated to 2014, and it is based on the initial 3V aspects of Big Data, now it is important to map Iso 25012 Data Quality model with the recent formalization of 5V aspects of Big Data, expecially for Veracity and Value + international sources on Data compliance and Data Governance, such as: DGI– Data Governance Institute, DAMA–Data Management Association, DAF– Data Audit Framework (JISC-funded DAFD project). In this context, Big Data peculiarities should be carefully considered in evolving models and processes for assessing Data Auditability, Data Provenance and Data Lineage, including Metadata Quality and MetaData management best practices. For example, in the HADOOP ecosystem for Big Data, some of the above issues can be supported using ATLAS and RANGER; in particular, taxonomy-based / tag-based policies for data quality, governance and compliance can be designed and supported combining ATLAS, InfraSOLR and RANGER. Also, consider that strong criticism has been raised on the lack of quality in Big Data Lakes (Data Scientist spend >70% of their time for data discovery & data reliability estimation and