Handling veracitty/uncertainity in data avalable on open source software reposotories as a case of Big Data Software Engineering area

01 January 1970 1 5K Report

In open source software development projects , a lot of data generated during each phase of software development . specially on bug tracking system where different users reported bugs, and diffferent issues. These data containts a lot of noise , uncertainity and trustworthiness issues. Now question is how to handle these challenges and if these are not handeled at a right time time then the perfromance of classifiers, or models can be degraded.

The truthfullness of data is a measure concern in open source software evolution. And unecrtainity is a infection avaalable in data then how to handle it? Can we take a subset data and checket if uncertinity as a factor exist or and the treatment , we are giving is working correctly......

We are not talking about the volume and handling those large data by using cloud/hadoop /mapreduce/radoop/creating multple nodes /division of data....

Renaud Di Francesco

There is a corpus of knowledge on "proof of programme". You are addressing the dual problem, "Proof of data".

There are several ways:

-provenance and quality

you drink pure source water, which you know has been bottled close to the source

-post-processing and sanitising

If you have properties on your data vectors like (x,y) with x+y must be 2, then you can detect errors or inaccuracies and "correct"/modify to meet the constraint

Again, physics and intrinsic models of what the data are about (semantics) will help dramatically.

There is no such thing as data in the void. There is data with a physical meaning, and that is this physical meaning which you check/enforce.

Badges
Science topic

More V. B. Singh's questions See All

Can we consider veracity as a big data attribute for prediction model building even if the data size is small ?

We want to develop a model by considering veracity for small data set.....can we consider it a big data area.....

03 April 2019 4,764 3 View

Can summary description of a bug of any OSS project be considered as big data?

Bug severity is assessed by using the summary description of a reported bug.. A large number of summary information is available of bugs for any open source project.

05 June 2016 9,867 8 View

Can anyone help with machine learning?

I am using machine learning techniques to train models for different training candidates which are actually combination of original datasets. I am creating training candidates by adding no more...

03 April 2014 6,824 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

GC-MS retention index prediticon?

Hello experts, Does anyone know any free software about retention index prediction ?

08 August 2024 7,403 2 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

How can I interpret the data without the need of solving it manually?

How can I interpret the data gathered without solving?

03 August 2024 9,054 3 View

Which test should be used to study association among demographic profile and awarness level?

i have to study the awareness and adoption level of cloud computing in a district of India. i also want to use association among demographic variables like gender, age, education, income etc and...

02 August 2024 2,420 3 View

Can anyone provide me with molecular docking softwares/ websites?

Molecular docking software/ websites?

02 August 2024 8,704 7 View