Is data reduction a serious challenge in big data?

More Muhammad Habib ur Rehman's questions See All

How combine yolo with Faster R-CNN?

I want a model that is balanced with accuracy or speed, faster rcnn has high accuracy while yolo have fast speed. i am thinking to combine them to get a hybrid model to achieve both speed and accuracy

02 August 2024 3,104 0 View

What should be the sample container for the hydrothermal reaction in a microwave reactor at 180 °C for 10 min at the heating rate of 5 °C per minute?

Suggest one of them 1. Teflon-lined stainless steel autoclave: 2. Alumina (Al2O3) ceramic container

30 July 2024 7,326 1 View

Addition of EDTA during the synthesis of copper nanoparticles to prevent it from being oxidized?

I've attempted to use this method to synthesize copper nanoparticles. Copper nanoparticles can be synthesised using a variety of precursor materials. CuSO4, distilled water, NaOH, and EDTA are...

28 July 2024 8,027 3 View

Can I please ask why my samples from anaerobic bioreactor giving me different size PCR product even after multiple runs?

Hi everyone, I have extracted DNA from a biogas bioreactor using Qiagen kit and prep cDNA library then used this library as template to optimize primers for qPCR (taken from papers). Some of the...

23 July 2024 1,329 5 View

Swerling Characteristic functions?

Hello!!! I want to implement the Swerling characteristics functions (CF) directly in MATLAB without using its Fourier integral pairs...the Swerling CFs are actually Laplace Transform of the signal...

23 July 2024 4,925 1 View

Radar Detection Probabilities?

Currently I need to calculate detection probabilities (PD) from RCS data. Beta distribution parameters for this RCS data are calculated and will be used in Swerling0 Equation. The idea is based on...

22 July 2024 2,851 0 View

Why methanol and sulphuric acid used in the analysis of polyhydroxyalkanoates (PHA) by GC-MS?

Why methanol and sulphuric acid, used in the analysis of polyhydroxyalkanoates (PHA) methyl esters by GC-MS? Additionally, why do we typically use non-polar solvents in GC-MS?

22 July 2024 1,210 2 View

Radar Detection Probabilities using beta distributed Scattering Cross section?

Currently I need to calculate detection probabilities (PD) from radar cross section (RCS) data. Beta distribution parameters for this RCS data are calculated and will be used in Swerling0...

22 July 2024 868 0 View

I want to buy Hydrothermal Synthesis Autoclave from any European company. Can anyone suggest any company inside Europe?

Thanks

22 July 2024 1,143 3 View

Optimal condition for depositing FTO target 95:5% using sputter technique?

Hello I want to know about the sputtering condition of depositing FTO from target 95:5%. I tried with RF sputter in pressure 2.5Pa and 100 sccm Ar at room temperature but it showing no...

21 July 2024 1,680 2 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Manas Gaur

In addition to what Mr Ashish Dutt said, you can definitely normalize the data set to reduce to range of randomness in data set. Other than that their are various methods for dimensionality reduction like SOM based on influence of various independent attributes on dependent attributes also there are possibilities of cluster or classification based on supervised and unsupervised learning in massive hetergeneous data set

Stéphane Girard

In addition to Mr Manas Gaur answer, I would like to emphasize that the computation of the dimensionality reduction step is itself an issue. Even if you use PCA, which is probably the simplest method, the computation and the eigenvalue decomposition of the covariance matrix are difficult problems when dealing with big data. Dedicated methods should be used.

Alessandro Giuliani

A balanced use of correlative (PCA, Cluster Analysis, Multidiemnsional Scaling) and sampling methods will allow you to check the basic invariance of the data structure (in terms of mutual correlatyion between variables) across different samplings.

Moreover, before entering the above methods the usual data trimming (keeping only one variable out of a variables pair correlated more than 0.90, elimination of no-variance items and variables with too many empty spaces) are in any case a good start.

Bin Jiang

try this one: http://en.wikipedia.org/wiki/Head/tail_Breaks

http://arxiv.org/ftp/arxiv/papers/1501/1501.03046.pdf