What is your definition or concept of Big Data?

More Sujit K. Ghosh's questions See All

Z-average size from Dynamic Light scattering can it be lower than the original probe size?

Hi everyone, Recently I have been conducting Dynamic Light scattering experiments in a micellar solution at 5 and gel at 37 degrees of Celsius with latex particles of diameter 190-500 nm. While...

01 August 2024 1,168 4 View

GSH estimation assay: What is the right choice of standard?

Hi there, My question is: What standard curves should be used while estimating Tot GSH and GSSG by kinetic method using GR enzyme mediated recyling with DTNB chromophore? Actually I am following...

01 August 2024 8,217 1 View

Intracerebroventricular (ICV) AAV inj.?

I'd like to know if anyone has got experience in intracerebroventricular (ICV) AAV inj. I am looking for a concomitant spread of the AAV including but not limited to the motor cortex and the...

12 July 2024 781 0 View

What is the maximum acceptable degradation of gain in the orthogonal plane relative to the beam scanning plane for beam steering antennas?

In research papers on beam steering antennas, while the radiation pattern for the beam scanning plane (such as the horizontal or vertical plane) is typically presented, details regarding the...

03 July 2024 8,321 0 View

What kind of surface morphology must be there for good photocatalytic activity of thin film?

If I want to increase the photocatalytic activity of a thin film sample what should be it's ideal surface morphology and parameters? All scientific answers related this are highly appreciated.

19 June 2024 6,744 0 View

Why I am getting follwing error when I tried to open .dta file in SPSS25?

>Error # 7202. Command name: GET STATA >Input dictionary read error. >Execution of this command stops. Cross Product Matrices are not supported

19 June 2024 5,423 0 View

Which deposition technique is better (Spin Coating/Dip Coating) for making Ni doped ZnO thin film for better photocatalytic activity?

If I want to improve the photocatalytic activity of Ni doped ZnO thin film then which would be the better deposition process for thin film Spin Coating or Dip Coating?

18 June 2024 7,678 2 View

What are the possible parameters needed to be tuned for better photocatalytic activity of thin film?

If I want to enhance the photocatalytic activity of Ni doped ZnO thin film. For this what are the parameters I need to tune and how? Related this all scientific answers/explanations are highly...

16 June 2024 4,533 1 View

Curing of Novolak resin with hexamine develops yellow color. I believe it is due to azo linkage -C-N=N- C- What is the reaction mechanism?

Also, I will appreciate whole curing mechanism of Novolak with Hexamine

14 June 2024 9,908 0 View

How are cancerous cell lines generated in 1900s sold today have the same genetic profile?

From the definition of cell line, they are indefinitely propagated/subcultured. However, in practice beyond a certain passage number, they are said to loose characteristic gene profile, for e.g....

10 June 2024 6,890 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

John W. Kern

This topic is exceedingly important in my opinion as I have observed great divergence from rigorous statistical experimental (study) design as data become "cheap" and more easily analyzed with "calculation hammer". I use "calculation" instead of "statistical" as without a probabilistic basis through sampling and design, statistical inference is in doubt.

Generally speaking the huge advances in computer science allowing nearly boundless analysis methods have to a degree rendered academic statistics departments irrelevant with new emerging Jargon such as "Data Scientist" and "Data Engineer". Often these new "calculators" are expert in algorithms, but poorly trained in experimental design and sampling theory and lack appreciation of the intimate link between experimental design and it's foundation for subsequent statistical analysis and inference.

Large sample sizes and more critically, large numbers of variables with small sample sizes do not eliminate these issues.

Sujit K. Ghosh

I agree with your viewpoint in that statistical methods based on probability theory are in doubt when they are used to analyze data that were probably collected without a well thought out statistical design. However, there are useful information hidden within observational studies and the problem becomes how to extract such information using rigorous probabilistic methods based on such haphazardly collected data.

Large sample sizes are generally good for statistical analyses and given that now a days collecting data is getting cheaper and easier, it is believed that typical large sample theory would apply. But...the assumptions under which such large sample methods work (e.g., some form of repeatability or ergodic set-ups) may not always apply to such cheaply collected data. May be one, should use the observational studies to create a statistical design to collect similar future data...

Thanks again for your prompt feedback.

You seem to have read my mind...Much of my consulting work involves "reverse engineering" the approximate study design that would have generated available observational data......with an eye toward differentiating the rigorous from the more tenuous inferences....particularly for inferring causation.

Gopalakrishna Palem

Well - the main problem about "collecting data intelligently" is - you do not have control over the data-sources. For example, take one of the biggest sources of data today - Social media.

Hence the "statistical hammer".

Same goes with many other sources.

As for structured data-sources, there are already intelligent systems that collect data intelligently. That's what traditional RDBMS has been doing for years. No need for big-data for that.

GK (Gopalakrishna)

http://gk.palem.in/

Bin Jiang

There is no doubt that there are major differences between small and big data. In the following papers, I have discussed some of them, e.g., Guassian statistics for small data, and power law statistics for big data.

Jiang B. (2015), Geospatial analysis requires a different way of thinking: The problem of spatial heterogeneity, GeoJournal, 80(1), 1-13.

Jiang B. and Miao Y. (2014), The evolution of natural cities from the perspective of location-based social media, The Professional Geographer, xx(xx), xx-xx, DOI: 10.1080/00330124.2014.968886, Preprint: http://arxiv.org/abs/1401.6756