How to handle a small sample in a regression?

12 February 2021 1 10K Report

I'm working on a paper about a bestseller book's list that shows a TOP20 book every month.

My problem is to prove that the this list is representative of the population of books sold each year.

From 2011-2020 I have: the number of book solds (it is an official estimative) and the amount of books sold on the TOP20 rank.

Using a simple regression total sold x list sold I can get an adjusted Rˆ2 of 0.92 and a p-value of 6.035e-06...

But because my sample is so small, it doesn't make sense to me... Is there a better way to prove the relationship between the total sold and the number from the bestlist?

James R Knaub

Whaner -

Are you saying that you let predictor x be the number of books sold for a selected TOP20 book in a given month, and y is the corresponding total number of books sold that year, so that you have 120 (x,y) data points, which have only 10 unique y-values? Or perhaps you combine the 12 x-values corresponding to each y-value, and have 10 data points? You said "small sample" so perhaps that is it.

Hmmm. Well, if you are talking about count data, then the first thing that would come to mind for many would be a poisson regression. I don't know. But this seems different.

If you have a time series, then we could have strong autocorrelation from one period to the next. But this seems different.

If you can consider this as one population, where no omitted variable would have helped discern one year from another ... say GDP for each year may not help because it impacts both x and y ... and x and y-values are plotted regardless of time period, then I guess a simple (x,y) plot of those 10 years may tell you that you can use TOP20 number of books sold to indicate ("predict") total number of books sold in a year, before you see the "official estimate." And it might be very accurate. I expect heteroscedasticity based on the size of predicted y. (See https://www.researchgate.net/publication/320853387_Essential_Heteroscedasticity, as determined by Ken Brewer, and https://www.researchgate.net/publication/333642828_Estimating_the_Coefficient_of_Heteroscedasticity.)

Is that what you are doing?

Perhaps you could show a scatterplot of your data so we can better see what you are doing, if that's OK.

From such a scatterplot, and perhaps a "graphical residual analysis" scatterplot with estimated residuals on the y-axis and predicted y on the x-axis, we might see if a quadratic linear regression fits, or a ratio estimator, or what, and how well, and perhaps an indication of heteroscedasticity. Also, new data might be used for a simplified "cross-validation," to avoid overfitting to your particular sample. However, the small sample size is a limitation, unless the relationship is very strong, and it sounds like it might be.

Cheers - Jim

Badges
Science topic

Similar topics
Books

More Whaner Endo's questions See All

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about Uranium ore deposits in world.

11 August 2024 6,720 0 View

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about diamond ore deposits in world.

11 August 2024 2,167 1 View

What is the difference between mathematical R^4 space and physical 4D unit space?

We assume that the difference is huge and that it is not possible to compare the two spaces. The R^4 mathematical space considers time as an external controller and the space itself is immobile in...

10 August 2024 6,678 14 View

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

10 August 2024 8,198 5 View

Controlling for pupil light reflex when analyzing pupil size time course?

I used eye tracking to examine how participants from two different populations (A and B) react to an image. Participants in population A exhibit larger pupil sizes over time, but they also have...

10 August 2024 3,229 0 View

What are a “Farmers Producer Organization” (FPO) and its essential features?

10 August 2024 477 5 View

Strugglling with m6A dot blot any suugesstion ?

I have been doing the m6A dot blot for a while with no improvement, I am extracting the RNA, and I can see the dots although the three biological replicas give a different reading on the memberan...

10 August 2024 8,539 5 View

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How do interactions between the biosphere, the carbon cycle, and the water cycle impact global warming and interaction between the atmosphere and the hydrosphere?

09 August 2024 3,291 2 View

How to get moment output in Abaqus Standart?

I have input a moment load in module load Abaqus, i put my moment load on the node surface (using reference point). I have define moment in history output and make a set for moment too. But the...

08 August 2024 4,831 4 View

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

08 August 2024 8,162 0 View

GC-MS retention index prediticon?

Hello experts, Does anyone know any free software about retention index prediction ?

08 August 2024 7,403 2 View

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

I am trying to analyse data from a survey examining what variables affect teachers perceived barriers to incorporating technology into their classroom. I have 5 predictor variables however my DV...

06 August 2024 1,752 3 View

In order to run Multinomial Logistic Regression, is it required that the data be in the long format?

I am using unit level data (IHDS round 2) & Stata 17

06 August 2024 5,725 2 View

Why do we equate male and female arousal?

Women, on the other hand, can become physically aroused (increased blood flow in the reproductive organs) without becoming psychologically aroused even in the slightest. (Robert Weiss)

05 August 2024 9,537 2 View

Do you know any references for analyzing stochastic fiber orientaion composites ?

Hello everyone I am looking for one or some books for propertes and behaviors of stochastic fibre orientation composites. unfotunately I could not find any suitable reference for thias by...

04 August 2024 3,461 3 View

I need the datasets of Microgrid for system identification?

Hi I am working on data driven model of the microgrid, for that, i need the reliable datasets for the identification of MG data driven Model. Thanks

02 August 2024 5,748 4 View

Should I remove an item from a scale to raise Cronbach's alpha and McDonald's omega or is it better to leave it if they are both over .7 already?

Hello! I have this scale which had 10 items initially. I had to remove items 8 and 10 because they correlated negatively with the scale, and then I removed item 9 because Cronbach's alpha and...

01 August 2024 4,606 7 View

Talking therapies for bipolar, psychology?

what is the best research evidence for psychological interventions for Bipolar?

01 August 2024 6,023 2 View

Normality assumption for linear regression is The assumption of normality is whether for residual errors or predictor variavble?

When we conduct linear regression, there are several assumptions. The assumption of normality is whether the residual errors are normally distributed, not whether a predictor is normal?

31 July 2024 6,164 3 View

What are some diseases that are caused by overactivity of enzymes?

I want to choose a resarch topic regarding enzyme inhibition. So I did my research and found out most of the diseases that originate from enzymes were actually caused by the "deficiency" of...

30 July 2024 2,483 5 View