Hi all,

I am currently writing a paper on the possibility to predict song popularity on TikTok, based on audio features (e.g. danceability, valence, tempo). My dataset consists of the TikTok's top 100 popular songs over the last 7, 30, and 120 days , from the US, NZ, IRE, AUS, CAN. I have collected all my data and. I am using the amount of videos posted with said songs as my song popularity variable, and the audio feature scores (derived from Spotify API) as my independent variables.

However, I am afraid my current research question "“To what extent can audio features predict a song’s popularity on TikTok?” has limited generalisability, since my dataset already consists of 'popular' songs, as opposed to including 'non-popular' songs as well.

How can I adjust my research question to properly fit my dataset used, but still retain the aspect of prediction? I have already carried out my MLR and random forest models, but I believe I will not be able to make sound conclusions based on my current research question.

Thank you in advance!

More Oliver van Hellenberg Hubar's questions See All
Similar questions and discussions