Data Science question - Time series analysis
No.14076287 ViewReplyOriginalReport
Quoted By: >>14076296 >>14076470
Hey all,
I'm sincerely hoping someone might be able to help me figure out the steps to take to perform some Time Series-type Analyses on this dataset I have.
I've collected the 'audio features' from Spotify's API for the Top 200 songs for every day since the start of 2018, in like 66 different countries. (The 'audio features' include Energy, Danceability, Instrumentalness, Valence, etc.)
I've also added in each country's daily reported # of new cases of Covid & # of daily deaths from Covid.
This makes a >16million row dataset I have at the moment.
My goal is to analyze the fluctuations of each country's top songs, and determine whether there have been any significant changes in their usual values. I'd also like to show whether the amount of Covid cases/deaths has any relationship with the ebbs & flows in that country's Audio Features.
However, I'm having trouble figuring out how to go about determining any of those relationships – starting I guess with how to wrangle the shape of this dataset.
I guess it's not a straightforward Time Series, nor does it seem quite like it's "Panel Data"....with those, isn't it 1 or multiple subject(s) with its multiple observations coming from different dates over time? Whereas with this, it is looking at multiple countries' observations over time, but also each *day* itself has 200 observations within it.
It's 200 rows every day, for every country as well. I could take a first step by just finding the average of, say, the 'Energy' of all songs in a country's daily Top 200, but then I lose information about that day's complete spread of values... so I assume this must be where Vector Regression can somehow be used?
I'm not great at this yet. Thank y'all for even reading this far!
tl;dr – How do I find the relationships between a multivariable time series that has 200 daily observations for each country over 4 years, and also how those Series might relate to # of Covid cases/deaths?
I'm sincerely hoping someone might be able to help me figure out the steps to take to perform some Time Series-type Analyses on this dataset I have.
I've collected the 'audio features' from Spotify's API for the Top 200 songs for every day since the start of 2018, in like 66 different countries. (The 'audio features' include Energy, Danceability, Instrumentalness, Valence, etc.)
I've also added in each country's daily reported # of new cases of Covid & # of daily deaths from Covid.
This makes a >16million row dataset I have at the moment.
My goal is to analyze the fluctuations of each country's top songs, and determine whether there have been any significant changes in their usual values. I'd also like to show whether the amount of Covid cases/deaths has any relationship with the ebbs & flows in that country's Audio Features.
However, I'm having trouble figuring out how to go about determining any of those relationships – starting I guess with how to wrangle the shape of this dataset.
I guess it's not a straightforward Time Series, nor does it seem quite like it's "Panel Data"....with those, isn't it 1 or multiple subject(s) with its multiple observations coming from different dates over time? Whereas with this, it is looking at multiple countries' observations over time, but also each *day* itself has 200 observations within it.
It's 200 rows every day, for every country as well. I could take a first step by just finding the average of, say, the 'Energy' of all songs in a country's daily Top 200, but then I lose information about that day's complete spread of values... so I assume this must be where Vector Regression can somehow be used?
I'm not great at this yet. Thank y'all for even reading this far!
tl;dr – How do I find the relationships between a multivariable time series that has 200 daily observations for each country over 4 years, and also how those Series might relate to # of Covid cases/deaths?