/sci/ - Science & Math » Thread #13141534

14KiB, 474x317, iu[1].jpg

View Same Google iqdb SauceNAO

How to sample to get county level data

Anonymous Sun 16 May 18:54:40 2021 No.13141534 View Reply Original Report

Quoted By:

I'm analyzing some data that's at the county level. I'm going to pair this with other county level data (socioeconomic, race, etc) that's provided by the census.

I thought it'd be neat to overlay public sentiment via scraping public, geocoded tweets about the topic and doing sentiment analysis, etc.

How do I develop a sampling method that is sound for the Twitter scraping to get a county level estimate?

US has 3143 counties
Random sampling would need 343 county samples for a 95% confidence level.

That seems like a doable amount of scraping. The next step I am a bit confused on how to proceed.

How do I determine the amount of cities I need to sample within a county? Should it be the same for all counties?
How do I determine which cities to sample? Do I look up the Wikipedia of the county and choose randomly again? Or do I pick the top X most populous knowing that geolocated Twitter data might be limited in smaller cities, knowing that skews the data towards urban centres?

I understand that this isn't completely statistically sound but I'd like to try to get the best possible result within the limitations I have.

Anonymous

Anonymous Sun 16 May 2021 22:04:47 No.13142431 Report

Quoted By:

bump

Anonymous

Anonymous Mon 17 May 2021 00:29:42 No.13142950 Report

Quoted By: >>13143035

Wait.
Data at the granularity you want is only available for the 2010 census. The country has changed quite a bit since then. The data from the 2020 census should start dribbling out later this year (August IIRC is their next milestone release date). Their first priority was statewide counts so Congress would know how many seats in the House of Representatives each state would get. The next priority is data detailed enough for drawing up Congressional districts. After that, detailed enough for state and local districting. Last comes the community data at the census block level, which is what you're seeking. It might be next year before that level of data is available.

Anonymous

Anonymous Mon 17 May 2021 00:45:59 No.13143035 Report

Quoted By:

>>13142950
I have the county level data I want from the census.

What I also want is public sentiment on the topic via tweets at the county level.

My question is how do I turn 1km radius samples of scraped tweets into a county level estimate.

Capcode	All Only User Posts Only Moderator Posts Only Admin Posts Only Developer Posts
Show Posts	All Only With Images Only Without Images
Deleted Posts	All Only Deleted Posts Only Non-Deleted Posts
Ghost Posts	All Only Ghost Posts Only Non-Ghost Posts
Post Type	All Only Sticky Threads Only Opening Posts Only Reply Posts
Results	All Grouped By Threads
Order	Latest Posts First Oldest Posts First

Your latest searches