The Working Waterfront

Numbers don’t always accurately tell our stories

Rural communities can be misrepresented in data collection

Kate Tagai
Posted 2021-06-03
Last Modified 2021-07-01

Data collection may seem like an exacting approach in capturing community reality, but if the measuring tool does not match what is being measured, inequity can result. As Catherine D’Ignazio, in a presentation on Data Feminism sponsored by the Data Innovation Project, put it: “What gets counted, counts.”

Because of this, small communities are at risk of being left behind if the data can’t accurately represent them. They aren’t getting counted and so they don’t count. How can we get better data that is more equitable?

Oversampling is one method that prioritizes collecting data from underrepresented groups so it makes up a larger portion of the data set.

The high margin of error in the data because of the small sample size makes it unreliable.

A bias is introduced toward the smaller population in the collection phase and then that bias is accounted for by weighting the over-sampled data when it is analyzed. The results don’t change, but the quality and useability increases.

The American Community Survey (ACS) doesn’t use oversampling in its data collection methods, but if it did, the data would be more useful. The ACS tells us that in Knox County 1,989 people are between 25 and 29 years old. The margin of error is plus or minus 63 people. So, the number could actually be 2,052 or it could be 1,926, but we are reasonably confident that there are about 2,000 people in the county in that age range.

(ACS does sampling “based on different strata,” meaning it changes its sampling intensity based on previous response rates and other factors which is similar to oversampling.)

On Isle au Haut, the ACS shows there are nine people in that age range. But here the margin of error is plus or minus ten. So, the number of late 20-somethings on Isle au Haut could be 45 percent of the population or it could be zero. The high margin of error in the data because of the small sample size makes it unreliable.

If we are trying to use this data point to provide resources for people under 30, we can get reasonably close to providing the right amount if we use the county level data, but not at the town level.

Now, imagine if the ACS oversampled small communities like Isle au Haut to reduce the margin of error. If the margin of error could be reduced even a little, we would know with greater certainty the resources needed to serve people under 30.  The people would count.

This example demonstrates the challenge when working with data from small populations. In rural and island communities it is harder to get accurate data—data that drives decision making and investment, federal program eligibility, grant funding, and resource access.

The key is that oversampling does not bias the final results, even though we are collecting more data from the smaller populations: the results are weighted to bring them back in line with the distribution of the sample size within the overall population.

So, since the population of Isle au Haut makes up less than 1 percent of the county population, the results would still make up less than 1 percent of the data because the answers would be adjusted.

Oversampling is just one strategy for balancing data collection and making it more equitable, but it isn’t the only strategy. As we use data to inform important decisions, we need to ask ourselves critically, how can we collect data that accurately represents the communities where we live and work? How can we create data that helps us get resources where they are needed and breaks down barriers and bias rather than reinforcing them?

Data collection is costly and time consuming, but better data practices create greater equity. Let’s make sure what counts gets counted.

Kate Tagai is a senior community development officer with the Island Institute, publisher of The Working Waterfront. She focuses on education and leadership work.