P values?
Do they account solely for sampling error (therefore irrelevant when population data is available) OR do they serve to asses the likelihood of something being due to chance in other ways (therefore relevant for studies with population data)?

Any links or literature are welcome :)

@rstats @phdstudents @datascience @socialscience @org_studies

  • Jey :crab:OP
    link
    fedilink
    07 months ago

    @arandomthought
    I read some similar comments online, but there were also positions contrary, but I think this makes sense.

    And I didn’t know about the infinite population thing, that is interesting.

    If I may a follow up: despite p values, regression models and correlation tests can still be interesting to apply to census data to measure effect sizes and such, right?

    • @sailingbythelee@lemmy.world
      link
      fedilink
      English
      27 months ago

      Look up super-population theory. It is based on the idea that even a perfect census is only a point-in-time estimate of the theoretical “super-population” that the point-in-time population is derived from. In large, real world populations, people are constantly coming and going. If we assume that this coming and going is random and the relevant super-population parameters do not change over time, it is easy to see a census population as a sample instance from a larger super-population. While somewhat theoretical, this is a useful model when estimating relationships between variables in census data and leads to the use of standard frequentist confidence intervals and, yes, even p-values.

      • @arandomthought@sh.itjust.works
        link
        fedilink
        English
        17 months ago

        That’s a very cool way to look at it. You’re basically taking “a sample in time” and will never be able to sample across time (assuming we don’t invent time machines… ever), so you will always be looking at a super-population that is technically infinite. =)

    • @arandomthought@sh.itjust.works
      link
      fedilink
      English
      17 months ago

      Sure, even if you had all the data on your whole population (and therefore p-values “wouldn’t make sense”) a regression could still tell you something useful about that population. It can for example let you estimate how strongly variable X influences variable Y (or at least how strongly they are related. Causality is a separate issue), or what value of Y we would expect for someone new in the population with a certain value of X.