Can someone with a background in studying Statistics help me out?

The_Jewish_Cuban [he/him]@hexbear.net · edit-2 1 year ago

Can someone with a background in studying Statistics help me out?

context [fae/faer, fae/faer]@hexbear.net · 1 year ago

so they’ve defined ambivalent typologies based on their framework in table 1, and use that to impose 4 clusters onto the data

Based on these considerations, the study sets the number of clusters at four, although all three heuristics, i.e. the elbow method, the silhouette value and the gap statistic, suggest that the respondents form three clusters

so there’s really only 3 clusters but they’ve decided to set k=4 anyway, and then k-means just minimizes the variance within each cluster relative to its mean value. each observation gets assigned to whichever mean is “closest” in a certain sense, but that doesn’t mean it’s really the best choice.

even after they “merge” the ambivalent classes and set k=3, assigning each observation to a cluster based on the closest mean value doesn’t mean it’s the best choice for defining each class, just that it’s the closest in terms of variance.

the natopedia article has a good illustration:

https://en.wikipedia.org/wiki/K-means_clustering#/media/File:K-means_convergence.gif

The_Jewish_Cuban [he/him]@hexbear.net · 1 year ago

Doesn’t it seem strange to apply this kind of statistical analysis to a four point survey?

context [fae/faer, fae/faer]@hexbear.net · 1 year ago

yeah, the more i think about it the more strange it seems. they’ve already defined groups, just cluster them based on what their answers were. doing an iterative means clustering algorithm seems like they felt like they needed to do some fancy math to make it look better.

context [fae/faer, fae/faer]@hexbear.net · 1 year ago

deleted by creator

asustamepanteon3 [none/use name, comrade/them]@hexbear.net · edit-2 1 year ago

So let me get this right? they got a bunch of {1,2,3,4}^2 points and they applied k-means one (1) time with the standard l_2 metric, found 4 random ass centroids, published it, and now you’re calling into question their methodology?

edit: The problem with k means is that it converges to local minima, so depending on the starting point (in this case the 4 ‘typologies’) each run gives different results. So any typologies found are gonna be interpreted with the ideological slant of the authors. Also a heat map would have been better for representing the data: it’s on a 2d grid.

zifnab25 [he/him, any]@hexbear.net · edit-2 1 year ago

the second question being about support for the one party system using the same scale

Ah, the illusion of choice. Does anyone think western opinions of China would improve if they had two competing Communist parties rather than one big CCP?

It does feel like US/UK fetishization of choice leads to some awfully nasty negative externalities, without ever yielding the kind of popular candidates that these liberal democracies claim to venerate. If Donald Trump and Joe Biden were part of the same singular national party, I honestly think the US might be run better (or, at least, smoother) than in its current state. That’s primarily because you wouldn’t have enormous chunks of the popular base for each candidate driven the hysteria after every election cycle. How much time and labor and materials go into creating a boogeyman out of your rival and then spending the next six months to a year engaging in public flagellation at the impending success or failure of the given candidate?

LaughingLion [any, any]@hexbear.net · 1 year ago

Even if you are measuring support for the CPC in a vacuum that doesn’t tell you much. I’ll elaborate: It is likely that some people don’t support the CPC but when asked which other party they do support they don’t support an alternative, either. I am struck by an interview I listened to of some Cuban anarchists. They don’t support the Cuban Communist Party (PCC), but they also would NEVER support any Western intervention, either. Now, for them they do support other politics in their country and perhaps even other parties, though alternatives to the PCC are weak and only just in the last few decades starting to grow. Of course being anarchists they are organized more around action than candidates or parties.

Anyways, I think if you found a significant portion of people not exactly supporting the CPC it doesn’t mean much if they also don’t support anything else. This lack of explicit support then just manifests itself as implicit support instead. It’s still a form of support; an acceptance of the status quo. So, in this way you might be able to further understand that large “ambivalent” group in regards to their actual feelings; weak implicit support or explicit opposition. Thinking about it this way you might be able to formulate your questions better.

ScrewdriverFactoryFactoryProvider [they/them]@hexbear.net · 1 year ago

I’m not a statistics person, but wouldn’t using 4 degree scale to create 3 clusters make the middle cluster that contains 2 degree points overrepresented? I feel like the existence of 3 clusters implies that each cluster had an equal amount of sampling from the population, but I also may just not know how K-means clustering works.

The_Jewish_Cuban [he/him]@hexbear.net · 1 year ago

Right, which I was thought this smelled like bs

ComradeSharkfucker · 1 year ago

This is why I stuck to theory, y’all lab rats can have at it lol

BodyBySisyphus [he/him]@hexbear.net · 1 year ago

Disclaimer, this is based on skimming. As far as survey analyses go it’s not horrible. If it were me I’d be asking why the two ambivalence categories are so different in size if they do represent a similar intensity, and the regression results suggest that people tend to support governments when they think the economy is doing well and corruption is low, an indication that the support question is more capturing a vibe than an indication of a person’s undergirding philosophy on the proper form of governance. I think the paper would be a lot stronger if it tried to tease out differences between weak supporters and weak dissenters. There’s some room for follow up here and the Asian Barometer data is publicly available for download so you can have a look at the questionnaire to see if he omitted any useful questions. Or do your own analyses, these things aren’t as hard as PoliSci profs try to make them look.

Maoo [none/use name]@hexbear.net · 1 year ago

It sounds like a very silly study even from the initial design. A 4-point scale is embarrassing. You need an odd number in order to have both a neutral choice option and symmetry. The 5-point Likert is basically standard if you don’t want to think about it too hard, like the authors. 1=lowest rating, 5=highest, 3=neutral, and 2 and 4 represent intermediates. If they really did just want to know pro vs con vs neutral, they should’ve chosen a 3-point scale so that participants had to actually make that choice: 1=low rating, 2=neutral, 3=high rating. 4 is silly and you can see that they grouped together the 2s and 3s answers for exactly this reason.

In terms of analysis, yeah that’s absurd. Even using K-means should raise little alarm bells given that their scale and dimensionality are so small that it’s basically discrete. K-means clustering is a continuous analysis. There are analogous methods to clustering for discrete cases where you cut cubes or sets of blocks.

But even that is silly. It sounds like they could have just made a 2D plot and shown it, like a density plot. The frewuy of all the answers can be shown on a 4x4 plot, right? You could make every dot have a different opacity or size. Easy to see all of the results and any subsequent analysis would be visually explained. At the same time, the clustering should be visualized the same way and it would probably show how silly their choices were. It would be obvious that this is a discrete and low-dimension dataset, for example.

Also, if they’re just gonna group middle clusters anyways, they should just admit they want to ask a question heuristically and create the clusters manually.