We had some threads about languages recently. To be honest, it’s one of my favorite topics and I thought it was cool to see so many people talking about it.
So, I want to point out Lemmygrad communities I see about language for anyone interested:
Linguistics:
https://lemmygrad.ml/c/leftlang - somewhat active
https://lemmygrad.ml/c/linguistics - not active
Specific language learning:
https://lemmygrad.ml/c/learnchinese
September’s Korean study thread on c/korea
Translation:
https://lemmygrad.ml/c/translation
I’m not sure to what degree other people here are interested in seeing more language study topics be discussed around here but personally I would be very interested in more activity around this topic, such as helping each other study, producing translations together (I know that’s a big project), studying topics like language revitalization, language acquisition, etc.
Is anyone else here enthusiastic about this kind of topic?
Do you or other techy comrades know how to do the following…
I’d like to read some García Márquez in the original, but it’s a bit too advanced for me. I was thinking of front-loading the vocab and learning the words that are in the book(s) before starting. The only way I can think of doing this is intensively reading it/them and listing all the words I don’t know. The problem is that I’ll end up understanding enough of the story to spoil it, but not understand enough to enjoy it.
So what I’d like to do is list all the words used in the text by frequency. Then I can ignore the words I do know and learn a bulk of the less frequent words before starting to read the book. Is there a script or something that I could apply to the epub version to strip the words and sort them by frequency?
Yeah that’s a class of application of the app/tool that I had in mind. I can write that in Python or find it somewhere if you give me a hot min.
ok… Unix like environments are made to do this type of stuff really well. for example: https://ebooks.stackexchange.com/questions/5841/i-am-looking-for-a-software-or-a-way-to-list-extract-count-in-short-analyze
If you would like I could try to package something more convenient in a container or app or whatever. (which was the point of the project).
No need to do this just for me, as I’ll try using calibre first.
Just a thought, though…
There might be quite a bit of interest in an app that could (maybe this already exists!):
It looks like I’ll be able to do 1 and 2 with calibre and it looks like 3 might be done easily by saving to .xml and opting the file in Google’s spreadsheet software. And I never really got on with Anki, but I know it’s popular.
Thanks again for the help.
Yes I should be able to do all of that when I’m not out busy. I write on my Manjaro laptop, but the next step is putting the project in a Docker/OCI container which shouldn’t be that hard to run on Windows or whatever.
Mobile app is actually a bit diffult because I’d need to somehow package the entire Python env with interpreter and libraries. Trying to package ML models would be difficult too (requiring multiple GB’s of storage) unless cloud hosting was an option which it probably isn’t.
But that’s what I want to do because it is optimal for on-the-go and for people actually being likely/able to use it. Or the lessons and materials can be statically generated from the container which is the plan initially but limits rich interactivity which may be a good goal.
looks like people have lots of ways to do this if you look around. https://cybertext.wordpress.com/2021/06/30/use-calibre-to-get-a-word-frequency-list/ https://www.reddit.com/r/languagelearning/comments/mps9nm/is_there_a_program_that_generates_word_frequency/
Thank you!
This is exactly what I was looking for.
I’ll try using calibre.