• 0 Posts
  • 14 Comments
Joined 1 year ago
cake
Cake day: August 6th, 2023

help-circle
  • Look into beautiful soup (bs4) for parsing html web pages. Once you have that cleaned, you should be able to do any kind of NLP modeling over it.

    Both Huggingface and SpaCy have pretty good tutorials/walkthroughs for tokenizing your data, doing entity extraction, etc.

    That said, it would be helpful for you to figure out what you want to do with your project — for example, you could try to identify keywords relevant to job tiles, you could try to use similarity metrics to recommend new roles based on the current one, etc.




  • namnnumbrtoPrivacy*Permanently Deleted*
    link
    fedilink
    English
    arrow-up
    18
    ·
    1 year ago

    A password manager can be considered critical infrastructure; beyond privacy and uptime/access considerations, you should also consider what happens if you lose all of your data - Do you have backups? Are the backups 3-2-1 redundant? Do you have a ready-to-go docker compose to get yourself up and running locally in a pinch?

    I self-hosted bitwarden (vaultwarden) for several years and it became evident to me that it was important enough to use the hosted service - especially as I was already paying Bitwarden to support their open source business.






  • namnnumbrtoPrivacyEcosia.org updates its Privacy Policy
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    That’s fair; I guess it depends on what your threat model is — kind of like how using a vpn can just expose you to your vpn service while ostensibly protecting you from your service provider.

    To me, the improved search results from kagi and the disconnect between search and ad-and-tracking companies are worth it. But that may not be a fit for anyone else.



  • IIRC, the biggest issue with TrueNAS SCALE + Docker is that they really run the containers on a ‘hidden’ kubernetes cluster and obfuscate the standard docker and docker-compose way of doing things behind a gui with limited customization and poor field descriptions.
    I found it much easier to spin up a VM on SCALE and run docker through that, although then you have to deal with multilayer networking.

    … To be fair, this was when SCALE was still in beta, so it has possibly improved since then.


  • namnnumbrtoProgrammer HumorEvery tech company rn
    link
    fedilink
    English
    arrow-up
    93
    ·
    1 year ago

    It’s not just every tech company, it’s every company. And it’s terrifying - it’s like giving people who don’t know how to ride a bike a 1000hp motorcycle! The industry does not have guardrails in place and the public consciousness “chatGPT can do it” without any thought to checking the output is horrifying.


  • python packaging authority and pytest have pretty good resources on standard repo structure; poetry is a new-kid-on-the-block tool to get started developing packages quickly (i.e., standard repo config, handles dependency environments / works as build tool)

    re: formatting & style – others have mentioned black; I also recommend ruff to lint/standardize your code to many accepted best practices.

    import is kind of a clusterf, because you can have absolute import packagename and relative from . import x.y.z imports, and importing a project-in-development can depend on the IDE you use (i.e., vscode and pycharm are generally smart enough to figure out that a src/ dir in the workspace should be importable, but not always). Using pip install -e can install your project-in-dev in “editable” mode and make it available for import. The modules docs may help here.

    Package management/locking is a a (relatively) rapidly evolving part of the python ecosystem. Because Python can be so dependent on the packages installed in the environment, simply managing the python version (like you would with pyenv) is insufficient, and it is recommended to create pseudo-hermetic virtual environments per project (venv, virtualenv, poetry, or conda help with this). I can’t help with pyenv (I use conda); this might be helpful. I think you would use pyenv to manage the python version and then venv or virtualenv to manage the installed packages. Personally, I would first get used to managing virtual environments with venv and then deal with pyenv later if you decide you need multiple python versions