Where to start? Text Extraction

Loopedcandle@lemmy.world · 1 year ago

Where to start? Text Extraction

namnnumbr · 1 year ago

Look into beautiful soup (bs4) for parsing html web pages. Once you have that cleaned, you should be able to do any kind of NLP modeling over it.

Both Huggingface and SpaCy have pretty good tutorials/walkthroughs for tokenizing your data, doing entity extraction, etc.

That said, it would be helpful for you to figure out what you want to do with your project — for example, you could try to identify keywords relevant to job tiles, you could try to use similarity metrics to recommend new roles based on the current one, etc.