umami_wasabi

pending anonymous user

  • 41 Posts
  • 690 Comments
Joined 1 year ago
cake
Cake day: August 7th, 2023

help-circle





  • umami_wasabitoPrivacyThink you need a VPN? Start here.
    link
    fedilink
    arrow-up
    21
    arrow-down
    1
    ·
    edit-2
    6 days ago

    Except many services are very aggressive to Tor exit nodes, namely Google and Cloudflare. Everytime I just met with CAPTCHA after CAPTCHAs, and eventually I gave up on the site.

    Yeah, I should cut ties with Google but cutting YouTube on NewPipe is hard. I’m on Proton and watching YouTube is already hard.
















  • I don’t a single guide for you but I can layout a road map.

    1. A programming language. I prefer Python.
    2. Basic HTML syntax and CSS selectors
    3. HTTP, specifically methods, status code (no need to memorize all cuz you can go look it up), and cookies

    After you got those foundation ready, you can go on and try to build a webscraper. I advice aginst using Scrapy. Not because it is bad but too overwhelming and abstracted for any beginner. I will instead advice you use requests for HTTP, and BeautifulSoup4 for HTML parsing. You will build a more solid foundation and transition to scrapy later when you need those advanced function.

    When you get stuck, don’t afraid to pause on your attempt and read tutorials again. Head to the Python Community on Discord to get interactive help. We welcome noobs as we once were noobs too. Just don’t ever mention scraping there as they can’t help if they suspect you’re trying to do something inappropriate, malicious, or illegal. They are notoriously aginst yt-dlp which frustrates me a bit. Phrase it nicely and in an generic way. I will be there occasionally offering help.