Trying to scrape a wedding gallery website for wedding images, what software would you suggest?

Arietty@jlai.lu · 1 year ago

Trying to scrape a wedding gallery website for wedding images, what software would you suggest?

Kissaki@programming.dev · edit-2 1 year ago

You didn’t even describe how it’s on the website.

I would use the webbrowser/Firefox save page functionality.

Or open the webbrowser dev tools and document.querySelectorAll('img') and get the URLs from it and use those.

Or Page info media tab.

Or dev tools network tab. To identify and use the image web requests.

Or use Nushell with query module enabled, and http get query html.

Or my own C# until.

But I suspect there’s Auth in play, so the only easy access is within the browser session?

8263ksbr@lemmy.ml · 1 year ago

Puppeteer and playwright were not mentioned yet

echindod@programming.dev · 1 year ago

I’d probably use selenium. But that depends.

madeindjs@programming.dev · 1 year ago

You can get pretty far using a bit of JS and Tamper Monkey . You can even search in existing user scripts if someone already did it.

ExperimentalGuy@programming.dev · 1 year ago

Before scraping I would verify that there is no HTTP API that you can use to craft requests instead of scraping from the website. These might be higher quality than what you can scrape. If there is no easy to use http API, go to scraping then. I would generally consider scraping the last option, unless it’s a ridiculously easy website to scrape.

dbx12@programming.dev · 1 year ago

Not an answer, but you don’t need an extension to defeat right-click blocking scripts: shift-right-click usually does the trick.

MajorHavoc@programming.dev · 1 year ago

Have a look at RobotFramework with the Selenium library. Anything you can manage manually, you can automate repetitively with Robot.

Also, have a look at the F12 Network tab, in case the real images are stored in a predictably named manner.