The Cookie Machine - Click here to drag window

DUMMY TEXT - Real text set in assets/js/theCookieMachine.js

If you can read me, I'm broken!

Views: 268β€…    Votes:  11β€…    βœ… Solution
Tags: discussion   data-explorer   images   data-dump   imgur-image-hosting  
Link: πŸ” See Original Question on Meta Stack Exchange ⧉ πŸ”—

URL: https://meta.stackexchange.com/q/401950
Title: How to get SE sstatic images for my website
ID: /2024/08/05/How-to-get-SE-sstatic-images-for-my-website
Created: August 5, 2024
Upload: September 15, 2024    Layout:  post
TOC: false    Navigation:  false    Copy to clipboard:  false


I’ve written about 2,500 questions and answers in Stack Exchange sites. About half of them I scrape into my website on GitHub Pages for customized presentation and searching.

A short time ago all the images disappeared because of SE’s switch from imgur server to sstatic server. I’ve read on other posts that my website likely wouldn’t be put on the β€œallow list”.

I will have to programmatically change links. For example, consider this grep output:

2024/2024-04-28-Data-Explorer-Query-crashes-with-collation-conflict-error.md:47: [1]: https://i.sstatic.net/195OOXx3.pngg

The Github Pages markdown would change to:

[1]: https://www.pippim.com/assets/img/_posts/2024/195OOXx3.png

Changing the Kramdown (markdown) is the easy part. The hard part is downloading all the images. Consider the number of images:

$ grep sstatic _posts/*/* | wc -l
480

480 manual downloads is too tedious. The website is refreshed weekly (after SE’s Sunday data dump) with a simple bash command and any new images would also need to be manually downloaded.

The only method I can think of is passing a list of sstatic URLs to a python script which uses selenium to open Chromium to open Google Search to open the link to use image save as and then close the image tab.

From then on, only image URLs for new posts since the last refresh need to be downloaded with the python script. Selenium is OK but it uses a lot of CPU resources and programming is awkward.

Is there an easier way to download a list of images from sstatic.net?

Data Explorer Query crashes with collation conflict error  β‡©