adplus-dvertising

Get all flag images from website using python code

Asked 3 months ago
Viewed 11 times

Is there a way to get all the flags from https://en.wikipedia.org/wiki/Gallery_of_sovereign_state_flags using python code? I tried with pd.read_html and did not succeed. I tried scraping but it got so messy and I couldn't do it.

import requests
from bs4 import BeautifulSoup

page = requests.get("https://en.wikipedia.org/wiki/Gallery_of_sovereign_state_flags")

# Scrap webpage
soup = BeautifulSoup(page.content, 'html.parser')
flags = soup.find_all('a', attrs={'class': "image"})

Would be nice if I can download them to a specific folder too! Thanks in advance!

asked 3 months ago

Correct Answer

In your example flags is an array of anchor tags including the img tags.

What you want is a way to get each individual src attribute from the image tag.

You can achieve this by looping over the results of your soup.find_all like so. Each flag is separate, which allows you to get the contents of the flag (the image tag) and then the value of the src attribute.

for flag in soup.find_all('a', attrs={'class': "image"}):
  src = flag.contents[0]['src'])

You can then work on downloading each of these to a file inside the loop.

answered 3 months ago