Extract Text From Wikipedia Html Using Python

November 17, 2024 Post a Comment

I am trying to look for a way to extract the main text of a Wikipedia article using python. I am aware of the 'wikipedia' library, but in my case I already have downloaded the html

Solution 1:

try BeautifulSoup:

from bs4 import BeautifulSoup
import requests

respond = requests.get("http://pl.wikipedia.org/wiki/StackOverflow")
soup = BeautifulSoup(respond.text)
l = soup.find_all('p')
print l[0].text

Solution 2:

You can use this python module:

pip install wikipedia

Html5 Log

Extract Text From Wikipedia Html Using Python

Solution 1:

Solution 2:

Post a Comment for "Extract Text From Wikipedia Html Using Python"