Wikipedia is the world's largest online encyclopedia. It contains millions of articles on different topics, you name it. Wikipedia provides you the public data about it. We have used Wikipedia to build school projects and write assignments. All thanks to free online articles, we can read and learn about anything for free. As Wikipedia contains millions of articles and topics, we can use the Python program to get data about all those specific topics.
Let's say you are building a Python project where you want to extract data for a specific topic from Wikipedia. In that case, you can either use Python web-scraping and get data from Wikipedia or use the Python Wikipedia API or Library.
In this Python tutorial, I will guide you through How to use the Python Wikipedia Library or API to get data for a specific topic. Before we start coding and learn more about how to use the Python Wikipedia library in a Python program, let's install the library first.
To install the Wikipedia library for your Python environment, run the following pip install command on your terminal(Linux/macOS) or command prompt(windows)
pip install wikipedia
Search Wikipedia topics with Python
Let's start with searching Wikipedia topics with Python. The
wikipedia
module provides a
search()
function that returns a list of relevant results based on the search query. The
search(query, results=10, suggestion=False)
function accepts 3 parameters:
query
is the topic which we want to search.
results
is the number of results that the search function should return; by default, its value is 10.
The
suggetstion
parameter will return the relevant suggestions for the topic in tuples if its value is True, but by default, its value is False.
Now let's use the
search()
function and search for the topic "Python," and let's see what result we get.
import wikipedia
topic = "Python"
#search for Python
results = wikipedia.search(topic, results =15)
print(results)
Output
['Python (programming language)', 'Python', 'Monty Python', 'Burmese python', 'Ball python', 'PYTHON', 'History of Python', 'Reticulated python', 'Python (genus)', 'Monty Python and the Holy Grail', 'Python molurus', 'Colt Python', 'Python (missile)', 'African rock python', 'Burmese pythons in Florida']
From the output, you can see that the search() function returns a list of 15 elements for query topic Python. All results we get from the search() functions are the official webpage title for the Wikipedia topics.
Fetch Wikipedia topic data with Python
Using the
search()
function, we can search for the relevant top topics for the query. Now let's say we also want to get some summary or description of the topic itself, so how would we get that?- The answer is
summary()
function. The summary() function returns a text string or summary about the specified page or topic.
summary(
query
,
sentences=0
,
chars=0
,
auto_suggest=True
,
redirect=True)
The
query
parameter specifies the page or topic name.
sentances
parameter specifies the number of sentences, 0 represents all the sentences.
chars
parameter represents the number of characters that should be returned from the summary 0 represents printing all the characters.
redirect
parameter allows redirection without any RedirectError. Now, let's print the 100-character summary from the top 3 search results.
import wikipedia
topic = "Python"
#top 3 best result
results = wikipedia.search(topic, results=3)
for topic in results:
print("Page---->", topic, ":")
print(wikipedia.summary(topic, chars=100))
print() #new line
Output
Page----> Python (programming language) :
Python is an interpreted, high-level and general-purpose programming language. Python's design philosophy...
Page----> Python :
Python is an interpreted, high-level and general-purpose programming language. Python's design philosophy...
Page----> Monty Python :
Monty Python (also collectively known as the Pythons) were a British surreal comedy troupe who created...
Fetch Wikipedia page data with Python
A Wikipedia page does not contain only text data but also images, links, references, page id, etc. Now let's see how we can get all the data from a Wikipedia page using the Python
wikipedia
module.
In
wikipedia
module, we get the
WikipediaPage()
class that returns a Wikipedia object with properties like page
categories
,
content
,
coordinates
,
images
,
links
,
references
etc.
WikipediaPage(
title=None
,
pageid=None
,
redirect=True
,
preload=False
,
original_title=u''
)
The
WikipediaPage()
class accepts the page
title
name as a mandatory parameter.
pageid
parameter specifies the page number for the title.
redirect
allow redirection without any error.
preload
parameter load the page data such as summary, images, content, and links. Now let's get the Wikipedia data for the page "Python (programming language)".
import wikipedia
title = "Python (programming language)"
page = wikipedia.WikipediaPage(title)
#get page content
print(page.content)
#get page images
print(f"The page {title} has {len(page.images)}: ")
for image_url in page.images:
print(image_url)
#page links
print(page.links)
Output
Python is an interpreted, high-level and general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant whitespace. Its language constructs and object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.Python is dynamically-typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly, procedural), object-oriented and functional programming. Python is often described as a "batteries included" language due to its comprehensive standard library.Python was..........................
The page Python (programming language) has 20:
https://upload.wikimedia.org/wikipedia/commons/b/b5/DNC_training_recall_task.gif
https://upload.wikimedia.org/wikipedia/commons/3/31/Free_and_open-source_software_logo_%282009%29.svg
https://upload.wikimedia.org/wikipedia/commons/9/94/Guido_van_Rossum_OSCON_2006_cropped.png
https://upload.wikimedia.org/wikipedia/commons/5/52/Merge-arrows.svg
https://upload.wikimedia.org/wikipedia/commons/6/6f/Octicons-terminal.svg
https://upload.wikimedia.org/wikipedia/commons/c/c3/Python-logo-notext.svg
https://upload.wikimedia.org/wikipedia/commons/1/10/Python_3._The_standard_type_hierarchy.png
https://upload.wikimedia.org/wikipedia/commons/f/f8/Python_logo_and_wordmark.svg
https://upload.wikimedia.org/wikipedia/commons/8/89/Symbol_book_class2.svg
https://upload.wikimedia.org/wikipedia/commons/d/df/Wikibooks-logo-en-noslogan.svg
https://upload.wikimedia.org/wikipedia/commons/f/fa/Wikibooks-logo.svg
https://upload.wikimedia.org/wikipedia/commons/f/ff/Wikidata-logo.svg
https://upload.wikimedia.org/wikipedia/commons/f/fa/Wikiquote-logo.svg
https://upload.wikimedia.org/wikipedia/commons/0/0b/Wikiversity_logo_2017.svg
https://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg
https://upload.wikimedia.org/wikipedia/en/8/8a/OOjs_UI_icon_edit-ltr-progressive.svg
https://upload.wikimedia.org/wikipedia/en/9/96/Symbol_category_class.svg
https://upload.wikimedia.org/wikipedia/en/d/db/Symbol_list_class.svg
https://upload.wikimedia.org/wikipedia/en/e/e2/Symbol_portal_class.svg
https://upload.wikimedia.org/wikipedia/en/9/94/Symbol_support_vote.svg
With
WikipediaPage()
module properties like categories, content, images, html(), links, references, etc., you can fetch the data from a Wikipedia page. In the above example, I have listed all the image URLs present on the page. If you want to know how to download images from a web page,
click here
.
Conclusion
In this Python tutorial, you learned how to use the Python Wikipedia library to extract data from Wikipedia pages. We do not need to use slow and inefficient web-scrapping to extract data from Wikipedia with this library. I would recommend you read the Python Wikipedia library's official documentation to know more about its functions.
People are also reading:
Leave a Comment on this Post