This is a tutorial of Python XML Parser - the Standard XML module capable of parsing XML files and writing data to the same in Python.
XML stands for Extensible Markup Language and like HTML, it is also a markup language. In XML, however, we do not use predefined tags but here we can use our own custom tags based on the data we are storing in the XML file.
An XML file is often used to share, store, and structure data because it can easily be transferred between servers and systems.
We all know when it comes to data, Python is one of the best programming languages to process and parse it. Luckily, Python comes with a Standard XML module that can parse XML files in Python and also write data in the XML file. This is called Python XML Parser.
In this Python tutorial, we will walk through the Python XML
minidom
and
ElemetnTree
modules, and learn how to parse an XML file in Python.
Python XML
minidom
and
ElementTree
module
The Python XML module support two sub-modules
minidom
and
ElementTree
to parse an XML file in Python. The
minidom
or Minimal DOM module provides a DOM (Document Object Model) like structure to parse the XML file, which is similar to the DOM structure of JavaScript.
Although we can parse an XML document using
minidom
,
ElementTree
provides a much better Pythonic way to parse an XML file in Python.
XML File
For all the examples in this tutorial, we will be using the
demo.xml
file, which contains the following XML data: #
demo.xml
<item>
<record>
<name>Jameson</name>
<phone>(080) 78168241</phone>
<email>cursus.in.hendrerit@ipsumdolor.edu</email>
<country>South Africa</country>
</record>
<record>
<name>Colton</name>
<phone>(026) 53458662</phone>
<email>non@idmagna.ca</email>
<country>Libya</country>
</record>
<record>
<name>Dillon</name>
<phone>(051) 96790901</phone>
<email>Aliquam.ornare@Etiamlaoreetlibero.ca</email>
<country>Madagascar</country>
</record>
<record>
<name>Channing</name>
<phone>(014) 98829753</phone>
<email>faucibus.Morbi.vehicula@aliquamarcu.co.uk</email>
<country>Korea, South</country>
</record>
</item>
In the above example, you can see that the data is nested under custom <tags>. The root tag is <item>, which has <record> as a nested tag, which further has 4 more nested tags:
- <name>,
- <phone>,
- <email>, and
- <country>.
Parse/Read XML Document in Python using minidom
minidom
is the submodule of the Python standard XML
module
, which means you do not have to pip install XML to use
minidom
. The
minidom
module
parses the XML document
in a Document Object Model(DOM), whose data can further be extracted using the
getElemetsByTagName()
function.
Syntax: To parse the XML document in Python using
minidom
from xml.dom import minidom
minidom.parse("filename")
Example: Let's grab all the names and phone data from our demo.xml file.
from xml.dom import minidom
#parse xml file
file = minidom.parse('demo.xml')
#grab all <record> tags
records = file.getElementsByTagName("record")
print("Name------>Phone")
for record in records:
#access <name> and <phone> node of every record
name = record.getElementsByTagName("name")
phone = record.getElementsByTagName("phone")
#access data of name and phone
print(name[0].firstChild.data, end="----->")
print(phone[0].firstChild.data)
Output
Name------>Phone
Jameson----->(080) 78168241
Colton----->(026) 53458662
Dillon----->(051) 96790901
Channing----->(014) 98829753
In the above example, you can see that first, we imported the
minidom
module using the
from xml.dom import minidom
statement. Then we parse our demo.xml file with
file = minidom.parse('demo.xml')
statement. The
parse()
function parses the XML document in a model node object with the
<item>
root node.
Note: " Our Python script and thedemo.xml
file are located at the same location that's why we only specify the file namedemo.txt
in theminidom.parse()
function. If your Python script and xml file are located at different locations, then you have to specify the absolute or relative path of the file."
After passing the XML file in our Python program we accessed all the
<record>
nodes using the
records = file.getElementsByTagName("record")
statement. The
getElementsByTagName()
is the
minidom
object function which returns a node objects of the specified tag.
Once we had all the record nodes, we loop through those nodes, and again using the
getElementsByTagName()
function we accessed its nested
<name>
and
<phone>
nodes.
Next, after accessing the individual
name
and
phone
node we printed their data using
name[0].firstChild.data
and
phone[0].firstChild.data
statement. The
firstChild.data
is the property of every node, by which we can access the text data of a specific node object.
Parse/Read XML Document in Python Using ElementTree
The
ElementTree
module provides a simple and straightforward way to parse and read XML files in Python. As
minidom
is the submodule of
xml.dom,
the ElementTree is the submodule of
xml.etree
. The
ElementTree
module parses the XML file in a tree-like structure where the root branch will be the first <tag> of the xml file(<item> in our case).
Syntax: To parse the XML document in Python using ElementTree
import xml.etree.ElementTree as ET
ET.parse('file_name.xml')
Example
Using
minidom
we grab the name and phone data, now let's access email and country data using XML
ElementTree.
import xml.etree.ElementTree as ET
tree = ET.parse('demo.xml')
#get root branch <item>
item = tree.getroot()
#loop through all <record> of <item>
for record in item.findall("record"):
email = record.find("email").text
country = record.find("country").text
print(f"Email: {email},-------->Country:{country}")
Output
Email: cursus.in.hendrerit@ipsumdolor.edu,-------->Country:South Africa
Email: non@idmagna.ca,-------->Country:Libya
Email: Aliquam.ornare@Etiamlaoreetlibero.ca,-------->Country:Madagascar
Email: faucibus.Morbi.vehicula@aliquamarcu.co.uk,-------->Country:Korea, South
From the above example, you can see that using
ElementTree
provides a more elegant and pythonic way to read or Parse an XML file in Python.
In our first statement, we imported
import xml.etree.ElementTree as ET
ElementTree as ET in our program. Then using the
tree= ET.parse('demo.xml')
statement we parse
demo.xml
file.
With the help of the
item = tree.getroot()
statement we access the root branch of our xml file, which is <item>. Then we loop through every <record> branch with the
item.findall("record")
statement and grab their email and phone data with
record.find("email").text
and
record.find("phone").text
statements.
Check out the Official documentation of the XML ElementTree module to know more about ElementTree and its functions.
Conclusion
That sums up this tutorial on Python XML Parser. As you can see, Python provides an inbuild Standard
xml
module to read and parse XML files in Python. It generally has 2 submodules that can parse an XML file:
-
minidom
and -
ElementTree.
The
minidom
module follows the Document Object Model approach to parse an XML file. On the other hand, the
ElementTree
module follows the tree-like structure to parse the XML file.
People are also reading:
- PHP XML Parsing Functions
- Best XML Editors
- How to Convert HTML Tables into CSV Files in Python?
- Face Detection in Python
- HOG Feature Extraction in Python
- Python Google Custom Search Engine API
- Automated Browser Testing in Python
- Python readline() Method
- Install Python package using Jupyter Notebook
- Starting Python Coding on a MacBook
Leave a Comment on this Post