Python has many data science libraries and Pandas is one of Python's most popular data science libraries. Like the NumPy library arrays and ndArrays Python Pandas support Series and DataFrames.
Here, Series represents 1D array and DataFrames represents multi-dimensional arrays. An excel sheet can be represented as a DataFrame, and the pandas library provides us an inbuilt
read_excel()
method that can be used to achieve this goal.
In this short Python tutorial, you will learn how can you import an Excel sheet in Python using pandas (with the read_excel() method). Before you import an excel sheet in Python using pandas make sure that pandas in installed in your system.
Python Libraries Required to Import an Excel File in Python
There are 3 libraries you need to install in your python environment if you want to import an excel sheet using pandas.
pip install numpy
pip install pandas
pip install xlrd
Make sure that you have installed these three libraries before importing an Excel sheet in Python with Pandas, else you will be getting this error: ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd.
How to Import an Excel File into Python Using Pandas?
read_excel()
is a pandas method that allows us to access an Excel sheet using Python. The
read_excel()
method can load the Excel file from the local system or specified URL and the
read_excel()
method allows us to access Excel files with extension
xls, xlsx, xlsm, xlsb, odf, ods
and
odt
.
For the example, below we have used the countries.xlsx file:
Python Program to Import an Excel File Using pandas
import pandas as pd
#access the excel sheet using python
df = pd.read_excel('countries.xlsx')
print(df)
print("The type of df is:",type(df) )
Output
CountryCode ShortName TableName LongName
0 AFG Afghanistan Afghanistan Islamic State of Afghanistan
1 ALB Albania Albania Republic of Albania
2 DZA Algeria Algeria People's Democratic Republic of Algeria
3 ASM American Samoa American Samoa American Samoa
4 ADO Andorra Andorra Principality of Andorra
5 AGO Angola Angola People's Republic of Angola
6 ATG Antigua and Barbuda Antigua and Barbuda Antigua and Barbuda
7 ARB Arab World Arab World Arab World
8 ARG Argentina Argentina Argentine Republic
9 ARM Armenia Armenia Republic of Armenia
The type of df is: <class 'pandas.core.frame.DataFrame'>
Behind the code:
From the above example, you can see that the
read_excel()
method imports the countries.xlsx file and convert it into a Pandas Dataframe object. Moreover, it converts the first row of the excel sheet to the columns named.
In the above example, the python script and the excel file are at the same location that’s why we are directly able to access the file using the file name. If the excel file and python script are located at different locations then you need to specify the path location of the excel file.
Pandas read_excel() Method Arguments
The read_excel() method accepts multiple arguments, and most of the arguments are optional, except the file name.
read_excel() important arguments
- io
- header
- name
- index_col
io
represents the file name and it is the mandatory argument. It is represented by a string value that specifies the path for the excel file:
df = pd.read_excel(io ='countries.xlsx')
header
specifies the head value for the data frame and by default, its value is
0
, which represents that row 0 will be the header value for the data frame. If we set it to
None
then indexing values will be used as headers.
import pandas as pd
#access the excel sheet using python
df = pd.read_excel(io ='countries.xlsx', header = None)
print(df)
print("The type of df is:",type(df) )
Output
0 1 2 3
0 CountryCode ShortName TableName LongName
1 AFG Afghanistan Afghanistan Islamic State of Afghanistan
2 ALB Albania Albania Republic of Albania
3 DZA Algeria Algeria People's Democratic Republic of Algeria
4 ASM American Samoa American Samoa American Samoa
5 ADO Andorra Andorra Principality of Andorra
6 AGO Angola Angola People's Republic of Angola
7 ATG Antigua and Barbuda Antigua and Barbuda Antigua and Barbuda
8 ARB Arab World Arab World Arab World
9 ARG Argentina Argentina Argentine Republic
10 ARM Armenia Armenia Republic of Armenia
The type of df is: <class 'pandas.core.frame.DataFrame'>
The
names
represent a list of values that specifies the header for the data frame if
header
value is None.
import pandas as pd
#access the excel sheet using python
df = pd.read_excel(io ='countries.xlsx', header = None, names=["One", "Two", "Three", "Four"])
print(df)
Output
One Two Three Four
0 CountryCode ShortName TableName LongName
1 AFG Afghanistan Afghanistan Islamic State of Afghanistan
2 ALB Albania Albania Republic of Albania
3 DZA Algeria Algeria People's Democratic Republic of Algeria
4 ASM American Samoa American Samoa American Samoa
5 ADO Andorra Andorra Principality of Andorra
6 AGO Angola Angola People's Republic of Angola
7 ATG Antigua and Barbuda Antigua and Barbuda Antigua and Barbuda
8 ARB Arab World Arab World Arab World
9 ARG Argentina Argentina Argentine Republic
10 ARM Armenia Armenia Republic of Armenia
index_col
argument represents the first row labels of the data frame.
import pandas as pd
#access the excel sheet using python
df = pd.read_excel(io ='countries.xlsx', index_col="TableName")
print(df)
Output
CountryCode ShortName LongName
TableName
Afghanistan AFG Afghanistan Islamic State of Afghanistan
Albania ALB Albania Republic of Albania
Algeria DZA Algeria People's Democratic Republic of Algeria
American Samoa ASM American Samoa American Samoa
Andorra ADO Andorra Principality of Andorra
Angola AGO Angola People's Republic of Angola
Antigua and Barbuda ATG Antigua and Barbuda Antigua and Barbuda
Arab World ARB Arab World Arab World
Argentina ARG Argentina Argentine Republic
Armenia ARM Armenia Republic of Armenia
Conclusion
Here in this Python tutorial, you learned how can you import an Excel file in python using pandas with the
read_excel()
method. Pandas also provide various methods, such as
read_table, read_csv, read_json,
and
read_html
to read and import tables, CSV, JSON, and HTML files, respectively.
Before you use the
read_excel()
method in Python, ensure that all the other dependencies (NumPy and xlrd) have been installed in your Python environment.
People are also reading:
Leave a Comment on this Post