How to Import an Excel File into Python using Pandas?

Posted in /  

How to Import an Excel File into Python using Pandas?
vinaykhatri

Vinay Khatri
Last updated on November 21, 2024

    Python has many data science libraries and Pandas is one of Python's most popular data science libraries. Like the NumPy library arrays and ndArrays Python Pandas support Series and DataFrames.

    Here, Series represents 1D array and DataFrames represents multi-dimensional arrays. An excel sheet can be represented as a DataFrame, and the pandas library provides us an inbuilt read_excel() method that can be used to achieve this goal.

    In this short Python tutorial, you will learn how can you import an Excel sheet in Python using pandas (with the read_excel() method). Before you import an excel sheet in Python using pandas make sure that pandas in installed in your system.

    Python Libraries Required to Import an Excel File in Python

    There are 3 libraries you need to install in your python environment if you want to import an excel sheet using pandas.

    1. numpy
    2. pandas
    3. xlrd

    pip install numpy pip install pandas pip install xlrd Make sure that you have installed these three libraries before importing an Excel sheet in Python with Pandas, else you will be getting this error: ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd.

    How to Import an Excel File into Python Using Pandas?

    read_excel() is a pandas method that allows us to access an Excel sheet using Python. The read_excel() method can load the Excel file from the local system or specified URL and the read_excel() method allows us to access Excel files with extension xls, xlsx, xlsm, xlsb, odf, ods and odt .

    For the example, below we have used the countries.xlsx file:

    Python Program to Import an Excel File Using pandas

    import pandas as pd
    #access the excel sheet using python
    df = pd.read_excel('countries.xlsx')
    print(df)
    print("The type of df is:",type(df) )

    Output

    CountryCode            ShortName            TableName                                 LongName
    
    0         AFG          Afghanistan          Afghanistan             Islamic State of Afghanistan
    
    1         ALB              Albania              Albania                      Republic of Albania
    
    2         DZA              Algeria              Algeria  People's Democratic Republic of Algeria
    
    3         ASM       American Samoa       American Samoa                           American Samoa
    
    4         ADO              Andorra              Andorra                  Principality of Andorra
    
    5         AGO               Angola               Angola              People's Republic of Angola
    
    6         ATG  Antigua and Barbuda  Antigua and Barbuda                      Antigua and Barbuda
    
    7         ARB           Arab World           Arab World                               Arab World
    
    8         ARG            Argentina            Argentina                       Argentine Republic
    
    9         ARM              Armenia              Armenia                      Republic of Armenia
    
    The type of df is: <class 'pandas.core.frame.DataFrame'>

    Behind the code:

    From the above example, you can see that the read_excel() method imports the countries.xlsx file and convert it into a Pandas Dataframe object. Moreover, it converts the first row of the excel sheet to the columns named.

    In the above example, the python script and the excel file are at the same location that’s why we are directly able to access the file using the file name. If the excel file and python script are located at different locations then you need to specify the path location of the excel file.

    Pandas read_excel() Method Arguments

    The read_excel() method accepts multiple arguments, and most of the arguments are optional, except the file name.

    read_excel() important arguments

    • io
    • header
    • name
    • index_col

    io represents the file name and it is the mandatory argument. It is represented by a string value that specifies the path for the excel file: df = pd.read_excel(io ='countries.xlsx') header specifies the head value for the data frame and by default, its value is 0 , which represents that row 0 will be the header value for the data frame. If we set it to None then indexing values will be used as headers.

    import pandas as pd
    #access the excel sheet using python
    df = pd.read_excel(io ='countries.xlsx', header = None)
    
    print(df)
    print("The type of df is:",type(df) )

    Output

                  0                    1                    2                                        3
    
    0   CountryCode            ShortName            TableName                                 LongName
    
    1           AFG          Afghanistan          Afghanistan             Islamic State of Afghanistan
    
    2           ALB              Albania              Albania                      Republic of Albania
    
    3           DZA              Algeria              Algeria  People's Democratic Republic of Algeria
    
    4           ASM       American Samoa       American Samoa                           American Samoa
    
    5           ADO              Andorra              Andorra                  Principality of Andorra
    
    6           AGO               Angola               Angola              People's Republic of Angola
    
    7           ATG  Antigua and Barbuda  Antigua and Barbuda                      Antigua and Barbuda
    
    8           ARB           Arab World           Arab World                               Arab World
    
    9           ARG            Argentina            Argentina                       Argentine Republic
    
    10          ARM              Armenia              Armenia                      Republic of Armenia
    
    The type of df is: <class 'pandas.core.frame.DataFrame'>

    The names represent a list of  values that specifies the header for the data frame if header value is None.

    import pandas as pd
    
    #access the excel sheet using python
    df = pd.read_excel(io ='countries.xlsx', header = None, names=["One", "Two", "Three", "Four"])
    
    print(df)

    Output

                One                  Two                Three                                     Four
    
    0   CountryCode            ShortName            TableName                                 LongName
    
    1           AFG          Afghanistan          Afghanistan             Islamic State of Afghanistan
    
    2           ALB              Albania              Albania                      Republic of Albania
    
    3           DZA              Algeria              Algeria  People's Democratic Republic of Algeria
    
    4           ASM       American Samoa       American Samoa                           American Samoa
    
    5           ADO              Andorra              Andorra                  Principality of Andorra
    
    6           AGO               Angola               Angola              People's Republic of Angola
    
    7           ATG  Antigua and Barbuda  Antigua and Barbuda                      Antigua and Barbuda
    
    8           ARB           Arab World           Arab World                               Arab World
    
    9           ARG            Argentina            Argentina                       Argentine Republic
    
    10          ARM              Armenia              Armenia                      Republic of Armenia

    index_col argument represents the first row labels of the data frame.

    import pandas as pd
    
    #access the excel sheet using python
    df = pd.read_excel(io ='countries.xlsx', index_col="TableName")
    
    print(df)

    Output

                        CountryCode            ShortName                                 LongName
    TableName
    Afghanistan                 AFG          Afghanistan             Islamic State of Afghanistan
    Albania                     ALB              Albania                      Republic of Albania
    Algeria                     DZA              Algeria  People's Democratic Republic of Algeria
    American Samoa              ASM       American Samoa                           American Samoa
    Andorra                     ADO              Andorra                  Principality of Andorra
    Angola                      AGO               Angola              People's Republic of Angola
    Antigua and Barbuda         ATG  Antigua and Barbuda                      Antigua and Barbuda
    Arab World                  ARB           Arab World                               Arab World
    Argentina                   ARG            Argentina                       Argentine Republic
    Armenia                     ARM              Armenia                      Republic of Armenia

    Conclusion

    Here in this Python tutorial, you learned how can you import an Excel file in python using pandas with the read_excel() method. Pandas also provide various methods, such as read_table, read_csv, read_json, and read_html to read and import tables, CSV, JSON, and HTML files, respectively.

    Before you use the read_excel() method in Python, ensure that all the other dependencies (NumPy and xlrd) have been installed in your Python environment.

    People are also reading:

    Leave a Comment on this Post

    0 Comments