Errors are the bane of a programmer’s existence. You write an awesome piece of code, are ready to execute it and build a powerful machine learning model, and then poof. Python throws up an unexpected error, ending your hope of quick code execution. Every single one of us has faced this issue and emerged from it a better programmer. Handling bugs and exceptions is what builds our confidence in the long run and teaches us valuable lessons of programming in Python.
We have some rules while writing programs in Python, such as not using a space when defining a variable name, adding a colon (:) after the if
statement, and so on. If we don’t follow these rules, we run into Syntax Errors and our program refuses to execute until we squash those errors.
But there are occasions when the program is syntactically correct, and it still throws up an error when we try to execute the program. What’s going on here? Well, these errors detected during the execution are called exceptions. And dealing with these errors is called exception handling.
We’ll be talking all about exception handling in Python here!
Why should you learn exception handling? Here is the answer using a two-pronged argument:
Here’s a list of the common exceptions you’ll come across in Python:
ZeroDivisionError
: It is raised when you try to divide a number by zero.ImportError
: It is raised when you try to import the library that is not installed, or you have provided the wrong name.IndexError
: Raised when an index is not found in a sequence. IndentationError
: Raised when indentation is not specified properly.ValueError
: Raised when the built-in function for a data type has the valid type of arguments, but the arguments have invalid values specified.Exception
: Base class for all exceptions. If you are not sure about which exception may occur, you can use the base class. It will handle all of them.You can read about more common exceptions here.
The try
keyword is used in Python to handle exceptions or errors that may occur during the execution of a block of code. It allows you to catch and handle exceptions gracefully, preventing your program from crashing.
try
statement.try
block is executed sequentially.try
block, the program jumps to the nearest except
block.except
block handles the exception and performs appropriate actions, such as logging an error message or taking corrective measures.except
block, the program runs the following code outside the try-except structure.Let’s define a function to divide two numbers, a and b. It will work fine if the value of b is non-zero, but it will generate an error if the value of b is zero:
def division(a, b):
return a/b
# function works fine when you try to divide the number by a non-zero number
print(division(10, 2))
# >> 5.0
print(division(10,-3))
# >> -3.333333334
# Error when you try to divide the number by zero
print(division(10,0))
We can handle this using the try
and except
statement. First, the try
clause will be executed, which contains the statements between the try
and except
keywords.
If no exception occurs, the except
clause will be skipped. On the other hand, if an exception occurs during the execution of the try
clause, then the rest of the try
statements will be skipped, and the control will go over to the except
block:
# define a function to divide two numbers
def division(a, b):
# use the try statement where error may occur
try:
return a/b
# if the error occurs, handle it !!
except ZeroDivisionError:
print("Cannot divide by Zero!!")
division(10,5)
## >> 2
division(10,0)
## >> Cannot divide by Zero!!
The above program first gives the output 2, and then displays “Cannot divide by Zero!!” upon execution. This is because trying to divide any number by zero leads to an error, which is handled by the except
clause, which eventually leads to the execution of the print
statement within.
In Python, we can also instruct a program to execute certain lines of code if no exception occurs using the else
clause. Now, if no exception occurs in the above code, we want to print “No Error occurred!!”.
Let’s see how to do this:
# define a function to divide two numbers
def division(a, b):
# use the try statement where error may occur
try:
print(a/b)
# if the error occurs, handle it !!
except ZeroDivisionError:
print("Cannot divide by Zero!!")
# if no error occurs
else:
print("No Error occured!!")
division(10,0)
# >> Cannot divide by Zero!!
division(10,2)
# >> 5
# >> No Error occured!!
Here the else
clause executes at the end, only if no exception occurs during the exception.
Now, what if we need some sort of action that will execute whether the error occurred or not (like maintaining logs)? For this, we have the finally
clause in Python. It will always get executed irrespective of whether the exceptions occurred or not.
We will see how we can use the finally
clause to write the logs, in detail, later in this article.
In the following example, the values of a
and b
are displayed after every execution, regardless of whether an error occurred or not.
# define a function to divide two numbers
def division(a, b):
# use the try statement where error may occur
try:
print(a/b)
# if the error occurs, handle it !!
except ZeroDivisionError:
print("Cannot divide by Zero!!")
else:
print("No Error occured!!")
finally:
print('Value of a', a, 'and b', b)
division(10,0)
## >> Cannot divide by Zero!!
## >> Value of a 10 and b 0
division(10,2)
## >> 5.0
## >> No Error occured!!
## >> Value of a 10 and b 2
So far, we have seen exception handling on some random data. How about turning the lever up a notch? Let’s understand this using a real-life example!
We have data that contains the details of employees, such as their education, age, number of trainings undertaken, etc. The data is divided into multiple directories region-wise. The details of employees belonging to the same region are stored in the same file.
Now, our task is to read all the files and concatenate their data to form a single file. Let’s start by importing some of the required libraries.
To view the directory structure, we will use the glob library, and to read the CSV files, we will use the pandas library:
# import the required libraries
import glob
import pandas as pd
View the directory structure using the glob.glob
method and the path of the target directory. You can download the directory structure here.
# list all the files in the folder
for directory in glob.glob('dataset/*'):
print(directory)
We can see that folder names are represented as some numbers, and in the next step will go through each of the directories and see the files present:
for directory in glob.glob('dataset/*'):
for files in glob.glob(directory + '/*'):
print(files)
In each of the folders, there is a CSV file present that contains the details of the employees of that particular region. You can open and view any CSV file. Below is the image of how the data looks in the region_1.csv file. It contains details of employees belonging to region 1:
Now, we know that there is a pattern in the directory and filename structure. In the directory n, there is a CSV file named region_n present. Now we will try to read all these files using a loop.
We know that the maximum number is 34, so we will use a for
loop to iterate and read the files in order:
for folder in range(1, 35):
file_name = 'region_' + str(folder) + '.csv'
data = pd.read_csv('dataset/'+ str(folder) +'/' +file_name)
Since the file named region_7 is not present, it led to an error. So, one of the simpler ways to deal with this is to put an if
condition in the program, i.e., if the directory name is 7 then skip reading from that file.
But what if we have to read thousands of files together? It would be a tedious task to update the if
condition every time we get an error.
Here, we will use the try
and except
statement to handle the errors. If there is any exception during the run time while reading any file, we will just skip that step and continue reading the next folder. We will display the file name with “File not found!!” if the error is FileNotFoundError, and print the file name with “Another Error!!” if any other error occurs.
# create a list to store the dataframes
dataframe_list = []
# iterate through folder 1 to 34
for folder in range(1, 35):
# try we are able to read the file
try :
### notice that for folder i, we have file name "region_i"
### create the file name
file_name = 'region_' + str(folder) + '.csv'
### read the file if possible
data = pd.read_csv('dataset/'+ str(folder) +'/' +file_name)
dataframe_list.append(data)
# if any error occurs, skip this step and continue reading other files.
except FileNotFoundError:
print('File not found!!', file_name)
# If any other error occurs, print the file name
except Exception:
print('Another Error!!', file_name)
continue
We can see that we got “Another Error!!” in file number 32. Let’s try to read this file separately:
pd.read_csv('dataset/32/region_32.csv')
There are some issues with File 32’s format. If you open the file region_32.csv, you will see that there are some comments added on top of the file, and we can read that file using the skiprows parameter:
Let’s see how to handle this.
We will create two boolean variables, namely parse_error
and file_not_found
, and initialize both of them as False at the start of each iteration. So, if we get a FileNotFoundError, then we’ll set file_not_found
as True.
Then, we will print that particular file as missing and skip that iteration. If we get ParserError, then we’ll set parse_error
as True and display that the particular file has an incorrect format, and read the file again using the skiprows parameter.
Now, if no exception occurs, then the code under the else
statement will execute. In the else
statement, we will append the data frame to the data frame list:
from pandas.io.parsers import ParserError
# create a list to store the dataframes
dataframe_list = []
# iterate through the folders 1 to 34
for folder in range(1, 35):
# create two boolean variables for both kind of exceptions.
parse_error = False
file_not_found = False
### notice that for folder i, we have file name "region_i"
### create the file name
try :
file_name = 'region_' + str(folder) + '.csv'
data = pd.read_csv('dataset/'+ str(folder) +'/' +file_name)
# if the error is ParserError, print file has incorrect format and set parse_error = True
except ParserError:
parse_error = True
data = pd.read_csv('dataset/'+ str(folder) +'/' +file_name, skiprows=4)
dataframe_list.append(data)
print(file_name, 'has incorrect format.')
# if the error is FileNotFoundError, print file is missing and set file_not_found = True
except FileNotFoundError:
file_not_found = True
print(file_name, 'is missing')
# if no exception occurs, append the dataframe to the list.
else:
dataframe_list.append(data)
Output:
File not found!! region_7.csv
File not found!! region_29.csv
Another Error!! region_32.csv
Let’s say you want to create a log file to keep track of which files are correct and which ones have errors. This is one way in which the finally
statement can be put to use. Whether you get the error or not, the finally
statement will execute.
So in the finally
clause, we will write to the file the status of the file at each iteration:
from pandas.io.parsers import ParserError
# create a list to store the dataframes
dataframe_list = []
file = open('log_file.txt', 'a+')
# iterate through the folders 1 to 34
for folder in range(1, 35):
# create two boolean variables for both kind of exceptions.
parse_error = False
file_not_found = False
# notice that for folder i, we have file name "region_i"
# create the file name
try :
file_name = 'region_' + str(folder) + '.csv'
data = pd.read_csv('dataset/'+ str(folder) +'/' +file_name)
# if the error is ParserError, print file has incorrect format and set parse_error = True
except ParserError:
parse_error = True
data = pd.read_csv('dataset/'+ str(folder) +'/' +file_name, skiprows=4)
dataframe_list.append(data)
print(file_name, 'has incorrect format.')
# if the error is FileNotFoundError, print file is missing and set file_not_found = True
except FileNotFoundError:
file_not_found = True
print(file_name, 'is missing')
# if no exception occurs, append the dataframe to the list.
else:
dataframe_list.append(data)
# Now, in the end, update the log file.
finally:
if parse_error:
file.writelines(file_name + ' has incorrect format\n')
elif file_not_found:
file.writelines(file_name + ' is missing.\n')
else:
file.writelines(file_name + ' is correct.\n')
file.close()
Output:
region_1.csv is correct.
region_2.csv is correct.
region_3.csv is correct.
region_4.csv is correct.
region_5.csv is correct.
.
.
.
Exception handling, when used right, prevents unexpected exits from the program. I used exception handling while scraping data from multiple webpages. I’d love to know how you use exception handling, so comment below your use case and share it with the community.
If you found this article informative, then please share it with your friends and comment below with your queries and feedback. I have listed some amazing articles related to Python and data science below for your reference:
A very well written and informative article - well done! I learned especially about logging the errors and that Pandas had own exceptions. In web scraping I use exceptions for missing data, missing html and sometimes to try and find the data on alternative web page sources.