5. CSV module

5.1. Introduction

In this chapter, we will see some of the features of the Python in-built CSV module. This module can be quite useful for processing the files as shown in this chapter.

5.2. Basic file operations

In this section, we will perform some operations on the file without using the CSV module. For this first create a file ‘price.csv’ with following contents in it,

date,metal,radius,price,quantity
"2016-06-12","Gold",5.5,80.99,1
"2015-07-13","Silver",40.3,5.5,3
"2016-01-21","Iron",9.2,14.29,8
"2014-03-23","Gold",8,120.3,2
"2017-09-11","Copper",4.1,70.25,12
"2011-01-20","Iron",3.25,10.99,3

5.2.1. Open and close the files

  • Next go to the folder, where the file ‘price.csv’ is saved and open Python shell there. And run the following commands.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    >>> f = open("price.csv", 'r')
    >>> f # it is a buffered text stream
    <_io.TextIOWrapper name='price.csv' mode='r' encoding='UTF-8'>
    >>> data = f.read() # read the buffer into data
    >>> print(data) # print the data
    date,metal,radius,price,quantity
    "2016-06-12","Gold",5.5,80.99,1
    "2015-07-13","Silver",40.3,5.5,3
    "2016-01-21","Iron",9.2,14.29,8
    "2014-03-23","Gold",8,120.3,2
    "2017-09-11","Copper",4.1,70.25,12
    "2011-01-20","Iron",3.25,10.99,3
    
  • We can access and print the individual lines as well, as shown below,

    >>> r = open('price.csv', 'r') # open in read mode
    >>> for line in r:
    ...     print(line)
    ...
    date,metal,radius,price,quantity
    
    "2016-06-12","Gold",5.5,80.99,1
    
    "2015-07-13","Silver",40.3,5.5,3
    
    "2016-01-21","Iron",9.2,14.29,8
    
    "2014-03-23","Gold",8,120.3,2
    
    "2017-09-11","Copper",4.1,70.25,12
    
    "2011-01-20","Iron",3.25,10.99,3
    
  • Next close the file. Once the file is closed than we can not perform further operation on buffer.

    >>> f.close()
    >>> r.close()
    
    >>> for line in r: # file is closed, therefore can not be accessed
    ...     print(line)
    ...
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: I/O operation on closed file.
    
    >>> r
    <_io.TextIOWrapper name='price.csv' mode='r' encoding='UTF-8'>
    

5.2.2. with - open statement

In this section, ‘with - open’ statement is used to read the file.

Note

Do not forget to close the file using close() command. Further, file will be closed automatically, if it is open using ‘with’ statement, as shown in this section. This method is quite useful when we write the codes in the files, instead of Python-shell.

  • In the below code, the file is open using ‘with’ statement therefore it will be closed as soon as the ‘with’ statement ends. Therefore buffer will not be accessible outside the ‘with’ statement. For example in the below code, Line 13 generates the error, as buffer ‘w’ is outside the ‘with’ statement.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    >>> with open('price.csv', 'r') as w:
    ...     data = w.read()
    ...
    >>> print(data) # print the data
    date,metal,radius,price,quantity
    "2016-06-12","Gold",5.5,80.99,1
    "2015-07-13","Silver",40.3,5.5,3
    "2016-01-21","Iron",9.2,14.29,8
    "2014-03-23","Gold",8,120.3,2
    "2017-09-11","Copper",4.1,70.25,12
    "2011-01-20","Iron",3.25,10.99,3
    
    
    
    >>> for lines in w:  # file is already closed
    ...     print(lines)
    ...
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: I/O operation on closed file.
    >>>
    

5.3. Strings operations

We need to perform string-operations to the data for the further processing the data e.g. extracting the lines which contains “Gold” etc. In this section, we will see some of the string operations and the perform these operations on the file ‘print.csv’.

>>> m = "Hello World"
>>> print(m)
Hello World
>>> m[0] # print first character
'H'
>>> m[0:2] # print first 2 characters
'He'

>>> m[-1] # print last character
'd'
>>> m[-3:-1] # print 2nd and 3rd last (but not the last)
'rl'
>>> m[-3:] # print last 3 characters
'rld'

Lets see some more string operations as below. Please read the comments in the codes.

Commands Description
strip() remove end line character i.e. n
strip(‘”’) remove “
replace(‘”’, ‘-‘) replace ” with -
split(“,”) make list for data with separator ‘,’
>>> f = open('price.csv', 'r')
>>> for line in f:
...     print(line)
...
date,metal,radius,price,quantity

"2016-06-12","Gold",5.5,80.99,1

"2015-07-13","Silver",40.3,5.5,3

"2016-01-21","Iron",9.2,14.29,8

"2014-03-23","Gold",8,120.3,2

"2017-09-11","Copper",4.1,70.25,12

"2011-01-20","Iron",3.25,10.99,3


>>> line # loop store only one value
'"2011-01-20","Iron",3.25,10.99,3\n'

>>> # remove the end line character i.e. \n
>>> line.strip()
'"2011-01-20","Iron",3.25,10.99,3'
>>> line # strip operation does not save automatically
'"2011-01-20","Iron",3.25,10.99,3\n'
>>> line = line.strip() # save the split operation
>>> line
'"2011-01-20","Iron",3.25,10.99,3'

>>> line.replace('"', '-') # replace " with -
'-2011-01-20-,-Iron-,3.25,10.99,3'

>>> # create list and split at comma
>>> columns = line.split(',')
>>> columns
['"2011-01-20"', '"Iron"', '3.25', '10.99', '3']
>>> type(columns)
<class 'list'>

>>> # acess columns
>>> columns[0]
'"2011-01-20"'

>>> # remove " from the data
>>> for i, col in enumerate(columns):
...     columns[i] = col.strip('"')
...
>>> columns
['2011-01-20', 'Iron', '3.25', '10.99', '3']
>>> columns[0]
'2011-01-20'

>>> # all the items are string, therefore multiplaction can not be performed
>>> total_price = columns[3]*columns[4]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't multiply sequence by non-int of type 'str'

# convert string to 'float' and multiple
>>> total_price = float(columns[3])*int(columns[4])
>>> total_price
32.97

5.4. Perform calculations

In this section, we will write a Python script which will use all the operations of the previous section, to calculate the total cost of all the items.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# price.py

total_price = 0 # for all items in the list

with open('price.csv', 'r') as f: # open file in read mode
    header = next(rows) # skip line 1 i.e. header
    for line in f:
        line = line.strip() # remove \n
        columns = line.split(',') # split line on ,
        columns[0] = columns[0].strip('"') # remove " for metal
        columns[1] = float(columns[1]) # radius
        columns[2] = float(columns[2]) # price
        columns[3] = int(columns[3]) # quantity
        columns[4] = columns[4].strip('"') # date

        total_price += columns[2] * columns[3]

print("Total price =", total_price)

Following is the output of above listing,

$ python price.py
Total price = 1328.38

Important

Note that, ‘next(rows)’ is used in Listing 6.1 which return the next row of the reader’s iterable object as a list. Since, the first row is extracted and save in the header, therefore it will not be available inside the for loop.

5.5. Problem with current method

Create a file ‘price2.csv’ with following contents. Note that the contents of this file and the ‘price.csv’ are the same expect for the date, which is different format.

date,metal,radius,price,quantity
"Jun 12, 2016","Gold",5.5,80.99,1
"Jul 13, 2015","Silver",40.3,5.5,3
"Jan 21, 2016","Iron",9.2,14.29,8
"Mar 23, 2014","Gold",8,120.30,2
"Sep 11, 2017","Copper",4.1,70.25,12
"Jan 20, 2011","Iron",3.25,10.99,3

Now read the file ‘price2.csv’ in ‘price.py’ as below,

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# price.py

total_price = 0 # for all items in the list

with open('price2.csv', 'r') as f: # open file in read mode
    header = next(rows) # skip line 1 i.e. header
    for line in f:
        line = line.strip() # remove \n
        columns = line.split(',') # split line on ,
        columns[0] = columns[0].strip('"') # date
        columns[1] = columns[1].strip('"') # remove " for metal
        columns[2] = float(columns[2]) # radius
        columns[3] = float(columns[3]) # price
        columns[4] = int(columns[4]) # quantity

        total_price += columns[3] * columns[4]

print("Total price = %10.2f" % total_price)

Next execute the code and following error will be generated.

$ python -i price.py
Traceback (most recent call last):
  File "price.py", line 12, in <module>
    columns[2] = float(columns[2]) # radius
ValueError: could not convert string to float: '"Gold"'

This error is generated because the date has one comma, because of which an additional column is added to the list, as shown below,

>>> columns
['Jun 12', ' 2016', '"Gold"', '5.5', '80.99', '1']

Note

One way to remove this problem is to redesign the code according to new date format. Note that, the dates are in standard formats in both the cases. Therefore we should think like this, “This is not a new problem as everything is in standard format, therefore there must be an standard way to solve this problem”. And look for the standard library or third party packages to solve the problem. Currently, we can solve this problem using CSV module, or we can use Pandas-library which is quite powerful to solve the problems in data-processing. Click here to learn the Pandas.

5.6. CSV module

In this section, we will see two functionalities of the CSV module i.e. ‘csv.reader’ and ‘csv.DictReader’.

5.6.1. csv.reader

  • Before modifying the Python script, let us see the functionality of the CSV module. Note that, in the below outputs, the stripping and splitting operations are performed by the CSV module itself.

    >>> import csv
    >>> f = open('price2.csv', 'r')
    >>> rows = csv.reader(f) # read the file using csv
    >>> for row in rows:
    ...     print(row)
    ...
    ['date', 'metal', 'radius', 'price', 'quantity']
    ['Jun 12, 2016', 'Gold', '5.5', '80.99', '1']
    ['Jul 13, 2015', 'Silver', '40.3', '5.5', '3']
    ['Jan 21, 2016', 'Iron', '9.2', '14.29', '8']
    ['Mar 23, 2014', 'Gold', '8', '120.30', '2']
    ['Sep 11, 2017', 'Copper', '4.1', '70.25', '12']
    ['Jan 20, 2011', 'Iron', '3.25', '10.99', '3']
    
  • Following is the Python script which can perform the calculation on both the files i.e. ‘price.csv’ and ‘price2.csv’.

    Listing 5.1 price calculation using csv module
    # price.py
    
    import csv
    
    total_price = 0 # for all items in the list
    
    with open('price2.csv', 'r') as f: # open file in read mode
        rows = csv.reader(f)
        header = next(rows) # skip line 1 i.e. header
        for row in rows:
            row[3] = float(row[3]) # price
            row[4] = int(row[4]) # quantity
    
            total_price += row[3] * row[4]
    
    print("Total price = %10.2f" % total_price)
    
  • Run the above script and we will get the following output,

    $ python price.py
    Total price =    1328.38
    

Important

Note that, when we use standard library then lots of task are reduced e.g. here we need not to perform any cleaning operation i.e. removing double-quotes and commas etc. Also, the code is shorter and cleaner when we used the CSV module.

5.6.2. csv.DictReader

The ‘.DictReader’ option is same as the ‘.reader’, but it maps the information into the dictionary, which enhances the data processing capabilities.

Note

  • ‘.reader’ returns a reader-object which iterates over the line of the csv file.
  • ‘.DictReader’ is similar to ‘.reader’ but maps the information in the dictionary.
  • In the below example, Python data structure are used, e.g. List and Set etc., which are discussed in next chapter.
  • Further, we will some more examples of ‘.DictReader’ after learning the ‘data structure’ and ‘functions’.
>>> import csv
>>> f = list(csv.DictReader(open('price.csv'))) # read DictReader in list
>>> f[0]  # first item in the list

OrderedDict([('date', '2016-06-12'), ('metal', 'Gold'), ('radius', '5.5'),
    ('price', '80.99'), ('quantity', '1')])

>>> [row['metal'] for row in f] # display all values i.e. List
['Gold', 'Silver', 'Iron', 'Gold', 'Copper', 'Iron']

>>> {row['metal'] for row in f} # display unique values i.e. Set
{'Silver', 'Copper', 'Iron', 'Gold'}

>>> g = [row for row in f if row['metal'] == 'Gold'] # read Gold entries
>>> len(g)  # total gold entries
2

>>> for item in g:  # print radius, price and quantity
...     print(item['radius'], item['price'], item['quantity'])
...
5.5 80.99 1
8 120.3 2

5.7. Conclusion

In this chapter, we saw the various ways to read the files. Also, we learn the usage of CSV module. Lastly, we saw some of the data structure available in Python, which will be discussed in details after the next chapter. In the next chapter, we will discussed the “functions” and “error handling”.