0% found this document useful (0 votes)
16 views20 pages

CW MD Jahid Hasan 2024

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 20

MINISTRY OF SCIENCE AND HIGHER EDUCATION

RUSSIAN FEDERATION FEDERAL STATE BUDGET


EDUCATIONAL
INSTITUTION OF HIGHER EDUCATION

VORONEZH STATE UNIVERSITY OF FOREST AND TECHNOLOGY


NAMED AFTER G.F. MOROZOV

Department of Computer Technology and Microelectronic


Engineering

EXPLANATORY NOTE
COURSE WORK: CSV (Comma Separated Values) Module in
Python
Class: Basics of programming

The task was Performed by the student


of the ПиЦТ2-231-OБ group

___________________
MD JAHID HASAN

________________________
Verified By- Ph.D. E.A. Anikeev
Voronezh – 2024
1
Tabel of Content
Introduction …………………………………………………………..…………. 3
1. Working with CSV files in Python ...……………………………………………. 3
2. Analysis of information sources on CSV Module in Python
2.1. Python Documentation ……………………………………………………… 4
2.2. Free Code Camp …………………………………………………………….. 8
2.3. Geeks for geeks ……………………………………………………………… 9
2.4. Study Tonight ……………………………………………………………….. 11
3. Description of the program
3.1. Introduction …………………………………………………………………. 13
3.2. Objective ……………………………………………………………………. 13
3.3. Required Component ……………………………………………………….. 13
3.4. Program Functionality ……………………………………………………… 14
3.5. Algorithm …………………………………………………………………… 14
3.6. Block Diagram ……………………………………………………………… 15
4. Description of the programming language tools
4.1. Program Code ………………………………………………………………. 16
4.2. Output - Program results ……………………………………………………. 16
4.3. Describe Output …………………………………………………………….. 17
4.4. Functions Used ……………………………………………………………… 17
4.5. Libraries Used ………………………………………………………………. 17
Conclusion ……………………………………………………………………….. 18
Literature …………………………………………………………………………. 19
Appendix – Program Code ……………………………………………………….. 20

2
Introduction
The so-called CSV (Comma Separated Values) format is the most common import and
export format for spreadsheets and databases. CSV format was used for many years prior
to attempts to describe the format in a standardized way in RFC 4180. The lack of a
well-defined standard means that subtle differences often exist in the data produced and
consumed by different applications. These differences can make it annoying to process
CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the
overall format is similar enough that it is possible to write a single module which can
efficiently manipulate such data, hiding the details of reading and writing the data from the
programmer.

The csv module implements classes to read and write tabular data in CSV format. It allows
programmers to say, “write this data in the format preferred by Excel,” or “read data from
this file which was generated by Excel,” without knowing the precise details of the CSV
format used by Excel. Programmers can also describe the CSV formats understood by
other applications or define their own special-purpose CSV formats.

The csv module’s reader and writer objects read and write sequences. Programmers can
also read and write data in dictionary form using the DictReader and DictWriter classes.

csv.reader(csvfile, dialect='excel', **fmtparams)

Return a reader object that will process lines from the given csvfile. A csvfile must be an
iterable of strings, each in the reader’s defined csv format. A csvfile is most commonly a
file-like object or list. If csvfile is a file object, it should be opened with newline=''. An
optional dialect parameter can be given which is used to define a set of parameters specific
to a particular CSV dialect. It may be an instance of a subclass of the Dialect class or one
of the strings returned by the list_dialects() function. The other optional fmtparams
keyword arguments can be given to override individual formatting parameters in the
current dialect. For full details about the dialect and formatting parameters, see section
Dialects and Formatting Parameters.

Each row read from the csv file is returned as a list of strings. No automatic data type
conversion is performed unless the QUOTE_NONNUMERIC format option is specified
(in which case unquoted fields are transformed into floats).

3
1. Working with CSV files in Python
Python is one of the important fields for data scientists and many programmers to handle a
variety of data. CSV (Comma-Separated Values) is one of the prevalent and accessible file
formats for storing and exchanging tabular data.

Reading from a CSV file is done using the reader object. The CSV file is opened as a text
file with Python’s built-in open() function, which returns a file object. In this example, we
first open the CSV file in READ mode, file object is converted to csv.reader object and
further operation takes place. Code and detailed explanation is given below.

the CSV file is opened using the open() method in ‘r’ mode(specifies read mode while
opening a file) which returns the file object then it is read by using the reader() method of
CSV module that returns the reader object that iterates throughout the lines in the specified
CSV document.

CSV (Comma Separated Values) is a simple file format used to store tabular data, such as a
spreadsheet or database. A CSV file stores tabular data (numbers and text) in plain text.
Each line of the file is a data record. Each record consists of one or more fields, separated
by commas. The use of the comma as a field separator is the source of the name for this file
format. For working CSV files in Python, there is an inbuilt module called CSV.

● Reading a CSV file


● Reading CSV Files Into a Dictionary With csv
● Writing to a CSV file
● Writing a dictionary to a CSV file
● Reading CSV Files With Pandas
● Writing CSV Files With Pandas
● Storing email in CSV file

4
2. Analysis of information sources on CSV Module
in Python
2.1. Python Documentation
The csv module implements classes to read and write tabular data in CSV format. It allows
programmers to say, “write this data in the format preferred by Excel,” or “read data from
this file which was generated by Excel,” without knowing the precise details of the CSV
format used by Excel. Programmers can also describe the CSV formats understood by
other applications or define their own special-purpose CSV formats.

The csv module’s reader and writer objects read and write sequences. Programmers can
also read and write data in dictionary form using the DictReader and DictWriter classes.

Module Contents

The csv module defines the following functions:

csv.reader(csvfile, dialect='excel', **fmtparams)


Return a reader object that will process lines from the given csvfile. A csvfile must be an
iterable of strings, each in the reader’s defined csv format. A csvfile is most commonly a
file-like object or list. If csvfile is a file object, it should be opened with newline=''. An
optional dialect parameter can be given which is used to define a set of parameters specific
to a particular CSV dialect. It may be an instance of a subclass of the Dialect class or one of
the strings returned by the list_dialects() function. The other optional fmtparams keyword
arguments can be given to override individual formatting parameters in the current dialect.
For full details about the dialect and formatting parameters, see section Dialects and
Formatting Parameters. Each row read from the csv file is returned as a list of strings. No
automatic data type conversion is performed unless the QUOTE_NONNUMERIC format
option is specified (in which case unquoted fields are transformed into floats).

5
A short usage example:

>>> import csv


>>> with open('eggs.csv', newline='') as csvfile:
... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
... for row in spamreader:
... print(', '.join(row))
Spam, Spam, Spam, Spam, Spam, Baked Beans
Spam, Lovely Spam, Wonderful Spam

csv.writer(csvfile, dialect='excel', **fmtparams)


Return a writer object responsible for converting the user’s data into delimited strings on
the given file-like object. csvfile can be any object with a write() method. If csvfile is a file
object, it should be opened with newline='' [1]. An optional dialect parameter can be given
which is used to define a set of parameters specific to a particular CSV dialect. It may be
an instance of a subclass of the Dialect class or one of the strings returned by the
list_dialects() function. The other optional fmtparams keyword arguments can be given to
override individual formatting parameters in the current dialect. For full details about
dialects and formatting parameters, see the Dialects and Formatting Parameters section. To
make it as easy as possible to interface with modules which implement the DB API, the
value None is written as the empty string. While this isn’t a reversible transformation, it
makes it easier to dump SQL NULL data values to CSV files without preprocessing the
data returned from a cursor.fetch* call. All other non-string data are stringified with str()
before being written.

6
A short usage example:

import csv
with open('eggs.csv', 'w', newline='') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])

Dialects and Formatting Parameters

To make it easier to specify the format of input and output records, specific formatting
parameters are grouped together into dialects. A dialect is a subclass of the Dialect class
containing various attributes describing the format of the CSV file. When creating reader
or writer objects, the programmer can specify a string or a subclass of the Dialect class as
the dialect parameter. In addition to, or instead of, the dialect parameter, the programmer
can also specify individual formatting parameters, which have the same names as the
attributes defined below for the Dialect class.

Dialects support the following attributes:

Dialect.delimiter
A one-character string used to separate fields. It defaults to ','.

Dialect.doublequote
Controls how instances of quotechar appearing inside a field should themselves be quoted.
When True, the character is doubled. When False, the escapechar is used as a prefix to the
quotechar. It defaults to True.

On output, if doublequote is False and no escapechar is set, Error is raised if a quotechar is


found in a field.

7
2.2. Free Code Camp

CSV is an acronym for comma-separated values. It's a file format that you can use to store
tabular data, such as in a spreadsheet. You can also use it to store data from a tabular
database. We can refer to each row in a CSV file as a data record. Each data record consists
of one or more fields, separated by commas.This article shows you how to use the Python
built-in module called csv to create CSV files. In order to fully comprehend this tutorial,
you should have a good understanding of the fundamentals of the Python programming
language.

The csv module has two classes that you can use in writing data to CSV. These classes are:

● the csv.writer class


● the csv.DictWriter class

You can use the csv.writer class to write data into a CSV file. The class returns a writer object,
which you can then use to convert data into delimited strings.To ensure that the newline
characters inside the quoted fields interpret correctly, open a CSV file object with newline=''.

The syntax for the csv.writer class is as follows:


csv.writer(csvfile, dialect=’excel’, **fmtparams)

import csv

with open('profiles1.csv', 'w', newline='') as file:


writer = csv.writer(file)
field = ["name", "age", "country"]

writer.writerow(field)
writer.writerow(["Oladele Damilola", "40", "Nigeria"])
writer.writerow(["Alina Hricko", "23", "Ukraine"])
writer.writerow(["Isabel Walter", "50", "United Kingdom"])

8
2.3. Geeks for geeks

A CSV (Comma Separated Values) file is a form of plain text document that uses a
particular format to organize tabular information. CSV file format is a bounded text
document that uses a comma to distinguish the values. Every row in the document is a data
log. Each log is composed of one or more fields, divided by commas. It is the most popular
file format for importing and exporting spreadsheets and databases.

Reading a CSV File


There are various ways to read a CSV file in Python that use either the CSV module or the
pandas library.
● csv Module: The CSV module is one of the modules in Python that provides classes
for reading and writing tabular information in CSV file format.
● pandas Library: The pandas library is one of the open-source Python libraries that
provide high-performance, convenient data structures and data analysis tools and
techniques for Python programming.

Using csv.reader()

At first, the CSV file is opened using the open() method in ‘r’ mode(specifies read mode
while opening a file) which returns the file object then it is read by using the reader()
method of CSV module that returns the reader object that iterates throughout the lines in
the specified CSV document.
Note: The ‘with’ keyword is used along with the open() method as it simplifies exception
handling and automatically closes the CSV file.
Example: This code reads and prints the contents of a CSV file named ‘Giants.csv’ using
the csv module in Python. It opens the file in read mode, reads the lines, and prints them
one by one using a for loop. The csv.reader() function is used to read the CSV file, and the
data from each row is printed to the console.

import csv
with open('Giants.csv', mode ='r')as file:
csvFile = csv.reader(file)
for lines in csvFile:
print(lines)

9
Using csv.DictReader() class

It is similar to the previous method, the CSV file is first opened using the open() method then it
is read by using the DictReader class of csv module which works like a regular reader but maps
the information in the CSV file into a dictionary. The very first line of the file consists of
dictionary keys.
Example: This code reads and prints the contents of a CSV file named ‘Giants.csv’ using the
csv module with DictReader. It opens the file in read mode, reads the lines, and prints them one
by one. csv.DictReader() reads the CSV file and treats the first row as headers, creating a
dictionary for each row where the header values are the keys. The code prints each row as a
dictionary, making it easier to work with structured CSV data.

import csv
with open('Giants.csv', mode ='r') as file:
csvFile = csv.DictReader(file)
for lines in csvFile:
print(lines)

Using pandas.read_csv() method

It is very easy and simple to read a CSV file using pandas library functions. Here read_csv()
method of pandas library is used to read data from CSV files.
Example: This code uses the pandas library to read and display the contents of a CSV file
named ‘Giants.csv.’ It reads the CSV file and stores it as a DataFrame using the
pandas.read_csv() function. Finally, it prints the entire DataFrame, which provides a
structured and tabular representation of the CSV data. This is a common approach when
working with tabular data in Python, as pandas offers powerful tools for data manipulation and
analysis.

import pandas
csvFile = pandas.read_csv('Giants.csv')
print(csvFile)

10
2.4. Study To Night

CSV stands for Comma Separated Values. The file uses a separator character called
delimiter to separate each value. CSV is a typical format for information exchange as it's
smaller, straightforward, and general. Each line of the file is a data record. The standard
format of the data record is defined by rows and columns. Each record comprises one or
more fields, separated by commas.

If we take a table having thousands of data, the .csv file has the ability to separate the values
using commas into distinguishable fields or columns. Normally, first-line tells the heading
or column name of data and after that actual data set is listed.

To read data from the CSV file, we must use the reader function to generate a reader object.
We use python’s open() function to open a text file, which returns a file object. This is then
passed to the reader.

The csv.reader() method returns a reader object which will iterate over each line in the given
CSV file. Each row read from the CSV file is returned as a list of strings.

import csv

with open('model.csv') as file:


data = csv.reader(file)
for row in data:
print(row)

Instead of printing a list of individual String elements, CSV data can be directly printed in
the form of an ordered dictionary. The first line of the CSV file is assumed to contain the
keys to use to build the dictionary.

The csv.DictReader() function creates an object that operates like a regular reader but maps
the information in each row to a dictionary whose keys are given by the optional fieldnames
parameter.

11
#import necessary modules
import csv

data = csv.DictReader(open("model.csv"))
for row in data:
print(row)

The CSV file has initial spaces, quotes around each entry, and uses a delimiter. The
csv.register_dialect() function is used to define a custom dialect.

Syntax : csv.register_dialect(name[, dialect[, **fmtparams]])

The custom dialect requires a name in the form of a string. Other specifications can be done
either by passing a sub-class of Dialect class, or by individual formatting patterns.

import csv

csv.register_dialect('myDialect', delimiter='|', skipinitialspace=True,


quoting=csv.QUOTE_ALL)

with open('model.csv', 'r') as csvfile:


reader = csv.reader(csvfile, dialect='myDialect')
for row in reader:
print(row)

While creating the reader object, we pass dialect='myDialect' to specify that the reader
instance must use that particular dialect. The advantage of using dialect is that it makes the
program more modular. CSV files and different ways to read a CSV file by using several
built-in modules and libraries. We used Dialect class, Dictionary Reader object, and CSV
Reader object. We used some custom parsing codes as well to parse the CSV file using
different text files and CSV files.

12
3. Description of the program
3.1. Introduction:
The CSV (Comma Separated Values) module in Python is a powerful and flexible library
designed to facilitate reading from and writing to CSV files. CSV files are a widely used
format for storing tabular data, where each line in the file represents a row, and columns
within that row are separated by commas or other delimiters.

3.2. Objective:
The objective of this document and program is to provide a concise and practical guide on
using Python's built-in `csv` module to read from and write to CSV files. It aims to
demonstrate how to efficiently handle CSV data by showcasing basic usage of
`csv.reader`, `csv.writer`, `csv.DictReader`, and `csv.DictWriter`, and by providing a
concrete example where data is filtered based on a condition. Additionally, the guide
highlights customization options for delimiters, quote characters, and line terminators,
enabling users to adapt the CSV handling to various formats and requirements.

3.3. Required Components:


The program requires the following components:

● Python Interpreter: To execute the Python code.


● Text Editor or Integrated Development Environment (IDE): To write and edit the Python
code.

13
3.4. Program Functionality:
The program reads data from an input CSV file, filters out rows where the age is less than
30, and writes the filtered data to a new output CSV file. This is achieved through the
following steps:

1. Read Data from Input CSV:


● Opens the input CSV file using csv.DictReader, which reads the file into a
dictionary format where each row is represented as a dictionary with keys
corresponding to the column headers.
2. Filter Data:
● Iterates through the rows of the input CSV data and applies a filter condition (Age
>= 30). Only rows that meet this condition are retained for further processing.
3. Write Filtered Data to Output CSV:
● Opens the output CSV file using csv.DictWriter and writes the header (column
names).
● Writes the filtered rows to the output CSV file.

3.5. Algorithm:
1. Start
2. Define Function filter_csv(input_file, output_file):
● Open Input File:
1. Use with open(input_file, newline='') as infile to open the input CSV file
for reading.
● Open Output File:
1. Use with open(output_file, 'w', newline='') as outfile to open the output
CSV file for writing.
● Initialize CSV Reader:
1. Initialize csv.DictReader(infile) to read the input file as dictionaries.
● Initialize CSV Writer:
1. Initialize csv.DictWriter(outfile, fieldnames=reader.fieldnames) to write to
the output file using the same headers as the input file.
● Write Header to Output File:
1. Use writer.writeheader() to write the column headers to the output file.
● Filter and Write Rows:
1. For each row in reader:
● Check if int(row['Age']) >= 30.
● If the condition is true, write the row to the output file using
writer.writerow(row).

14
3. Main Execution:
● Call filter_csv Function:
1. Call filter_csv('input.csv', 'output.csv') with the appropriate file names.
● Print Completion Message:
1. Print "Filtered data has been written to output.csv" to indicate the process is
complete.
4. End

3.6. Block Diagram Of Python Array:


The Simple Python Array Operations Program provides a basic introduction to array
manipulation in Python using native list structures. By following the outlined algorithm,
we can do the basics of array manipulation and build a foundation for more advanced array
processing tasks.

Start

-------------------------

Define function filter_csv(input_file, output_file)

-------------------------

Read input CSV file and open output CSV file for writing

-------------------------

Filter data from input CSV file

-------------------------

Write filtered data to output CSV file

-------------------------

Print completion message

-------------------------

End

Figure 1 – Block Diagram of Array in Python


15
4. Description of the programming language tools
The CSV filtering program leverages Python's built-in tools, including the `csv` module for
CSV file handling, the `with` statement for context management to ensure proper file
closure, the `open()` function for file operations, and the `int()` function for converting
string age values to integers. These language tools collectively enable the program to
efficiently read data from an input CSV file, filter rows based on a specific condition (age
greater than or equal to 30), and write the filtered data to a new output CSV file. This
demonstrates Python's versatility and simplicity in handling common data manipulation
tasks, making it a powerful choice for handling CSV data and other file-related operations.
4.1. Program Code:
import csv

def filter_csv(input_file, output_file):


with open(input_file, newline='') as infile, open(output_file, 'w', newline='') as
outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
if int(row['Age']) >= 30:
writer.writerow(row)

if __name__ == "__main__":
filter_csv('input.csv', 'output.csv')
print("Filtered data has been written to output.csv")
Figure – 2 Simple Array Program in Python
4.2. Output - Program Result:

Input CSV (input.csv) Output CSV (output.csv)

Name,Age,City Name,Age,City
Alice,30,New York Alice,30,New York
Bob,25,Los Angeles Charlie,35,Chicago
Charlie,35,Chicago Eve,40,Seattle
Diana,28,Houston
Eve,40,Seattle

16
4.3. Describe Output
filter_csv(input_file, output_file): Opens the input CSV file for reading and the output CSV
file for writing.

● Uses csv.DictReader to read the input CSV file as dictionaries.


● Uses csv.DictWriter to write the filtered data to the output CSV file.
● Writes the header (column names) to the output file.
● Iterates through each row, and writes rows where the age is 30 or greater to the
output file.

4.4 Functions Used:

1. open(): Used to open files in different modes ('r' for reading, 'w' for writing, 'a' for
appending). It returns a file object.
2. csv.DictReader(): Creates a reader object that iterates over lines in the CSV file,
interpreting each line as a dictionary where the keys are column headers and the values
are the corresponding row values.
3. csv.DictWriter(): Creates a writer object that enables writing data to a CSV file in
dictionary format. It requires specifying the fieldnames (column names) for the CSV file.
4. writer.writeheader(): Writes the header (fieldnames) to the output CSV file.
5. writer.writerow(row): Writes a row of data (a dictionary) to the output CSV file.

4.5. Libraries Used:


The program uses the built-in `csv` module in Python. This module is part of Python's
standard library, which means it comes pre-installed with Python and does not require any
additional installation steps. The `csv` module provides functionality for reading from and
writing to CSV (Comma Separated Values) files, making it a convenient choice for handling
tabular data in Python programs.

17
Conclusion:
The CSV (Comma Separated Values) module in Python provides a powerful and flexible
tool for reading from and writing to CSV files, a common format for storing tabular data.
With its user-friendly interface and built-in functionalities, the csv module simplifies the
process of handling CSV files, offering classes like csv.reader and csv.writer for basic CSV
operations and csv.DictReader and csv.DictWriter for more advanced operations with
dictionary-based data. The module's versatility allows developers to handle different CSV
dialects and customize various aspects of CSV parsing and formatting, such as delimiters,
quoting characters, and line terminators. By leveraging the csv module, Python developers
can efficiently manipulate CSV data, making it an essential tool for data processing,
analysis, and manipulation tasks. Overall, the CSV module enhances productivity and
facilitates seamless interaction with CSV files, making it a valuable asset in Python
programming for handling tabular data. Source Analysis of this Online Websites: Various
online resources such as tutorials, forums, and programming websites were consulted to
gather information on CSV module in Python. These resources provided insights into
different techniques for array manipulation and best practices for writing efficient code.
Python Documentation: The official Python documentation, particularly the documentation
for lists and built-in functions, served as a valuable reference for understanding the syntax
and usage of various functions and methods used in the program. It provided detailed
explanations and examples that helped in implementing the desired functionalities. Based
on the insights gathered from the source analysis, functions were implemented for array
initialization, access, modification, arithmetic operations, and displaying array information.
These functions were designed to be simple, efficient, and easy to understand. The program
was tested iteratively to ensure that each function works as intended and produces the
expected output. Debugging was performed to identify and fix any errors or unexpected
behaviors encountered during testing. Efforts were made to optimize the code for clarity,
readability, and performance. List comprehensions were used for concise and efficient
implementation of array arithmetic operations. Comments and docstrings were added
throughout the code to explain the purpose and functionality of each function. This
documentation enhances code readability and makes it easier for other developers to
understand and maintain the code in the future. Overall, the development of the Python
Array Operations Program involved thorough research, careful implementation, and
rigorous testing to ensure its functionality and reliability.

18
Literature
1. Documentation of Python 3.12.2. The Python Module in CSV File Reading and
Writing.(Last updated on Apr 03, 2024) URL:
https://docs.python.org/3.12/library/csv.html (accessed: 26/05/2024).
2. Free Code Camp Dionysia Lemonaki Published Article About The Python Module in
CSV File Reading and Writing On (31 Jan 2022). The article covers arrays that you
create by importing the array module. URL:
https://www.freecodecamp.org/news/how-to-create-a-csv-file-in-python/ (accessed:
26/05/2024)
3. Geeks for Geeks Organization Published Article About The Python Module in CSV
File Reading and Writing (21 Nov 2023). The article covers arrays that you create
Integers and Doubles, Slicing, Removing and Searching module. URL:
https://www.geeksforgeeks.org/working-csv-files-python/ (accessed: 26/05/2024)
4. Study To Night Published Article About The Python Module in CSV File Reading and
Writing Using an initializer On (27 Oct 2023). URL:
https://www.studytonight.com/python-howtos/how-to-read-csv-to-list-in-python
(accessed: 26/05/2024)

19
Appendix - Program code
import csv

def filter_csv(input_file, output_file):


with open(input_file, newline='') as infile, open(output_file, 'w', newline='') as outfile:
reader = csv.DictReader(infile)
writer = csv.DictWriter(outfile, fieldnames=reader.fieldnames)
writer.writeheader()
for row in reader:
if int(row['Age']) >= 30:
writer.writerow(row)

if __name__ == "__main__":
filter_csv('input.csv', 'output.csv')
print("Filtered data has been written to output.csv")

20

You might also like