0% found this document useful (0 votes)

16 views42 pages

DATASCIENCE LAB

data science lab'

Uploaded by

G Ravi Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views42 pages

DATASCIENCE LAB

data science lab'

Uploaded by

G Ravi Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

List of Experiments

1. Creating
2. a NumPy Array a. Basic ndarray
3. b. Array of zeros
4. c. Array of ones
5. d. Random numbers in ndarray
6. e. An array of your choice
7. f. Imatrix in NumPy
8. g. Evenly spaced ndarray

Here’s how you can create different types of NumPy arrays based on your requests:

1. Basic ndarray

To create a basic NumPy array from a list:

python
import numpy as np

# Basic ndarray
arr = np.array([1, 2, 3, 4, 5])
print(arr)

2. Array of Zeros

To create an array filled with zeros:

# Array of zeros
zeros_arr = np.zeros((3, 3)) # 3x3 matrix of zeros
print(zeros_arr)

3. Array of Ones

To create an array filled with ones:

python
Copy code
# Array of ones
ones_arr = np.ones((2, 4)) # 2x4 matrix of ones
print(ones_arr)

4. Random Numbers in ndarray

To create an array with random numbers (uniform distribution between 0 and 1):

python
Copy code
# Random numbers in ndarray
random_arr = np.random.rand(3, 3) # 3x3 matrix of random numbers
print(random_arr)

You can also generate random integers using np.random.randint:

python
Copy code
# Random integers in ndarray
random_int_arr = np.random.randint(0, 10, size=(2, 5)) # 2x5 matrix with integers between 0
and 10
print(random_int_arr)

5. An Array of Your Choice

You can create an array based on a specific sequence or data you want. Here's an example of a
custom array:

python
Copy code
# An array of your choice
custom_arr = np.array([10, 20, 30, 40])
print(custom_arr)

6. Identity Matrix in NumPy

To create an identity matrix (diagonal of ones, rest are zeros):

python
Copy code
# Identity matrix
identity_matrix = np.eye(4) # 4x4 identity matrix
print(identity_matrix)

7. Evenly Spaced ndarray

To create an array with evenly spaced values using np.linspace:

python
Copy code
# Evenly spaced ndarray
evenly_spaced_arr = np.linspace(0, 10, 5) # 5 values from 0 to 10 (inclusive)
print(evenly_spaced_arr)

You can also use np.arange to create an array with a specified step:

python
Copy code
# Evenly spaced ndarray using np.arange
even_arr = np.arange(0, 10, 2) # Values from 0 to 10 with step size of 2
print(even_arr)
2. The Shape and Reshaping of NumPy Array

a. Dimensions of NumPy array

b. Shape of NumPy array

c. Size of NumPy array

d. Reshaping a NumPy array

e. Flattening a NumPy array

Transpose of a NumPy array

Here’s how you can create different types of NumPy arrays based on your requests:

1. Basic ndarray

To create a basic NumPy array from a list:

python
Copy code
import numpy as np

# Basic ndarray
arr = np.array([1, 2, 3, 4, 5])
print(arr)

2. Array of Zeros

To create an array filled with zeros:

python
Copy code
# Array of zeros
zeros_arr = np.zeros((3, 3)) # 3x3 matrix of zeros
print(zeros_arr)

3. Array of Ones

To create an array filled with ones:

python
Copy code
# Array of ones
ones_arr = np.ones((2, 4)) # 2x4 matrix of ones
print(ones_arr)

4. Random Numbers in ndarray

To create an array with random numbers (uniform distribution between 0 and 1):

python
Copy code
# Random numbers in ndarray
random_arr = np.random.rand(3, 3) # 3x3 matrix of random numbers
print(random_arr)

You can also generate random integers using np.random.randint:

python
Copy code
# Random integers in ndarray
random_int_arr = np.random.randint(0, 10, size=(2, 5)) # 2x5 matrix with integers between 0
and 10
print(random_int_arr)

5. An Array of Your Choice

You can create an array based on a specific sequence or data you want. Here's an example of a
custom array:

python
Copy code
# An array of your choice
custom_arr = np.array([10, 20, 30, 40])
print(custom_arr)

6. Identity Matrix in NumPy

To create an identity matrix (diagonal of ones, rest are zeros):

python
Copy code
# Identity matrix
identity_matrix = np.eye(4) # 4x4 identity matrix
print(identity_matrix)

7. Evenly Spaced ndarray

To create an array with evenly spaced values using np.linspace:

python
Copy code
# Evenly spaced ndarray
evenly_spaced_arr = np.linspace(0, 10, 5) # 5 values from 0 to 10 (inclusive)
print(evenly_spaced_arr)

You can also use np.arange to create an array with a specified step:

python
Copy code
# Evenly spaced ndarray using np.arange
even_arr = np.arange(0, 10, 2)
# Values from 0 to 10 with step size of 2
print(even_arr)

Each of these snippets demonstrates how to create different types of arrays in NumPy. You can
adjust the dimensions and values based on your specific requirements.

You said:
2. The Shape and Reshaping of NumPy Array a. Dimensions of NumPy array b. Shape of
NumPy array c. Size of NumPy array d. Reshaping a NumPy array e. Flattening a NumPy array
Transpose of a NumPy array

2. The Shape and Reshaping of NumPy Array

a. Dimensions of NumPy Array

The number of dimensions (axes) of a NumPy array can be obtained using the .ndim attribute.

python
Copy code
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

# Get the number of dimensions
print("Dimensions of the array:", arr.ndim)

b. Shape of NumPy Array

The shape of a NumPy array refers to the dimensions of the array (rows, columns, etc.). It can be
accessed using the .shape attribute.

python
Copy code
# Get the shape of the array
print("Shape of the array:", arr.shape)

For the above example, it will output (2, 3), indicating that the array has 2 rows and 3 columns.
c. Size of NumPy Array

The size of a NumPy array refers to the total number of elements in the array, which can be
obtained using the .size attribute.

python
Copy code
# Get the size of the array
print("Size of the array:", arr.size)

This will give you the total number of elements in the array. For example, in a 2x3 array, the size
will be 6.

d. Reshaping a NumPy Array

You can reshape an array using the .reshape() method. This changes the shape of the array
without changing its data.

python
Copy code
# Reshape the array
reshaped_arr = arr.reshape(3, 2) # Reshaping to a 3x2 array
print("Reshaped array:")
print(reshaped_arr)

Note: The total number of elements must stay the same when reshaping. For example, if the
original array has 6 elements, the reshaped array must also have 6 elements (e.g., 2x3, 3x2).

e. Flattening a NumPy Array

Flattening refers to converting a multidimensional array into a one-dimensional array. This can
be done using .flatten() or .ravel().

python
Copy code
# Flatten the array
flattened_arr = arr.flatten()
print("Flattened array:", flattened_arr)

Alternatively, .ravel() also flattens the array, but it returns a flattened view of the array whenever
possible.

python
Copy code
# Flatten the array using ravel
raveled_arr = arr.ravel()
print("Raveled array:", raveled_arr)
f. Transpose of a NumPy Array

The transpose of an array is obtained by swapping rows and columns. This can be done using .T.

python
Copy code
# Transpose of the array
transposed_arr = arr.T
print("Transposed array:")
print(transposed_arr)

For a 2x3 array, the transpose will result in a 3x2 array.

Example Walkthrough:

python
Copy code
import numpy as np

# Creating a 2x3 array

arr = np.array([[1, 2, 3], [4, 5, 6]])

# a. Dimensions of NumPy array

print("Dimensions of the array:", arr.ndim)

# b. Shape of NumPy array

print("Shape of the array:", arr.shape)

# c. Size of NumPy array

print("Size of the array:", arr.size)

# d. Reshaping the array to 3x2

reshaped_arr = arr.reshape(3, 2)
print("Reshaped array:\n", reshaped_arr)

# e. Flattening the array

flattened_arr = arr.flatten()
print("Flattened array:", flattened_arr)

# f. Transposing the array

transposed_arr = arr.T
print("Transposed array:\n", transposed_arr)

Output:

lua
Copy code
Dimensions of the array: 2
Shape of the array: (2, 3)
Size of the array: 6
Reshaped array:
[[1 2]
[3 4]
[5 6]]
Flattened array: [1 2 3 4 5 6]
Transposed array:
[[1 4]
[2 5]
[3 6]]

Key Points:

 arr.ndim: Gives the number of dimensions of the array.

 arr.shape: Returns the shape (dimensions) of the array (rows, columns, etc.).
 arr.size: Returns the total number of elements in the array.
 arr.reshape(): Changes the shape of the array.
 arr.flatten(): Converts the array into a one-dimensional array.
 arr.T: Returns the transpose of the array, switching rows and columns.

3. Expanding and Squeezing a NumPy Array a. Expanding a NumPy array b. Squeezing a

NumPy array c. Sorting in NumPy Arrays

a. Expanding a NumPy array

Expanding a NumPy array refers to increasing its dimensions by adding new axes. The function
np.expand_dims() is commonly used for this. It adds a new axis at a specified position.

Example:
python
Copy code
import numpy as np

# Creating a 1D array
arr = np.array([1, 2, 3])

# Expanding the array by adding a new axis at position 0

expanded_arr = np.expand_dims(arr, axis=0)
print("Expanded array along axis 0:")
print(expanded_arr)
print("Shape of expanded array:", expanded_arr.shape)

# Expanding the array by adding a new axis at position 1

expanded_arr = np.expand_dims(arr, axis=1)
print("Expanded array along axis 1:")
print(expanded_arr)
print("Shape of expanded array:", expanded_arr.shape)
Output:
lua
Copy code
Expanded array along axis 0:
[[1 2 3]]
Shape of expanded array: (1, 3)

Expanded array along axis 1:

[[1]
[2]
[3]]
Shape of expanded array: (3, 1)

b. Squeezing a NumPy array

Squeezing a NumPy array removes dimensions of size 1 from the shape of an array. This is done
using the np.squeeze() function.

Example:
python
Copy code
import numpy as np

# Creating a 3D array with a single dimension of size 1

arr = np.array([[[1], [2], [3]]])

print("Original array:")
print(arr)
print("Shape of original array:", arr.shape)

# Squeezing the array to remove the single-dimensional axes

squeezed_arr = np.squeeze(arr)
print("Squeezed array:")
print(squeezed_arr)
print("Shape of squeezed array:", squeezed_arr.shape)

Output:
lua
Copy code
Original array:
[[[1]
[2]
[3]]]
Shape of original array: (1, 3, 1)

Squeezed array:
[1 2 3]
Shape of squeezed array: (3,)

c. Sorting in NumPy Arrays

NumPy provides functions to sort arrays in different ways. The np.sort() function returns a sorted
copy of the array, while np.argsort() returns the indices that would sort the array. Additionally,
np.sort() can sort the array along a specified axis.

Example of sorting an array:

python
Copy code
import numpy as np

# Creating an unsorted array

arr = np.array([3, 1, 2, 5, 4])

# Sorting the array in ascending order

sorted_arr = np.sort(arr)
print("Sorted array:", sorted_arr)

# Sorting the array in descending order by reversing the sorted result

sorted_desc_arr = np.sort(arr)[::-1]
print("Sorted array in descending order:", sorted_desc_arr)

# Sorting along an axis (2D array example)

arr_2d = np.array([[3, 1, 2], [6, 5, 4]])
sorted_2d_arr = np.sort(arr_2d, axis=1) # Sort along each row
print("Sorted 2D array along axis 1:")
print(sorted_2d_arr)

# Using np.argsort() to get the indices that would sort the array
sorted_indices = np.argsort(arr)
print("Indices that would sort the array:", sorted_indices)

Output:
lua
Copy code
Sorted array: [1 2 3 4 5]
Sorted array in descending order: [5 4 3 2 1]

Sorted 2D array along axis 1:

[[1 2 3]
[4 5 6]]

Indices that would sort the array: [1 2 0 4 3]

4. Indexing and Slicing of NumPy Array a. Slicing 1-D NumPy arrays b. Slicing 2-D
NumPy arrays c. Slicing 3-D NumPy arrays d. Negative slicing of NumPy arrays

a. Slicing 1-D NumPy arrays

Slicing in a 1D array allows you to extract a portion of the array using a start, stop, and step
value.
Syntax for 1D array slicing:
python
Copy code
arr[start:stop:step]

 start: The starting index (inclusive).

 stop: The stopping index (exclusive).
 step: The step size (default is 1).

Example:
python
Copy code
import numpy as np

# Creating a 1D array
arr = np.array([10, 20, 30, 40, 50, 60, 70])

# Slicing the array

print("Array sliced from index 2 to 5:", arr[2:5]) # [30, 40, 50]
print("Array sliced with step 2:", arr[::2]) # [10, 30, 50, 70]
print("Array sliced with step -1 (reversed):", arr[::-1]) # [70, 60, 50, 40, 30, 20, 10]

Output:
javascript
Copy code
Array sliced from index 2 to 5: [30 40 50]
Array sliced with step 2: [10 30 50 70]
Array sliced with step -1 (reversed): [70 60 50 40 30 20 10]

b. Slicing 2-D NumPy arrays

Slicing a 2D array allows you to select subarrays along both axes (rows and columns).

Syntax for 2D array slicing:

python
Copy code
arr[start_row:end_row, start_col:end_col]

 start_row:end_row: Row slicing.

 start_col:end_col: Column slicing.

Example:
python
Copy code
import numpy as np

# Creating a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
# Slicing the 2D array
print("Sliced 2D array (rows 1 to 3 and columns 1 to 2):")
print(arr_2d[1:3, 1:3])

print("Sliced 2D array (all rows, columns 1 to 2):")

print(arr_2d[:, 1:3])

print("Sliced 2D array (row 2, all columns):")

print(arr_2d[2, :])

Output:
lua
Copy code
Sliced 2D array (rows 1 to 3 and columns 1 to 2):
[[5 6]
[8 9]]

Sliced 2D array (all rows, columns 1 to 2):

[[ 2 3]
[ 5 6]
[ 8 9]
[11 12]]

Sliced 2D array (row 2, all columns):

[7 8 9]

c. Slicing 3-D NumPy arrays

For 3D arrays, you can slice across three dimensions: depth (axis 0), rows (axis 1), and columns
(axis 2).

Syntax for 3D array slicing:

python
Copy code
arr[start_depth:end_depth, start_row:end_row, start_col:end_col]

Example:
python
Copy code
import numpy as np

# Creating a 3D array
arr_3d = np.array([[[1, 2], [3, 4]],
[[5, 6], [7, 8]],
[[9, 10], [11, 12]]])

# Slicing the 3D array

print("Sliced 3D array (depth 0 to 2, row 0, all columns):")
print(arr_3d[0:2, 0, :])
print("Sliced 3D array (depth 1, all rows and columns):")
print(arr_3d[1, :, :])

print("Sliced 3D array (depth 0 to 2, all rows, columns 0 to 1):")

print(arr_3d[0:2, :, 0:1])

Output:
lua
Copy code
Sliced 3D array (depth 0 to 2, row 0, all columns):
[[1 2]
[5 6]]

Sliced 3D array (depth 1, all rows and columns):

[[ 5 6]
[ 7 8]]

Sliced 3D array (depth 0 to 2, all rows, columns 0 to 1):

[[[ 1]
[ 3]]

[[ 5]
[ 7]]]

d. Negative slicing of NumPy arrays

Negative slicing allows you to slice an array starting from the end rather than the beginning.
Negative indexing is useful when you want to select elements from the end without knowing the
exact size of the array.

Example:
import numpy as np

# Creating a 1D array
arr = np.array([10, 20, 30, 40, 50, 60, 70])

# Negative indexing and slicing

print("Last 3 elements using negative indexing:", arr[-3:])
print("All elements except the last 2:", arr[:-2])
print("Reverse the array using negative slicing:", arr[::-1])

Output:
Last 3 elements using negative indexing: [50 60 70]
All elements except the last 2: [10 20 30 40 50]
Reverse the array using negative slicing: [70 60 50 40 30 20 10]
5. Stacking and Concatenating Numpy Arrays a. Stacking ndarrays b. Concatenating
ndarrays c. Broadcasting in Numpy Arrays

a. Stacking NumPy arrays

Stacking refers to combining multiple arrays along a new axis. NumPy provides several
functions for stacking:

 np.stack(): Stacks arrays along a new axis.

 np.hstack(): Stacks arrays horizontally (along axis 1).
 np.vstack(): Stacks arrays vertically (along axis 0).
 np.dstack(): Stacks arrays depth-wise (along axis 2).

Example:
python
Copy code
import numpy as np

# Creating two 1D arrays

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Stacking arrays along a new axis (default axis=0)

stacked = np.stack((arr1, arr2))
print("Stacked along axis 0:\n", stacked)

# Stacking arrays horizontally (along axis=1)

hstacked = np.hstack((arr1, arr2))
print("Horizontally stacked:\n", hstacked)

# Stacking arrays vertically (along axis=0)

vstacked = np.vstack((arr1, arr2))
print("Vertically stacked:\n", vstacked)

# Stacking arrays depth-wise (along axis=2)

arr1_2d = np.array([[1, 2], [3, 4]])
arr2_2d = np.array([[5, 6], [7, 8]])
dstacked = np.dstack((arr1_2d, arr2_2d))
print("Depth-wise stacked:\n", dstacked)

Output:
lua
Copy code
Stacked along axis 0:
[[1 2 3]
[4 5 6]]

Horizontally stacked:
[1 2 3 4 5 6]

Vertically stacked:
[[1 2 3]
[4 5 6]]

Depth-wise stacked:
[[[1 5]
[2 6]]

[[3 7]
[4 8]]]

b. Concatenating NumPy arrays

Concatenation refers to joining two or more arrays along a specified axis. NumPy provides
np.concatenate() for this operation. You can concatenate arrays along any axis, not just 0 or 1.

Example:
python
Copy code
import numpy as np

# Creating two 1D arrays

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Concatenating arrays along axis 0 (default)

concatenated = np.concatenate((arr1, arr2))
print("Concatenated along axis 0:", concatenated)

# Concatenating 2D arrays along axis 1

arr1_2d = np.array([[1, 2], [3, 4]])
arr2_2d = np.array([[5, 6], [7, 8]])
concatenated_2d = np.concatenate((arr1_2d, arr2_2d), axis=1)
print("Concatenated 2D array along axis 1:\n", concatenated_2d)

# Concatenating along axis 0 (vertically)

concatenated_2d_v = np.concatenate((arr1_2d, arr2_2d), axis=0)
print("Concatenated 2D array along axis 0:\n", concatenated_2d_v)

Output:
lua
Copy code
Concatenated along axis 0: [1 2 3 4 5 6]

Concatenated 2D array along axis 1:

[[1 2 5 6]
[3 4 7 8]]

Concatenated 2D array along axis 0:

[[1 2]
[3 4]
[5 6]
[7 8]]

c. Broadcasting in NumPy Arrays

Broadcasting allows NumPy to perform element-wise operations on arrays of different shapes
by automatically expanding the smaller array to match the shape of the larger one. This
eliminates the need for explicit replication of arrays, making operations more efficient.

Broadcasting follows a set of rules to determine whether two arrays can be broadcast together.
The key rules are:

1. If the arrays have different numbers of dimensions, the shape of the smaller array is
padded with 1s on the left side until they have the same number of dimensions.
2. If the dimensions of the arrays do not match, broadcasting is possible only if one of the
arrays has a dimension of size 1 in that position.

Example of Broadcasting:
python
Copy code
import numpy as np

# Array A: 3x3 matrix

A = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])

# Array B: 1D array (broadcasted across columns of A)

B = np.array([1, 2, 3])

# Broadcasting B to the shape of A (adding B to each row of A)

result = A + B
print("Result of broadcasting and adding arrays A and B:\n", result)

# Array C: A scalar (broadcasted across all elements of A)

C = 10
result_scalar = A + C
print("Result of broadcasting a scalar to A:\n", result_scalar)

Output:
less
Copy code
Result of broadcasting and adding arrays A and B:
[[ 2 4 6]
[ 6 7 9]
[10 11 12]]

Result of broadcasting a scalar to A:

[[11 12 13]
[14 15 16]
[17 18 19]]

6. Perform following operations using pandas a. Creating dataframe b. concat() c. Setting

conditions d. Adding a new column
a. Creating a DataFrame in Pandas

A DataFrame is the primary data structure in Pandas, and you can create it from various data
sources like dictionaries, lists, or NumPy arrays.

Example:
python
Copy code
import pandas as pd

# Creating a DataFrame from a dictionary

data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}

df = pd.DataFrame(data)
print("DataFrame created from dictionary:")
print(df)

Output:
sql
Copy code
DataFrame created from dictionary:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 Houston

b. Concatenating DataFrames (concat())

Pandas' concat() function is used to concatenate DataFrames either vertically (row-wise) or

horizontally (column-wise).

Example (Concatenating DataFrames vertically):

python
Copy code
import pandas as pd

# Creating two DataFrames

df1 = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Age': [25, 30]
})
df2 = pd.DataFrame({
'Name': ['Charlie', 'David'],
'Age': [35, 40]
})
# Concatenating DataFrames vertically (along rows)
df_concat = pd.concat([df1, df2], ignore_index=True)
print("Concatenated DataFrame (vertically):")
print(df_concat)

Example (Concatenating DataFrames horizontally):

python
Copy code
# Creating two DataFrames
df1 = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Age': [25, 30]
})
df2 = pd.DataFrame({
'City': ['New York', 'Los Angeles']
})

# Concatenating DataFrames horizontally (along columns)

df_concat_h = pd.concat([df1, df2], axis=1)
print("Concatenated DataFrame (horizontally):")
print(df_concat_h)

Output (Vertical Concatenation):

java
Copy code
Concatenated DataFrame (vertically):
Name Age
0 Alice 25
1 Bob 30
2 Charlie 35
3 David 40

Output (Horizontal Concatenation):

java
Copy code
Concatenated DataFrame (horizontally):
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles

c. Setting Conditions in Pandas

You can apply conditions to filter data within a DataFrame. This allows you to select rows that
meet specific criteria.

Example (Filtering based on conditions):

python
Copy code
import pandas as pd
# Creating a DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
})

# Setting a condition to filter rows where Age is greater than 30

filtered_df = df[df['Age'] > 30]
print("Filtered DataFrame (Age > 30):")
print(filtered_df)

# Using multiple conditions (Age > 30 and City is 'Chicago')

filtered_multiple_conditions = df[(df['Age'] > 30) & (df['City'] == 'Chicago')]
print("Filtered DataFrame (Age > 30 and City is 'Chicago'):")
print(filtered_multiple_conditions)

Output:
java
Copy code
Filtered DataFrame (Age > 30):
Name Age City
2 Charlie 35 Chicago
3 David 40 Houston

Filtered DataFrame (Age > 30 and City is 'Chicago'):

Name Age City
2 Charlie 35 Chicago

d. Adding a New Column to a DataFrame

You can add new columns to an existing DataFrame by simply assigning values to a new column
name.

Example (Adding a new column):

python
Copy code
import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
})

# Adding a new column 'Salary'

df['Salary'] = [50000, 60000, 70000, 80000]
print("DataFrame with a new column 'Salary':")
print(df)

Output:
sql
Copy code
DataFrame with a new column 'Salary':
Name Age City Salary
0 Alice 25 New York 50000
1 Bob 30 Los Angeles 60000
2 Charlie 35 Chicago 70000
3 David 40 Houston 80000

7. Perform following operations using pandas a. Filling NaN with string b. Sorting based on
column values c. groupby()

a. Filling NaN with a String

In Pandas, you can fill NaN values with a specific value using the fillna() method. If you want to
replace NaN with a string (or any other value), you can do so easily.

Example:
python
Copy code
import pandas as pd
import numpy as np

# Creating a DataFrame with NaN values

df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, np.nan, 35, np.nan],
'City': ['New York', 'Los Angeles', 'Chicago', np.nan]
})

# Filling NaN values with a specific string

df_filled = df.fillna('Unknown')
print("DataFrame after filling NaN with 'Unknown':")
print(df_filled)

Output:
sql
Copy code
DataFrame after filling NaN with 'Unknown':
Name Age City
0 Alice 25 New York
1 Bob Unknown Los Angeles
2 Charlie 35 Chicago
3 David Unknown Unknown

b. Sorting Based on Column Values

You can sort a DataFrame by one or more columns using the sort_values() method. You can sort
in ascending or descending order based on a column's values.

Example:
python
Copy code
import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
})

# Sorting the DataFrame by 'Age' in ascending order

sorted_df = df.sort_values(by='Age', ascending=True)
print("DataFrame sorted by 'Age' in ascending order:")
print(sorted_df)

# Sorting the DataFrame by 'Age' in descending order

sorted_df_desc = df.sort_values(by='Age', ascending=False)
print("\nDataFrame sorted by 'Age' in descending order:")
print(sorted_df_desc)

Output:
csharp
Copy code
DataFrame sorted by 'Age' in ascending order:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
3 David 40 Houston

DataFrame sorted by 'Age' in descending order:

Name Age City
3 David 40 Houston
2 Charlie 35 Chicago
1 Bob 30 Los Angeles
0 Alice 25 New York

c. groupby() in Pandas

The groupby() function in Pandas is used to group data based on one or more columns and then
apply an aggregate function to the grouped data. Common operations include summing,
averaging, or counting values in each group.

Example:
python
Copy code
import pandas as pd

# Creating a DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'Age': [25, 30, 35, 40, 30],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Los Angeles']
})

# Grouping the DataFrame by 'City' and calculating the average age for each city
grouped_df = df.groupby('City')['Age'].mean()
print("Average age for each city:")
print(grouped_df)

# Grouping by multiple columns and counting the number of occurrences

grouped_count = df.groupby(['City', 'Age']).size()
print("\nCount of occurrences of each combination of City and Age:")
print(grouped_count)

Output:
vbnet
Copy code
Average age for each city:
City
Chicago 35.0
Houston 40.0
Los Angeles 30.0
New York 25.0
Name: Age, dtype: float64

Count of occurrences of each combination of City and Age:

City Age
Chicago 35 1
Houston 40 1
Los Angeles 30 2
New York 25 1
dtype: int64

8. Read the following file formats using pandas

a. Text files b. CSV files c. Excel files d. JSON files

a. Reading Text Files

You can read a text file into a DataFrame using pd.read_csv(). Text files can have custom
delimiters (spaces, tabs, or others). If the text file is space-delimited, use the delim_whitespace
parameter.

Example:
python
Copy code
import pandas as pd

# Reading a text file (space-delimited or tab-delimited)

df_text = pd.read_csv('file.txt', delim_whitespace=True) # Use 'sep="\t"' for tab-delimited
print(df_text)

 Parameters:
o delim_whitespace=True: This allows Pandas to treat any whitespace as a
delimiter.
o You can also use sep=' ', sep='\t', or any other custom delimiter for more control.

b. Reading CSV Files

CSV (Comma Separated Values) is the most common tabular data format. Pandas provides
pd.read_csv() to read CSV files.

Example:
python
Copy code
import pandas as pd

# Reading a CSV file

df_csv = pd.read_csv('file.csv')
print(df_csv)

 Parameters:
o sep=',': Specifies the delimiter (default is comma).
o header=0: Row number to use as column names (default is 0).
o index_col: To specify which column to use as the index.

Example with custom delimiter (semicolon):

python
Copy code
df_csv = pd.read_csv('file.csv', sep=';') # For semicolon-separated values
print(df_csv)

c. Reading Excel Files

Pandas can read .xls and .xlsx files using the pd.read_excel() function. You’ll need the openpyxl
library for .xlsx files and xlrd for .xls.

Example:
python
Copy code
import pandas as pd
# Reading an Excel file (default is sheet_name=0 for the first sheet)
df_excel = pd.read_excel('file.xlsx', sheet_name='Sheet1')
print(df_excel)

 Parameters:
o sheet_name: Specifies the sheet to read by name or index. If sheet_name=None, it
reads all sheets.
o header: Defines the row(s) to use as column names.
o usecols: To select specific columns.

Example (reading all sheets):

python
Copy code
df_excel_all = pd.read_excel('file.xlsx', sheet_name=None) # Read all sheets
print(df_excel_all)

d. Reading JSON Files

JSON (JavaScript Object Notation) is commonly used for hierarchical data. Pandas can easily
read JSON files into DataFrames using pd.read_json().

Example:
python
Copy code
import pandas as pd

# Reading a JSON file

df_json = pd.read_json('file.json')
print(df_json)

 Parameters:
o orient: You can specify how to read the JSON file. For instance, 'records' means
each line is a dictionary (the default).
o lines=True: If the file contains one JSON object per line, use this option.

Example with nested JSON structure:

python
Copy code
# Read a nested JSON file
df_json_nested = pd.read_json('file.json', orient='records', lines=True)
print(df_json_nested)

Summary of Code Examples:

1. Text Files (space or tab-delimited):

python
Copy code
df_text = pd.read_csv('file.txt', delim_whitespace=True)

2. CSV Files:

python
Copy code
df_csv = pd.read_csv('file.csv') # For comma-separated
df_csv = pd.read_csv('file.csv', sep=';') # For semicolon-separated

3. Excel Files:

python
Copy code
df_excel = pd.read_excel('file.xlsx', sheet_name='Sheet1')
df_excel_all = pd.read_excel('file.xlsx', sheet_name=None) # Read all sheets

4. JSON Files:

python
Copy code
df_json = pd.read_json('file.json')
df_json_nested = pd.read_json('file.json', orient='records', lines=True)

9. Read the following file formats a. Pickle files b. Image files using PIL c. Multiple files
using Glob d. Importing data from database

a. Reading Pickle Files

Pickle files are used to serialize and deserialize Python objects, making them convenient for
saving and loading complex data structures. You can read Pickle files using the pickle module or
pandas (for DataFrames).

Example using pickle:

python
Copy code
import pickle

# Load a Pickle file

with open('file.pkl', 'rb') as f:
data = pickle.load(f)

print(data)

 'rb': Opens the file in binary mode for reading.

 Use pickle.load() to deserialize the object stored in the file.
Example using pandas (for reading DataFrames stored as Pickle files):
python
Copy code
import pandas as pd

# Load a Pickle file (DataFrame)

df = pd.read_pickle('file.pkl')
print(df)

 pd.read_pickle() directly loads Pickle files that store Pandas DataFrames.

b. Reading Image Files using PIL (Pillow)

The Pillow library (a fork of PIL, the Python Imaging Library) allows you to open, manipulate,
and save various image formats like PNG, JPEG, GIF, etc.

Example:
python
Copy code
from PIL import Image

# Open an image file

image = Image.open('file.jpg')

# Show the image

image.show()

# Optionally, you can convert the image to grayscale or perform other manipulations
image_gray = image.convert('L')
image_gray.show()

 Image.open(): Opens the image file.

 image.show(): Displays the image.
 convert('L'): Converts the image to grayscale.

You can also save or manipulate images further using Pillow's various methods.

c. Reading Multiple Files using glob

The glob module allows you to find all pathnames matching a specified pattern. It is useful for
reading multiple files from a directory, such as all .txt files or .csv files.

Example:
python
Copy code
import glob

# Get a list of all text files in the directory

files = glob.glob('path/to/folder/*.txt')

# Loop through all files and process them

for file in files:
with open(file, 'r') as f:
content = f.read()
print(content)

 glob.glob('pattern'): Finds all files matching the pattern (e.g., *.txt for all text files).
 You can then read and process each file as needed.

Example for reading multiple CSV files:

python
Copy code
import pandas as pd
import glob

# Get a list of all CSV files in the folder

csv_files = glob.glob('path/to/folder/*.csv')

# Loop through all CSV files and read them into a DataFrame
for csv_file in csv_files:
df = pd.read_csv(csv_file)
print(df)

d. Importing Data from a Database

To import data from a database like SQLite, MySQL, PostgreSQL, etc., you can use the
pandas.read_sql() function. You'll need a database connection, and each type of database
requires a different connection method.

Example for SQLite:

python
Copy code
import sqlite3
import pandas as pd

# Create a connection to the SQLite database

conn = sqlite3.connect('database.db')

# Query data from a table

df = pd.read_sql('SELECT * FROM table_name', conn)

# Display the DataFrame

print(df)
# Close the connection
conn.close()

Example for MySQL using SQLAlchemy:

python
Copy code
from sqlalchemy import create_engine
import pandas as pd

# Create a connection to the MySQL database

engine = create_engine('mysql+pymysql://user:password@host/database')

# Query data from a table

df = pd.read_sql('SELECT * FROM table_name', engine)

# Display the DataFrame

print(df)

 SQLite: Use sqlite3.connect() to create a connection.

 MySQL/PostgreSQL: Use SQLAlchemy or a database-specific connector like pymysql
or psycopg2.

Summary of Code Examples:

1. Reading Pickle Files:

o Using pickle:

python
Copy code
import pickle
with open('file.pkl', 'rb') as f:
data = pickle.load(f)

o Using pandas (for DataFrames):

python
Copy code
import pandas as pd
df = pd.read_pickle('file.pkl')

2. Reading Image Files using PIL:

python
Copy code
from PIL import Image
image = Image.open('file.jpg')
image.show()
3. Reading Multiple Files using glob:

python
Copy code
import glob
files = glob.glob('path/to/folder/*.txt')
for file in files:
with open(file, 'r') as f:
content = f.read()
print(content)

4. Importing Data from Database:

o SQLite:

python
Copy code
import sqlite3
conn = sqlite3.connect('database.db')
df = pd.read_sql('SELECT * FROM table_name', conn)

o MySQL using SQLAlchemy:

python
Copy code
from sqlalchemy import create_engine
engine = create_engine('mysql+pymysql://user:password@host/database')
df = pd.read_sql('SELECT * FROM table_name', engine)

10. Demonstrate web scraping using python

import requests
from bs4 import BeautifulSoup

# URL of the website to scrape

url = 'https://example.com'

# Send HTTP GET request

response = requests.get(url)

# Check if the request was successful (status code 200)

if response.status_code == 200:
print("Successfully fetched the webpage!")

# Parse the HTML content

soup = BeautifulSoup(response.content, 'html.parser')

# Find all <h2> tags

article_titles = soup.find_all('h2')

# Loop through each article title and print the title and the link
for title in article_titles:
link = title.find('a') # Get the link inside the <h2> tag
if link:
title_text = title.get_text()
article_link = link.get('href')
print(f"Title: {title_text}")
print(f"Link: {article_link}")
print("-" * 40)
else:
print("Failed to fetch the webpage. Status code:", response.status_code)
11. Perform following preprocessing techniques on loan prediction dataseta. Feature Scaling
b. Feature Standardization c. Label Encoding d. One Hot Encoding.

1. Feature Scaling

Feature scaling ensures that features have similar ranges, which is important for algorithms that
rely on the distance between points (e.g., k-nearest neighbors, support vector machines). The
most common techniques for feature scaling are Min-Max Scaling and Standardization.

Example: Min-Max Scaling

Min-Max scaling scales the data to a specific range, often [0, 1].

python
Copy code
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Example dataset
data = {'Age': [25, 30, 35, 40, 45],
'Income': [40000, 50000, 60000, 70000, 80000],
'LoanAmount': [100000, 200000, 150000, 120000, 180000]}
df = pd.DataFrame(data)

# Initialize the scaler

scaler = MinMaxScaler()

# Apply Min-Max scaling on 'Income' and 'LoanAmount'

df[['Income', 'LoanAmount']] = scaler.fit_transform(df[['Income', 'LoanAmount']])

print(df)

Output:
plaintext
Copy code
Age Income LoanAmount
0 25 0.0 0.000000
1 30 0.25 0.500000
2 35 0.5 0.250000
3 40 0.75 0.000000
4 45 1.0 0.750000

2. Feature Standardization

Standardization scales the data to have a mean of 0 and a standard deviation of 1. This is useful
for algorithms like Logistic Regression, SVM, or Linear Regression that assume the data is
normally distributed.

Example: Standardization (Z-score normalization)

python
Copy code
from sklearn.preprocessing import StandardScaler

# Initialize the standard scaler

scaler = StandardScaler()

# Apply standardization on 'Income' and 'LoanAmount'

df[['Income', 'LoanAmount']] = scaler.fit_transform(df[['Income', 'LoanAmount']])

print(df)

Output:

plaintext
Copy code
Age Income LoanAmount
0 25 -1.414214 -1.297771
1 30 -0.707107 0.129777
2 35 0.000000 -0.587023
3 40 0.707107 -1.138628
4 45 1.414214 1.893645

3. Label Encoding

Label encoding is used when the target variable is categorical and has a natural order (like
"Low", "Medium", "High"). It encodes categories as integers.

Example: Label Encoding for a target variable Loan_Status

python
Copy code
from sklearn.preprocessing import LabelEncoder

# Example target column (Loan_Status)

loan_status = ['Approved', 'Denied', 'Approved', 'Denied', 'Approved']

# Initialize LabelEncoder
encoder = LabelEncoder()

# Apply label encoding

encoded_status = encoder.fit_transform(loan_status)

print(encoded_status) # Output: [1 0 1 0 1]

In this case:

 Approved -> 1
 Denied -> 0

4. One-Hot Encoding

One-hot encoding is used when categorical features have no ordinal relationship (like Gender,
MaritalStatus). It converts categorical variables into binary columns (1 or 0), one column for
each category.

Example: One-Hot Encoding for a feature Marital_Status

python
Copy code
# Example feature column (Marital_Status)
data = {'Marital_Status': ['Single', 'Married', 'Single', 'Married', 'Divorced']}
df = pd.DataFrame(data)

# Apply one-hot encoding

df_encoded = pd.get_dummies(df, columns=['Marital_Status'])

print(df_encoded)

Output:

plaintext
Copy code
Marital_Status_Divorced Marital_Status_Married Marital_Status_Single
0 0 0 1
1 0 1 0
2 0 0 1
3 0 1 0
4 1 0 0
The feature Marital_Status is converted into three binary columns (Marital_Status_Single,
Marital_Status_Married, Marital_Status_Divorced).

12. Perform following visualizations using matplotlib a. Bar Graph b. Pie Chart c. Box Plot
d. Histogram e. Line Chart and Subplots f. Scatter Plot

1. Bar Graph

A bar graph is useful to represent categorical data with rectangular bars where the length of the
bar represents the value.

python
Copy code
import matplotlib.pyplot as plt

# Sample data for the bar graph

categories = ['A', 'B', 'C', 'D']
values = [3, 7, 2, 5]

# Creating a bar graph

plt.bar(categories, values, color='skyblue')

# Adding title and labels

plt.title('Bar Graph Example')
plt.xlabel('Categories')
plt.ylabel('Values')

# Displaying the plot

plt.show()

2. Pie Chart

A pie chart is a circular statistical graphic that is divided into slices to illustrate numerical
proportions.

python
Copy code
# Data for the pie chart
labels = ['Apple', 'Banana', 'Cherry', 'Date']
sizes = [35, 25, 20, 20]
colors = ['#ff9999', '#66b3ff', '#99ff99', '#ffcc99']

# Creating the pie chart

plt.pie(sizes, labels=labels, colors=colors, autopct='%1.1f%%', startangle=90)

# Adding a title
plt.title('Fruit Distribution')
# Displaying the plot
plt.axis('equal') # Equal aspect ratio ensures that pie chart is drawn as a circle
plt.show()

3. Box Plot

A box plot (or box-and-whisker plot) is used to represent the distribution of numerical data based
on a five-number summary: minimum, first quartile, median, third quartile, and maximum.

python
Copy code
import numpy as np

# Random data for the box plot

data = np.random.randn(100)

# Creating a box plot

plt.boxplot(data)

# Adding title and labels

plt.title('Box Plot Example')
plt.ylabel('Values')

# Displaying the plot

plt.show()

4. Histogram

A histogram is used to represent the distribution of numerical data. It groups the data into bins
and counts the number of data points in each bin.

python
Copy code
# Random data for the histogram
data = np.random.randn(1000)

# Creating a histogram
plt.hist(data, bins=30, color='orange', edgecolor='black')

# Adding title and labels

plt.title('Histogram Example')
plt.xlabel('Value')
plt.ylabel('Frequency')

# Displaying the plot

plt.show()
5. Line Chart and Subplots

A line chart is useful for showing data trends over a continuous range (e.g., time series). Subplots
allow multiple plots to be displayed in a single figure.

python
Copy code
# Data for line chart
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)

# Creating subplots (2 rows, 1 column)

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(8, 6))

# Plotting the first line chart

ax1.plot(x, y1, label='sin(x)', color='blue')
ax1.set_title('Sine Wave')
ax1.set_xlabel('x')
ax1.set_ylabel('sin(x)')

# Plotting the second line chart

ax2.plot(x, y2, label='cos(x)', color='red')
ax2.set_title('Cosine Wave')
ax2.set_xlabel('x')
ax2.set_ylabel('cos(x)')

# Adding space between plots

plt.tight_layout()

# Displaying the plots

plt.show()

6. Scatter Plot

A scatter plot is used to represent the relationship between two continuous variables.

python
Copy code
# Data for scatter plot
x = np.random.rand(100)
y = np.random.rand(100)

# Creating a scatter plot

plt.scatter(x, y, color='purple')

# Adding title and labels

plt.title('Scatter Plot Example')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')

# Displaying the plot

plt.show()
13. Getting started with NLTK, install NLTK using PIP

Steps to Install NLTK using PIP:

1. Install NLTK: Open a terminal (or command prompt) and run the following command to
install NLTK via pip:

bash
Copy code
pip install nltk

2. Verify Installation: After installation, you can verify that NLTK has been successfully
installed by importing it in a Python script or in an interactive Python session.

python
Copy code
import nltk
print(nltk.__version__) # Print the NLTK version

3. Download NLTK Data: After installing NLTK, it is necessary to download various

datasets and models that NLTK uses (e.g., corpora, stopwords, punkt tokenizer). To
download the necessary data, use the following command:

python
Copy code
import nltk
nltk.download()

This will open a GUI window where you can select which datasets to download.
Alternatively, you can download specific resources like so:

python
Copy code
nltk.download('punkt') # For tokenization
nltk.download('stopwords') # For stop words

Example Usage:

Once installed, you can begin using NLTK for tasks like tokenization, stemming, or part-of-
speech tagging. Here's an example to tokenize text into words:

python
Copy code
import nltk
from nltk.tokenize import word_tokenize
# Sample text
text = "NLTK is a leading platform for building Python programs to work with human language
data."

# Tokenizing the text into words

tokens = word_tokenize(text)

print(tokens)

Output:

plaintext
Copy code
['NLTK', 'is', 'a', 'leading', 'platform', 'for', 'building', 'Python', 'programs', 'to', 'w
14. Python program to implement with Python Sci Kit-Learn & NLTK

Steps:

1. Install Required Libraries: First, make sure you have NLTK and Scikit-Learn
installed.

bash
Copy code
pip install nltk scikit-learn

2. Download Necessary NLTK Data: For this example, we'll need NLTK's stopwords and
punkt for tokenization.

python
Copy code
import nltk
nltk.download('stopwords')
nltk.download('punkt')

3. Text Classification Program: We'll use the 20 Newsgroups dataset from Scikit-learn
for classification. The task is to classify text documents into one of several predefined
categories.

Python Program: Text Classification with NLTK & Scikit-Learn

python
Copy code
import nltk
from sklearn.datasets import fetch_20newsgroups
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import string

# Download necessary NLTK data

nltk.download('stopwords')
nltk.download('punkt')

# Step 1: Fetch the 20 Newsgroups dataset

newsgroups = fetch_20newsgroups(subset='all')

# Step 2: Preprocess the text using NLTK (tokenization, stopword removal, punctuation
removal)
def preprocess_text(text):
# Tokenize the text
tokens = word_tokenize(text)

# Remove punctuation and convert tokens to lowercase

tokens = [word.lower() for word in tokens if word.isalpha()]

# Remove stopwords
stop_words = set(stopwords.words('english'))
tokens = [word for word in tokens if word not in stop_words]

return " ".join(tokens)

# Preprocess all the documents

processed_docs = [preprocess_text(text) for text in newsgroups.data]

# Step 3: Convert the text data into numerical features using CountVectorizer
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(processed_docs)

# Step 4: Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, newsgroups.target, test_size=0.3,
random_state=42)

# Step 5: Train a Naive Bayes classifier

clf = MultinomialNB()
clf.fit(X_train, y_train)

# Step 6: Make predictions on the test set

y_pred = clf.predict(X_test)

# Step 7: Evaluate the model

print("Classification Report:")
print(classification_report(y_test, y_pred, target_names=newsgroups.target_names))

Explanation of the Code:

1. Dataset:
o We use Scikit-learn's fetch_20newsgroups to load a dataset of newsgroup
documents categorized into 20 topics.
2. Text Preprocessing:
o We preprocess the text data using NLTK for:
 Tokenization: Using nltk.word_tokenize.
 Removing Punctuation: We filter out any tokens that are not alphabetic.
 Removing Stopwords: Using the stopwords corpus from NLTK.
3. Vectorization:
o Scikit-learn's CountVectorizer is used to convert the processed text documents
into a matrix of token counts. This transforms the text data into a format that can
be used for machine learning.
4. Model Training:
o We use Multinomial Naive Bayes (MultinomialNB), a classifier well-suited for
text classification tasks.
5. Evaluation:
o We evaluate the model's performance using classification report, which shows
precision, recall, and F1-score for each category.

Sample Output:

The output will be a classification report that provides performance metrics for each category:

plaintext
Copy code
Classification Report:
precision recall f1-score support

alt.atheism 0.85 0.79 0.82 319

comp.graphics 0.86 0.86 0.86 389
comp.os.ms-windows.misc 0.70 0.77 0.73 394
...

15. Python program to implement with Python NLTK/Spicy/Py NLPI.

o implement a Python program with NLTK, spaCy, and PyNLPI (which is a library for natural
language processing in Python), we will cover the following aspects:

1. NLTK: Used for text preprocessing, tokenization, and part-of-speech tagging.

2. spaCy: A more advanced NLP library, suitable for named entity recognition (NER),
dependency parsing, and other sophisticated NLP tasks.
3. PyNLPI: A package for natural language processing (though less commonly used
compared to NLTK and spaCy).

We'll combine all three libraries in a single Python program for some common NLP tasks, such
as text preprocessing, named entity recognition, and tokenization.

Steps to Install Libraries:

Before you run the code, make sure to install the necessary libraries using pip:
bash
Copy code
pip install nltk spacy pynlpi

Also, for spaCy, download a pre-trained language model (e.g., en_core_web_sm for English):

bash
Copy code
python -m spacy download en_core_web_sm

Python Program Using NLTK, spaCy, and PyNLPI:

python
Copy code
import nltk
import spacy
from pynlpi import Tokenizer
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from spacy import displacy

# Download NLTK data

nltk.download('punkt')
nltk.download('stopwords')

# Load spaCy model

nlp = spacy.load("en_core_web_sm")

# Sample text for processing

text = "Apple is looking at buying U.K. startup for $1 billion. Steve Jobs co-founded Apple."

# 1. NLTK: Tokenization and stopword removal

print("NLTK Processing:")

# Tokenize text using NLTK

nltk_tokens = word_tokenize(text)
print(f"Tokens using NLTK: {nltk_tokens}")

# Remove stopwords using NLTK

stop_words = set(stopwords.words('english'))
filtered_nltk_tokens = [word for word in nltk_tokens if word.lower() not in stop_words]
print(f"Filtered Tokens using NLTK: {filtered_nltk_tokens}")

# 2. spaCy: Named Entity Recognition (NER) and Dependency Parsing

print("\nspaCy Processing:")

# Process the text with spaCy

doc = nlp(text)

# Named Entity Recognition (NER)

print("Named Entities using spaCy:")
for ent in doc.ents:
print(f"{ent.text} - {ent.label_}")

# Dependency Parsing
print("\nDependency Parsing using spaCy:")
for token in doc:
print(f"{token.text} -> {token.dep_} -> {token.head.text}")

# Visualize Dependency Parsing

# displacy.serve(doc, style="dep") # Uncomment this line to visualize in a browser

# 3. PyNLPI: Tokenization
print("\nPyNLPI Processing:")

# Initialize the PyNLPI tokenizer

tokenizer = Tokenizer()

# Tokenize text using PyNLPI

pynlpi_tokens = tokenizer.tokenize(text)
print(f"Tokens using PyNLPI: {pynlpi_tokens}")

Breakdown of the Program:

1. NLTK:
o Tokenization: We use word_tokenize to break the text into individual words.
o Stopword Removal: We remove common English stopwords (like "the", "is",
"and") from the tokenized list using NLTK's stopwords corpus.
2. spaCy:
o Named Entity Recognition (NER): We extract named entities (e.g., "Apple",
"U.K.") from the text using spaCy's built-in ents attribute.
o Dependency Parsing: We analyze the syntactic structure of the sentence, printing
each word's syntactic role (e.g., subject, object).
o Optional Visualization: You can visualize the dependency parsing tree using
displacy.serve, which opens a visualization in the browser.
3. PyNLPI:
o Tokenization: We use PyNLPI's Tokenizer to split the text into tokens.

Expected Output:

plaintext
Copy code
NLTK Processing:
Tokens using NLTK: ['Apple', 'is', 'looking', 'at', 'buying', 'U.K.', 'startup', 'for', '$', '1', 'billion', '.',
'Steve', 'Jobs', 'co-founded', 'Apple', '.']
Filtered Tokens using NLTK: ['Apple', 'looking', 'buying', 'U.K.', 'startup', '$', '1', 'billion', '.',
'Steve', 'Jobs', 'co-founded', 'Apple', '.']

spaCy Processing:
Named Entities using spaCy:
Apple - ORG
U.K. - GPE
$1 billion - MONEY
Steve Jobs - PERSON
Apple - ORG

Dependency Parsing using spaCy:

Apple -> nsubj -> looking
is -> aux -> looking
looking -> ROOT -> looking
at -> prep -> looking
buying -> pcomp -> at
U.K. -> pobj -> at
startup -> attr -> buying
for -> prep -> buying
$ -> quantmod -> billion
1 -> compound -> billion
billion -> pobj -> for
. -> punct -> looking
Steve -> nsubj -> co-founded
Jobs -> appos -> Steve
co-founded -> ROOT -> co-founded
Apple -> dobj -> co-founded
. -> punct -> co-founded

PyNLPI Processing:
Tokens using PyNLPI: ['Apple', 'is', 'looking', 'at', 'buying', 'U.K.', 'startup', 'for', '$', '1', 'billion',
'.', 'Steve', 'Jobs', 'co-founded', 'Apple', '.']

Explanation of Output:

 NLTK: The tokens are extracted and stopwords are removed from the text.
 spaCy:
o Named Entity Recognition (NER) identifies entities like Apple, U.K., Steve
Jobs, and $1 billion.
o Dependency parsing shows the grammatical relationships between words in the
sentence.
 PyNLPI: The tokens extracted by PyNLPI are similar to the ones from NLTK.

Data Science Using Python Lab Manual
No ratings yet
Data Science Using Python Lab Manual
68 pages
ST7782 Datasheet
0% (1)
ST7782 Datasheet
88 pages
Unit 4 NAND-NOR-Clocked Flip Flops
100% (1)
Unit 4 NAND-NOR-Clocked Flip Flops
20 pages
Topic - 2 - The Basics of NumPy Arrays 1
100% (1)
Topic - 2 - The Basics of NumPy Arrays 1
10 pages
AdministeringSALGateway 4.0
No ratings yet
AdministeringSALGateway 4.0
219 pages
MCQs OF GIS
100% (1)
MCQs OF GIS
2 pages
Drilling Engineer Role
No ratings yet
Drilling Engineer Role
1 page
numpy_ppt
No ratings yet
numpy_ppt
73 pages
NumPy Library and Function
No ratings yet
NumPy Library and Function
129 pages
Description of Functions EDC 15C B271.V10 - 9
No ratings yet
Description of Functions EDC 15C B271.V10 - 9
36 pages
Exercise Workbook For Student 36: SAP B1 On Cloud - AIS
No ratings yet
Exercise Workbook For Student 36: SAP B1 On Cloud - AIS
40 pages
Unit 1
No ratings yet
Unit 1
170 pages
Reverse Engineering of Bicycle IJERTV10IS050350
No ratings yet
Reverse Engineering of Bicycle IJERTV10IS050350
9 pages
Chapter 9 Supply Chain Technology
No ratings yet
Chapter 9 Supply Chain Technology
41 pages
Cast@Ibm: Global Coe: Improved Quality and Faster Delivery With Accurate Measurement
No ratings yet
Cast@Ibm: Global Coe: Improved Quality and Faster Delivery With Accurate Measurement
23 pages
Arhitectura Ecall
No ratings yet
Arhitectura Ecall
9 pages
NSE5 FAZ-6.2 48Qs 9 - 22
100% (1)
NSE5 FAZ-6.2 48Qs 9 - 22
15 pages
Datasciencepythonlab
No ratings yet
Datasciencepythonlab
77 pages
AFIA_KSASpunpile
No ratings yet
AFIA_KSASpunpile
48 pages
Recovering Deleted Files With Scalpel PDF
No ratings yet
Recovering Deleted Files With Scalpel PDF
5 pages
FHSS Tutorial
No ratings yet
FHSS Tutorial
17 pages
Numpy
No ratings yet
Numpy
44 pages
Aocp PDF
No ratings yet
Aocp PDF
38 pages
CAP776 Numpy
No ratings yet
CAP776 Numpy
71 pages
Sampe Cs Theses
No ratings yet
Sampe Cs Theses
13 pages
Kuliah #7 Alprog - Numpy, Pandas, Matplotlib
No ratings yet
Kuliah #7 Alprog - Numpy, Pandas, Matplotlib
48 pages
UNIT-2 Arrays in Python
No ratings yet
UNIT-2 Arrays in Python
64 pages
BOBJ Tomcat CMC Server Details
No ratings yet
BOBJ Tomcat CMC Server Details
3 pages
Num Py
No ratings yet
Num Py
21 pages
Module Numpy
No ratings yet
Module Numpy
67 pages
KNN - Asg 1
No ratings yet
KNN - Asg 1
9 pages
Advance Python Program Unit II
No ratings yet
Advance Python Program Unit II
92 pages
Ids 6 Experiments
No ratings yet
Ids 6 Experiments
27 pages
15.NUMPY
No ratings yet
15.NUMPY
32 pages
3252-1,2,3
No ratings yet
3252-1,2,3
20 pages
Data Science Using Python Lab 2024-2025
No ratings yet
Data Science Using Python Lab 2024-2025
55 pages
UNIT 1
No ratings yet
UNIT 1
15 pages
Jovia Report
No ratings yet
Jovia Report
18 pages
NumPy
No ratings yet
NumPy
18 pages
How To Flash The BIOS
No ratings yet
How To Flash The BIOS
5 pages
Num Py
No ratings yet
Num Py
31 pages
Rohini 89424899844
No ratings yet
Rohini 89424899844
4 pages
Unit 4 Final
No ratings yet
Unit 4 Final
100 pages
Swarang Raut EDVA Experiment 1 Numpy Pandas
No ratings yet
Swarang Raut EDVA Experiment 1 Numpy Pandas
58 pages
Numpy Matplot
No ratings yet
Numpy Matplot
14 pages
DE LAB MANUAL NEW
No ratings yet
DE LAB MANUAL NEW
24 pages
APznzaaqszKXWidB7ZcUyElwKtMW9baPO5uwgBspe7mup3-RAjUbFs9a5J0SWJx5baBOtL8oMAExrcfE-xNmC3fbtEqgqkuUDV3hM3RFDNeuJc8K5DkloC95lixWjd8hSK4WWqCMirKOpcOSGSRNGGugDyjrAf-wzcSS5bC_l3kfkAro7lqM_CfNu8jP_XQRy6CFb
No ratings yet
APznzaaqszKXWidB7ZcUyElwKtMW9baPO5uwgBspe7mup3-RAjUbFs9a5J0SWJx5baBOtL8oMAExrcfE-xNmC3fbtEqgqkuUDV3hM3RFDNeuJc8K5DkloC95lixWjd8hSK4WWqCMirKOpcOSGSRNGGugDyjrAf-wzcSS5bC_l3kfkAro7lqM_CfNu8jP_XQRy6CFb
51 pages
Numpy Arrays
No ratings yet
Numpy Arrays
25 pages
Numpy
No ratings yet
Numpy
11 pages
10 Numpy
No ratings yet
10 Numpy
39 pages
numpy (1)
No ratings yet
numpy (1)
40 pages
NumPy
No ratings yet
NumPy
39 pages
Mds1111 Merged Numbered (1)
No ratings yet
Mds1111 Merged Numbered (1)
41 pages
Numpy
No ratings yet
Numpy
14 pages
numpy-jupyter-pdf
No ratings yet
numpy-jupyter-pdf
9 pages
Num Py
No ratings yet
Num Py
13 pages
Num Py
No ratings yet
Num Py
15 pages
Numerical Python Numpy
No ratings yet
Numerical Python Numpy
28 pages
Lecture+Notes Python+for+DS PDF
No ratings yet
Lecture+Notes Python+for+DS PDF
48 pages
p
No ratings yet
p
27 pages
Lab 1 - Introduction
No ratings yet
Lab 1 - Introduction
14 pages
numpy Part-1
No ratings yet
numpy Part-1
22 pages
Information Technology Study Plan
No ratings yet
Information Technology Study Plan
1 page
Tentative NumPy Tutorial
No ratings yet
Tentative NumPy Tutorial
30 pages
Ece-b Time Table
No ratings yet
Ece-b Time Table
2 pages
Numpy - Basics
No ratings yet
Numpy - Basics
18 pages
CSE - AI & DS R20_IV YEARS_Course Structure (1)
No ratings yet
CSE - AI & DS R20_IV YEARS_Course Structure (1)
8 pages
Int246 L1
No ratings yet
Int246 L1
25 pages
Numpy
No ratings yet
Numpy
27 pages
NUMPY _ PANDAS
No ratings yet
NUMPY _ PANDAS
26 pages
ml ds
No ratings yet
ml ds
2 pages
AIML LAB
No ratings yet
AIML LAB
6 pages
vertopal.com_C1_W1_Lab_1_introduction_to_numpy_arrays
No ratings yet
vertopal.com_C1_W1_Lab_1_introduction_to_numpy_arrays
12 pages
Numpy Tutorial
No ratings yet
Numpy Tutorial
19 pages
2-1 Time Tables Empty (2)
No ratings yet
2-1 Time Tables Empty (2)
1 page
Lets Begin With Numpy
No ratings yet
Lets Begin With Numpy
16 pages
Skittles Project3
No ratings yet
Skittles Project3
5 pages
Nism Hall Tic 28
No ratings yet
Nism Hall Tic 28
2 pages
Exp. 1 (2)
No ratings yet
Exp. 1 (2)
4 pages
AIML Exp 2
No ratings yet
AIML Exp 2
4 pages
AIML Exp 2
No ratings yet
AIML Exp 2
4 pages
Blogging MCQs
No ratings yet
Blogging MCQs
4 pages
Exp 12345
No ratings yet
Exp 12345
15 pages
Lab 02
No ratings yet
Lab 02
5 pages
Using_the_numpy.array_Function
No ratings yet
Using_the_numpy.array_Function
4 pages
Work Load I Sem(2025-26)
No ratings yet
Work Load I Sem(2025-26)
12 pages
exp 3& 4
No ratings yet
exp 3& 4
3 pages
On the Insert ta1
No ratings yet
On the Insert ta1
1 page
OOP THROUGH JAVA MID-2
No ratings yet
OOP THROUGH JAVA MID-2
1 page
Z1200E
No ratings yet
Z1200E
2 pages
Content
No ratings yet
Content
2 pages
NumPy class 11th
No ratings yet
NumPy class 11th
10 pages
NUMPY
No ratings yet
NUMPY
8 pages
Z Transform
No ratings yet
Z Transform
29 pages
Cp Lab Internal Sign Sheet 2024-25 Cse-b
No ratings yet
Cp Lab Internal Sign Sheet 2024-25 Cse-b
2 pages
Cp Lab Internal Sign Sheet 2024-25 Cse-A
No ratings yet
Cp Lab Internal Sign Sheet 2024-25 Cse-A
2 pages
DAA MID-2
No ratings yet
DAA MID-2
2 pages
Lab 1
No ratings yet
Lab 1
6 pages
Functions
No ratings yet
Functions
1 page
DAA - Quiz 2-2025 (Ans).Doc
No ratings yet
DAA - Quiz 2-2025 (Ans).Doc
1 page
FLAT R20
No ratings yet
FLAT R20
2 pages
B.TECH II-II DS
No ratings yet
B.TECH II-II DS
1 page
Daa Mid-1 III Ml&Ds 31-Jan-2025
No ratings yet
Daa Mid-1 III Ml&Ds 31-Jan-2025
1 page
DS Lab Internal q Paper -CSE-B 2024-25
No ratings yet
DS Lab Internal q Paper -CSE-B 2024-25
1 page
2-2 ML DS
No ratings yet
2-2 ML DS
1 page
SELF NUMPY
No ratings yet
SELF NUMPY
6 pages
Numpy Guide
No ratings yet
Numpy Guide
1 page
python-notes-BCC-302 (Unit - 05)
No ratings yet
python-notes-BCC-302 (Unit - 05)
25 pages
UBI FullStatement
No ratings yet
UBI FullStatement
2 pages
CII_E-Directory_delete
0% (1)
CII_E-Directory_delete
33 pages
NumPy Notes
No ratings yet
NumPy Notes
13 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet