Data Filtering in Pandas

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

with sql codes

yunusemrecevik
TABLE OF
CONTENTS
SINGLE CONDITION 2
MULTIPLE CONDITION 4
IN OPERATOR 6
LIKE OPERATOR 8
IS NULL OPERATOR 11
FILTER DATES 14
BASIC OPERATIONS 17
DATA
TABLE
ID NAME SURNAME COUNTRY AGE SALARY

1 ADAM SMITH USA 25 150000

2 PAUL WAGNER FRANCE 35 250000

3 JACK RYAN GERMANY 32 200000

4 JOHN WICK CANADA 40 300000

5 ADAM WAYNE USA 21 225000

6 JENNIFER NICK CANADA 24 200000

7 WANDA RYAN USA 36 150000

8 JOHNY BRYAN FRANCE 19 100000

pandas

1
SINGLE CONDITION

SQL
SELECT * FROM TABLE WHERE ID > 5

ID NAME SURNAME COUNTRY AGE SALARY

6 JENNIFER NICK CANADA 24 200000

7 WANDA RYAN USA 36 150000

8 JOHNY BRYAN FRANCE 19 100000

PANDAS
METHOD 1: DATAFRAME WAY

You can use all logical operators.

Pandas loc() method same as dataframe way for filtering.

METHOD 2: QUERY METHOD

2
SINGLE CONDITION (NOT)

SQL
SELECT * FROM TABLE WHERE NOT ID > 5

ID NAME SURNAME COUNTRY AGE SALARY

1 ADAM SMITH USA 25 150000

2 PAUL WAGNER FRANCE 35 250000

3 JACK RYAN GERMANY 32 200000

4 JOHN WICK CANADA 40 300000

5 ADAM WAYNE USA 21 225000

PANDAS
METHOD 1: DATAFRAME WAY
bracket notation

METHOD 2: QUERY METHOD

( ~ ) is a unary operator that performs bitwise inversion

3
MULTIPLE CONDITION (AND)

SQL
SELECT * FROM TABLE WHERE ID > 5 AND COUNTRY = 'CANADA'

ID NAME SURNAME COUNTRY AGE SALARY

6 JENNIFER NICK CANADA 24 200000

PANDAS
METHOD 1: DATAFRAME WAY

dot notation make sure to use parentheses for multiple conditions

METHOD 2: QUERY METHOD

Should you use "dot notation" or "bracket notation" ?

Dot notation is faster to write and clearer to read. You can use whichever you like.
For more details:
https://www.dataschool.io/pandas-dot-notation-vs-brackets/

4
MULTIPLE CONDITION (AND & OR)

SQL
SELECT * FROM TABLE WHERE ID > 5 AND (COUNTRY = 'CANADA'
OR COUNTRY = 'USA')

ID NAME SURNAME COUNTRY AGE SALARY

6 JENNIFER NICK CANADA 24 200000

7 WANDA RYAN USA 36 150000

PANDAS
METHOD 1: DATAFRAME WAY

you can use " & " for and operator you can use " | " for or operator

METHOD 2: QUERY METHOD

you can use "and" or "&" for and operator you can use "or" or "|" for and operator

5
IN OPERATOR

SQL
SELECT * FROM TABLE WHERE COUNTRY IN ('GERMANY', 'FRANCE')

ID NAME SURNAME COUNTRY AGE SALARY

2 PAUL WAGNER FRANCE 35 250000

3 JACK RYAN GERMANY 32 200000

8 JOHNY BRYAN FRANCE 19 100000

PANDAS

METHOD 1: DATAFRAME WAY


isin() method helps for selecting multiple value.
you cant use "in" operator for dataframe way

METHOD 2: QUERY METHOD


you can use "in" operator for query() method

6
NOT IN OPERATOR

SQL
SELECT * FROM TABLE WHERE COUNTRY NOT IN ('USA', 'FRANCE')

ID NAME SURNAME COUNTRY AGE SALARY

3 JACK RYAN GERMANY 32 200000

4 JOHN WICK CANADA 40 300000

6 JENNIFER NICK CANADA 24 200000

PANDAS
METHOD 1: DATAFRAME WAY

you can use tilde sign for "not" operator

METHOD 2: QUERY METHOD


you can use "not" or tilde sign for "not" operator

7
LIKE OPERATOR - startswith

SQL
SELECT * FROM TABLE WHERE SURNAME LIKE "WA%"

ID NAME SURNAME COUNTRY AGE SALARY

2 PAUL WAGNER FRANCE 35 250000

5 ADAM WAYNE USA 21 225000

PANDAS
METHOD 1: DATAFRAME WAY

METHOD 2: QUERY METHOD

LIKE REGEX
str.match() determine if each string matches a regular expression.
For more detail :
https://pandas.pydata.org/docs/reference/api/pandas.Series.str.match.html

8
LIKE OPERATOR - endswith

SQL
SELECT * FROM TABLE WHERE SURNAME LIKE "%YAN"

ID NAME SURNAME COUNTRY AGE SALARY

3 JACK RYAN GERMANY 32 200000

7 WANDA RYAN USA 36 150000

8 JOHNY BRYAN FRANCE 19 100000

PANDAS
METHOD 1: DATAFRAME WAY

METHOD 2: QUERY METHOD

9
LIKE OPERATOR - contains

SQL
SELECT * FROM TABLE WHERE NAME LIKE "%OH%"

ID NAME SURNAME COUNTRY AGE SALARY

4 JOHN WICK CANADA 40 300000

8 JOHNY BRYAN FRANCE 19 100000

PANDAS

METHOD 1: DATAFRAME WAY

METHOD 2: QUERY METHOD

MORE strıng methods

For more string metods visit https://pandas.pydata.org/docs/user_guide/text.html

10
IS NULL OPERATOR

ID NAME SURNAME COUNTRY AGE SALARY

1 ADAM SMITH NaN 25 150000

2 PAUL WAGNER FRANCE 35 250000

3 JACK RYAN GERMANY 32 200000

4 JOHN WICK CANADA 40 300000

5 ADAM WAYNE NaN 21 225000

6 JENNIFER NICK CANADA 24 200000

7 WANDA RYAN NaN 36 150000

8 JOHNY BRYAN FRANCE 19 100000

USA records updated to NULL

SQL
SELECT * FROM TABLE WHERE COUNTRY IS NULL

ID NAME SURNAME COUNTRY AGE SALARY

1 ADAM SMITH NaN 25 150000

5 ADAM WAYNE NaN 21 225000

7 WANDA RYAN NaN 36 150000

11
IS NULL OPERATOR

PANDAS

METHOD 1: DATAFRAME WAY

you can use also isna() method

METHOD 2: QUERY METHOD

you can use also isna() method

12
IS NOT NULL OPERATOR

SQL
SELECT * FROM TABLE WHERE COUNTRY IS NOT NULL

ID NAME SURNAME COUNTRY AGE SALARY

2 PAUL WAGNER FRANCE 35 250000

3 JACK RYAN GERMANY 32 200000

4 JOHN WICK CANADA 40 300000

6 JENNIFER NICK CANADA 24 200000

8 JOHNY BRYAN FRANCE 19 100000

PANDAS
METHOD 1: DATAFRAME WAY
you can use also notna() method

METHOD 2: QUERY METHOD


you can use also notna() method

13
FILTER DATES

ID NAME SURNAME COUNTRY AGE SALARY MEMBER_DATE

1 ADAM SMITH NaN 25 150000 2021-02-14

2 PAUL WAGNER FRANCE 35 250000 2005-03-17

3 JACK RYAN GERMANY 32 200000 2000-03-01

4 JOHN WICK CANADA 40 300000 1999-01-19

5 ADAM WAYNE NaN 21 225000 2011-04-24

6 JENNIFER NICK CANADA 24 200000 2008-11-10

7 WANDA RYAN NaN 36 150000 2015-11-30

8 JOHNY BRYAN FRANCE 19 100000 2019-04-19

MEMBER_DATE column added to data

PANDAS

14
FILTER STRING DATES

SQL
SELECT * FROM TABLE WHERE MEMBER_DATE > '2015-01-01'

ID NAME SURNAME COUNTRY AGE SALARY MEMBER_DATE

1 ADAM SMITH NaN 25 150000 2021-02-14

7 WANDA RYAN NaN 36 150000 2015-11-30

8 JOHNY BRYAN FRANCE 19 100000 2019-04-19

PANDAS

METHOD 1: DATAFRAME WAY

METHOD 2: QUERY FUNCTION

15
FILTER DATATIME DATES
SQL
SELECT * FROM TABLE WHERE YEAR(MEMBER_DATE) < 2005

ID NAME SURNAME COUNTRY AGE SALARY MEMBER_DATE

3 JACK RYAN GERMANY 32 200000 2000-03-01

4 JOHN WICK CANADA 40 300000 1999-01-19

PANDAS

METHOD 1: DATAFRAME WAY

METHOD 2: QUERY FUNCTION

16
BASIC OPERATIONS
SQL
SELECT * FROM TABLE WHERE SALARY / 20 > 12000

ID NAME SURNAME COUNTRY AGE SALARY MEMBER_DATE

2 PAUL WAGNER FRANCE 35 250000 2005-03-17

4 JOHN WICK CANADA 40 300000 1999-01-19

PANDAS

METHOD 1: DATAFRAME WAY

METHOD 2: QUERY FUNCTION

17
BASIC OPERATIONS

SQL
SELECT * FROM TABLE WHERE NAME > "JO"

ID NAME SURNAME COUNTRY AGE SALARY MEMBER_DATE

2 PAUL WAGNER FRANCE 35 250000 2005-03-17

4 JOHN WICK CANADA 40 300000 1999-01-19

7 WANDA RYAN NaN 36 150000 2015-11-30

8 JOHNY BRYAN FRANCE 19 100000 2019-04-19

PANDAS

METHOD 1: DATAFRAME WAY

METHOD 2: QUERY FUNCTION

18
BASIC OPERATIONS
SQL
SELECT * FROM TABLE WHERE SALARY * 2 > @VALUE

ID NAME SURNAME COUNTRY AGE SALARY MEMBER_DATE

4 JOHN WICK CANADA 40 300000 1999-01-19

PANDAS
METHOD 1: DATAFRAME WAY

METHOD 2: QUERY FUNCTION

19
BASIC OPERATIONS

SQL
SELECT * FROM TABLE WHERE CONCAT(NAME, ' ' ,SURNAME) = 'PAUL
WAGNER'

ID NAME SURNAME COUNTRY AGE SALARY MEMBER_DATE

2 PAUL WAGNER FRANCE 35 250000 2005-03-17

PANDAS
METHOD 1: DATAFRAME WAY

METHOD 2: QUERY FUNCTION

20
DOWNLOAD PANDAS CODES

https://github.com/egitimplus/helpers/blob/main/notebook/filter.ipynb

FOLLOW ME ON

@yunusemrecevik

@yunusemrecevik

@EGITIMPLUS

21

You might also like