Skip to content

Commit 1abd12b

Browse files
committed
Working on Pandas
1 parent 00bb34b commit 1abd12b

File tree

2 files changed

+211
-239
lines changed

2 files changed

+211
-239
lines changed

README.md

Lines changed: 109 additions & 122 deletions
Original file line numberDiff line numberDiff line change
@@ -3111,6 +3111,7 @@ Name: a, dtype: int64
31113111
<Sr>.update(<Sr>) # Updates items that are already present.
31123112
```
31133113

3114+
#### Apply, Aggregate, Transform:
31143115
```python
31153116
<el> = <Sr>.sum/max/mean/idxmax/all() # Or: <Sr>.aggregate(<agg_func>)
31163117
<Sr> = <Sr>.diff/cumsum/rank/pct_change() # Or: <Sr>.agg/transform(<trans_func>)
@@ -3119,32 +3120,31 @@ Name: a, dtype: int64
31193120
* **Also: `'ffill()'` and `'interpolate()'`.**
31203121
* **The way `'aggregate()'` and `'transform()'` find out whether a function accepts an element or the whole Series is by passing it a single value at first and if it raises an error, then they pass it the whole Series.**
31213122

3122-
#### Apply, Aggregate, Transform:
31233123
```python
3124-
>>> sr = Series([1, 2], index=['x', 'y'], name='a')
3124+
>>> sr = Series([1, 2], index=['x', 'y'])
31253125
x 1
31263126
y 2
3127-
Name: a, dtype: int64
3127+
dtype: int64
31283128
```
31293129

31303130
```python
3131-
+-------------+--------+-----------+---------------+
3132-
| | 'sum' | ['sum'] | {'s': 'sum'} |
3133-
+-------------+--------+-----------+---------------+
3134-
| sr.apply(…) | | | |
3135-
| sr.agg(…) | 3 | sum 3 | s 3 |
3136-
| | | | |
3137-
+-------------+--------+-----------+---------------+
3131+
+-------------+---------------+---------------+---------------+
3132+
| | 'sum' | ['sum'] | {'s': 'sum'} |
3133+
+-------------+---------------+---------------+---------------+
3134+
| sr.apply(…) | | | |
3135+
| sr.agg(…) | 3 | sum 3 | s 3 |
3136+
| | | | |
3137+
+-------------+---------------+---------------+---------------+
31383138
```
31393139

31403140
```python
3141-
+-------------+--------+-----------+---------------+
3142-
| | 'rank' | ['rank'] | {'r': 'rank'} |
3143-
+-------------+--------+-----------+---------------+
3144-
| sr.apply(…) | | rank | |
3145-
| sr.agg(…) | x 1 | x 1 | r x 1 |
3146-
| sr.trans(…) | y 2 | y 2 | y 2 |
3147-
+-------------+--------+-----------+---------------+
3141+
+-------------+---------------+---------------+---------------+
3142+
| | 'rank' | ['rank'] | {'r': 'rank'} |
3143+
+-------------+---------------+---------------+---------------+
3144+
| sr.apply(…) | | rank | |
3145+
| sr.agg(…) | x 1 | x 1 | r x 1 |
3146+
| sr.trans(…) | y 2 | y 2 | y 2 |
3147+
+-------------+---------------+---------------+---------------+
31483148
```
31493149

31503150
### DataFrame
@@ -3187,44 +3187,6 @@ b 3 4
31873187
<DF> = <DF>.melt(id_vars=column_key/s) # Melts on columns.
31883188
```
31893189

3190-
```python
3191-
<Sr> = <DF>.sum/max/mean/idxmax/all() # Or: <DF>.apply/agg/transform(<agg_func>)
3192-
<DF> = <DF>.diff/cumsum/rank/pct_change() # Or: <DF>.apply/agg/transform(<trans_func>)
3193-
<DF> = <DF>.fillna(<el>) # Or: <DF>.applymap(<map_func>)
3194-
```
3195-
* **Also: `'ffill()'` and `'interpolate()'`.**
3196-
* **All operations operate on columns by default. Use `'axis=1'` parameter to process the rows instead.**
3197-
3198-
#### Apply, Aggregate, Transform:
3199-
```python
3200-
>>> df = DataFrame([[1, 2], [3, 4]], index=['a', 'b'], columns=['x', 'y'])
3201-
x y
3202-
a 1 2
3203-
b 3 4
3204-
```
3205-
3206-
```python
3207-
+-------------+---------------+---------------+---------------+
3208-
| | 'sum' | ['sum'] | {'x': 'sum'} |
3209-
+-------------+---------------+---------------+---------------+
3210-
| df.apply(…) | | x y | |
3211-
| df.agg(…) | x 4 | sum 4 6 | x 4 |
3212-
| df.trans(…) | y 6 | | |
3213-
+-------------+---------------+---------------+---------------+
3214-
```
3215-
3216-
```python
3217-
+-------------+---------------+---------------+---------------+
3218-
| | 'rank' | ['rank'] | {'x': 'rank'} |
3219-
+-------------+---------------+---------------+---------------+
3220-
| df.apply(…) | x y | x y | x |
3221-
| df.agg(…) | a 1 1 | rank rank | a 1 |
3222-
| df.trans(…) | b 2 2 | a 1 1 | b 2 |
3223-
| | | b 2 2 | |
3224-
+-------------+---------------+---------------+---------------+
3225-
```
3226-
* **Transform() doesen't work with `['sum']` and `{'x': 'sum'}`.**
3227-
32283190
#### Merge, Join, Concat:
32293191
```python
32303192
>>> l = DataFrame([[1, 2], [3, 4]], index=['a', 'b'], columns=['x', 'y'])
@@ -3269,99 +3231,124 @@ c 6 7
32693231
┗━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━┛
32703232
```
32713233

3272-
### GroupBy
3273-
**Object that groups together rows of a dataframe based on the value of passed column.**
3234+
#### Apply, Aggregate, Transform:
3235+
```python
3236+
<Sr> = <DF>.sum/max/mean/idxmax/all() # Or: <DF>.apply/agg/transform(<agg_func>)
3237+
<DF> = <DF>.diff/cumsum/rank/pct_change() # Or: <DF>.apply/agg/transform(<trans_func>)
3238+
<DF> = <DF>.fillna(<el>) # Or: <DF>.applymap(<map_func>)
3239+
```
3240+
* **Also: `'ffill()'` and `'interpolate()'`.**
3241+
* **All operations operate on columns by default. Use `'axis=1'` parameter to process the rows instead.**
32743242

32753243
```python
3276-
>>> df = DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 6]], index=list('abc'), columns=list('xyz'))
3277-
>>> gb = df.groupby('z')
3278-
x y z
3279-
3: a 1 2 3
3280-
6: b 4 5 6
3281-
c 7 8 6
3244+
>>> df = DataFrame([[1, 2], [3, 4]], index=['a', 'b'], columns=['x', 'y'])
3245+
x y
3246+
a 1 2
3247+
b 3 4
32823248
```
32833249

32843250
```python
3285-
<GB> = <DF>.groupby(column_key/s) # DF is split into groups based on passed column.
3286-
<DF> = <GB>.get_group(group_key) # Selects a group by value of grouping column.
3287-
<DF> = <GB>.<operation>() # Executes operation on each col of each group.
3251+
+-------------+---------------+---------------+---------------+
3252+
| | 'sum' | ['sum'] | {'x': 'sum'} |
3253+
+-------------+---------------+---------------+---------------+
3254+
| df.apply(…) | | x y | |
3255+
| df.agg(…) | x 4 | sum 4 6 | x 4 |
3256+
| | y 6 | | |
3257+
+-------------+---------------+---------------+---------------+
32883258
```
3289-
* **Result of an operation is a dataframe with index made up of group keys. Use `'<DF>.reset_index()'` to move the index back into it's own column.**
32903259

3291-
#### Aggregations:
32923260
```python
3293-
<DF> = <GB>.sum/max/mean/idxmax/all()
3294-
<DF> = <GB>.apply/agg/transform(<agg_func>)
3261+
+-------------+---------------+---------------+---------------+
3262+
| | 'rank' | ['rank'] | {'x': 'rank'} |
3263+
+-------------+---------------+---------------+---------------+
3264+
| df.apply(…) | x y | x y | x |
3265+
| df.agg(…) | a 1 1 | rank rank | a 1 |
3266+
| df.trans(…) | b 2 2 | a 1 1 | b 2 |
3267+
| | | b 2 2 | |
3268+
+-------------+---------------+---------------+---------------+
32953269
```
32963270

3271+
#### Encode:
32973272
```python
3298-
+-------------+------------+-------------+---------------+
3299-
| | 'sum' | ['sum'] | {'x': 'sum'} |
3300-
+-------------+------------+-------------+---------------+
3301-
| gb.apply(…) | x y z | | |
3302-
| | z | | |
3303-
| | 3 1 2 3 | | |
3304-
| | 6 11 13 12 | | |
3305-
+-------------+------------+-------------+---------------+
3306-
| gb.agg(…) | x y | x y | x |
3307-
| | z | sum sum | z |
3308-
| | 3 1 2 | z | 3 1 |
3309-
| | 6 11 13 | 3 1 2 | 6 11 |
3310-
| | | 6 11 13 | |
3311-
+-------------+------------+-------------+---------------+
3312-
| gb.trans(…) | x y | | |
3313-
| | a 1 2 | | |
3314-
| | b 11 13 | | |
3315-
| | c 11 13 | | |
3316-
+-------------+------------+-------------+---------------+
3273+
<DF> = pd.read_json/html('<str/path/url>')
3274+
<DF> = pd.read_csv/pickle/excel('<path/url>')
3275+
<DF> = pd.read_sql('<query>', <connection>)
3276+
<DF> = pd.read_clipboard()
33173277
```
33183278

3319-
#### Transformations:
3279+
#### Decode:
33203280
```python
3321-
<DF> = <GB>.diff/cumsum/rank() # …/pct_change/fillna/ffill()
3322-
<DF> = <GB>.agg/transform(<trans_func>)
3281+
<dict> = <DF>.to_dict(['d/l/s/sp/r/i'])
3282+
<str> = <DF>.to_json/html/csv/markdown/latex([<path>])
3283+
<DF>.to_pickle/excel(<path>)
3284+
<DF>.to_sql('<table_name>', <connection>)
33233285
```
33243286

3287+
### GroupBy
3288+
**Object that groups together rows of a dataframe based on the value of passed column.**
3289+
33253290
```python
3326-
+-------------+------------+-------------+---------------+
3327-
| | 'rank' | ['rank'] | {'x': 'rank'} |
3328-
+-------------+------------+-------------+---------------+
3329-
| gb.agg(…) | x y | x y | x |
3330-
| | a 1 1 | rank rank | a 1 |
3331-
| | b 1 1 | a 1 1 | b 1 |
3332-
| | c 2 2 | b 1 1 | c 2 |
3333-
| | | c 2 2 | |
3334-
+-------------+------------+-------------+---------------+
3335-
| gb.trans(…) | x y | | |
3336-
| | a 1 1 | | |
3337-
| | b 1 1 | | |
3338-
| | c 1 1 | | |
3339-
+-------------+------------+-------------+---------------+
3291+
<GB> = <DF>.groupby(column_key/s) # DF is split into groups based on passed column.
3292+
<DF> = <GB>.get_group(group_key) # Selects a group by value of grouping column.
33403293
```
33413294

3342-
### Rolling
3295+
#### Apply, Aggregate, Transform:
33433296
```python
3344-
<Rl_S/D/G> = <Sr/DF/GB>.rolling(window_size) # Also: `min_periods=None, center=False`.
3345-
<Rl_S/D> = <Rl_D/G>[column_key/s] # Or: <Rl>.column_key
3346-
<Sr/DF/DF> = <Rl_S/D/G>.sum/max/mean()
3347-
<Sr/DF/DF> = <Rl_S/D/G>.apply(<agg_func>) # Invokes function on every window.
3348-
<Sr/DF/DF> = <Rl_S/D/G>.aggregate(<func/str>) # Invokes function on every window.
3297+
<DF> = <GB>.sum/max/mean/idxmax/all() # Or: <GB>.apply/agg(<agg_func>)
3298+
<DF> = <GB>.diff/cumsum/rank/ffill() # Or: <GB>.aggregate(<trans_func>)
3299+
<DF> = <GB>.fillna(<el>) # Or: <GB>.transform(<map_func>)
33493300
```
33503301

3351-
### Encode
33523302
```python
3353-
<DF> = pd.read_json/html('<str/path/url>')
3354-
<DF> = pd.read_csv/pickle/excel('<path/url>')
3355-
<DF> = pd.read_sql('<query>', <connection>)
3356-
<DF> = pd.read_clipboard()
3303+
>>> df = DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 6]], index=list('abc'), columns=list('xyz'))
3304+
>>> gb = df.groupby('z')
3305+
x y z
3306+
3: a 1 2 3
3307+
6: b 4 5 6
3308+
c 7 8 6
33573309
```
33583310

3359-
### Decode
33603311
```python
3361-
<dict> = <DF>.to_dict(['d/l/s/sp/r/i'])
3362-
<str> = <DF>.to_json/html/csv/markdown/latex([<path>])
3363-
<DF>.to_pickle/excel(<path>)
3364-
<DF>.to_sql('<table_name>', <connection>)
3312+
+-------------+-------------+-------------+---------------+
3313+
| | 'sum' | ['sum'] | {'x': 'sum'} |
3314+
+-------------+-------------+-------------+---------------+
3315+
| gb.agg(…) | x y | x y | x |
3316+
| | z | sum sum | z |
3317+
| | 3 1 2 | z | 3 1 |
3318+
| | 6 11 13 | 3 1 2 | 6 11 |
3319+
| | | 6 11 13 | |
3320+
+-------------+-------------+-------------+---------------+
3321+
| gb.trans(…) | x y | | |
3322+
| | a 1 2 | | |
3323+
| | b 11 13 | | |
3324+
| | c 11 13 | | |
3325+
+-------------+-------------+-------------+---------------+
3326+
```
3327+
3328+
```python
3329+
+-------------+-------------+-------------+---------------+
3330+
| | 'rank' | ['rank'] | {'x': 'rank'} |
3331+
+-------------+-------------+-------------+---------------+
3332+
| gb.agg(…) | x y | x y | x |
3333+
| | a 1 1 | rank rank | a 1 |
3334+
| | b 1 1 | a 1 1 | b 1 |
3335+
| | c 2 2 | b 1 1 | c 2 |
3336+
| | | c 2 2 | |
3337+
+-------------+-------------+-------------+---------------+
3338+
| gb.trans(…) | x y | | |
3339+
| | a 1 1 | | |
3340+
| | b 1 1 | | |
3341+
| | c 1 1 | | |
3342+
+-------------+-------------+-------------+---------------+
3343+
```
3344+
3345+
### Rolling
3346+
```python
3347+
<Rl_S/D/G> = <Sr/DF/GB>.rolling(window_size) # Also: `min_periods=None, center=False`.
3348+
<Rl_S/D> = <Rl_D/G>[column_key/s] # Or: <Rl>.column_key
3349+
<Sr/DF/DF> = <Rl_S/D/G>.sum/max/mean()
3350+
<Sr/DF/DF> = <Rl_S/D/G>.apply(<agg_func>) # Invokes function on every window.
3351+
<Sr/DF/DF> = <Rl_S/D/G>.aggregate(<func/str>) # Invokes function on every window.
33653352
```
33663353

33673354

0 commit comments

Comments
 (0)