0% found this document useful (0 votes)
24 views

Pandas Toolkit

Uploaded by

VU NGO XUAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
24 views

Pandas Toolkit

Uploaded by

VU NGO XUAN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 44
In (1) In [2] out [2]: In [3]: out [3]: In [4] out [4]: In [5]: out [5]: In [6]: out [6]: ‘Syed Afroz All import pandas as pd import matplotlib.pyplot as plt df = pd.DataFrame({ Name": ["Braund, Mr. Owen Harri: "age": (22, 35, 58], "Sex": ["male", "male", “female"] » df Name Age Sex © Braund, Mr, Owen Harris 22 male 1 Allen, Mr. William Henry 35. male 2 Bonnell, Miss. Elizabeth 58 female dF["Age"] a 22 135 2 58 Name: Age, dtype: intea ages = pd.Series([22, 35, 58], name: ages "age") @ 2 135 2 58 Name: Age, dtype: intea dF["Age" ].max() 58 ages.max() 58 »"Allen, Mr. William Henry", jonnell, Miss. In [7]: df.describe() out [7]: Age count 3.000000 mean 38.333333 std 18.230012 ‘min 22.000000 25% 28.500000 50% 36.000000 75% 48.500000 max. 58,000000 In [8]: titanic = pd.read_csv("train_titanic.csv") titanic.head() out(8}: Passengerid Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cal Braund, ° 1 0 3 MeOwon male 220 © 10 NS 2171 7.2500 N Harris Cuming, Mrs. John 1 2 +t GBRsleY female 980 10. PC T7890 71.2838 ¢ Briggs Th. Heikkinen, stonvoz. 2 3 13° Miss: fomalo 260-00 Spay 7.8280 N Futrele, Mrs. Jacques 3 4 1 1 JRCGUeS female 350 1 «119808. 3.1000 C1 (ily May Peel) Allen, Mr. 4 5 0 3 Wiliam = male 350 «= 0S 378450 8.0500 N Haney _ In [9]: out [9]: In [10]: In (41): In [12]: In [13]: out [13]: titanic. dtypes PassengerId survived Pclass Name sex Age sibsp Parch Ticket Fare Cabin Embarked dtype: object intea intes intea object object floated intea intea object Floated object object ‘titanic.to_excel("titanic.xlsx", sheet_name="passengers", index-False, ‘titanic = titanic. info() pd-read_excel(“titanic.x1sx", sheet_nane="passengers") Rangelndex: 891 entries, @ to 890 Data colunns (total 12 columns): Non-Null Count # Column PassengerId Survived Pclass Name Sex Age sibsp Parch Ticket Fare 1@ Cabin 11 Embarked e 1 2 3 4 5 6 7 8 9 391 391 391 391 391 714 91 391 391 891 204 389 non-null non-null non-null non-null. non-null non-null non-null non-null non-null non-null non-null non-null Dtype intea intea intea object object floates intea intea object floate4 object object dtypes: float6a(2), ints4(5), object(5: memory usage: ages ages.head() @ 22.0 1 38.0 2 26.0 3 35.0 4 35.8 N : Age, dtype: 83.74 KB titanic["Age"] floates In (14): out (14): In [15]: out (15): In [16]: out [16]: type(titanic["Age"}) pandas. core. series. Series titanic[ "Age"].shape (891,) titanic["Age"] shape (891,) In [17]: age_sex = titanic[{"age", "Sex"]] out (47): In [18]: out (18): In [19]: out(19]: age_sex.head() Age Sex 0 220° male 1 38.0 female 2 26.0 female 3 35.0 female 4 35.0 male titanic[["Age", (891, 2) above_35 = titanic[titanic["Age"] > 35] above_35.head() Passengerld Survived Pclass sex" ]] shape Name Sex Age SibSp Parch Ticket Fare cab " 2 1 1 1B 4 ° 3 15 16 1 2 ‘Cumings, Mrs. John Bradley (Florence Briggs Tha McCarthy, Mr Timothy J Bonnell, Miss. Elizabeth ‘Andersson, Mr. Anders Joan Hewlett, Mrs. (Mary D Kingcome) fomale 38.0 male 54.0 female 58.0 male 39.0 female 55.0 PC 17599 0 0 17483 oo 113783 15 347082 0 0 248708 Ot 71.2833 51.8625 28.5500 31.2750 16.0000 ct Ne In [20]: class_23 = titanic[titanic{"Pclass"].isin([2, 3])] class_23.head() Out [20]: Passengerld Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cal rane ° 1 0a Mrowen male 220-10 AS2NI71. 72500 N Same Heltinen . stowoz 2 24a INES: tomo 280 @ 0 STOUGE 7250 Aten ‘ 5 03 Wiiam malo 350-0 373450 80500 N ‘em 5 6 0 gM male NaN 00007 8.4569 N Palsson, 7 a 0g MBS ale 2031 aaa909 1.0750 N Leora LS , in [21]: class_23 = titanie[ (titanic "Pclass"] == 2) | (titanie[*Pclass"] = 3)] class_23.head() out [21]: Passengerid Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cal raund, ° 10a MrOwen male 2201-0 ASNT 72500 Same Hotinen, a a1 Mise! female 26000 STONO2 p50 Caine Aten 4 5 03 “Wiis mae 956000379450 8.0500 N ony Nan, 5 6 0 welt@P, male NaN 0900877 84589 Passon, Waster 7 a 0a MBS ae 2031 aH9900 21.0750 N Leonard In [22]: age_no_na = titanic[titanic["age"] .notna()] age_no_na.head() Out[22]: __Passengerld Survived Pelass Name Sex Age SibSp Parch Ticket Fare Cal Braun ° 1 0 3 MrOwen male 220 91 0 AS2t71 7.2500 N Haris Cumings, Mrs. John 1 2 +1 ittlley female 980 © 1 «8 PC A7509 71.2889 Briggs Th Heikkinen, . STONIO2. 2 3 13 female 260 0 0 SION 7.9050 N 3 4 1 1 female 350 1 0 113803 53.1000 ct (ily May Peel) Allen, Mr. 4 5 0 = Wiliam ~=male 350 = 00879450 8.0500 N Henry SS > In [23]: adult_names = titanic.loc[titanic["Age"] > 35] adult_names .head() Out[(23]: __Passengerid Survived Polass Name Sex Age SibSp Parch Ticket Fare Cab Cumings, Mrs. John Bradley PC 1 2 1 1 (pealey female 98.0 © 1 0 ypayy 71.2899 Ci Briggs Th McCarthy, 6 7 o 4 Mc male 540 = 0-0. (17483. §1.8625 Timothy J Bonnell, " 12 1 1 Miss! female 580 9 0 118783 26.5500 Ct Elizabeth ‘Andersson, 1% 4 0 3 Mr Anders male 32.0 1 «= § 347082 31.2750 Ne Johan Howlett, 15 6 + 2 MS (MEY tome 550 0 0 248708 160000 Ne Kingcome) In [24]: adult_names = titanic.loc[titanic["Age"] > 35, "Name"] adult_names.head() Out(24]: 1 Cumings, Mrs. John Bradley (Florence Briggs Th... 6 McCarthy, Mr. Timothy 3 1 Bonnell, Miss. Elizabeth B Andersson, Mr. Anders Johan 15 Hewlett, Mrs. (Mary D Kingcome) : Name, dtype: object In [25]: titanic.iloc(9:25, 2:5] Out [25]: Pclass Name Sex 8 2 Nasser, Mrs. Nicholas (Adele Achem) female 10 3 ‘Sandstrom, Miss. Marguerite Rut female " 1 Bonnell, Miss. Elizabeth female R 3 Saundercock, Mr. Willam Henry male 13 3 Andersson, Mr. Anders Johan male “ 3 Vestrom, Miss, Hulda Amanda Adolfina female 15 2 Howlett, Mrs. (Mary D Kingcome) female 16 3 Rice, Master. Eugene male " 2 Willams, Mr. Charles Eugene male 18 3. Vander Planke, Mrs. Julius (Emelia Maria Vande... female 19 3 Masselmani, Mrs. Fatima female 20 2 Fynney, Mr. Joseph) — male a 2 Beesley, Mr Lawrence male 2 3 McGowan, Miss. Anna “Annie” female 23 1 Sloper, Mr. Willam Thompson male 24 3 Palsson, Miss. Torborg Danira_ female In [26]: anon = titanic.iloc[@:3, 3] = nonymous” out[26]: ‘anonymous" In [27]: titanic.head() Out[27]: Pagsengerld Survived Pelass Name Sex Age SibSp Parch Ticket Fare Gr ° 1 ° 3 anonymous male 220 1 0 AS21171 7.2500 I 1 2 1 1 anonymous female 38.0 1 0 PC 17599. 71.2833, sTONIO2, 2 3 1 3 anonymous female 260 0 0 SLO 7960 1 Futrelle, Mrs. 3 4 1 1 Jacques female 35.0 1 0 113803 $3,100 Heath (Lily May Peel) Allen, Mr. 4 5 o 3 Wiliam male 350 0 += 0373450 8.0500 I Henry In [28]: titanic[”Age"].mean() out(28): 29,69911764705882 In [29]: titanic[["Age" out[29]: Age Fare 28.0000 14.4542 dtype: floatea In [30]: titanic[["Age", out [30]: ‘count std min 25% 50% 75% max Age 714,000000 29,690118 14526497 0.420000 20.125000, 28,0000 38,0000 0.000000 Fare"]].median() Fare 3891,000000 32.204208 49,693429 0.000000 7.910400 14.454200 31.000000 512.329200 Fare"]].describe() In (31): out [31]: In [32]: out [32]: In [33]: out [33]: In [34]: out [34]: In [35]: out [35]: titanic. aga({ median", “skew"] max", “median”, "mean"] Age Fare ‘min 0.420000 0.000000 max 80,000000 512.329200 median 28.000000 14.454200 skew 0.399108 Nan mean NaN 32.204208 titanic[["Sex", "Age"]].groupby("Sex").mean() Age female 27.915709 male 30.726645 titanic[["Sex", “Age"]].groupby("Sex").max() Age Sex female 63.0 male 80.0 titanic[["Sex", "Age"]].groupby("Sex") First () Age Sex female 38.0 male 22.0 titanic.head(2) Passengerld Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin ° 1 ° 3 anonymous male 220 1 0 yA 7.2600 NaN anim 1 2 1 1 female 380 © 10 yee 71.2833 C85 ‘anonymous female 7400 — > In [37]: out [37]: In [38]: out (38): In [39]: out [39]: In [40]: out [40]: In [41]: out [41]: ‘titanic. groupby("sex")[ "Age" ].mean() Sex female — 27.915709 male 30.726645 Name: Age, dtype: floates titanic.groupby(["Sex", “Pclass"])["Fare"].mean() Sex Pclass female 1 106.125798 2 21.970121 3 16.118810 male 1 67.226127 2 19.741782 3 12.661633 Name: Fare, dtype: floate4 ‘titanic["Pclass”].value_counts() 3 492 1 216 2 184 Name: Pclass, dtype: inte4 titanic. groupby("Pclass")["Pclass"].count() Pclass 1 216 2 184 3 492 Name: Pelass, dtype: inte4 titanic. sort_values (by="Age" , ascending=False) .head() Passengerld Survived Pclass Name Sex Age SibSp Parch Ticket Fare Ca Barkworth, Mr. Algernon 630 eat 1 1 Henry male 80.0 © 27042 30.0000 + Wilson ‘Svensson, 351 852 ° 3 fyenston. male 74.0 0 347080 7.750 N Artagaveytia, PC 493 494 ° 1 Anagavevtion male 71.0 © yey 49.5042 N Goldschmidt, Pc 96 97 ° 1 Mr.George mate 71.0 © s7yaq 34.8542 Connors, Mr 16 “7 o 3 ett, male 70.5 © 370369 7.7500 N In [42]: titanic.sort_values(by=['Pclass', ‘Age'], ascending=False) .head() Out [42]: Passengerld Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabi est ese 08 SYOTEHOM. malo 740 ©) 0247060 7.7750 Nal Comers te W708 Mie male 705 00 870369. 7.7500 Nal Paviek Duane, 200 281 0 3 y,2¥08 male 65.0 0 96498 7:7500 Nal Turk, 48 04 + 9 Ns! female 630 04134. 9.8875 Nal (Hedwig) Nysveen, 228 a2? «8 Medohari male 61000345964 6.2875 Nal Hansen LS , In [43]: titanic.dtypes out[43]: Passengerid intea Survived intea Pclass intea Name object sex object Age Floated sibsp intea Parch intea Ticket object Fare Floated Cabin object Embarked object dtype: object In [44]: titanic["Name"].str.lower() out[44]: @ anonymous 1 anonymous 2 anonymous 3 futrelle, mrs. jacques heath (lily may peel) 4 allen, mr. william henry 886 montvila, rev. juozas 887 graham, miss. margaret edith 388 johnston, miss. catherine helen “carrie” 389 behr, mr. karl howell 890 dooley, mr. patrick Name: Name, Length: 891, dtype: object In [45] out [45]: In [46] out [46]: In [47] out [47]: titanic["Nane"].str.split(",") e [anonymous] 1 [anonymous] 2 [anonymous] 3 [Futrelle, Mrs. Jacques Heath (Lily May Peel)] 4 [Allen, Mr. William Henry] 886 [Nontvila, Rev. Juozas] 887 [Graham, Miss. Margaret Edith] 888 [Johnston, Miss. Catherine Helen "Carrie"] 889 [Behr, Mr. Karl Howell] 890 [Dooley, Mr. Patrick] Name: Name, Length: 891, dtype: object ‘titanic["surname"] = titanic["Name"].str.split(",").str.get(0) ‘titanic[ "Surname" ] @ anonymous 1 anonymous 2 anonymous 3 Futrelle 4 Allen 886 Montvila 887 Graham 888 Johnston B89) Behr B90 Dooley Name: Surname, Length: 891, dtype: object ‘titanic[ "Name_main ‘titanic[ "Name_main titanic["Name"].str.split( ).str.get(1) NaN NaN NaN Mrs. Jacques Heath (Lily May Peel) Me. William Henry Rev. Juozas Miss. Margaret Edith Miss. Catherine Helen "Carrie Mr. Karl Howell Nr. Patrick Name_main, Length: 891, dtype: object In [48]: titanic["Name"].str.split(", Out[48]: @ [anonymous] 1 [anonymous] 2 [anonymous] 3 [Futrelle, Mrs. Jacques Heath (Lily May Peel)] 4 [Allen, Mr. William Henry] 886 [Nontvila, Rev. Juozas] 887 [Graham, Miss. Margaret Edith] 888 [Johnston Miss. Catherine Helen "Carrie"] 889 [Behr, Mr. Karl Howell] 890 [Dooley, Mr. Patrick] Name: Name, Length: 891, dtype: object In [49]: titanic[ 'Real_Name'] = titanic["Name"].str.split( titanic.head() ).str.get(®) Out[49): passengerld Survived Pclass Name Sex Age SibSp Parch Ticket Fare Ci ° 1 0 8 anonymous male 220 1 0 A271 7.2500 1 1 2 1 4 anonymous female 380 = 1-0. PC17599 71.2803 sTonioz, 2 3 1 anonymous female 260 0 0 STON 76059 Fuel, Ms. 3 4 + 1 Jacques female 950 © 1 = «0119809 53.1000 Heath (uly May Peel) ‘Alon, Mr 4 5 0 3 Wiliam ~=male 350 «=O 373450. 8.0500 1 Henry LS > In [50]: titanic['Surname'] = titanic["Name"].str.split(",").str.get(1) titanic.head() Out[S®}: passengerld Survived Pclass Name Sex Age SibSp Parch Ticket Fare: ° 1 0 3 anonymous male 2201 0 A211 7.2500 1 1 2 14 anonymous female 380 © 1-0. PC17599 71.2893 stowio2. 2 3 1 anonymous female 260 0 0 STON'O2 76059 Fate, Ms. 3 4 1 1 Jacques female 350 © 1-0 119809 53.1000 Heath (uly May Peel) ‘len, Mr 4 5 0 3 Wiliam = male 350 «00373450 8.0500 1 Henny In [51]: out (51): In [52]: out [52]: In [53]: out [53]: titanic['Salutation'] = titanic[ Surname" ].str.split(".").str.get(2) titanic. head() Passengerld Survived Pclass Name Sex Age SibSp Parch Ticket Fare Ci ° 1 ° 3 anonymous male 220 1 0 AS21171 7.2500 I 1 2 1 1. anonymous female 380 1 0 PC 17599. 71.2833, STONIO2, 2 3 1 3 anonymous female 260 0 0 SLONO?, 7ar69 1 Futrelle, Mrs. 3 4 1 1 Jacques female 35.0 1 0 113803 §3.1000 Heath (Lily May Peel) Allen, Mr. 4 5 o 3 Wiliam male 380 9 0373450 8.0500 I Henry — > titanic[ "Name" ].str.contains("Me") @ False 1 False 2 False 3 True 4 True 886 False 887 False 888 False 889 True 8990 True Nam Name, Length: 891, dtype: bool titanic[titanic[ "Name" ].str.contains("Countess")] Passengerld Survived Pelass Name Sex Age SibSp Parch Ticket Fare Cabin Rothes, the Countess. 759 760 1 1 “of(lucy female 330 0 0 110182 88.5 B77 Noe! Martha Dye. —_—_— > In [54]: out (54): In [55]: out [55]: In [56]: out (56): In [57]: out [57]: In [58]: out (58): In [59]: ‘titanic["Name"].str.1en() e 9 1 9 2 9 3 44 4 24 886 21 88728 88840 e921 s0@ 19 Name: Name, Length: 891, dtype: intea titanic[ "Name" +str.Len() -idxmax() 307 titanic. loc[titanie[ "Name"].ste.en()-idxmax(), "Name"] “Penasco y Castellana, Mrs. Victor de Satode (Maria Josefa Perez de Soto y Va lejo)' titanic. loc[titanic["Name"].str.len().idxmin(), "Name"] “anonymous ' titanic[ "Sex_short" titanic ["Sex_short"] titanic["Sex"].replace({"male": » “fenale": "F"}) @ 4 1 F 2 F 3 F 4 4 886M 887 F 88 F som se oH Name: Sex_short, Length: 891, dtype: object titanic[" ‘titanic[" "sex"].str.replace("fenale", "F") "Sex_short"].str.replace("male", "M") In [170]: import numpy as np df = pd.DataFrame(np.random.randn(1@, 3), columns=list("abc")) df[["a", “b"}] out [170] a ° b © 0.971377 0.762178 -0.305884 1 0.412251 0.588495 0.096369 2 1.801618 -0.597973 1.489147 3 -0,359858 -0.878680 -1.461579 4 -0.455795 0.681250 0.973445 5 0.882787 -0.544525 -0.295961 6 -1.098355 -1.421945 0.417816 7 -0.133820 -0,183852 1.228267 8 -0.495825 -1,226723 -0.318924 9 -0.064218 -0.306832 0.34591 In [171]: df-locl:, out (171) a e © 0971377 0.762178 1 0.412251 0.588495 2 1.801618 -0.597973 3 -0,359858 0.878680 4 -0.455795 0.681250 5 0.852787 -0.544525 6 -1.098355 -1.421945 7 -0.133820 -0,183852 8 -0.495825 -1,226723 9 -0.084218 -0.306832 Good Code In [ ]: named = list("abcdefg") n= 30 columns = named + np.arange(len(named), n).tolist() df = pd.DataFrame(np.random.randn(n, n), columns=columns) df.iloc[:, np.r_[:1@, 24:30] In( ]: In (J: In [63]: out (63): In [64]: out [64]: df = pd.DataFrame({ [1, 3,5, 7, 8, 3, 5, mp.nan, 4, 5, 7, 91, [11, 33, 55, 77, 88, 33, 55, np.nan, 44, 55, 77, 99], ed", “blue”, 1, 2, np.nan, "big", 1, 2, "red", 1, np.nan, 12], let", “dry”, 99,95,np.nan, "damp", 95,99, "red", 99,np.nan,np.nan,] g = df.groupby(["by1", “by2"]) g("vi", "v2"]].mean() import numpy as np 5 = pd.Series(np.arange(5), dtype=np. float32) s @ 1 2 3 4 PUNES dtype: float32 s-isin([2, 4]) @ False 1 False 2 True 3 False 4 True dtype: bool Data genetarion code In [65]: # Data genetarion code import random import string baseball = pd.DataFrame({ team": ["tean Xd" % (x + 1) for x in range(s)] * 5, player": random. sample(list(string.ascii_lowercase), 25), batting avg": np.random.uniform(9.20, 2.490, 25), ? ) baseball out [65]: team player batting avg 0 tami 03Tt944 1 team2 = w 0.300678 2 teams ce 0.271453 3 teams = p_—0,801831 4 teams a (0.257927 5 teamt = g_——(0.384259 6 team2 = d—(0.279827 T teams s 0.200344 8 team4 1 0289042 9 teams k 0.388716 10 team 1 u 0.386087 14 team2 —-v_—(0.276580 12 team3 =z 0.981452 13 team4 = g——0.264230 14 teams 0.397186 15 team n 0.289416 16 team2 i 0.245684 17 teams sy 0.316917 18 team4 0.206810 19 teams =m 0.272203 20 team 1 h 0.328023 21 team2 0.352936 22 team 3 + 0.380134 23 team4 = x_—_—0,368002 24 teams 0.282423 In [66]: baseball.pivot_table(value: out [66]: team team1 batting avg", column: team2 team3team4 —teamS. ‘team", agefun batting avg 0.384259 0.952936 0.981452 0.368002 0.397186 In [67]: df df. out [67]: In [68]: df. out (68): = pd.bataFrane( -head() o.84anis 1.363807 0525568 0.351879 0.700257 query("2 oeaazta “1.963807 0.525568 0.351879 0.700257 0.927895 “1.526683 2.427308 0.208862 0.413750 b 1.528596 “1.321466 1.252908 0.315085 0.328759 6) » 1.528596 1.321486 1.252005 -0.315065 0.326759 0.516451 0.259058 osat293 0237485 0.770686 np.random.randn(1@), " np.random.randn(10)}) In [69]: df[df["a"] <= df["b"]] out [69]: a b © o4e214 1.528596 1 -4.369807 -1.321486 2 -0,525568 1,252386 3. -0,351879 -0.315085 4 -0,700257 0.328759 5 -0.927895 -0.516451 6 -1.526683.-0.259954 7 -2:127308 0.531293 8 -0.208562 0.237485 9 0.413750 0,70886 In [70]: df.locldF{"a ofl out (70): a b © 0848214 1.528596 1 -4.369807 -1.321486 2 -0.525568 1.252385 3. -0.351879 -0.315085 4 -0,700257 0.328759 5 -0.927895 -0.516451 6 -1.526683 -0.259954 7 -2127308 0.531293 8 -0.208562 0.237485 9 0.413750 0.70886 In [71]: df[d#["a"} >= dtl out(7a]: ab In [72]: out (72): In [73]: out [73]: In [74]: out [74]: df df. pd.DataFrame({"a" shead() a b 0.087788 -0.548498 0.150895 -1.303927 0.391174 0.383887 0.486376 0.660384 0.149571 0.048288 seval("a + b") -@.490711 1.454421 0.007287 8.174008 -@,101283 0.636180 8.540169, -@.429076 8.185766 -1.048530 floate4 + aft’ -0,490711 -1.454421 0.007287 8.174008 0.101283 0.636180 8.540169 -@.429076 2.185766 -1.048530 itype: floated np.random.randn(12),, np.random.randn(1@)}) In [75]: out [75]: In [76]: out [76]: df = pd.DataFrame({ 8.108587 5 » df.head() 0 47516759 1 35.505487 2 91.081404 3 26.488195, 4 105.767124 grouped y 33:530167 252.682232 170.3078 382.594130 107.686286 df. groupby (["nont! np-randon.uniform(1.0, 168.0, 120), np-randon.uniform(7.2, 334.0, 120), randon.uniform(1.7, 20.7, 128), 15, 6 7, 8] * 38, : np.random.randint (1, 4, 120) 2 month wi 14.136898 6 7.498431 7 5.038641 8 1.308504 5 » “week"]) grouped["x"].agg([np.mean, np.std]) month week 1 5 2 3 1 6 2 3 1 7 2 3 1 82 87.949769| soa72147 83,919926 62.640842 76,669235 74546488 83,765432 120.548907 90.085512 85.325932 50.293565 9.143017 std 52.340818 51.547185 38707186 45.863366 49,722219 45.781976 35.884936, 36.888050 49.915038, 60874941 37.083210 44.149256 In [77]: a = np.array(list(range(1, 24)) + [np.NAN]).reshape(2, 3, 4) a a Out(77]: array([[[1., 2., 3.5 40], [5., 6.5 7, 8], (9, 18.) 21,, 22:91, [[13., 14., 15., 16.], [17., 18., 19., 20.], [2a., 22., 23., nan]]]) In [78]: pd.bataFrame([tuple(list(x) + [val]) for x, val in np.ndenunerate(a)}) out (78): O12 3 1001 20 2002 30 3003 40 4010 50 011 60 012 70 7013 80 9021 100 wo22 10 023 120 12 10 0 130 1310 4 140 14 102 150 18 103 160 1 110 170 1141 180 e112 190 19113 200 2120 210 m1 2 1 20 2122 230 23°12 3 NaN In [79]: out [79]: In [80]: out(80): In [81]: out [81]: In [82]: out [82]: In [83]: out (83): a a = List(enumerate(1ist(range(1, 5)) + [np.NAN])) [(@, 1), (1, 2), (2, 3), (3, 4), (4, nan)] pd.DataFrame(a) o4 oo 14 20 22 30 33 40 44 NaN cheese = pd.DataFrame({ first 0 John 1 May pd.melt(cheese, id_vars first 0 John 1 May 2 John 3 Mary cheese. first John Mary dtype: lary"), Bo" ], 6.0], 150) 130 last height weight Doe 55 Bo 60 last variable Doe height Bo height Doe weight Bo weight set_index([" last Doe height weight Bo height weight Floates 150 value 55 60 1300 150.0 inst", 5. 130. 6. 150. "first", “last"]) “last"]).stack() # alternative In [84]: df = pd.DataFrame({ np.random.uniform(1.@, 168.0, 12), np.random.uniform(7.@, 334.0, 12), np.random.uniform(1.7, 20.7, 12), month": [5, 6, 7] * 4, week": [1, 2] * 6 » mdf = pd.melt(df, id_var: "month", "week"]) month" ], pd. pivot_table(mdf, values="value", inde ariable", "week"], column: out [84]: month 5 6 7 variable week 1 69,688604 58,926280 50,639441 2 67470350 117.676001 118.517232 1 160,009684 21,384183 177.609046 2 203.316298 197.839213 184.583499 1 10.060922 4.185807 8.656566 2 8.009302 11.10562111.506984 In [85]: df = pd.DataFrame({ “Animal”: [“Animal1","Animal2","Animal3","Animal2", FeedType": ["A", "8", "A", "A's "8", "B's "A"D, amount": (18, 7, 4 2, 5, 6, 2] ‘Animal2", "Animal: » df. pivot_table(values="Amount”, index="Animal", columns="FeedType", aggfunc: Out[85]: Feedtype AB Animal “Animal! 10.0 50° Animal2 2.0 13.0 Animals 6.0 NaN In [86]: df.groupby(["Animal", "FeedType"])["Anount"].sum() out[86]: Animal Feedtype Animala A 18 8 5 Animal2 A 2 8 13 Animal3 A 6 Name: Amount, dtype: intea In [87]: pd.cut(pd.Series([1, 2, 3, 4, 5, 61), 3) Out[87]: @ (@.995, 2.667] 1 (8.995, 2.667] 2 (2.667, 4.333] 3 (2.667, 4.333] 4 (4.333, 6.8] 5 (4.333, 6.0] dtype: category Categories (3, interval[floate4, right]): [(.995, 2.667] < (2.667, 4.333] < (4,333, 6.0) In [88]: pd.Series([1, 2, 3, 2, 2, 3]).astype("category") out (88): auNHe 5 dtype: category Categories (3, inte4): [1, 2, 3] In [89]: frame = pd.bataFrame({"coli": ["A", "B", np.NaN, "C", “D"], "col2": ["F", np.Né frame Out[89]: galt colz oA F 1B NaN 2 NN GC 3 0c oH 4 dD 1 In [90]: frame[frame["col2"].isna()] Out[92]: alt colz 1B NaN In [91]: frame[frame["col1" ].notna()] Out[92]: colt colz oA F 1B NaN 3 0c #H In [92] In [93]: out (93): In [94] out (94): In [95] out [95]: In [96] out[96]: df df2 pd.DataFrame({"key": ["A", "B", "C", "D"], "value": np.random.randn(4)}) pd.DataFrame({"key": ["B", "D", “D", “E"], "value": np.random.randn(4)}) pd.merge(dfi, df2, on="key") key valuex value_y 0B 0.335446 1.794026 1D 4.224740 1.418379 2 0 1.224740 0.425891 indexed_df2 = df2.set_index("key") pd.merge(df1, indexed_df2, left_or "key", right_inde: key valuex value_y 1B -0.335446 1.794026 3D 1.224740 1.418379 3D 1.224740 0.425891 pd.merge(dfi, df2, 0 key", how: key value_x value_y 0 A 0420288 NaN 1B 035446 1.794028 © -0.685751 NaN 3D 1224740 1.418379 1.228740 0.425891 pd.merge(dfl, df2, on="key", how="right") key value x value_y 0B 0.935446 1.794028 1D 1224740 1.418379 2 0 1.224740 0.425801 3 & NaN 0.826731 In [97]: pd.merge(d#1, df2, on="key", how="outer") Out[97]: key value_x value y © A 0429288 NaN 1B 0.335448 1.794028 2 0685751 NaN D 1.224740 1.418379 4D 1.224740 0.425891 E NaN 0.828731 In [98]: dfl = pd.DataFrame({"city": ["Chicago", "San Francisco", "New York City"], "rar os Angeles” af2 = pd.bataFrame({"city": ["Chicago", "Boston", J) "rank": [1, 4 pd.concat({dfi, af2]) out (98): city rank ° Chicago 1 1 SanFranciseo 2 2 NewYork city 3 ° Chicago 1 1 Boston 4 2 LosAngeles 5 In [99]: pd.concat([df1, df2]).drop_duplicates() out[99]: city rank ° Chicago 4 1 SanFrancisco 2 2 NewYork city 3 1 Boston 4 2 LosAngeles 5 In [100]: df = pd.DataFrane(. df [2, 4, 6]}) out[ 100) xy In [101]: Out [101] In [102]: out[ 102) In [103] Out [103] In [104]: out[104] In [105]: Out [105] firstlast = pd.DataFrame({"String": ["John Smith", "Jane Cook"]}) firstlast["First_Name"] = firstlast["String"].str.split(" ", expand=True)[0] firstlast["Last_Name"] = firstlast["string"].str.rsplit(" ", expand=True)[1] firstlast String FirstName LastName © John Smith John ‘Smith 11 Jane Cook Jane Cook firstlast = pd.DataFrame({"string": ["John Smith", "Jane Cook"]}) firstlast [" upper firstlast["string"].str.upper() firstlast [" lower firstlast["string"].str.lower() firstlast ["titl firstlast["string"].str.title() firstlast string upper lower tit © John Smith JOHN SMITH john smith John Smith 11 Jane Cook JANE COOK jane cook Jane Cook df1 = pd.DataFrame({"key" df np.random.randn(4)}) key value 0 A 2710160 1B -ossari2 2 © 0.268376 3D 1.138070 df2 = pd.DataFrame({"key’ 2 "E"], “value np.random.randn(4)}) af2 key value 0B -1961649 1D ogas774 2 0 0695178 3 © 0265280 inner_join = df1.merge(d#2, on=["key"], hos inner_join inner") key value x value_y 0B 0184712 1.961649 1D 4.436070 0.885771 2D 1.196070 o.695118 In [106]: left_join = df1.merge(df2, on=["key"], how="1eft") left_join Out[106]: key value_x value_y 0 A 2710180 NaN 1B 0.484712 -1.961649 © 0.268876 NaN 3D 1.436070 0.885771 © 1136070 0.695118 In [107]: right_join = df1.merge(df2, o right_join Out[1@7]: Key value_x value _y 0B 0.184712 -1.961649 4D 1.496070 0.885771 2 0 1.136070 0.695118 3 E NaN -0.265280 In [108]: outer_join = dfl.merge(d#2, 0 outer_join key"], how="outer") out[208]: key value x value_y 0 A 2.710180 NaN 1B 0.184712 -1,961649 2 0.268976 NaN D 1.196070 0.88571 4.136070 0.695118 SE NaN -0.265280 In [109]: df = pd.DataFrame({"AAA": [1] * 8, "BBB": list(range(®, 8))}) df out (109) ‘AAA BBB o 1 oO 1404 2 1 2 30403 44 4 5s 4 8 a ) 7 407 In [110]: series = list(range(1, 5)) series Out[11@}: [1, 2, 3, 4] In [111]: df.loc[2:5, "AAA"] = series at out[112]: aaa BBB °. 7 0 10404 2 41 2 3 203 4304 5s 4 5 6 1 6 In [112]: df = pd.DataFrame({ "class": ["A", student_count 2 “AS, “BN, SC", "D"], [42, 35, 42, 50, 47, 45], ‘all_pass": ["Yes", "Yes", "Yes", "No", "No", "Yes"] » df.drop_duplicates() Out [112] class student_count all_pass oo” 42 Yes 1 OA 35 Yes. 3B 60 No 4c 47 No sD 45 Yes In [113]: df.drop_duplicates(["class", "student_count"]) Out [113] class student_count all_pass oo” 42 Yes 1 OA 35 Yes. 3B 60 No 4c 47 No sD 45° Yes In [114]: new_row = pd.Dataframe([["E", 51, True]], column: pd.concat([dF, new_row]) class", “student_count", out [114] class student_count all_pass oo” a Yes 1A 35 Yes. 208 42 Yes 3 8B 50 No 4c 47 No 6 oD 45 Yes o S10 Tue In (115): out [115] In [116]: out (116) In [117]: out [117] In (118): out (118) df = pd.DataFrame({"x’ df df1 = pd.DataFrame({"key": df key value 1.402688 a 1B 0.545334 © 4.458278 D 0.697387 df2 = pd.DataFrame({"key": df key value 0.665418 0.008734 0.719310 0.507211 inner_join inner_join key valuex 0B 0.545334 1D 0.697387 2 0 0.697387 df1.merge(df2, on: value_y 0.865418 0.008734 0.719310 (1, 3, 5], "y key"], ho [2, 4, 6]}) ) "E"], "value’ inner") np.random.randn(4)}) np.random.randn(4)}) In [119]: Out [119] In [120]: Out [120] In [121] out(121) In [122]: out [122] In [123] out 123] left_join Jeft_join key value_x 0 A -1.402688 1B 0.545934 © 4.485278 3D 0.697387 D 0.697387 right_join = df1.merge(d#2, 0 right_join key valu 0B 0.545334 1D 0.697387 2 0 0.697387 3 £ NaN outer_join = dfl.merge(df2, 0 outer_join key value x 0 A -1.402688 1B 0.585334 1.485278 c 3 0 o.697387 © 0.697387 E NaN df1.merge(df2, on=["key"], hi value_y NaN 0.565418 NaN 0.008734 0.719310 value_y 9.665418 0.008734 0.719810 -0.507211 value_y NaN 0.665418 NaN 0.008734 0.719310 0.507211 left") key"], how="outer") outer_join["value_x"] + outer_join["value_y"] e Nan 1 -1,210752 2 NaN 3 8.706121 4 -9.021924 5 NaN dtype: floated outer_join["value_x"].sum() -2.008526169978958 In [124]: outer_join{outer_join["value_x”].isna()] Out[124]: key valu x value_y SE NaN -0507211 In [125]: outer_joinfouter_; -notna()] Out[125]: key value_x value_y 0 A 1402668 NaN 1B 0.545334 0.665418 2 © 1455278 NaN 3D 0.697387 0.008784 4D 0.697387 -0.719810 In [126]: outer_join.dropna() Out[126]: key value_x value_y 1B 0545334 -0.685418 3D 0.697387 0.008734 4D 0.697367 -0.719810 In [127]: outer_join.fillna(nethod="FFi11") out[227]: key value x value_y 0 A -1.402688, NaN 1B 0.545334 -0.665418 1.455278 0.665418 c 3D 0.697387 0.008734 D 0.697387 -0.719310 E 0.697387 -0.507211 In [128]: outer_join["value_x"].fillna(outer_join["value_» "].mean()) out[128]: @ -1.4e2688 1 -0.545334 2 -1.455278 3 0.697387 4 0.697387 5 -9.401705 N lame: value_x, dtype: floates In [129]: Out[129] In [130] Out [130] In [131] out (131) In [132]: out [132] = pd.Series([1, 3, 5, np.nan, 6, 8]) e 1 2 3 4 5 dtype: floatea dates = pd.date_range("20130101", periods=6) dates DatetimeIndex(['213-01-01', '2013-01-@2', '2013-@1-03', ‘2013-01-04, *2013-01-05', '2613-01-06'], dtype='datetime64[ns]', freq="D') df = pd.DataFrame(np.random.randn(5, 4), index=dates, colunns=list(“ABCO")) at A 8 c > 2013-01-01 0.591099 -1.749225 1.031762 0.972004 2013-01-02 0.437900 0.154981 0.621125 0.812640 2013-01-03 0.277972 0.613548. -1.847452 0.395107 2013-01-04 0.232215 1.631459 0.639726 0.872221 2013.01.95 0.427207 -0.571715 1.176223 0.509280 2013-01-06 0.876173 1.281415 0.015445. -2.132041 df2 = pd.DataFrame({ 1.9, pd. Timestamp("20130102"), pd.Series(1, index=List(range(4)), dtyp np.array([3] * 4, dtype="int32"), : pd.Categorical(["test", "train", "test", “teain"]), foo" float32"), » df A BocbD EF © 10 20130102 1.0 3 test foo 14.0 2013.01-02 1.0 3 train foo 2 10 2013-01-02 1.0 3 test foo 31.0 2013-01-02 1.0 3 train foo In [133]: Out [133] In [134]: out[134) In [135]: out [135] In [136]: Out [136] In (137): df.sort_values(by= out (137) df2. index Int6dIndex([@, 1, 2, df.to_numpy() array([[-0.59109898, -1.749225 , [ 0.43790005, -0.15498061, [-0.27797154, @.6135462 , [-@.23221497, 1.63145916, { 0.42720723, -@.57171521, [-0.87617273, df2.to_numpy() array([[1. [1. (1. (1. dF. sort_index(axi 1.28141542, @, Timestamp(' 2013-01-02 @, Timestamp( ‘2013-01-22 @, Timestamp( ‘2013-21-22 @, Timestamp( "2013-21-22 dtype=object) D c 4 ascendiny B 3], dtype="inte4') 1.03176168, @.62112534, -1.64745161, @.53972385, 1.176299, e.01s4aao5, 20:00:00"), 20:00:00"), 0:00:00"), 0:00:00"), alse) A 2013.01.01 2013-01-02 2013-01-03 2013-01-04 2013-01-05 2013-01-06 0.972908 0.812640 0.995107 0872221 0.509280 2.132941 A 1.031762 0621125 1.647452 0.539724 4.176223 0.015445 B 749225 0.154981 0.613546 1.631459 Oris 1.281415 c 0591098 0.437900 o2r1972 0232215 0.427207 0.876173 D 2013-01-01 2013-01-05 2013-01-02 2013-01-03 2013-01-06 2013-01-04 0.591099 0.427207 0.437900, 0.277972 876173 0.232215, 7.749225 OTIS 0.154981 0.613546 1.281418 1.631459 1.031762 4.176223 0.621125 11647452 0.015445 0.539728 0.972904 0.509280 0.812640 0.395107 2.132981 0.872221 @.97290379], @.81264041], @.39510719], ~0.87222052], @.50928032], -2.13294079]]) 1.e, 1.8, 1.8, 1.8, 3, 3, 3, 3, ‘test’, ‘train’, ‘test', ‘train’, “fo0'], “f00"], "£00"; “fo0']], In [138]: Out [138] In [139] out (139) In [140]: out (140) In [141] out (141) In [142] out [142] In [143]: out [143] oF 2013-01-01 2013-01-02 2013-01-03 dF{"20130162": 2013-01-02 2013-01-03 2013-01-04 A 0.591099) 0.437900 0.277972 A 0.437900, 0.277972 232215 df.1oc{dates[9)] -0.591099 -1.749225 8.972904 lam df-locl:, A 8 c 1.032762 o N B c D “1.749225 1.081762 0.972904 0.184981 0.821125 0.812640 0.613548 -1.647452 0.395107 20130104") B c D 0.184981 0.621125 0.812640 0.613546 -1.847452 0.995107 1.831459 0.539728 -0,872221 2013-01-01 00:00:00, dtype: floated 2013-01-01 2013-01-02 2013-01-03 2013-01-04 2013.01.05 2013-01-06 df. 1oc[ "20130102" 0591099 0.437900 o2rrer2 0232215 0427207 0.876173, A 7.749225 0.154981 0.613546 1.631459 OsTI715 1.281415 0130104" B 2013-01-02 2013-01-03 2013-01-04 0.437900, 0277972 0.232215 df. 1oc[ "20130102", A 0.437900 B -9.154981. Nam : 2013-21-02 00:1 0.184981 0.613546 1.891459 ta, "8°11 08, dtype: floated In [144]: df.at[dates[@], "A"] out(144): -@,5910989777117921 In [145]: df. iloc[3] out[145]: A -@.232215 B 1.631459, © 0.539724 D -0.872221 Name: 2013-@1-04 00:00:00, dtype: floate4 In [146]: df.dloc[3:5, 0:2] out [146] a 5 2013-01-04 0.232215 1.691459 2013-01-05 0.427207 -0.571715 In [147]: dF-iloc{{1, 2, 4], (0, 21] out[247] A c 2013-01-02 0.437900 0.621125 2013-01-03 0.277972 -1,847452 2013-01-05 0.427207 1.176223, In [148]: df.iloc[1:3, out (248) A 8 c D 2019-01-02 0.437900 0.154081 0.621125 0.812540 2013-01-03 277972 0.813646 -1.647452 0.395107 In [149]: df.iloc{:, 1:3] out(249] 8 c aowsover 1.749205 1.081762 aois-o102 0.154961 0.621125 2019.01.09 0.613548. -1.947452 aois-0104 1.631459 0.599724 2190105 0.571715 1.176228 2013-01-06 1.281415 0.015445, In [150]: df.iloc[1, 1] out[150]: -@,15498061171987595 In [151]: out [151] In [152]: Out [152] In [153]: out (153) In [154]: out [154] df[dF["A"] > 0] A 8 cp 2013-01-02 0497900 -0.154081 0621125 0.81254 2013.01.05 0427207 057x715 1.176223 0.50028 aftaf > 0] A 8 c > 2013-01-01 NaN NaN 1.081762 0.972008 2013.01.02 0497900 NaN 0.621125 0.812640 2019-01-03 NaN 0519548 NaN 0.305107 2013.01.04 NaN 1.691459 0.530724 NaN 2019-01-05 0427207 NaN 1.176229 0.500280 2013-01-08 NaN 1.261415 0.015445 NaN a2 = dF.copy() af2["E"] = ["one", "one", "two" are A 8 c . 203.0101 -0.591009 -1749225 1.081762 0.972004 2013-01-02 0.437900 0.154081 0.621125 0.812640 2013.01.03 0.277972 0.519646 1.687452 0.395107 2013-01-04 0.232215 1.631459 0.830724 0.872221 2019-01-05 0.427207 -0571715 1.176203 0.509280 2013.01.08 0676173 1281415 O.015445 2.192061 af2[df2["E"]-Asin(["two", “Four"])] A 8 c > 2019-01-09 0.277972 0619546 1.647452 0.305107 2013-01-05 0427207 0.571715 1.176228 0.500280 “ehree", “four, “three"] E two three four three four In [155]: s1 = pd.Series([1, 2, 3, 4, 5, 6], index-pd.date_range("20130102", period: si out[{155]: 2013-e1-02 2013-01-03 2013-01-04 2013-01-05 2013-01-06 2013-01-07 Freq: D, dtype: inte4 ausune In [156]: df.at[dates[@], "A"] In [157]: df.dat[@, 1] = 2 In [158]: df.loc df 2 "D = np.array([5] * len(df)) €:\Users\pytho\AppData\Local \Temp\ipykernel_8580\568071402.py:1: Deprecationl arning: In a future version, “df.iloc[:, i] = newvals’ will attempt to set th € values inplace instead of always setting a new array. To retain the old beh avior, use either “df[df.columns[i]] = newvals’ or, if columns are non-uniqu e, “df-isetitem(i, newvals) df.loc[:, “D"] = np.array([5] * len(df)) out [158] a 5 ep 2013-01-01 0.000000 0.000000 1.031782 5 2013-01-02 0.437900 0.154981 0.621125. 5 2013-0103 0.277972 0.813546 1.647452 5 2013-01-06 0.232215 1.631459 0.539724 5 2013.01.05 0.427207 -OS71715 1.176223 5 2013-01-06 0.876173 1.281415 0.015445 5 In [159]: df2 = df.copy() df2[df2 > @] = -df2 df2, out [159] A B cop 2013-01-01 0.000000 0.000000 1.031762 2013-01-02 0.437900 -0.154981 -0.621125 2013-01-03 0.277972 -0.613546 -1.647452 2013-01-04 0.232215 -1.631459 0.530724 -5 2013-01-05 0.427207 -0.571715 -1.176223 2013-01-06 0.876173 -1.281415 0.015445 In [160]: Out [168] In [161]: out [161] In [162]: out [162] In [163] out [163] In (164): out (164) df1 = df.reindex(index=dates[0:4], column: df1.loc[dates[@] : dates[1], "E"] = 1 df A B cp c 2013-01-01 0.000000 0.000000 1.031762 5 1.0 2013-01-02 0.437900 -0.154981 0.621125 5 10 2013-01-03 0.277972 0.619546. -1.647452 5 NaN 2013-01-04 0.232215 1.631459 0.539724 5 NaN d1.dropna(how="any") A B cDeE 2013-01-01 0.0000 0.000000 1.031762 5 1.0 2013-01-02 0.4379 -0.154981 0.621125 5 1.0 df1.fillna(valu A B cD e 2013-01-01 0.000000 0.000000 1.031762 5 10 2013-01-02 0.437900 -0.154981 0.621125 5 10 2013-01-03 0.277972 0.813548 -1.647452 5 50 2013-01-04 0.232215 1.631459 0.639724 5 50 pd.isna(d#1) A Bc De 2013-01-01 False False False False False 2013-01-02 False False False False False 2013-01-03 False False False False True 2013-01-04 False False False False True df.mean() 0.086875 0.466621, A B © 0.289471 D 5.000000 dtype: floatea ist(df.columns) + ["E"]) In [165]: df.mean(1) out[165]: 2013-@1-01 1.507940 2013-01-02 1.476011 2013-01-03 @.922031 2013-01-04 1.734742 2013-01-05 1.507929 2013-01-06 1.355172 Freq: D, dtype: floatea In [166]: s = pd.Series([1, 3, 5, np.nan, 6, 8], index=dates).shift(2) s s out(166): 2013-e1-01 NaN 2013-01-02 NaN 2013-01-03 1.0 2013-01-04 3.0 2013-01-05 5.0 2013-01-06 NaN Freq: D, dtype: floatea In [167]: df.sub(s, out( 167) A B cop 2013-01-01 NaN NaN NaN NaN 2013-01-02 NaN NaN NaN NaN 2013-01-03 1.277972 -0.386454 2.647452 4.0 2013-01-04 3.232215 -1.368541 2.460276 2.0 2013.01.05 4.572793 5.571715 3.823777 0.0 2013-01-06 NaN Nan NaN NaN In [168]: df.apply(np.cumsum) out [268] A 8 cp 2013-09-01 0.000000 0.000000 7.031762 5 2013-01-02 0.437900 0.154981 1.652887 10 2013-01-03 0.159929 0.458566 0.005435 15 2013-01-04 072286 2.090025 0.545159 20 2013-01-05 0.354921 1.518310 1.721382 25 2013-01-06 0.621252 2.709725 1.736827 30 In [169] out (169) In[ ]: df.apply(lambda x: x.max() - x.min()) A 1.314073 B 2.203174 © 2.823675 D 2.000000 dtype: floatea Syed Afroz Ali

You might also like