SlideShare a Scribd company logo
Saturday	Morning		
Keynote	
Wes	McKinney		
@wesmckinn	
PyCon	APAC	2016	(Seoul)
Me	
DataPad	
Apache	
Arrow	
Feather	
ibis
In	process:	
Python	for	Data	Analysis:	2nd	Edi:on	
Coming	2017	
	
(in	English	J)
Q:	What	brings	
you	here?
Our	shared	
values
Pride	in	soMware	
craMsmanship
My	story	
•  Accidental	soMware	developer	
•  2007:	My	first	job	(financial	research	analyst)	
•  I	started	wriPng	Python	libraries	to	do	my	own	
work	beQer	
•  Soon	I	was	helping	my	colleagues	work	beQer,	
too
Tools
Tools
Empathy	
the	feeling	that	you	understand	
and	share	another	person's	
experiences	and	emoPons	:	the	
ability	to	share	someone	else's	
feelings	
Source:	Merriam-Webster's	Learner's	DicPonary
Open	source	is	
wonderful…
Open	source	is	
wonderful…but	it	can	
also	be	frustraPng
Sustainable	open	source	
•  How	to	keep	contributors	from	drowning	/	
burning	out?	
•  How	to	fund	the	work?	
•  How	to	protect	and	serve	the	community?
The	Grind
“The	grind	is	an	endless	
stream	of	bug	reports,	
requests,	demands,	
quesPons,	and	
occasional	inquisiPons.”	
	 DHH,	Creator	of	Ruby	on	Rails
pandas,	the	open	source	project	
•  Parts	of	code	date	back	to	April	2008	
•  Over	600	unique	contributors	on	GitHub	
	
•  AcPve	project	maintainers	range	from	4-7	
people	
•  >	6900	Closed	Issues	
•  >	5100	Pull	Requests
pandas	at	end	of	2012
April	7,	2014
"Some	might	argue	that	
[Heartbleed]	is	the	worst	
vulnerability	found	(at	least	in	
terms	of	its	potenPal	impact)	
since	commercial	trac	began	to	
flow	on	the	Internet."	
Joseph	Steinberg,	Forbes	cybersecurity	columnist
“	There	should	be	at	least…[6]	full	Pme	
OpenSSL	team	members,	not	just	one,	able	
to	concentrate	…	without	having	to	hustle	
commercial	work.	If	you’re	a	…	in	a	posiPon	
to	do	something	about	it,	give	it	some	
thought.	Please.	I’m	gemng	old	and	weary	
and	I’d	like	to	rePre	someday.”	
	
Steve	Marquess,	OpenSSL	team
By	Nadia	Eghbal,	supported	by	
the	Ford	FoundaPon	
For	more	on	this
“The	Cathedral		
and	the	Bazaar”
Python’s	normalizaPon	in	industry	
•  Python	has	become	a	leading	language	
instead	of	something	“experimental”	or	
“risky”	
•  Many	businesses	founded	on	the	growth	of	
the	Python	user	base	
•  See	Paul	Graham’s	2004	essay	“The	Python	
Paradox”	—	how	things	have	changed!
Governance	
“the	processes	of	interacPon	and	
decision-making	among	the	actors	
involved	in	a	collecPve	problem…”	
M.	HuMy	(via	Wikipedia)
Openness	and	
Transparency
Consensus
Some	example	governance	documents	
•  NumPy	(see	the	docs)	
•  IPython	/	Jupyter	governance	
– github.com/jupyter/governance	
•  pandas	
– github.com/pydata/pandas-governance	
– Modeled	aMer	Jupyter	governance
hQp://numfocus.org	
hQp://apache.org
PyCon APAC 2016 Keynote
conda-forge	
•  	Community-curated	conda	package	channel	
(hosted	on	anaconda.org)	
•  	Reproducible	build	infrastructure	(Docker	+	
Circle	CI	+	Travis	CI	+	Appveyor)	
•  	Automated	GitHub	helper	tools	
conda config --add channels conda-forge
What	is	next	for	pandas?	
•  pandas	1.0	
– A	stable,	maintenance-only	release	
•  Beginning	“pandas	2.0”	
– Planning	significant	refactoring	on	the	internals	of	
Series,	DataFrame
Why	pandas	2.0?	
•  Some	changes	difficult/impossible	to	do	in	an	
incremental	way	
•  pandas’s	relaPonship	with	the	ecosystem	has	
evolved	over	the	last	5	years	
	
•  Make	pandas	
– Faster	and	use	less	memory	
– Fix	long-standing	limitaPons	/	inconsistencies	
– Easier	interoperability	/	extensibility
Apache	
Arrow	
hQp://arrow.apache.org
High	Performance	Sharing	&	Interchange	
Today With Arrow
•  Each system has its own
internal memory format
•  70-80% CPU wasted on
serialization and
deserialization
•  Similar functionality
implemented in multiple
projects
•  All systems utilize the same
memory format
•  No overhead for cross-
system communication
•  Projects can share
functionality (eg, Parquet-
to-Arrow reader)
Feather	File	Format	for	Python	and	R	
• Problem:	fast,	language-
agnosPc	binary	data	
frame	le	format	
• By	Wes	McKinney	
(Python)	and	Hadley	
Wickham	(R)	
• Read	speeds	close	to	
disk	IO	performance	
• Leverages	Apache	Arrow
Thank	you	
	
@wesmckinn	
hQp://wesmckinney.com	
	
pandas	sprint	on	Monday!

More Related Content

PyCon APAC 2016 Keynote