Dataiku AI 2021
Dataiku AI 2021
Dataiku AI 2021
EBOOK
Out of AI in 2021
www.dataiku.com
While times of economic disruption and change impact many aspects of how organizations
operate, they certainly have not diminished the impact AI is having (and will continue to have
until it becomes completely ubiquitous). To help organizations continue to deftly pivot and
keep pace in an ever-evolving world, we compiled qualitative commentary from a diverse
range of experts — both from technical and non-technical roles — on key learnings from 2020,
opportunities for 2021, barriers preventing AI adoption, notable use cases, and more.
Kristof has been with Amazon for five years and leads the AWS AI/ML partner ecosystem. Coming
from a small Hungarian city, Pécs, he first joined Roland Berger Strategy Consultants, then he
went to the U.S. to obtain his MBA from Wharton. After that, he joined Amazon as a product
manager and launched the company’s first global-selling recommendation engine. At Amazon,
Kristof completed 16 different machine learning research projects as part of the Machine
Learning University and teaches ML4PMs and recommender systems.
SHAMS KHAN
Data Scientist, Capgemini
Shams has an academic background in physics and statistical learning. Along with the more
traditional methods, he specializes in geospatial data processing, deep learning, and computer
vision. He has contributed to a number of data science use cases across a range of industries
including fashion, retail, energy and infrastructure. His work is focused around developing rapid
proof of concepts and prototypes for unique problems using advanced data science methods,
helping a business visualize the benefit and structure of a complete solution.
TIAN ZHANG
Managing Consultant, Capgemini
Tian has a background in systems engineering and enterprise architecture and over the
last decade has formed the opinion that primary causes of failure in ambitious programs
are not poor solutioning or implementation, but that rigorous engineering is necessary but
ultimately insufficient to drive change. He now advises organizations on the intersection of
corporate strategy (which markets do we compete in?), business strategy (how do we achieve
and sustain an advantage?) and functional strategy (how can analytics help?).
Mitra Azizirad is the Corporate Vice President for Microsoft AI and Innovation where she is
responsible for driving the perception of Microsoft’s AI and future technologies, for defining and
accelerating the productization of Microsoft’s portfolio of innovation, and for leading marketing
and storytelling in support of AI and Innovation thought leadership. Mitra leads Microsoft’s
vision to help individuals and organizations put AI into action to innovate, transform, and deliver
lasting positive impact on people, industry, and society at large.
A 28-year veteran of Microsoft, she has held key leadership positions at Microsoft across a
wide array of technical, marketing and sales functions, both in the field and corporate. Prior
to Microsoft, Mitra held technical leadership positions at the World Bank, National Association
of Securities Dealers (NASD/NASDAQ), and International Telecommunications and Satellite
Organization (Intelsat).
TARIK DWIEK
Director of Technology Alliances, Snowflake
Tarik has spent over 20 years in the high technology industry spanning software development,
sales, and business development. Prior to Snowflake, Tarik managed strategic partnerships at
AWS and EMC and also spent more than 12 years at EMC selling to enterprise accounts and
leading the technology strategy for EMC’s global accounts.
NATHAN MANNHEIMER
Senior Product Manager, Advanced Analytics, Tableau
Léo Dreyfus-Schmidt is a mathematician and holds a Ph.D. in pure mathematics from University
of Oxford and University of Paris VII. After five years focusing on homological algebra and
representation theory in Paris, Oxford, and the University of California - Los Angeles, he joined
Dataiku where he has been developing solutions for predictive maintenance, personalized
ranking systems, price elasticity, and natural language applications. Léo is a bicycle and food
aficionado (separately), so in his spare time, you’ll find him either zipping around Paris rain or
shine or enjoying a great meal somewhere.
VINCENT HOUDEBINE
Senior Data Scientist, Dataiku
Vincent is a senior data scientist at Dataiku in New York. He supports organizations in building
valuable data science projects and deploying them into production. In the past few years, he
has been dealing with a variety of operational data science and machine learning problems,
from fraud detection to churn prevention and product recommendations, as well as research
topics like compression of neural networks. Prior to joining Dataiku, Vincent was the CEO of a
computer vision startup.
TRIVENI GANDHI
Data Scientist, Dataiku
Triveni is a data scientist at Dataiku. She works with clients to determine best practices around
data science and their specific projects. Previously, she worked as a data analyst with a large
non-profit dedicated to improving education outcomes in New York City. Triveni holds a Ph.D.
in Political Science from Cornell University.
Claire leads the business value practice at Dataiku. She works with clients to determine what
business value they create thanks to data science and Dataiku in particular. She also shares best
practices to help accelerate and scale the business impact of data science at the enterprise level.
Previously, Claire spent most of her 15+ years career in management consulting and financial
services (PayPal, Mastercard). She is now based in New York City.
WALTER ALDANA
VP Business Development, Dataiku
RAJAR SINHA
Global Head of Partnerships and Alliances, Data Analytics & AI, Wipro Limited
Rajat is a technology executive with a history of success in data analytics & AI with a focus on
alliances, sales, business development, and customer engagement. He is a critical thinker
with demonstrated success in applying technology to create customer-centric, value-added
marketing and operations business solutions. He is a leader and mentor who bridges cultural
and geographic barriers in building teams that achieve results.
Brandeis Marshall helps rising and experienced working professionals interpret the racial,
gender, and socioeconomic impact of data in technology. Twice named one of 200 Black women
in tech to follow on Twitter, Brandeis is a skilled explainer who has a knack of making difficult
computing and data concepts easier to understand, regardless of a person's educational
background.
A thought leader in broadening participation in data science, Brandeis often discusses inclusivity
and equity for organizations like DataCamp, Dataiku, Experian, NeurIPS, and Truist. She has
appeared in Medium, OneZero, and The Moguldom Nation. Brandeis shares her approaches to
effectively amplify social contexts within data and its implications for all communities.
Brandeis is a teacher and advisor at heart. She holds a Ph.D. and Master of Science in Computer
Science from Rensselaer Polytechnic Institute and a Bachelor of Science in Computer Science
from University of Rochester. Dr. Marshall brings nearly 15 years experience in higher education.
She was the first Black woman to receive tenure at Purdue University College of Technology.
Still in academia, Brandeis regularly teaches software development, data, and analytics topics.
VIVEK KARMAKAR
Consulting Partner, Data Science and AI, Wipro Limited
Vivek is a data science practitioner with 20 years of experience in solution consulting across
retail, consumer goods, telecom, and automotive industries. He has worked with organizations
like PwC Consulting, IBM, and Dunnhumby in both India and the U.S. He is currently engaged
with Wipro and is responsible for solution design for analytics and AI solution areas. Vivek spent
most of his professional career on machine learning solutions around personalization, loyalty
marketing, demand and inventory management, merchandising, and pricing and promotions
areas. He completed his master’s degree in statistics from Indian Statistical Institute. Vivek is
based out of Kolkata, India.
David Ryan Polgar is a pioneering tech ethicist who paved the way for the hotly-debated issues
around regulating speech on social media, AI ethics, unintended consequences, digital wellbeing,
and what it means to be human in the digital age. He has appeared on CBS This Morning, TODAY
show, BBC World News, Fast Company, SiriusXM, Associated Press, LA Times, USA Today, and many
others. An international speaker with rare insight into building a better future with technology,
David has been on stage at Harvard Business School, Princeton University, The School of the New
York Times, TechChill (Latvia), The Next Web (Netherlands), FutureNow (Slovakia), and the Future
Health Summit (Ireland).
David is the founder of All Tech Is Human, an organization aimed at accelerating tech
consideration, increasing methods of participation and onboarding people into the Responsible
Tech ecosystem. In September 2020, the organization released its "Guide to Responsible Tech:
How to Get Involved & Build a Better Tech Future" which is aimed at inspiring the next generation
of responsible technologists and changemakers.
David is a frequent consultant and tech commentator, advocating for greater collaboration
between industry and civil society, more interdisciplinary approaches to solving thorny tech/
society issues, and better aligning technology with our individual and societal interests. In March,
David became a founding member of TikTok’s Content Advisory Council (US), providing expertise
around the delicate and difficult challenges facing social media platforms to expand expression
while limiting harm. He is also an advisory board member for the Technology and Adolescent
Mental Wellness (TAM) program and co-host of the podcast Funny as Tech, a show about our
messy relationship with technology.
As the AI market continues to mature, not only will this notion of creating AI systems to
enhance (not replace) humans continue to reign true, but it will continue to pave the way for ML
techniques and technologies and enable technical stakeholders to push the envelope when it
comes to their day-to-day functions. These experts are looking forward to getting their hands
dirty when it comes to things like generative language models, object and depth detection, and
untapping new ways to use models to improve decision making.
“Causal inference! Because machine learning is only concerned with predictions and not with
actions, this will get AI closer to real decision making.”
- Léo Dreyfus-Schmidt, Research Director, Dataiku
“Generative language models like GPT-3 are opening a new range of possibilities for
automatic text generation. For example, GPT-3 was used to generate HTML code based on
plain language specifications.”
- Vincent Houdebine, Senior Data Scientist, Dataiku
“I look forward to the development of federated learning systems as they offer an opportunity
to train models on sensitive data without exposing the privacy of the user.”
- Triveni Gandhi, Data Scientist, Dataiku
“I’m most curious about AI-driven biometrics and the implications of it for Black people. AI-driven
biometrics techniques can learn quite a bit from the lessons of facial recognition.”
- Brandeis Marshall, Chief Executive Officer, DataedX
1
Gartner, Leverage Augmented Intelligence to Win With AI, June 2019
“Model fitting and AutoML are largely solved problems in the machine learning for analytics
space. As a result, the technologies I am most excited to see mature in 2021 are tools that support
the expansion of the base of analysts that can effectively fit and utilize ML models in analysis
and decision making processes. These tools will need to place modeling workflows in business
contexts to democratize model building and free up core data science teams to focus on deeper,
high impact problems. A core part of this process will involve deeper integrations between model
building tools and visual analytics software, like Tableau, that are essential to business decisions
today.”
- Nathan Mannheimer, Senior Product Manager, Advanced Analytics, Tableau
As companies aim to recover and understand their new market dynamics, it remains critical to
implement AI systems that are persistent and resilient during times of economic flux. Below,
you’ll see that stakeholders were captivated by GPT-3, alternative data sources for enriched
insights during times of uncertainty, and making headway with unlabeled data.
“I’d have to say it’s a toss up between GPT-3 and PULSE (Duke University). The debates about
the value of these applications are very polarizing given the real-life value turns out to be
minimal and perpetuates discrimination. It will be interesting to witness how the discussions
will unfold. Validation of either tech will be an uphill battle.”
- Brandeis Marshall, Chief Executive Officer, DataedX
“All the models that are able to detect deep fakes or fight adversarial examples. As AI models
get better and better at imitating humans, it is necessary to have models that would prevent
them from tricking us.”
“Probably not 2020-specific, but the impressive results of semi-supervised learning matching
supervised learning benchmarks opens interesting perspectives for learning with not a lot of
labeled data.” -
Léo Dreyfus-Schmidt, Research Director, Dataiku
“The most interesting ML application I saw this year was the integration of dynamic
simulation models into interactive dashboards. These tools allowed business users to ask
questions and explore new scenarios in real-time, greatly expanding the degree to which
people outside of data science teams can explore the world.”
- Nathan Mannheimer, Senior Product Manager, Advanced Analytics, Tableau
Given that 87% of data science projects never make it into production2, it seems fitting that two
experts cited the challenge of operationalizing models to garner real-world value. Further, it’s no
surprise to hear about model drift monitoring, particularly as MLOps practices continue to gain
traction across the enterprise. Lastly, given 2020’s global reckoning of racial inequity, we expect to
continue to see data practitioners aim to debunk bias, unfairness, and racism in AI systems.
“Model drift monitoring was another interesting application of machine learning. It's also a good
sign of the industry gaining in maturity as this shows more and more ML models are deployed in
production.”
- Léo Dreyfus-Schmidt, Research Director, Dataiku
“A project to match customer support tickets with appropriate technical documentation. The
project is not the hardest from a pure data science or machine learning point of view, but the
deployment in production is the challenging part as the model needs to be able to explore a
massive knowledge database in near real time.
What is interesting is that it has a direct impact on the business and creates a lot of value for
support engineers, not by replacing them but by making them more efficient. In general, this
project showcases the emerging need for data scientists to not only be able to train models
but also deploy and integrate them in a broader system and business process.”
- Vincent Houdebine, Senior Data Scientist, Dataiku
2
https://venturebeat.com/2019/07/19/why-do-87-of-data-science-projects-never-make-it-into-production/
“No use cases that I worked on but rather that worked me — chatbots. With a pandemic, our
world changed suddenly. As folks were not answering phones and emails, I relied on chatbots
to handle some many activities with regard to travel, banking, etc. Learning how to ask
questions took a little practice and once I understood, it was off to the races.”
- Brandeis Marshall, Chief Executive Officer, DataedX
“I was involved in a Capgemini global data science competition where the aim was to build
a computer vision model to identify sperm whales. The model would be used to speed up
research and support conservation efforts, so it was great to be involved in something
tangible with real-world impacts. Also, I thoroughly enjoyed the research into state-of-the-
art deep learning algorithms such as siamese/triplet networks that allowed us to tackle a
problem of that nature.”
- Shams Khan, Data Scientist, Capgemini
“The most impactful AI/ML problems I worked on this year related to bridging the last-mile
problem of operationalizing models into business processes. These cases were critical to the
success of the overall modeling projects because, although they did not involve the specifics
of the models in question, they related to how to technically and organizationally integrate
models into business thinking and decisions. Without this critical last step, the value and effort
spent analyzing and modeling the data would be lost.”
- Nathan Mannheimer, Senior Product Manager, Advanced Analytics, Tableau
“The value of external data. A global insurance company, predominantly into personal and
commercial insurance, was making losses due to not-so-optimal pricing decisions. They
have started incorporating OpenStreetMap features like distance from fire station, railways
etc. into their home insurance risk assessment algorithms resulting in improved pricing and
significant improvement in loss ratios.”
- Vivek Karmakar, Consulting Partner, Data Science and AI, Wipro Limited
Data science has the capacity to generate long-lasting impacts and it’s important to recall that
it’s not only data executives and practitioners that help drive those impacts on a daily basis
— eventually, those on the periphery will be leveraging data in an organic way. The common
thread among the answers below is this notion that data can be used to facilitate societal
progress, spark innovation, and give more people a seat at the table.
“What excites me most about data science in 2021 is that we are now recognizing the
significance that the field has on us as individuals and society at large. The attention is
finally here. We are beginning to grasp that data science as a field needs to be more diverse,
inclusive, and multidisciplinary. Data science touches so many aspects of our modern life, so
it behooves us as a society to be more thoughtful about our technological development and
deployment. Looking at 2021, I am excited about all of the new voices that are coming to the
forefront and look forward to the ways we can be more intentional about data science.”
- David Ryan Polgar, Tech Ethicist and Founder of All Tech Is Human
“I am excited to see the new wave of AI applications that get developed next year. With more
data, more users, and more powerful machine learning and AI commercial tools, customers
will be in a very unique position to come up with even more powerful data models to better
predict what’s most important to them with crisper levels of precision. That’s going to be
pretty exciting to see!”
- Walter Aldana, VP Business Development, Dataiku
“I am most excited about connecting the dots in recent technology achievements. Powerful
pre-trained models, computer vision at the edge, MLOps, 5G, spatial computing, and low-code
environments – just to name a few. Making machine learning easier and embedded with its
full power into low-code environments is a particular area of interest for 2021.
I am convinced and excited that we will innovate with AWS partners along making these
achievements accessible to non data scientists, too. Developing together with leading
partners such as Dataiku, I am excited to build few-click solutions with drag, drop, tune,
deploy, and monitor experiences for use cases ranging from as simple as making forecasting
more accurate to as complex as empowering contact centers, factories, or hospitals with a
suite of ML capabilities.” - Kristof Schum, Global Segment Leader of Machine Learning, AWS
“Macro-economic conditions have changed in 2020. Business models have gone through
massive disruption. All sectors of the economy have been affected. While 2020 provided
an initial jolt to the economy and our way of living/working globally, these new market
conditions will give rise to an untapped opportunity for companies to adopt. The opportunity:
Training their analysts to become citizen data scientists.
Imagine the possibilities and outcomes of empowering grassroot-level analysts with the right
toolkits and frameworks to take their Statistics 101 skills and bring it to use to drive shareholder
value for the enterprise in a sustainable, governed and collaborative manner. The increasing
adoption of predictive analytics and machine learning platforms in 2021 will be a catalyst to
drive enterprise AI adoption in 2021.” - Rajat Sinha, Global Head of Alliances and Partnerships,
Data Analytics & AI, Wipro Limited
The insights below reinforce that sometimes the simplest oversights cause residual delays in
adoption, such as never establishing or straying from the business objective, failing to evaluate
what will really make an impact before diving in headfirst, and not breaking down data silos to
streamline the data-to-insights process.
“AI is not just a tech play, it requires profound transformation from an organization,
particularly from a people and process perspective. Companies need to break the existing
silos between analytics, business, and IT teams and align these key stakeholders around
a common AI vision and roadmap. Many forget that AI’s main goal is solving business
problems.”
- Claire Gubian, Director of Customer Value, Dataiku
“Time and time again, customers are struggling with data management challenges that
are caused by silos of data that have been built up through the last few decades (i.e. if it's a
big enterprise that has made multiple acquisitions and built a bunch of different systems). We
see that they're spending up to 80% of the time trying to find and integrate the data instead of
extracting the gems of value hidden in this data. This is slowing down or preventing them from
achieving that disruptive transformation so that they can innovate and gain market share. To me,
these are the challenges that Snowflake and Dataiku were built to solve.”
- Tarik Dwiek, Director of Technology Alliances, Snowflake
All three barriers can be overcome by having the right strategy across people, processes,
technology, and data.”
- Rajat Sinha, Global Head of Alliances and Partnerships, Data Analytics & AI, Wipro Limited
While 2020 was certainly rocky, it forced organizations to take a long, hard look at their
current data and analytics strategy (including any models in place) to make sure they’re
resilient during moments of disruption. Robust AI implementation now and moving forward
will improve organizations’ ability to adapt to the world around them.
“In 2020, there has been an increased focus to help organizations make sense of the
disruptions, rather than optimization and fine-tuning of business operations. This meant
establishing blended business and technical teams working to a weekly cadence. Hopefully,
this closeness will continue to develop going forward in 2021 as organizations see analytics
not as a reactive reporting tool, but a forward-looking one that can facilitate experimentation
and exploration; to better understand opportunities and organizational risk.”
- Tian Zhang, Managing Consultant, Capgemini
Another trend is that customers want secure and governed access to data. Now that the cloud
is opening up the ability to manage data at scale, these customers see both the opportunity
and the critical need to enable data governance at scale. And, of course, leveraging AI and
machine learning in the cloud. If organizations can achieve those cost and scale optimizations
for capturing and managing all of their data, they can start to realize the full potential of AI and
machine learning at enterprise scale.”
- Tarik Dwiek, Director of Technology Alliances, Snowflake
“Year 2020 saw the adoption and acceptance of Enterprise AI into the mainstream. Be it with
the growth of:
• Natural language processing in traditional industries to drive better customer service (an
increasing use of chatbots – eg: the chatbot “Erica” at Bank of America.)
• The autonomous vehicle becoming more of a reality (Eg: Waymo moves from an
autonomous ride hailing service to expand into freight hauling.)
• Hyper-personalization and richer recommendation engines across platforms online and in
our home devices (Google Nest, Alexa, Apple Watch)
The ability to have machines change our quality of life has just started and will continue to
accelerate over the next few years. It is important to make sure Enterprise AI is implemented
with ethical practices in mind.”
- Rajat Sinha, Global Head of Alliances and Partnerships, Data Analytics & AI, Wipro Limited
Regardless of where your organization is in its digital transformation and journey to Enterprise
AI, we hope that these learnings will be useful to you and your teams in building responsible
and explainable AI solutions. As you work to continue to solve complex business problems,
maintain a competitive edge in the digital age, and deliver valuable insights that will permeate
throughout your organization, you can (and should) aim to:
• Democratize the use of data, putting it in the hands of many, not the elite few
• Infuse agility and elasticity so you can easily monitor and adjust models as needed in times
of economic volatility
• Leverage collaborative tools that are responsible, governable, and free of unintended bias
3
https://www.idc.com/getdoc.jsp?containerId=prUS46794720
03 PM
ACTIONS
2016-01-22 / 2016-01-23
06 PM 09 PM Fri 11 03 AM 06 AM 09 AM 12 PM
TIME
03 PM
RUN
06 PM
1.4
1k
Marketing reports 21:35
800
Size
706
600 21/09 946
Check flow warning 21:34
400
200
Pipedrive reports 21:32
Netezza
Teradata Train MLlib_Prediction
Deploy to production
2
Cassandra
400+ 40,000+
CUSTOMERS ACTIVE USERS*
*data scientists, analysts, engineers, & more
Dataiku is one of the world’s leading AI and machine learning platforms, supporting
agility in organizations’ data efforts via collaborative, elastic, and responsible AI, all
at enterprise scale. Hundreds of companies use Dataiku to underpin their essential
business operations and ensure they stay relevant in a changing world.
EBOOK
www.dataiku.com