Data Science Unit 6

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 8

What is Data Visualization?

Data Visualization is basically putting the analyzed data in the form of visuals
i.e - graphs, images. These visualizations make it easy for humans to understand
the analyzed trends through visuals.

Data Visualization is very important when it comes to analyzing big datasets. When
data scientists analyze complex datasets they also need to understand the insights
collected. Data Visualization will make it easier for them to understand through
graphs and charts.

Best Data Visualization Tools Data Scientists Need To Use

Nowadays, to hire an Android developer or iOS developer depends upon the kind of
tools and techniques that they use to an extent. As for businesses around the
world, using these tools can help gain business insights and stay ahead in the
race. The majority of top iOS and Android mobile app development companies are
using these tools to analyze the data sets extracted from mobile apps to help the
business grow and maintain a customer base.

Here are some of the best data visualization tools every Data Scientist must use
for the year 2020:

1. Tableau

It is an interactive data visualization software. This tool is used for effective


data analysis and data visualization in the industry. It has a drag and drop
interface and this feature helps it to perform tasks easily and very fast.

The software doesn’t force its users to write codes. The software is compatible
with a lot of data sources. The tool is a bit expensive but it is the most
preferred choice of a top company like Amazon. Qlik view is the biggest competitor
of tableau and the tool is extensively used because of its unique drag and drop
feature.

Key features of Tableau:

Tableau is known as the simplest business intelligence tool for data visualization
Data scientists do not need to write custom code in this tool
The tool is also a real-time collaboration along with data mixing

2. D3

D3.js is a Javascript library for producing interactive data visualizations in web


browsers. It is the most effective platform to work on data visualization. The tool
was initially released on Feb 18, 2011, and became official in August.

It supports HTML, CSS, and SVG. Developers can present data in the form of creative
pictures and graphics. It is a very flexible platform as it allows variations for
the creation of different graphs.

Key features of D3:

This data visualization tool offers powerful SVG operation capability


D3 integrates multiple methods as well as tools for the processing of data
Data scientists can effortlessly map their data to the SVG attribute

3. Qlikview

QlikView is a software similar to a tableau but you need to pay before using it for
commercial purposes. It is a business intelligence platform that turns data into
useful information.

This software helps to improve the data visualization process. The tool is
preferred by well-established data scientists to analyze large scale data. Qlik
view is used across 100 countries and has a very strong community.

Key features of QlikView:

The tool integrates with a very wide range of data sources such as EC2, Impala, HP
Vertica, etc
It is extremely fast when it comes to data analysis
This data visualization tool is easily deployable as well as configurable

4. Microsoft Power BI

It is a set of business analytics tools that can simplify data, prepare and analyze
instantly. It is the most preferred tool as it can easily integrate with Microsoft
tools and is absolutely free to use and download.

The tool is available for both mobile and desktop versions. So if a business uses
Microsoft tools it can be a big benefit for them.

Key features of Microsoft Power BI:

Generate interactive data visualizations across multiple data centers


It offers enterprise data analytics as well as self-service on a single platform
Even non-data scientists can easily create machine learning models

5. Datawrapper

This tool is a blessing for non-technical users and is the most user-friendly
visualization tool. To create visualizations you need to have technical skills such
as coding but in this app, you don’t need to have any technical skills.

The app can be best used by beginners who want to start their career in data
visualization. This app is the most user-friendly app for a data scientist. The
tool is widely used in media organizations where there is a high need for
presenting everything through stats and graphs. The tool is the most popular choice
because it has a simple and easy interface.

Key features of Datawrapper:

It offers the users with an embed code and provides the ability to export charts as
well
Option to select multiple map types and charts at once
The tool requires no advanced knowledge of coding for its installation

6. E Charts
Next, we have in the list of best data visualization tools is E Charts which is an
enterprise-level chart data visualization tool from the expert team of Baidu. E
Charts can be referred to as a pure Javascript chart library that runs smoothly on
various platforms and is also compatible with the majority of browsers.

Key features of E Charts:

Has multidimensional data analysis


Charts are available for all sized devices
It provides a framework for the rapid construction of web-based visualizations.
These are absolutely free to use

7. Plotly

Plotly enables more complicated and intricate visualizations. It creates a way to


its integration with analytics-orientated programming languages consisting of
Python, Matlab, and R.

It is constructed on top of the open supply d3.Js visualization libraries for


JavaScript, but this commercial package (with a potential non-industrial license
available) adds layers of user-friendliness and support in addition to inbuilt
support for APIs inclusive of Salesforce.

Key features of Plotly:

It offers built-in permissions and integrations with SAML


Super quick and easy deployment of the data visualization tool
Provides access to users for rapid exploration and prototyping

8. Sisense

A complete analytics solution is provided by Sisense. The visualization abilities


offer an uncomplicated drag and drop option that can easily support complicated
graphics, charts and interactive visualizations.

It allows the accumulation of records in easily accessible repositories where it


can be saved instantly on the dashboards.

Dashboards can then be shared throughout groups making sure even the non-technical-
minded personnel can discover the solutions they need to their problems.

Key features of Sisense:

Offers users with various tools to understand collected data in a visual


environment
You can connect directly to multiple data sources at once
With this tool, data scientists can tie together various maps and charts

9. FusionCharts

FusionCharts is based on the charting of JavaScript. This visualization tool has


secured itself as one of the leaders in the market.

It can produce 90 one of a kind chart kinds and integrates with a big variety of
systems and frameworks giving a notable deal of flexibility.
FusionCharts can create any type of visualization from scratch and this is one of
its unique features. Customers also have the option to choose from a selection of
“live” example templates.

Key features of FusionCharts:

It provides informative tooltips to assist users


The tool makes sure that users can understand different functionalities
You can compare the values of different data points with one another

10. HighCharts

Like FusionCharts, this also requires a license for business use, although it may
be used freely as a trial, non-business or for non-public use.

Its internet site claims that it's used by seventy-two of the world’s a hundred
largest agencies and it is often selected when a quick and flexible solution has to
be rolled out, with a minimum need for specialist statistical visualization
training before it may be put to work.

Key features of HighCharts:

The tool for data visualization provides its users with good compatibility
HighCharts is one of the most widely used tools for data analyzing
This tool is convenient to add interactive charts to advanced applications

====================

What is the connection between Data Science and Ethics?


At first glance the two subjects don’t seem to have much in common. Data science is
related to engineering and science, while ethics revolves around social science and
philosophy. However, the truth is that human contexts and ethics are inseparable
parts of Data Science. For the rest of this article we’re going to assume that you
already have some basic knowledge about what Data Science is. If you don’t, no
worries! Check some other articles by Big Data at Berkeley, or read this article by
the UC Berkeley School of Information, which will give you a more detailed
introduction to the subject. Whether you are interested in Data Science, or you are
someone who is simply curious about human ethics and morals, this article will give
you a great first look at the amazing world of data ethics.

What is Data Ethics? Why do we care about it?


Let’s start with a professional definition of data ethics. In their article, What
is data ethics?, Oxford professors and philosophers Luciano Floridi and
Mariarosaria Taddeo state that:

“Data ethics is a new branch of ethics that studies and evaluates moral problems
related to data (including generation, recording, curation, processing,
dissemination, sharing and use), algorithms (including artificial intelligence,
artificial agents, machine learning and robots) and corresponding practices
(including responsible innovation, programming, hacking and professional codes), in
order to formulate and support morally good solutions (e.g. right conducts or right
values).”

Simply speaking, in data ethics, we learn about all the ethical problems that
appear during our use of data. In this era of rapid technological development, we
are living in a “Data-fied World.” Data collection is a vital part of nearly every
aspect of our lives, from the phones in our pockets to the cars we drive. Almost
every human behavior and every operation we do with a tool like a computer can be
collected as data. Over the years, as technology progressed and we aimed for a
better life, we began to use data generated from day-to-day actions to conduct
complex analysis with the help of strengthened computing powers and new analytical
tools. Advanced technologies related to data science, like Machine Learning and AI,
have brought a lot of benefits to our life. However, as humans begin to step away
from hands-on analysis and let automated machines do most of the work for us,
different issues such as fairness, privacy, and representation emerge. We will
cover a couple of cases about those issues in detail below, so keep reading!

Why do Data Scientists need to understand Data Ethics?


Ever since Data Science became a buzzword in the technological industry, colleges
and universities have been scrambling to open a Data Science Program to satisfy the
world’s growing demand for data scientists, engineers, and analysts. In 2018, the
University of California, Berkeley was among the first few colleges that introduced
a unique Data Science Major. The program wants to:

“produce graduates who not only have deep technical expertise, but who also know
how to responsibly collect and manage data, and use it to inform decisions and
advance innovation to benefit the rapidly evolving world they’re graduating into”.

Besides various technical requirements such as computing, probability, and


modeling, the Berkeley Data Science curriculum has an additional human contexts and
ethics requirement. This shows how academic institutions recognize data ethics as a
crucial skill for any future Data Scientist to develop. As Data Scientists, we
often deal with big sets of data that are driven by people, so it is our duty to
keep private data secured and use it responsibly. To better incorporate human
values like justice and equity in data-driven technologies, we need to also
understand the underlying human and social structures.

How should we incorporate Data Ethics in our work as students?


When we are doing a data science project, we need to make sure that we understand
the potential ethical consequences of our work. Some tips for you to be an ethical
data scientist are: first, be aware of privacy issues such as data breaches and
find ways to adequately secure the data. If you are not familiar with the danger of
a data breach, check out this news article about the Facebook Security Breach.
Second, be transparent with your data usage. Get user consent before you use their
data in any way. Read the report by CDC about the infamous Tuskegee syphilis study
to see how a study with no informed consent can go horribly wrong. Third, despite
the difficulty of being completely objective, you should try your best to make sure
there is no bias involved in your model. In fact, to make employees follow data
ethics principles, many companies and organizations have incorporated certain codes
of ethics and conduct. One code of conduct that a lot of professional data
scientists follow is the Oxford-Munich Code of Conduct. It addresses common ethical
dilemmas that data scientists from the industry, academia, and the public sector
may face. Feel free to take a look at it. Below, we also provide you with a
checklist created by DJ Patil, Hilary Mason and Mike Loukides, which you can use to
incorporate data ethics in all of your data science related projects.

Here’s the Data Ethics Checklist:

❏ Have we listed how this technology can be attacked or abused? [SECURITY]

❏ Have we tested our training data to ensure it is fair and representative?


[FAIRNESS]

❏ Have we studied and understood possible sources of bias in our data? [FAIRNESS]

❏ Does our team reflect diversity of opinions, backgrounds, and kinds of thought?
[FAIRNESS]

❏ What kind of user consent do we need to collect to use the data?


[PRIVACY/TRANSPARENCY]

❏ Do we have a mechanism for gathering consent from users? [TRANSPARENCY]

❏ Have we explained clearly what users are consenting to? [TRANSPARENCY]

❏ Do we have a mechanism for redress if people are harmed by the results?


[TRANSPARENCY]

❏ Can we shut down this software in production if it is behaving badly?

❏ Have we tested for fairness with respect to different user groups? [FAIRNESS]

❏ Have we tested for disparate error rates among different user groups? [FAIRNESS]

❏ Do we test and monitor for model drift to ensure our software remains fair over
time? [FAIRNESS]

❏ Do we have a plan to protect and secure user data? [SECURITY]

(Loukides, Mason, Patil)

Some real-world cases that might blow your mind.


Now that we have some ideas about how to be ethical data scientists, let’s examine
the following case that addresses some of the issues we mentioned above.

As the world became more technologically advanced, the use of data has brought
efficiency in a variety of industries. For example, many tech companies have
employed data scientists to track and understand the popularity of their products.
However, Kwang-Mo Yang, a member of Samsung Medical Center, has written an article
regarding the ethical concerns behind using real-world data. The problem emerges
from non-governmental organizations studying the health data after de-identifying
personal information. Because a patient’s health data may contain highly personal
information, it is possible for the pharmaceutical companies to analyze the gender,
age, and race of a patient and categorize a certain type of individual as
vulnerable to a certain disease. Therefore, this is an issue of privacy and
representation. Pharmaceutical companies have used this information to target
advertisements for drugs. Groups of individuals who had been classified as
“vulnerable to Disease A” were more likely to see advertisements for drugs
targeting “Disease A” in their pharmacies¹. This categorization often
disproportionately affected low-income communities and under-represented
minorities, raising several questions about whether this practice was truly
ethical. Many individuals have also shown concern about their personal information
being used for commercial purposes. Ironically, many governmental organizations
have utilized real-world data to invent new drugs that have helped a variety of
patients, but they used lots of personal information to drive that marketing².

While it is legal in many countries, including South Korea, to study health data
after de-identifying personal information, it begs the question: Would an
individual be happy about their own information being used without their knowledge?
Will a person feel completely secure in a society where he or she can’t hide their
personal information?

To make any study more ethical, companies should acquire informed consent from
their patients before they begin to use private data from an individual.
The Evolution of Data Ethics
We’ve already talked a lot about the present state of data ethics in data science.
Let’s predict the future roles it may play in the industry. Writer Barbara Lawler
has deduced some of the potential global trends related to data ethics and data
privacy. Here are five trends that Lawler expects to see:

Chief Privacy Officers can expect ethics to become an explicit part of their role.
Technology companies will lead the way for U.S. Federal Privacy legislation.
Sustainable ethics codes will evolve to better address the challenges of a digital
world.
Product excellence and privacy by design will become synonymous.
Companies will drive to educate policy-makers and regulators about their
technologies.
What does this mean for you?

Data Ethics is here to stay, and will likely become a key part of any responsible
Data Scientist’s job, if it isn’t already.

While data has yielded a wide variety of benefits in everyday lives, the purpose
behind the use of data has become a vital topic. It all begins from considering the
human impact from the use of data. It will be important for privacy officers to
analyze the impact on people and society and whether the impact may be positive,
negative or neutral.

Since the necessity for data privacy in the United States has been a long discussed
topic and, as technology companies are the most knowledgeable organization within
the area of data usage, Lawler believes that the United States will lead the way
for U.S. Federal Privacy legislation following the regulation of General Data
Protection Regulation that was implemented in European Union.

For a long period of time, there has been a shift in consensus on how to respect
privacy due to the emergence of personal computing and larger network connection.
Following the expansion of globalization of economy and profound alteration in the
physical and digital lives of the citizens, Lawler is convinced that companies will
come up with a sustainable ethics code to counter potential challenges in a digital
world.

Privacy by design means to embed data privacy requirements into product design and
development, embodying the “build it in, don’t bolt it on” mentality. This includes
building in:

Privacy-savvy defaults
In-product transparency
Considerations for and documenting privacy risks and data flows
Assigning data owners upfront and throughout the data lifecycle, including E2E
security
With advancements in technology, knowing where data comes from and why it exists
has never been more vital from a strategic, operational, and compliance
perspective. Data needs to be stored in a clean and accessible form, which will
allow companies to learn, analyze, and tackle business issues in real-time. PbD
(Privacy by Design) will play a critical role in this and it is just as important
as secure coding.

Lawler writes that it is vital for policymakers around the world to develop a
deeper understanding on what they wish to regulate. Given the profound shift in the
digital network globally, policymakers must consider³:

What harms are they trying to protect people from?


What rights do they want to guarantee?
What problems are they trying to solve?
What are the privacy outcomes they hope to achieve for their citizens?
Therefore, the deeper the understanding the policymakers have towards the newly
created technologies, the easier it will be for them to decide if they want to
regulate that technology or not. As a result, the organizations that place the
greatest emphasis on educating policymakers will have the highest impact on the
evolution of data science.

You might also like