Data Science Unit 6
Data Science Unit 6
Data Science Unit 6
Data Visualization is basically putting the analyzed data in the form of visuals
i.e - graphs, images. These visualizations make it easy for humans to understand
the analyzed trends through visuals.
Data Visualization is very important when it comes to analyzing big datasets. When
data scientists analyze complex datasets they also need to understand the insights
collected. Data Visualization will make it easier for them to understand through
graphs and charts.
Nowadays, to hire an Android developer or iOS developer depends upon the kind of
tools and techniques that they use to an extent. As for businesses around the
world, using these tools can help gain business insights and stay ahead in the
race. The majority of top iOS and Android mobile app development companies are
using these tools to analyze the data sets extracted from mobile apps to help the
business grow and maintain a customer base.
Here are some of the best data visualization tools every Data Scientist must use
for the year 2020:
1. Tableau
The software doesn’t force its users to write codes. The software is compatible
with a lot of data sources. The tool is a bit expensive but it is the most
preferred choice of a top company like Amazon. Qlik view is the biggest competitor
of tableau and the tool is extensively used because of its unique drag and drop
feature.
Tableau is known as the simplest business intelligence tool for data visualization
Data scientists do not need to write custom code in this tool
The tool is also a real-time collaboration along with data mixing
2. D3
It supports HTML, CSS, and SVG. Developers can present data in the form of creative
pictures and graphics. It is a very flexible platform as it allows variations for
the creation of different graphs.
3. Qlikview
QlikView is a software similar to a tableau but you need to pay before using it for
commercial purposes. It is a business intelligence platform that turns data into
useful information.
This software helps to improve the data visualization process. The tool is
preferred by well-established data scientists to analyze large scale data. Qlik
view is used across 100 countries and has a very strong community.
The tool integrates with a very wide range of data sources such as EC2, Impala, HP
Vertica, etc
It is extremely fast when it comes to data analysis
This data visualization tool is easily deployable as well as configurable
4. Microsoft Power BI
It is a set of business analytics tools that can simplify data, prepare and analyze
instantly. It is the most preferred tool as it can easily integrate with Microsoft
tools and is absolutely free to use and download.
The tool is available for both mobile and desktop versions. So if a business uses
Microsoft tools it can be a big benefit for them.
5. Datawrapper
This tool is a blessing for non-technical users and is the most user-friendly
visualization tool. To create visualizations you need to have technical skills such
as coding but in this app, you don’t need to have any technical skills.
The app can be best used by beginners who want to start their career in data
visualization. This app is the most user-friendly app for a data scientist. The
tool is widely used in media organizations where there is a high need for
presenting everything through stats and graphs. The tool is the most popular choice
because it has a simple and easy interface.
It offers the users with an embed code and provides the ability to export charts as
well
Option to select multiple map types and charts at once
The tool requires no advanced knowledge of coding for its installation
6. E Charts
Next, we have in the list of best data visualization tools is E Charts which is an
enterprise-level chart data visualization tool from the expert team of Baidu. E
Charts can be referred to as a pure Javascript chart library that runs smoothly on
various platforms and is also compatible with the majority of browsers.
7. Plotly
8. Sisense
Dashboards can then be shared throughout groups making sure even the non-technical-
minded personnel can discover the solutions they need to their problems.
9. FusionCharts
It can produce 90 one of a kind chart kinds and integrates with a big variety of
systems and frameworks giving a notable deal of flexibility.
FusionCharts can create any type of visualization from scratch and this is one of
its unique features. Customers also have the option to choose from a selection of
“live” example templates.
10. HighCharts
Like FusionCharts, this also requires a license for business use, although it may
be used freely as a trial, non-business or for non-public use.
Its internet site claims that it's used by seventy-two of the world’s a hundred
largest agencies and it is often selected when a quick and flexible solution has to
be rolled out, with a minimum need for specialist statistical visualization
training before it may be put to work.
The tool for data visualization provides its users with good compatibility
HighCharts is one of the most widely used tools for data analyzing
This tool is convenient to add interactive charts to advanced applications
====================
“Data ethics is a new branch of ethics that studies and evaluates moral problems
related to data (including generation, recording, curation, processing,
dissemination, sharing and use), algorithms (including artificial intelligence,
artificial agents, machine learning and robots) and corresponding practices
(including responsible innovation, programming, hacking and professional codes), in
order to formulate and support morally good solutions (e.g. right conducts or right
values).”
Simply speaking, in data ethics, we learn about all the ethical problems that
appear during our use of data. In this era of rapid technological development, we
are living in a “Data-fied World.” Data collection is a vital part of nearly every
aspect of our lives, from the phones in our pockets to the cars we drive. Almost
every human behavior and every operation we do with a tool like a computer can be
collected as data. Over the years, as technology progressed and we aimed for a
better life, we began to use data generated from day-to-day actions to conduct
complex analysis with the help of strengthened computing powers and new analytical
tools. Advanced technologies related to data science, like Machine Learning and AI,
have brought a lot of benefits to our life. However, as humans begin to step away
from hands-on analysis and let automated machines do most of the work for us,
different issues such as fairness, privacy, and representation emerge. We will
cover a couple of cases about those issues in detail below, so keep reading!
“produce graduates who not only have deep technical expertise, but who also know
how to responsibly collect and manage data, and use it to inform decisions and
advance innovation to benefit the rapidly evolving world they’re graduating into”.
❏ Have we studied and understood possible sources of bias in our data? [FAIRNESS]
❏ Does our team reflect diversity of opinions, backgrounds, and kinds of thought?
[FAIRNESS]
❏ Have we tested for fairness with respect to different user groups? [FAIRNESS]
❏ Have we tested for disparate error rates among different user groups? [FAIRNESS]
❏ Do we test and monitor for model drift to ensure our software remains fair over
time? [FAIRNESS]
As the world became more technologically advanced, the use of data has brought
efficiency in a variety of industries. For example, many tech companies have
employed data scientists to track and understand the popularity of their products.
However, Kwang-Mo Yang, a member of Samsung Medical Center, has written an article
regarding the ethical concerns behind using real-world data. The problem emerges
from non-governmental organizations studying the health data after de-identifying
personal information. Because a patient’s health data may contain highly personal
information, it is possible for the pharmaceutical companies to analyze the gender,
age, and race of a patient and categorize a certain type of individual as
vulnerable to a certain disease. Therefore, this is an issue of privacy and
representation. Pharmaceutical companies have used this information to target
advertisements for drugs. Groups of individuals who had been classified as
“vulnerable to Disease A” were more likely to see advertisements for drugs
targeting “Disease A” in their pharmacies¹. This categorization often
disproportionately affected low-income communities and under-represented
minorities, raising several questions about whether this practice was truly
ethical. Many individuals have also shown concern about their personal information
being used for commercial purposes. Ironically, many governmental organizations
have utilized real-world data to invent new drugs that have helped a variety of
patients, but they used lots of personal information to drive that marketing².
While it is legal in many countries, including South Korea, to study health data
after de-identifying personal information, it begs the question: Would an
individual be happy about their own information being used without their knowledge?
Will a person feel completely secure in a society where he or she can’t hide their
personal information?
To make any study more ethical, companies should acquire informed consent from
their patients before they begin to use private data from an individual.
The Evolution of Data Ethics
We’ve already talked a lot about the present state of data ethics in data science.
Let’s predict the future roles it may play in the industry. Writer Barbara Lawler
has deduced some of the potential global trends related to data ethics and data
privacy. Here are five trends that Lawler expects to see:
Chief Privacy Officers can expect ethics to become an explicit part of their role.
Technology companies will lead the way for U.S. Federal Privacy legislation.
Sustainable ethics codes will evolve to better address the challenges of a digital
world.
Product excellence and privacy by design will become synonymous.
Companies will drive to educate policy-makers and regulators about their
technologies.
What does this mean for you?
Data Ethics is here to stay, and will likely become a key part of any responsible
Data Scientist’s job, if it isn’t already.
While data has yielded a wide variety of benefits in everyday lives, the purpose
behind the use of data has become a vital topic. It all begins from considering the
human impact from the use of data. It will be important for privacy officers to
analyze the impact on people and society and whether the impact may be positive,
negative or neutral.
Since the necessity for data privacy in the United States has been a long discussed
topic and, as technology companies are the most knowledgeable organization within
the area of data usage, Lawler believes that the United States will lead the way
for U.S. Federal Privacy legislation following the regulation of General Data
Protection Regulation that was implemented in European Union.
For a long period of time, there has been a shift in consensus on how to respect
privacy due to the emergence of personal computing and larger network connection.
Following the expansion of globalization of economy and profound alteration in the
physical and digital lives of the citizens, Lawler is convinced that companies will
come up with a sustainable ethics code to counter potential challenges in a digital
world.
Privacy by design means to embed data privacy requirements into product design and
development, embodying the “build it in, don’t bolt it on” mentality. This includes
building in:
Privacy-savvy defaults
In-product transparency
Considerations for and documenting privacy risks and data flows
Assigning data owners upfront and throughout the data lifecycle, including E2E
security
With advancements in technology, knowing where data comes from and why it exists
has never been more vital from a strategic, operational, and compliance
perspective. Data needs to be stored in a clean and accessible form, which will
allow companies to learn, analyze, and tackle business issues in real-time. PbD
(Privacy by Design) will play a critical role in this and it is just as important
as secure coding.
Lawler writes that it is vital for policymakers around the world to develop a
deeper understanding on what they wish to regulate. Given the profound shift in the
digital network globally, policymakers must consider³: