0% found this document useful (0 votes)
18 views

Module 4 Data Science

The document provides lecture notes on visualization tools for data scientists, discussing methods for delivering insights such as presentations, customer segmentation, and real-time dashboards. It introduces Crossfilter, a JavaScript library for handling large datasets, and outlines the components needed to build a dc.js application for interactive dashboards. Additionally, it compares various visualization tools and discusses the pros and cons of creating custom reports versus using established software solutions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Module 4 Data Science

The document provides lecture notes on visualization tools for data scientists, discussing methods for delivering insights such as presentations, customer segmentation, and real-time dashboards. It introduces Crossfilter, a JavaScript library for handling large datasets, and outlines the components needed to build a dc.js application for interactive dashboards. Additionally, it compares various visualization tools and discusses the pros and cons of creating custom reports versus using established software solutions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

PES Institute of Technology and Management

NH-206, Sagar Road,Shivamogga-577204

Department of Computer Science and Engineering


Affiliated to

VISVESVARAYA TECHNOLOGICAL UNIVERSITY


Jnana Sangama, Belagavi, Karnataka –590018c

Lecture Notes
on

Module 4
VISUALIZATION TOOLS
(21CS754)
2021 Scheme
Prepared By,
Mrs. Prathibha S ,
Assistant Professor,
Department of CSE,PESITM
Module 4- Visualization Tools (21CS754)

MODULE -3
The data scientists must deliver their new insights to the end user. The resultscan be
communicated in several ways:
■ A one-time presentation—Research questions are one-shot deals because the
business decision derived from them will bind the organization to a certainThis chapter
covers considering options for data visualization for your end users course for many
years to come. Take, for example, company investment decisions:Do we distribute our
goods from two distribution centers or only one? Where do they need to be located for
optimal efficiency? When the decision is made, the exercise may not be repeated until
you’ve retired. In this case, the results are delivered as a report with a presentation as
the icing on the cake.
■ A new viewport on your data—The most obvious example here is customer
segmentation. Sure, the segments themselves will be communicated via reports and
presentations, but in essence they form tools, not the end result itself. When a clear and
relevant customer segmentation is discovered, it can be fed back to the database as a
new dimension on the data from which it was derived. From then on, people can make
their own reports, such as how many products were sold to each segment of customers.
■ A real-time dashboard—Sometimes your task as a data scientist doesn’t end when
you’ve discovered the new information you were looking for. You can send your
information back to the database and be done with it. But when other people start
making reports on this newly discovered gold nugget, they might interpret it incorrectly
and make reports that don’t make sense. As the data scientist who discovered this new
information, you must set the example: make the first refreshable report so others,
mainly reporters and IT, can understand it and follow in your footsteps. Making the first
dashboard is also a way to shorten the delivery time of your insights to the end user
who wants to use it on an everyday basis. This way, at least they already have something
to work with until the reporting department finds the time to create a permanent report
on the company’s reporting software.

PESITM, Dept of CSE Prepared By, Prathibha S Page 2


Module 4- Visualization Tools (21CS754)

Crossfilter, the JavaScript MapReduce library.


Crossfilter is a JavaScript library for exploring large multivariate datasets in the
browser. Crossfilter supports extremely fast (<30ms) interaction with coordinated
views, even with datasets containing a million or more records

You don’t want to send enormous loads of data over the internet or even your internal
network though, for these reasons:
■ Sending a bulk of data will tax the network to the point where it will bother
other users.
■ The browser is on the receiving end, and while loading in the data it will temporarily
freeze. For small amounts of data this is unnoticeable, but when you
start looking at 100,000 lines, it can become a visible lag.

Ingredients of dc.js application


It’s time to build the actual application, and the ingredients of our small dc.js application
are as follows:
■ JQuery—To handle the interactivity
■ Crossfilter.js—A MapReduce library and prerequisite to dc.js
■ d3.js—A popular data visualization library and prerequisite to dc.js
■ dc.js—The visualization library you will use to create your interactive dashboard
■ Bootstrap—A widely used layout library you’ll use to make it all look better
There will be three files:
■ index.html—The HTML page that contains your application
■ application.js—To hold all the JavaScript code you’ll write
■ application.css—For your own CSS

Reduce() functions in Crossfilter


reduceCount() –A function to keep count of observations.
reduceSum() - A function to find the sum of observagion s
reduceAdd() function—A function that describes what happens when an extra
observation is added.
reduceRemove() function—A function that describes what needs to happen
when an observation disappears (for instance, because a filter is applied).
reduceInit() function—This one sets the initial values for everything.
PESITM, Dept of CSE Prepared By, Prathibha S Page 3
Module 4- Visualization Tools (21CS754)

reduceInitAvg()initializes the p object by defining its


components (count, sum, and average) and setting their initial values
reduceRemoveAvg()
recude AddAvg()
reduceRemoveAvg()

Dashboard development tools or visualization tools free


with Java sript library.
 HighCharts—One of the most mature browser-based graphing libraries. The
free license applies only to noncommercial pursuits. If you want to use it in a
commercial context, prices range anywhere from $90 to $4000. See http://shop
.highsoft.com/highcharts.html.
 Chartkick—A JavaScript charting library for Ruby on Rails fans. See
http://ankane.github.io/chartkick/.
 Google Charts—The free charting library of Google. As with many Google products,
it is free to use, even commercially, and offers a wide range of graphs.
 d3.js— This is an odd one out because it isn’t a graphing library but a data
visualization library.
The difference might sound subtle but the implications are not.Whereas libraries such as
HighCharts and Google Charts are meant to draw certain redefined charts, d3.js doesn’t lay
down such restrictions. d3.js is currently the most versatile JavaScript data visualization
library available.

Multiple reasons why you’d create your own custom reports instead
of opting for the (often more expensive) company tools out there:

■No budget—When you work in a startup or other small company, the licensing
costs accompanying this kind of software can be high.
■ High accessibility—The data science application is meant to release results to any
kind of user, especially people who might only have a browser at their disposal—
your own customers, for instance. Data visualization in HTML5 runs fluently
on mobile.
■ Big pools of talent out there—Although there aren’t that many Tableau developers,
scads of people have web-development skills. When planning a project, it’s
important to take into account whether you can staff it.
■ Quick release—Going through the entire IT cycle might take too long at your
company, and you want people to enjoy your analysis quickly. Once your interface
is available and being used, IT can take all the time they want to industrialize
the product.
■ Prototyping —The better you can show IT its purpose and what it should be capable
of, the easier it is for them to build or buy a sustainable application that
does what you want it to do.

PESITM, Dept of CSE Prepared By, Prathibha S Page 4


Module 4- Visualization Tools (21CS754)

■ Customizability—Although the established software packages are great at what they


do, an application can never be as customized as when you create it yourself.

Reasons against developing you own application in Data Science.

 Company policy—Application proliferation isn’t a good thing and the company


might want to prevent this by restricting local development.
 Mature reporting team—If you have a good reporting department, why would
you still bother?
 Customization is satisfactory—Not everyone wants the shiny stuff; basic can
be enough.

PESITM, Dept of CSE Prepared By, Prathibha S Page 5

You might also like