29403
29403
29403
com
https://textbookfull.com/product/big-data-analytics-with-
java-1st-edition-rajat-mehta/
OR CLICK BUTTON
DOWNLOAD NOW
https://textbookfull.com/product/from-big-data-to-big-profits-success-
with-data-and-analytics-1st-edition-russell-walker/
textboxfull.com
https://textbookfull.com/product/big-data-and-analytics-for-
insurers-1st-edition-boobier/
textboxfull.com
https://textbookfull.com/product/big-data-analytics-in-cybersecurity-
first-edition-deng/
textboxfull.com
Big Data Analytics Systems Algorithms Applications C.S.R.
Prabhu
https://textbookfull.com/product/big-data-analytics-systems-
algorithms-applications-c-s-r-prabhu/
textboxfull.com
https://textbookfull.com/product/big-data-analytics-for-intelligent-
healthcare-management-1st-edition-nilanjan-dey/
textboxfull.com
https://textbookfull.com/product/emerging-technology-and-architecture-
for-big-data-analytics-1st-edition-anupam-chattopadhyay/
textboxfull.com
All rights reserved. No part of this book may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, without the prior written
permission of the publisher, except in the case of brief quotations embedded in
critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy
of the information presented. However, the information contained in this book is
sold without warranty, either express or implied. Neither the author, nor Packt
Publishing, and its dealers and distributors will be held liable for any damages
caused or alleged to be caused directly or indirectly by this book.
Livery Place
35 Livery Street
ISBN 978-1-78728-898-0
www.packtpub.com
Credits
Author
Rajat Mehta
Reviewers
Dave Wentzel
Roberto Casati
Commissioning Editor
Veena Pagare
Acquisition Editor
Chandan Kumar
Deepti Thore
Technical Editors
Jovita Alva
Sneha Hanchate
Copy Editors
Safis Editing
Laxmi Subramanian
Project Coordinator
Shweta H Birwatkar
Proofreader
Safis Editing
Indexer
Pratik Shirodkar
Graphics
Tania Dutta
Production Coordinator
Shantanu N. Zagade
Cover Work
Shantanu N. Zagade
About the Author
Rajat Mehta is a VP (technical architect) in technology at JP Morgan Chase in
New York. He is a Sun certified Java developer and has worked on Java-related
technologies for more than 16 years. His current role for the past few years
heavily involves the use of a big data stack and running analytics on it. He is
also a contributor to various open source projects that are available on his
GitHub repository, and is also a frequent writer for dev magazines.
About the Reviewers
Dave Wentzel is the CTO of Capax Global, a data consultancy specializing in
SQL Server, cloud, IoT, data science, and Hadoop technologies. Dave helps
customers with data modernization projects. For years, Dave worked at big
independent software vendors, dealing with the scalability limitations of
traditional relational databases. With the advent of Hadoop and big data
technologies everything changed. Things that were impossible to do with data
were suddenly within reach.
Before joining Capax, Dave worked at Microsoft, assisting customers with big
data solutions on Azure. Success for Dave is solving challenging problems at
companies he respects, with talented people who he admires.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to
all Packt books and video courses, as well as industry-leading tools to help you
plan your personal development and advance your career.
Why subscribe?
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Customer Feedback
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our
editorial process. To help us improve, please leave us an honest review on this
book’s Amazon page at https://www.amazon.com/dp/1787288986.
If you’d like to join our team of regular reviewers, you can e-mail us at
customerreviews@packtpub.com. We award our regular reviewers with free
eBooks and videos in exchange for their valuable feedback. Help us be relentless
in improving our products!
Chapter 2, First Steps in Data Analysis, takes the first steps towards the field of
analytics on big data. We start with a simple example covering basic statistical
analytic steps, followed by two popular algorithms for building association rules
using the Apriori Algorithm and the FP-Growth Algorithm. For all case studies,
we have used realistic examples of an online e-commerce store to give insights
to users as to how these algorithms can be used in the real world.
Chapter 5, Regression on Big Data, explains how you can use linear regression
to predict continuous values and how you can do binary classification using
logistic regression. A real-world case study of house price evaluation based on
the different features of the house is used to explain the concepts of linear
regression. To explain the key concepts of logistic regression, a real-life case
study of detecting heart disease in a patient based on different features is used.
Chapter 7, Decision Trees, explains that decision trees are like flowcharts and
can be programmatically built using concepts such as Entropy or Gini Impurity.
The golden egg in this chapter is a case study that shows how we can predict
whether a person's loan application will be approved or not using decision trees.
Chapter 8, Ensembling on Big Data, explains how ensembling plays a major role
in improving the performance of the predictive results. I cover different concepts
related to ensembling in this chapter, including techniques such as how multiple
models can be joined together using bagging or boosting thereby enhancing the
predictive outputs. We also cover the highly popular and accurate ensemble of
models, random forests and gradient-boosted trees. Finally, we predict loan
default by users in a dataset of a real-world Lending Club (a real online lending
company) using these models.
Chapter 10, Clustering and Customer Segmentation on Big Data, speaks about
clustering and how it can be used by a real-world e-commerce store to segment
their customers based on how valuable they are. I have covered both k-Means
clustering and bisecting k-Means clustering, and used both of them in the
corresponding case study on customer segmentation.
Chapter 11, Massive Graphs on Big Data, covers an interesting topic, graph
analytics. We start with a refresher on graphs, with basic concepts, and later go
on to explore the different forms of analytics that can be run on the graphs,
whether path-based analytics involving algorithms such as breadth-first search,
or connectivity analytics involving degrees of connection. A real-world flight
dataset is then used to explore the different forms of graph analytics, showing
analytical concepts such as finding top airports using the page rank algorithm.
Chapter 12, Real-Time Analytics on Big Data, speaks about real-time analytics
by first seeing a few examples of real-time analytics in the real world. We also
learn about the products that are used to build real-time analytics system on top
of big data. We particularly cover the concepts of Impala, Spark Streaming, and
Apache Kafka. Finally, we cover two real-life case studies on how we can build
trending videos from data that is generated in real-time, and also do sentiment
analysis on tweets by depicting a Twitter-like scenario using Apache Kafka and
Spark Streaming.
Chapter 13, Deep Learning Using Big Data, speaks about the wide range of
applications that deep learning has in real life whether it's self-driving cars,
disease detection, or speech recognition software. We start with the very basics
of what a biological neural network is and how it is mimicked in an artificial
neural network. We also cover a lot of the theory behind artificial neurons and
finally cover a simple case study of flower species detection using a multi-layer
perceptron. We conclude the chapter with a brief introduction to the
Deeplearning4j library and also cover a case study on handwritten digit
classification using convolution neural networks.
What you need for this book
There are a few things you will require to follow the examples in this book: a
text editor (I use Sublime Text), internet access, admin rights to your machine to
install applications and download sample code, and an IDE (I use Eclipse and
IntelliJ).
You will also need other software such as Java, Maven, Apache Spark, Spark
modules, the GraphFrames library, and the JFreeChart library. We mention the
required software in the respective chapters.
You also need a good computer with a good RAM size, or you can also run the
samples on Amazon AWS.
Who this book is for
If you already know some Java and understand the principles of big data, this
book is for you. This book can be used by a developer who has mostly worked
on web programming or any other field to switch into the world of analytics
using machine learning on big data.
When we wish to draw your attention to a particular part of a code block, the
relevant lines or items are set in bold:
Dataset<Row>data = spark.read().csv("data/heart_disease_data.csv");
System.out.println("Number of Rows -->" + data.count());
Note
Warnings or important notes appear in a box like this.
Tip
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think
about this book—what you liked or may have disliked. Reader feedback is
important for us to develop titles that you really get the most out of.
If there is a topic that you have expertise in and you are interested in either
writing or contributing to a book, see our author guide on
www.packtpub.com/authors.
You can also download the code files by clicking on the Code Files button on
the book's webpage at the Packt Publishing website. This page can be accessed
by entering the book's name in the Search box. Please note that you need to be
logged in to your Packt account.
Once the file is downloaded, please make sure that you unzip or extract the
folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
[Ready
To be followed by
II
DIXMUDE
A chapter in the History of the Naval Brigade, Oct.-Nov.
1914
By CHARLES LE GOFFIC
Illustrated
III
IN THE FIELD (1914-15)
The Impressions of an Officer of Light Cavalry
IV
IN THE DARDANELLES AND SERBIA
Notes of a French Army Doctor
Illustrated
LONDON: WILLIAM HEINEMANN
21 Bedford Street, W.C.
*** END OF THE PROJECT GUTENBERG EBOOK WITH MY
REGIMENT ***
Updated editions will replace the previous one—the old editions will
be renamed.
1.D. The copyright laws of the place where you are located also
govern what you can do with this work. Copyright laws in most
countries are in a constant state of change. If you are outside the
United States, check the laws of your country in addition to the
terms of this agreement before downloading, copying, displaying,
performing, distributing or creating derivative works based on this
work or any other Project Gutenberg™ work. The Foundation makes
no representations concerning the copyright status of any work in
any country other than the United States.
1.E.6. You may convert to and distribute this work in any binary,
compressed, marked up, nonproprietary or proprietary form,
including any word processing or hypertext form. However, if you
provide access to or distribute copies of a Project Gutenberg™ work
in a format other than “Plain Vanilla ASCII” or other format used in
the official version posted on the official Project Gutenberg™ website
(www.gutenberg.org), you must, at no additional cost, fee or
expense to the user, provide a copy, a means of exporting a copy, or
a means of obtaining a copy upon request, of the work in its original
“Plain Vanilla ASCII” or other form. Any alternate format must
include the full Project Gutenberg™ License as specified in
paragraph 1.E.1.
• You pay a royalty fee of 20% of the gross profits you derive
from the use of Project Gutenberg™ works calculated using the
method you already use to calculate your applicable taxes. The
fee is owed to the owner of the Project Gutenberg™ trademark,
but he has agreed to donate royalties under this paragraph to
the Project Gutenberg Literary Archive Foundation. Royalty
payments must be paid within 60 days following each date on
which you prepare (or are legally required to prepare) your
periodic tax returns. Royalty payments should be clearly marked
as such and sent to the Project Gutenberg Literary Archive
Foundation at the address specified in Section 4, “Information
about donations to the Project Gutenberg Literary Archive
Foundation.”
• You comply with all other terms of this agreement for free
distribution of Project Gutenberg™ works.
1.F.
1.F.4. Except for the limited right of replacement or refund set forth
in paragraph 1.F.3, this work is provided to you ‘AS-IS’, WITH NO
OTHER WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO WARRANTIES OF
MERCHANTABILITY OR FITNESS FOR ANY PURPOSE.
Please check the Project Gutenberg web pages for current donation
methods and addresses. Donations are accepted in a number of
other ways including checks, online payments and credit card
donations. To donate, please visit: www.gutenberg.org/donate.
Most people start at our website which has the main PG search
facility: www.gutenberg.org.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com