Chap1_Introduction
Chap1_Introduction
Chapter 1. Introduction
Yijie Zhang
1
Course Website
2
Class Attendance Check
Purpose:
• Have an opportunity to learn about each other’s
education/research background or work experience to possibly
Order of Magnitude:
form a team with common interests for homework or projects. 20 1 100 One
• Name
210 K 103 Thousand
•
•
Why do you take this course?
What is the largest data size you’ve ever personally
230 G 109 Billion
−
−
storage format
processing/analysis purposes
260 E 1018 Quintillion
280 Y 1024
290
……
3
About this course
• Recent Developments and Future Trends on Big Data Computing
• Continuum Computing: from Edge to Cloud
• High-performance Computing: Supercomputer, Cluster, etc.
• Advanced Topics:
• Big Data Meets Large Models
• Big Data Visualization
• Big Data Transfer
• Big Data Workflows
• Big Data Security
5
Textbooks and Reference Books
Overview Machine Learning / Data Mining
MapReduce / Hadoop
7
Big Data and HPC for LLMs
8
Center for Big Data
Director: Chase Wu
URL: https://centers.njit.edu/bigdata
9
Mission Statement
10
A Three-layer Structure of the CBD
Transportation
Solar-Terrestrial
Brain injury Goals: Advance sciences in various
Big Data Physics domains
Layer 3 Tasks: Adapt, customize, and refine
Applications Healthcare
Business application-specific solutions
Smart city
etc.
bound User
Interface
North-
Goals: Provide generic and special
big-data enabling solutions
Systems/Platforms
Big Data Tasks: Investigate, design, develop,
Tools/Libraries
Layer 2 Technological implement, and test big data-
Services
Infrastructure oriented analytics, visualization,
Algorithms
computing, networking, workflow,
storage, and retrieval solutions
Data Access
Retrieval
and
11
− Layer 1: Big Data Repository
• Store, manage, and provide a wide variety of data such as raw data
(experimental, simulation, observational, and user-generated
content), metadata, markup data, analysis results (intermediate
and final) in various forms including models, views, tables, images,
and videos, and workflow templates with provenance data.
• Build a dedicated one-stop portal to share research data and
analysis results for community building.
− Layer 2: Big Data Technological Infrastructure
• Provide generic and domain-specific big data enabling solutions for
data management, movement, and analytics.
• Host and maintain a set of practical technical resources in the form
of systems/platforms, tools/libraries, services, and algorithms in
various areas including database management, data mining,
machine learning, and parallel and distributed computing, which
are needed to compose big data solutions in different application
domains.
12
− Layer 3: Big Data Applications
• Present a common portal to big data applications spanning across a
wide spectrum of research fields, including
− transportation
− solar-terrestrial
− brain injury
− physics
− healthcare
− business
− smart city
• Provide researchers powerful and customized big data solutions to
advance the frontier of sciences in various application domains.
13
Core Faculty of CBD
• Chase Wu (Director) Professor, Dept of Data Science
• Dantong Yu (Co-Director) Associate Professor, Leir Chair, School of Management
• Yi Chen Professor, Leir Chair, School of Management, Dept of
Computer Science
• Andrew Gerrard Professor, Dept of Physics, Center for Solar-Terrestrial Research
• Lazar Spasovic Professor, Dept of Civil and Environmental Engineering
• Steven Chien Professor, Dept of Civil and Environmental Engineering
• Joyoung Lee Assistant Professor, Dept of Civil and Environmental Engineering
• Namas Chandra Professor, Dept of Biomedical Engineering, Center for Injury Bio-
mechanics, Materials and Medicine
• Jason Wang Professor, Dept of Computer Science
• Usman Roshan Associate Professor, Dept of Computer Science
• Zhi Wei Professor, Dept of Computer Science
• Dimitri Theodoratos Associate Professor, Dept of Computer Science
• Vincent Oria Professor, Dept of Computer Science
• Senjuti Roy Associate Professor, Dept of Computer Science
• Brook Wu Associate Professor, Dept of Informatics
• Hai Phan Assistant Professor, Dept of Data Science
14
Funded Projects
• DOE: Technologies and Tools for Synthesis of Source-to-Sink High-
Performance Flows, DOE Office of Science, Big Data-Aware Terabits
Networking.
• NSF: An Integrated Approach to Performance Modeling and Optimization of
Big-data Scientific Workflows, Computer and Network Systems.
• DOE: Towards a Scalable and Adaptive Application Support Platform for Large-
Scale Distributed E-Sciences in High-Performance Network Environments,
DOE Office of Science, High-Performance Networks for Distributed Petascale
Science.
• Google Research Award, Understanding and Processing Subjective Queries on
Structured Data
• NSF: CAREER: Analyzing and Exploiting Meta-information for Keyword Search
on Semi-structured Data.
• EarthCube IA: Magnetosphere-Ionosphere-Atmosphere Coupling, Abstract
#1541009.
• Intelligent Transportation Systems Resource Center - Task: Data Acquisition,
Integration, Analysis, and Visualization.
15
Application 1: Transportation
NJIT Devices Transmit
NJIT
Data using Verizon 3G/
4G Network
Database Server
TrafficCast Bluetooth Internet
Data (Real time
speed and travel NJIT Internal Network
Real Time Traffic ASTI Real Time Traffic
time)
Volume Devices Volume NJIT
Web
Application
TRANSCOM Server Server
(Transmit & OpenReach)
Internet
Internet
Device Location and
Status
End Users
Travel Time
Travel Times
Indexes
SWRL: 10 GB/day
Jeffer Lidar
Ballistic (bulletBlunt
) Injury-most prevalentBlast (military)
Blunt Impacts>> MVA, fall,
• Ballistics (Bullet,
sports injury shrapnel)
• Blunt (motor vehicle, sports,
CONCUSSION
fall from height)
• Blast (explosions)
18
Exascale Computing and Big Data
https://vimeo.com/129742718
19
Thanks!☺
Questions ?
20