Data Science
Data Science
Data Science
Data Science
Data Science is used in many industries in the world today, e.g. banking,
consultancy, healthcare, and manufacturing.
Data Science can be applied in nearly every part of a business where data is
available. Examples are:
Consumer goods
Stock markets
Industry
Politics
Logistic companies
E-commerce
Machine Learning
Statistics
Programming (Python or R)
Mathematics
Databases
A Data Scientist must find patterns within the data. Before he/she can find
the patterns, he/she must organize the data in a standard format.
There are five v's of Big Data that explains the characteristics.
Variety
Big Data can be structured, unstructured, and semi-structured that
are being collected from different sources. Data will only be collected
from databases and sheets in the past, But these days the data will
comes in array forms, that are PDFs, Emails, audios, SM posts,
photos, videos, etc.
c. Unstructured Data: All the unstructured files, log files, audio files,
and image files are included in the unstructured data. Some organizations
have much data available, but they did not know how to derive the value
of data since the data is raw.
Example: Web server logs, i.e., the log file is created and maintained
by some server that contains a list of activities.
Veracity
Veracity means how much the data is reliable. It has many ways to filter
or translate the data. Veracity is the process of being able to handle and
manage data efficiently. Big Data is also essential in business
development.
Value
Value is an essential characteristic of big data. It is not the data that we
process or store. It is valuable and reliable data that we store,
process, and also analyze.
Velocity
Velocity plays an important role compared to others. Velocity creates the
speed by which the data is created in real-time. It contains the linking of
incoming data sets speeds, rate of change, and activity bursts. The
primary aspect of Big Data is to provide demanding data rapidly.
Big data velocity deals with the speed at the data flows from sources
like application logs, business processes, networks, and social
media sites, sensors, mobile devices, etc.
Web Scraping
Web scraping is an automatic method to obtain large amounts of data
from websites. Most of this data is unstructured data in an HTML
format which is then converted into structured data in a
spreadsheet or a database so that it can be used in various
applications. There are many different ways to perform web
scraping to obtain data from websites. These include using online
services, particular API’s or even creating your code for web
scraping from scratch. Many large websites, like Google, Twitter,
Facebook, StackOverflow, etc. have API’s that allow you to access their
data in a structured format.
Reporting vs Analytics
Key Differences Between
Reporting and Analytics
Reporting is the process of gathering and presenting data in a structured
format such as graphs and tables. Organizing information in predefined
KPIs and metrics makes it easier for you to understand what is
happening. Analytics is the process of analyzing your data to identify
patterns and gain insights. Using techniques such as predictive and
prescriptive analytics helps you understand why things are happening and
what to do next.
What is Reporting?
Reporting primarily involves the presentation of data in a structured format. Its
purpose is to provide a snapshot of specific metrics or KPIs over a defined
period. Reports are instrumental in summarizing information for stakeholders
and are often automated and scheduled on a regular basis. Ad hoc reports,
created on-demand, can address specific inquiries or issues promptly. Data
visualizations help identify trends, patterns, and anomalies more intuitively.
Dashboards play a crucial role in presenting real-time data to stakeholders for
quick decision-making.
Benefits
Reporting Process
Your reporting analytics tool uses this data to allow you to create visualizations,
dashboards and KPI reports via automation. These make it easier for you to
know what has happened or what is happening in your business.
Types of Reporting
Reporting takes various forms in organizations, serving specific functions. Operational
reports offer day-to-day insights into activities like sales and inventory management.
Financial reports detail a company's financial health, including balance sheets and income
statements. Management reports provide summarized data for internal decision-making,
while strategic reports guide long-term planning. Compliance reports ensure adherence to
legal requirements, while ad hoc reports address specific queries
possible via technology designed particularly for that purpose. It constitutes a simple method
of storing data in digital form on computer devices, and keeping data on hand makes many
Storage devices may use electromagnetic, optical, or other media to keep the data safe and
recover it when necessary. File recovery and backup procedures become simple by data
While setting this up, every organization should consider these factors: dependability,
Source: vectorStock
Innovative technologies like data analysis, the Internet of Things, and AI produce and utilize
enormous amounts of data. Therefore, data storage plays a major role in the growth of any
organization now more than ever. Some of the benefits are as follows:
1. It is simple to gather large amounts of records for a longer time using electronic data storage.
2. Making duplicates of stored data makes it simple to back it up, enabling file loss or
3. With today’s cutting-edge security technologies and capabilities, plenty of techniques exist to
4. Every authorized individual has access to centralized stored data, which can be viewed and
5. Digital data can be more easily categorized and organized, and the process can be accessed
6. Digital data storage is faster than producing files that must be kept in file cabinets by printing
Primary Storage
A computer system’s primary data storage serves as its primary storage. The primary storage
smaller than secondary memory and comes with comparatively lesser storage. It is the only
storage form readily available to the CPU, unlike RAM and ROM (Read-only Memory). The
CPU always accesses primary storage-stored commands and processes them as needed. All
A secondary storage system can keep data longer and has additional storage space. External
or internal computer components include hard drives, USB drives, CDs, and other media.
One computer typically accesses secondary storage through its input/output channels and
Tertiary Storage
It is an extensive electronic storage system that is typically quite sluggish; hence, it stores
components that are retrieved occasionally. This method often incorporates a robotic device
that mounts and dismounts removable drives into computer storage units following the
system requirements. It helps access extremely massive databases without the assistance of
Forms of Storage
Data storage comes in three primary forms:
1. File Storage: Data is organized into files and directories in file storage. It is suitable for
storing structured data and is accessible through network protocols like NFS and SMB. File
storage is commonly used for documents, media files, and user data.
2. Block Storage: Block storage breaks data into fixed-sized blocks and is often used in
scenarios where raw storage volumes are needed. It is highly efficient for database systems
and can be accessed via protocols like iSCSI. Block storage provides low-level storage access.
3. Object Storage: Object storage stores data as objects, each with its unique identifier
and metadata. It is ideal for unstructured data, scalable storage, and cloud-based
applications. Object storage is accessible via RESTful APIs and is well-suited for backup,
Choosing the right storage type depends on the specific requirements of your application and
data.
Magnetic Storage
components and coated using a thin film of magnetic material, where data is stored. The
magnetized face of such disks goes inside a rotary drive, with a read-write unit of a magnetic
yoke and a magnetizing coil that spins in close range of the disks.
Optical Storage
An optical drive is a device that uses optical storage techniques for data processing functions
such as read/write/access. Laser light helps in reading and storing data on an optical disk. An
optical disk is a resin similar to polycarbonate, and the electronic data is maintained in tiny
Flash storage uses solid-state drives (SSDs) with flash memory for large-scale data or file
archiving. It substitutes HDDs and other forms of storage. A multi-terabyte dataset can be
kept “in memory” using an all-flash array, which offers read/write speeds four times faster
Cloud Storage
By replicating the capability of physical storage devices, cloud storage enables you to save or
retrieve various content types whenever you need to from a virtual setting. Any data uploaded
to the cloud is kept off-site in reliable data centers, and an on-site operator or an off-site third-
party service often handles it. Users can access cloud storage using a computer with an
internet connection, web portal, intranet, cloud storage apps, or additional application
Object Storage
Object storage is a technique that manages data storage in distinct components or objects. A
framework on which data analytics software can run queries on objects is known as an object
store. By adopting a flat address space, object storage removes the need for the hierarchical
structure that different systems need. This enables easy scaling up or down to accommodate
Software-Defined Storage
SDS is a storage system that separates the hardware and software used for storage. Unlike
conventional NAS or SAN systems, SDS runs on any x86 or industry-standard system,
Whenever you upload digital data to a personal computer, it gets saved to a device, which
stays there until it is damaged. Storage is fundamentally different from computer memory:
While anyone can swiftly retrieve information from your computer’s RAM, such data is only
Modern computers or devices may connect to storage devices directly or via a network
connection. Users give computers instructions for accessing data stored on and retrieved from
various storage devices. On a basic level, data storage depends on two principles: the form it
RAID is a method that uses several drives in tandem rather than just one to boost
performance, provide data redundancy, or both. It is a method for securing data during a drive
crash by maintaining the same data in different places on many hard drives or solid-state
storage devices. It has two or more parallel-operating disks, and RAID level is how disks are
arranged.
Source: Steemit.com
A network-attached storage (NAS) server is a specific storage platform that links to devices
over a LAN. The server’s connectivity features allow for the retrieval and storage of data
from many external devices, and NAS storage also offers extensive sharing capabilities. The
system utilizes the features of a file-storage technology and the clustering of a redundant
A storage area network (SAN) is a network-based storage that can access data at the block
level. This kind of storage consists of several data storage units connected by a network. The
storage format is an amalgam of NAS and DAS. The storage type transfers data across a
server and storage using specific networking protocols, like Fibre Channels.
Source: Phoenixap.com
Object Storage vs Block Storage
The differences between object storage and block storage are as follows:
Cost-effective Expensive
A single central or decentralized system A centralized system for on-site or private cloud data
that maintains data in the private, storage. If the program and its data storage are located far
public, hybrid, or cloud. away from one another, latencies could pose a concern.
Suitable for large amounts of raw data. Best for storing databases and data related to transactions.
Large files yield the greatest results. It performs best with compact files.
Some of the best practices for data storage management are as follows:
After you transfer your data from regular, operational systems for immediate and future
storage, a reliable data backup and recovery plan will ensure it is constantly kept secure.
Backup copies enable data to be recovered from an earlier date, enabling the organization to
recover from unforeseen circumstances. Maintaining another copy of the data on another
storage device is important for protecting against original data loss or corruption.
Data Deduplication
There are instances where similar data is produced due to repeated operations. You can
improve data management and reduce storage costs by setting up a human or automated
procedure that constantly evaluates data and eliminates duplicates. Your data will remain
Data compression makes files take up less room on a hard drive and takes less time to
transfer or download. The decrease in distance and time could lead to major savings in
expenses. It makes it possible to transport data objects and files quickly through networks and
It enables you to identify sensitive data and essential assets, and establish robust security
measures that monitor and protect every stage of data sorting, thereby maximizing your data
security. Encryption converts the data you store into nonsensical codes; only the owner’s key
can decode it. This ensures the data won’t be used, even when unauthorized people gain
access to it.
The physical storage capacity of mobile devices is limited and ranges typically from just a
few gigabytes up to a few hundred gigabytes. Data file sizes have grown considerably in
Privacy and security are key factors to consider because the data kept on mobile devices
could contain personal and sensitive data. Malicious attacks, unauthorized access, or
Internal Storage
The internal memory space of the device is the internal storage. The files you maintain here
are restricted to the application itself, so no matter their permissions, other applications
cannot access those. Android OEMs and app developers utilize internal storage to store
private data, app data, user settings, and additional system files.
External Storage
Any storage not part of the device’s internal memory, including an attached SD card, is called
external storage. Any app with the appropriate permissions could have access to this region,
which serves as a free-for-all area. There are two kinds of external storage: SD cards,
commonly called memory cards, which represent the secondary external storage, and built-in
1. Data Backup: Businesses safeguard vital data and backups on storage devices or in the cloud
data backup strategy is essential to prevent data loss and maintain business continuity.
management. These systems help maximize space utilization, minimize storage costs, and
streamline order fulfillment processes. Efficient inventory management is crucial for meeting
databases, ensuring secure access and reliability for employees and customers. Robust
scalability.
4. Cloud Storage: Cloud storage services offer scalable and cost-effective data storage options,
enabling businesses to securely store and access data from anywhere with an internet
connection. This flexibility enhances data accessibility, reduces infrastructure costs, and
5. Collaboration and File Sharing: Businesses employ storage systems to facilitate team
collaboration. They securely store and share documents, presentations, and media files,
enabling employees to work together efficiently, whether in the office or remotely. Secure
What’s Next?
Data storage is an essential component of our digital life. Increasing storage capacities, cloud
storage options, and strong security protocols are just a few of the challenges involved in data
storage that are being addressed by innovative technologies and enhanced storage
CareerData EngineeringDatabaseIntermediate
Rahul Shah
27
Sion Chakrabarti
16
CHIRAG GOYAL
87
Barney Darlington
Suvojit Hore
Arnab Mondal
15
Prateek Majumder
68