Case Study On Processing Data Driven For Health
Case Study On Processing Data Driven For Health
Class: TE-B
What are the major constituents of HADOOP ECOSYSTEM?
The major constituents of the Hadoop ecosystem mentioned in the case study topic are:
Healthcare Industry is one of the world's greatest and most extensive ventures. Amid, the ongoing years
the Healthcare administration around the globe is changing from infection focused to a patient-focused
model and volumebased model. Teaching the predominance of Healthcare and diminishing the cost is a
guideline behind the creating development towards-esteem based Healthcare conveyance model and
patient-focused mind. The volume and interest for huge information in Healthcare associations are
developing little by close to nothing. To give successful patient-focused care, it is fundamental to oversee
and analyse the huge amount of data sets. The traditional methods are obsolete and are not sufficiently
adequate to break down enormous information as assortment and volume of information sources have
expanded and a very large rate in previous two decades. There is a requirement for new and creative tools
and methods that can meet and surpass the capacity of overseeing such a huge amount of data being
generated by the healthcare department.
The social insurance framework of healthcare departments is community in nature. This is since it
comprises of a substantial number of partners such as doctors with specialization in different sectors,
medical caretakers, research centre technologists and other individuals that cooperate to accomplish the
shared objectives of decreasing medicinal cost and blunders and also giving quality healthcare experience.
Every one of these partners produce information from heterogeneous sources, for example, physical
examination, clinical notes, patients’ meetings and perceptions, research facility tests, imaging reports,
medications, treatments, overviews, bills and protection.
The rate at which information is being generated from heterogeneous sources from various healthcare
department has incremented exponentially on the daily basis. Therefore, it is becoming hard to store,
process and break down this inter related information with traditional dataset handling applications.
Nonetheless, new and efficient methods and systems are in addition to provide great processing
advancements to store, process, break down and extricate values from voluminous and heterogeneous
medical information being generated in a continuous way. Henceforth, the medicinal services framework
is quick turning into a major information industry.
Generally, medicinal services information has developed enormously in both organized and unstructured
way, to a great extent driven by the requests of always extending information parched populace what's
more, operational attributes of e-health stages. This dangerous multidimensional development has lead
scientists, to add numerous more watchwords to portray Healthcare Big Data (HBD). It isn't only the
volume however their assortment, specifically the kinds of sources that deliver information and the
objective sorts that request them are excessively different and various in Healthcare area. These
incorporate medicinal services workforce (doctors, clinical staff, parental figures), benefit giving
organizations (counting safety net providers), healing facilities with resources, clinicians, government
controllers, drug stores, pharmaceutical makes (with look into groups included), and therapeutic gadget
organizations.
Objectives:
In order to process a huge amount of health data records at once we need efficient tools and
methodologies. The proposed papers use the Hadoop Framework to handle the data, and the
algorithm being used is Map Reduction.
1. Name Node: Name Node is utilized to store the Metadata (data about the area, size of
files/blocks) for HDFS. The Metadata could be put away on RAM or Hard-Disk. There will
dependably be just a single Name Node in a cluster. The only way that the Hadoop framework
can fail is when the Name Node will crash.
2. Secondary Name Node: It is used as a backup for Name Node. It holds practically same data
as that of Name Node. On the off chance that Name Node falls flat, this one comes into picture.
3. Data Node: The actual user files or data is stored on Data Node. The number of Data Node
depends on your data size and can be increased with the need. The Data Node communicates to
Name Node in definite interval of times.
4. Job Tracker: Name Node and Data Nodes store points of interest and genuine information on
HDFS. This information is likewise required to process according to users’ prerequisites. A
Developer writes a code to process the information. Processing of data can be done using
MapReduce. MapReduce Engine sends the code over to Data Nodes, making jobs in multiple
nodes running alongside of each other. These employments are to be persistently observed by
the Job tracker.
Map Reduction algorithm contains two important tasks, namely Map and Reduce.
MapReduce utilizes different numerical calculations to separate an errand into little parts and dole
out them to various frameworks.
MapReduce calculation helps in sending the Map and Reduce errands to proper servers in a bunch.
The tasks are executed in parallel in all the different nodes and finally the result is returned to the
user.
CGH adopted a phased approach to implement the Hadoop ecosystem. The initial phase focused on
ingesting data from various sources, including:
CGH utilized Sqoop, a tool specifically designed for transferring data between relational databases and
HDFS. Data cleansing and transformation were performed using MapReduce or Spark scripts. The
processed data was then stored in HDFS in a structured format readable by querying tools like Hive and
Pig.While the Hadoop ecosystem offers significant advantages, CGH encountered certain challenges:
Data Security and Privacy: Implementing robust security measures is crucial to protect sensitive patient
data stored in the data lake. CGH enforces access controls and encrypts data both at rest and in transit.
Data Quality and Standardization: CGH established data governance procedures to ensure data quality
and consistency across different sources. Standardizing data formats facilitates seamless integration and
analysis.
Technical Expertise: Managing a Hadoop cluster requires specialized skills. CGH invested in training its
staff and potentially outsourced some functions.
Improved Patient Care: By analyzing patient records and sensor data, CGH can identify potential health
risks and initiate proactive interventions. Predictive analytics helps tailor treatment plans to individual
patient needs.
Enhanced Research & Development: CGH can leverage Big Data to analyze research data from various
sources, enabling faster drug discovery and improved treatment methods.
Operational Efficiency: Data analytics helps CGH optimize resource allocation by identifying areas for
cost savings and improving operational workflows.
Personalized Marketing: CGH can analyze patient data to understand their needs and preferences,
allowing for targeted communication and outreach programs.
At present as the medicinal services showcase is developing, it has turned out to be certain that
associations which are equipped for utilizing the force of examination are showing a sensible favored
outlook in bit of the general business over their opponents. Big Data analysis in medical services has
indeed, now turned into the main thrust for making openings for creating ideal treatment pathways,
enhancing the restorative setback extent and better managing clinical decision candidly steady systems.
With taking off healthcare services costs and additionally developing administrative weights for both
moderateness and changes in clinical results – Analytics has risen as a silver covering for this industry.
Examination in medical services has demonstrated to create experiences that not just lower add up to costs,
decrease wasteful aspects, and distinguish high hazard populace yet additionally can foresee a patient's
future social insurance needs.
Thus, with examination today, emotionally supportive networks are presently being subjected to different
measurable and computerized reasoning systems. This is further bringing about advancement of significant
bits of knowledge, for example, ID of different patient hazard factors, gathering of patients in light of
changed wellbeing conditions, arrangement of noteworthy data to doctors at the purposes of care and
above all, quantifiable advancement on healthcare services results. The presentation of examination in
medical services hence, has helped in beating a lot of difficulties making genuine incentive for this part. A
portion of these ordinary difficulties include
Anyway, to conquer the greater part of the above difficulties, it is basic that the data being recorded via
the patients should be put to good use. Additionally, all clinical data should be put away in their
standard information organizations. For example - EHRs must be changed into useful information on
which analysis can be done, in order to effectively get significant insight from it and improvements can
be achieved over the same to provide personalized healthcare experience to the patient.
To finish up along these lines, it tends to be said that analysis today is undoubtedly an urgent process in
medical services that will altogether reshape its scene in the upcoming years. Besides, investigation is
likewise being current drive of a move in this industry towards arrangements that are fit for conveying
genuine esteem, for example,
The potential of the Hadoop ecosystem in healthcare extends far beyond the applications demonstrated
by Health First. Here are some promising future directions
Genomics and Precision Medicine: Integrating genomic data with traditional clinical data can pave the
way for personalized medicine at a deeper level, tailoring treatments to individual genetic profiles.
Population Health Management: Analyzing large datasets from entire patient populations can identify
trends, predict disease outbreaks, and develop targeted public health interventions.
Wearable Devices and IoT Integration: Data from wearable devices and Internet of Things (IoT)
sensors can provide real-time insights into patient health and behavior, enabling proactive monitoring
and preventive care.
Advanced Analytics and AI: Machine learning and artificial intelligence hold immense potential for
tasks like automating medical image analysis, drug discovery, and even chatbots for patient support.
The adoption of big data technologies like Hadoop marks a transformative journey for the healthcare
industry. By embracing data-driven insights, healthcare providers can empower themselves to deliver
better patient care, improve clinical outcomes, and optimize resource allocation. As technology
continues to evolve and new challenges emerge, continuous innovation and a commitment to data
security and privacy will be paramount in unlocking the full potential of big data for a healthier future.
This case study has explored the implementation of the Hadoop ecosystem at Health First and its impact
on various healthcare initiatives. The concluding sections have highlighted the future potential and
challenges associated with big data in healthcare. It is evident that big data holds immense promise for
transforming healthcare delivery, and the Hadoop ecosystem serves as a powerful tool for unlocking
valuable insights from the ever-growing ocean of healthcare data.