0% found this document useful (0 votes)
9 views

Lecture 6 - Data Sources and Course Project

The document outlines data sources classified into primary, secondary, and tertiary categories, explaining their characteristics and examples. It introduces the CRAAP test for evaluating data sources based on currency, relevance, authority, accuracy, and purpose. Additionally, it details the course project requirements, including data collection, normalization, and visualization, with specific deadlines for submissions and presentations.

Uploaded by

gokulmohan4002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture 6 - Data Sources and Course Project

The document outlines data sources classified into primary, secondary, and tertiary categories, explaining their characteristics and examples. It introduces the CRAAP test for evaluating data sources based on currency, relevance, authority, accuracy, and purpose. Additionally, it details the course project requirements, including data collection, normalization, and visualization, with specific deadlines for submissions and presentations.

Uploaded by

gokulmohan4002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Data Sources & Course Project

ANUP APREM

• BASED ON MATERIAL FROM DALT7002 (P08801): DATA SCIENCE FOUNDATIONS AT


OXFORD BROOKES UNIVERSITY
• SPONSORED BY BRITISH COUNCIL GOING GLOBAL EXPLORATORY GRANT
Data Sources
● These represent sources of data – where the data come
from?
● These are classified into three categories

Primary Data
Sources

Secondary Data
Sources

Tertiary Data
Sources
2
Data Sources
● Primary data sources: Original sources (material, events or evidence) as
they are actually happened. That is, data is not interpreted or analysed (it
shows first time or original materials).
– Examples include: dissertations, original research, original data, some
government reports, speeches, letters, interviews, etc.
● Secondary data sources: These explain primary resources and provide
analysis of those sources. In order words, these sources summarize and
anlyse data in order to provide added values to primary data sources.
– Examples are: textbooks, edited works (conference proceedings, etc),
review research articles, biographies, political analysis, etc.

3
Data Sources
Tertiary data sources:
● Distillation and collection of primary and secondary sources
● These are sources that index, organize or compile other
data sources. Examples include:
Dictionaries, Encyclopedia, Wikipedia,
Directories, Manuals, Indexing sources,
Guide books, etc.

4
Evaluating Data sources
• Large number of data source (Ex. CRAAP Test
Internet)
• Currency
• Large volume of data can be collected
from different sources • Relevance

• How to access quality of data? • Authority


• Accuracy
• Purpose
CRAAP Test
Currency Relevance
• Related to the timeline (Recency) of • Importance of data in relation to your
data needs
• When was the data created/updated? • Who is the intended audience/users of
Is it still valid to use? the data?
• Has the data been updated? Web • Percentage of useful data in the source
pages, links, etc.
• Comparison with other sources –
• Is it important for (your) work to use looking at or comparing a variety of
current data? sources to find out which one(s) to use
• Can old data be used?
Accuracy
CRAAP Test • Reliability or correctness of data
Authority • What are the sources where the data come
from?
• Creator’s credentials – who is the author
or source of data? • Cross validation of data on other sources – to
verify data from another source or personal
• Website links – do they provide knowledge
information about authors or sources of
data. • Is the data supported by evidence, experiment,
etc.
• Collection/analysis methodology used
Purpose
• Organizational track record, expertise or
qualification of authors • Purpose of data source – the reason of data
being created – teaching, research,
information, selling, etc.
• Is the purpose clearly identified?
• Does it provide fact or opinion, etc.
• Creator/organizational bias
Nominal vs Ordinal
Data Types and Characteristics • Nominal or categorical or qualitative data:
Discrete vs Continuous Data can only take a finite set of values.
• Discrete: It can take certain values from a Values have no meaningful ordering
finite dataset. For example, Number of between them. It provide descriptions or
people in a room, Number of PCs in a lab, labels but no ordering between them.
It is not possible to have 2.5 people in a
room or 3.5 PCs in a lab Ex: Gender: Male, Female
• Continuous data (or variable) It can take • Ordinal data
different values on an interval. Examples
include income, sales, age, etc. The order of values is significant
Examples: Feedback on service
1. not happy
2. happy
3. very happy
Marking scheme
Course Project • Novelty of the problem/data collection
• Group of 2 • Data selection and cleaning (CRAAP)
• Data collected from various sources
• Legal and/or ethical issues
• Data normalized to 3rd normal form and
stored in SQLite database. • Structured and semi-structured data

• Python for SQL queries • Data model and implementation (SQL +


Python code)
• Data visualization
• Data visualization (in Python)
• Data exploration
• Data exploration (in Python)
Constraints for this course
• Report and Presentation
1. Primary data source should be
https://data.gov.in/
2. Free to select any secondary data
sources
Initial Submission
• One page submission
• Proposal due: Sep 9, 2024
• Identify a novel problem (based on data available at data.gov.in)
• Identify any secondary source of data
• What are the attributes that you will collect?
• Data pre-processing/Data exploration/Data visualization that you plan
to perform
• Data should at least be 1000 records or more.
• Project Due: Nov 1, 2024 (No extension)
• Demonstration + Short Viva (Nov 4-8, 2024)

You might also like