PROCESO:
FORMATO Código: JD-RG-002
Diseño y Desarrollo
de Programas SÍLABO DE MÓDULOS PREGRADO Y POSGRADO
Académicos Versión: 3
VIRTUAL
ESCUELA DE NEGOCIOS Y DESARROLLO INTERNACIONAL
I. IDENTIFICACIÓN DEL MÓDULO
CICLO DE FORMACIÓN Pregrado
NOMBRE DEL MÓDULO Programming for Data Science
CÓDIGO DEL MÓDULO ES597
NÚMERO DE CRÉDITOS 2
DURACIÓN 8 semanas
CONDICIONES DE INSCRIPCIÓN
TIPOLOGÍA Teórico práctico
II. DESCRIPCIÓN DEL MÓDULO
The contents in this Module are designed to be an applied introduction with some statistical concepts. The work will be done using R
language, so you will need to install the programming language and the code editor, that are in free.
It is not necessary to have previous knowledge, but if you have experience in computer programming and statistics, this may feel more
familiar. However, all the contents were structured to learn from scratch. There are a great deal of topics and resources to learn R, so you
will often find references to manuals and online resources.
This Module will provide you with basic tools so that you can later delve into disciplinary applications. It will be extremely useful for
you.
III. COMPETENCIA GENERAL DEL MÓDULO
Understanding the basic operation of programming languages for data science focused on the tasks of importing, managing, cleaning, preparing
and exploratory analysis of data, as necessary tools for the technical development of data science projects.
COMPETENCIA UNIDAD 1
Understanding the types of data involved in data science project to decide the proper programming tool to approach them for an effecient and
effective processing.
ESCENARIO 1 ESCENARIO 2
Elemento 1 Indicador 1 Elemento 1 Indicador 1
Classifies data in structured, semi-
Understanding the types of Describes the types of data that are
Understanding structured, semi- structured and unstructured and
data that are processed processed with programming languages
structured and unstructured acknowledges the possibilities of
using programming and explains ways to structure them in
special data types. processing in each programming
languages. each programming language.
language.
Elemento 2 Indicador 2 Elemento 2 Indicador 2
Understanding the formats Defines the possibilities of data
Defines data writing formats and Understanding the possibilities
for writing, printing, and processing involving time, finance,
presents primary visualizations of the of data processing involving
displaying primary and GPS in each programming
raw data. time, finance, and GPS.
structured data. language.
“Este documento es propiedad intelectual del POLITECNICO GRANCOLOMBIANO, se prohíbe su reproducción total
o parcial sin la autorización escrita de la Rectoría. TODO DOCUMENTO IMPRESO O DESCARGADO DEL
SISTEMA, ES CONSIDERADO COPIA NO CONTROLADA”.
Página 1 de 4
PROCESO:
FORMATO Código: JD-RG-002
Diseño y Desarrollo
de Programas SÍLABO DE MÓDULOS PREGRADO Y POSGRADO
Académicos Versión: 3
VIRTUAL
COMPETENCIA UNIDAD 2
Understanding the tools used to take raw data, clean it and transform it for the subsequent modeling that will extract knowledge.
ESCENARIO 3 ESCENARIO 4
Elemento 1 Indicador 1 Elemento 1 Indicador 1
Understanding the use of tools
Understanding the
Defines a strategy to tackle data that is for primary data processing such Uses tools for primary data
treatment of missing and
missing, abnormal or erroneous. as subsets, filters, conditionals, processing.
anomalous data.
crossing of variables.
Elemento 2 Indicador 2 Elemento 2 Indicador 2
Understanding the main
Understands the scope and application of Understanding the use of
tools for data Understands the strategies for
data transformation: replacement, programming tools for the
transformation, specifically iterative programming applied to
concatenation, reassignment, and iterative application of data
replacement, concatenation, large databases.
indexing. groups.
reassignment, and indexing.
COMPETENCIA UNIDAD 3
Understanding the main strategies and instruments for initial data exploration, to draw hypotheses, select variables and planning the design of
experiments with data mining tools.
ESCENARIO 5 ESCENARIO 6
Elemento 1 Indicador 1 Elemento 1 Indicador 1
Understanding and Understanding dimension Recognizes dimension reduction
developing univariate Makes a univariate statistics report. reduction tools for viewing tools and applies them in practical
descriptive statistics. multivariate bases. cases.
Elemento 2 Indicador 2 Elemento 2 Indicador 2
Understanding and Understanding data simulation Understands and applies the tools of
Runs multivariate analysis using
developing multivariate tools, sampling and stochastic simulation, sampling and stochastic
programming code.
descriptive analysis. processes. processes.
COMPETENCIA UNIDAD 4
Understanding and using the different packages for the graphic exploration of data for the descriptive representation of data or the explanation of
results.
ESCENARIO 7 ESCENARIO 8
Elemento 1 Indicador 1 Elemento 1 Indicador 1
Understands and applies the R base Analyzing data descriptively
Understanding the use of Analyzes data descriptively with
visualization functions and additional with univariate visualization
display paths in R. multivariate tools.
packages. tools.
Elemento 2 Indicador 2 Elemento 2 Indicador 2
Understanding the use of Understands code that generates Presenting descriptive reports Presents descriptive reports with
Página 2 de 4
PROCESO:
FORMATO Código: JD-RG-002
Diseño y Desarrollo
de Programas SÍLABO DE MÓDULOS PREGRADO Y POSGRADO
Académicos Versión: 3
VIRTUAL
multivariate statistics bivariate descriptive statistics graphs and
with univariate tools. univariate tools.
graphs in R. interprets them appropriately.
IV. NÚCLEOS TEMÁTICOS
1. Types of Data and their Representations
2. Data Cleansing and Transformation
3. Exploration of Quantitative Data
4. Graphic Exploration of Data
NÚCLEO TEMÁTICO 1. Types of Data and their Representations
EJES TEMÁTICOS:
Setup
Starting tasks Basic R
Types of data in R
Tibbles 7
Importing data
Loops and iterations
Conditional declarations (conditional statements)
Functions
Statistics in geometric spaces
Areal data
Some models applied to data analysis in economics
Decision trees
NÚCLEO TEMÁTICO 2. Data Cleansing and Transformation
EJES TEMÁTICOS:
Missing values
Verification of the type of variable
Approach to outliers and missing data
Grouping
Functions applied to an entire data frame
NÚCLEO TEMÁTICO 3. Exploration of Quantitative Data
EJES TEMÁTICOS:
Unvariate descriptive statistics
Measures of location
Measures of dispersion
Some descriptive graphs
Principal Component Analysis (PCA)
t-SNE
Probability distributions in R
NÚCLEO TEMÁTICO 4. Graphic Exploration of Data
EJES TEMÁTICOS:
Loading and enlisting grouped data
Distribution
Página 3 de 4
PROCESO:
FORMATO Código: JD-RG-002
Diseño y Desarrollo
de Programas SÍLABO DE MÓDULOS PREGRADO Y POSGRADO
Académicos Versión: 3
VIRTUAL
Graph of evolution in time
Some considerations taken from literature
Better graphics
Software
Good practices
Additional resources
Communicating the results
V. APOYOS REFERENCIALES
BIBLIOGRÁFICOS
Wickham, H. & Grolemund, G. (2016). R for data science: import, tidy, transform, visualize, and model data. O’Reilly Media, Inc.
Laude, H. (2017). Data Scientist y lenguaje R Guía de autoformación para el uso de Big Data. Eni.
De Jonge, E., & Van Der Loo, M. (2013). An introduction to data cleaning with R. Statistics Netherlands Heerlen.
Burns, E. (2021). Data Cleaning in R Made Simple. Towards Data Science. https://towardsdatascience.com/data-cleaning-in-r-made-
simple-1b77303b0b17
Rincón, L. (2007). Curso elemental de probabilidad y estadística. Universidad UNAM.
Laude, H. (2017). Data Scientist y lenguaje R Guía de autoformación para el uso de Big Data. Eni.
Kabacoff, R. (2020). Data visualization with R. Wesleyan University.
Plotly. (n.d.). Plotly R Open Source Graphing Library. Plotly. https://plotly.com/r/
VI. ANEXOS
1. Desarrollo Didáctico de los Módulos en Pregrado Virtual
2. Evaluación de los Módulos en Pregrado Virtual
3. Desarrollo Didáctico de los Módulos en Posgrado Virtual
4. Evaluación de los Módulos en Posgrado Virtual
Página 4 de 4