Data Modelling and Visualization

Uploaded by

toon town

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

160 views31 pages

Data Modelling and Visualization

Uploaded by

toon town

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 31

Data Aggregation and Analysis 4 Prd Data Aggregation and Group operations : Group by Mechanics, Dats aggregation, General split: Pivot tables and cross tabulation 67 Time Series Data Analysis: Date and Time Data Types and Tools, Time series Basics, date Ranges, Frequencies and shifting, Time Moving Window Functions Zone Handling, Periods and Periods Arithmetic, Resampling and Frequency conversion, 5.1 _ Introduction : Data Aggregation + Data aggregation is any process in which information is gathered and expressed in a summary form. for purposes such as statistical analysis, + A common aggregation purpose is to get more information about particular groups based on specific variables such as age, profession or income. +The information about such groups can then be used for Web site personalization to choose contentt and advertising likely to appeal to an individual belonging to one or more groups for which data has been collected. For example, a site that sells music CDs might advertise certain CDs based on the age of the user and the data aggregate for their age group. + Online Analytic Processing (OLAP) is a simple type of data aggregation in which the marketer uses an online reporting mechanism to process the information. ‘+ A-common aggregation purpose is to get more information about particular groups based on specific variables such as age, profession, or income, The information about such groups can then be used for Web site personalization to choose content and advertising likely to appeal to an individual belonging to one or more groups for which data has been collected. For example, a site that sells music CDs might advertise certain CDs based on the age of the user and the data aggregate for their age group. Online Analytic Processing (OLAP) is a simple type of data aggregation in which the marketer uses an online reporting mechanism to process the information. 5.2__Data Aggregation and Group Operations + Data aggregation and group operations are essential techniques in data analysis and are particulary useful when dealing with large datasets. These methods allow analysts to summarize and gain insights from the data efficiently. ". ation oF transform Ctegorizing a dataser and applying a function to each group, whether an oes preparing a datas Often a critical Component of'a data analysis workflow. After loadiia, tity aint Purposes. p may need to compute ‘STOUP statistics or possibly pivot tables for reporting or te ize datasets in a natural *ible group by interface, enabling you to slice, dice, and summarize One reason fo, is the ease wit p ery langu * the popularity of relational databases and SQl. (which stands ei eee a on ith which data can be joined, filtered, transformed, and aggrega = formed. As yall like SQL are Somewhat constrained in the kinds of group cperstions that oe pat operations a With the expressiveness of Python and Pandas, we can perform quite complex nie how to: any functi Pandas object or NumPy array. In this chapter, you wi i of functions, arrays, or Data © Split a pandas object into pieces using one or more keys (in the form 7 column names), ion that accepts a ©. Calculate group summary statistics, like count, mean, or standard see 2 set oe © Apply within-group transformations or other manipulations, like normalization, linea subset selection. © Compute pivot tables and cross-tabulations. ° Perform quantile analysis and other statistical group analyses, 1 Group by Mechanics £idcRReO ics in data modeling and visualization 1s a ee = as “ Group by Mechani: oR een ply 9 crcl re - sro ral summarize data based on aaa decisions, The “Group by" operation involves dividing a dataset he ae ee one of more key variables and then performing clculations or aggrega groups based on Cee in various data manipulation and ees tools, including spr each group. vice Excel and data analysis libraries like pandas in Pythor software like Mi gain insia!- pata Modeling and Visualization ‘Aggregation and Analysis L Data Organization Before applying the “Group by* operation, the data needs to be structured n'a tabular format sto represented a5 a data frame or a table with rows and columns, Each row in the dataset represents ® ange observation, and each column represents a specific attribute or variable associated with that observation. 2, Wentifying Key Variables To use the "Group by" operation, one or more key variables need to be selected to define the groups These variables should have categorical or discrete values, such as product categories, geographical regions, oF customer IDs. The "Group by” operation will group the data based on the unique values of these key variables. 3. Grouping Process Once the key variables are chosen, the “Group by" operation will group the data according to the nvaue values of these variables. All rows with the same value(s) for the key variable(s) will be combined into a separate group. As a result, the dataset is partitioned into multiple subsets, each representing a distinct group. 4, Aggregation and Calculation inctions ‘and calculations on each median, count, standard within each + After the data is grouped, analysts can perform various aggregation fu group independently. Common aggregation functions include sum, mean, deviation, and more. These calculations provide insights into the characteristics and patterns group. «For example, consider a sales dataset with columns for product. categories, sales dates, and, sales amounts. By grouping the data based on the product categories, you can obtain separate groups for teach category. You can then apply the sum function to calculate the total sales for each product category, providing a concise summary of the sales data for different products. 5, Multi-Level Grouping Group by Mechanics also supports multi-level grouping, where you can use multiple key variables to create a hierarchical grouping structure. This allows for deeper analysis by drilling down into subgroups within larger groups. For instance, you can group sales data by both product categories and sales regions, providing insights into how different products perform in various regions. 6. Visualization Once the data is grouped and aggregated, visualizing the results can be highly effective in communicating insights, Bar charts, line charts, pie charts, and other visualizations can help compare and contrast the data, across different groups. Visualizations make it easier to identify trends, patterns, and outliers within the dataset. re 7. Interpretation and Decision-Making > origin a The insights gained from Group by Mechanics enable data-driven decision-making. An enti : top-performing categories, regions with the highest sales, or trends in customer behavior, which. can guide marketing strategies, inventory management, and resource allocation.Data Modeling and Visualization 5.2.2 Data Aggregation 54 Data Aggregation and. Data Aggregation is any process whereby data is gathered and expressed in a summary form. When d aggregated, atomic data rows - typically gathered from multiple sources - are replaced with tot Simmary statistics. Groups of observed aggregates are replaced with summary statistics based on observations. Aggregate data is typically found in a data warehouse, as it can provide answers to a Questions and also dramatically reduce the time to query large sets of data. Data aggregation can enable analysts to access and examine large amounts of data in a reasonable frame, A row of aggregate data can represent hundreds, thousands or even more atomic data records, the data is aggregated, it can be queried quickly instead of requiring all of the processing cycles to ach undlerlying atomic data row and aggregate it in real time when itis queried or accessed. As the amount of data stored by organizations continues to expand, the most important and accessed data can benefit from aggregation, making it feasible to access efficiently. What does data aggregation do? Data aggregators summarize data from multiple sources. They provide capabilities for multiple measurements, such as sum, average and counting. Examples of aggregate data : * _ Voter turnout by state or county. Individual voter records are not presented, just the vote totals by cz for the specific region. * Average age of customer by product. Each individual customer is not identified, but for each pr average age of the customer is saved. * Number of customers by country. Instead of examining each customer, a count of the customers i country is presented. An example of this is creating a summary that shows the aggregate average salary for department, rather than browsing through individual employee records with salary data. * Aggregate data does not need to be numeric. You can, for example, count the number of any no data element. * Before aggregating, it is crucial that the atomic data is analyzed for accuracy and that there is for the aggregation to be useful. For example, counting votes when only 5% of results are likely to produce a relevant aggregate for prediction. How do data aggregators work? * Data aggregators work by combining atomic data from multiple sources, processing the data insights and presenting the aggregate data in a summary view. Furthermore, data provide the ability to track data lineage and can trace back to the underlying atomic data aggregated.Z ae ta Modeling and Visualization 3, col lection s Data Aggregation and AmalySs Ist data aggregation tools may extract data from multiple sources, storing it in large databases as atomic data. The data may be extracted from Internet of Things (oT) sources, such as the following ° ‘+ social media communications; + news headlines; «personal data and browsing history from IoT devices; and + all centers, podcasts, etc (through speech recognition), 2. Processing : Once the data is extracted, it is processed. The data aggregator will identify the atomic data that is to be aggregated. The data aggregator may apply predictive analytics, artificial intelligence (AD or machine teaming algorithms to the collected data for new insights. The aggregator then applies the specified statistical functions to aggregate the data 3. Presentation : Users can present the aggregated data in a summarized format that itself provides new data. The statistical results are comprehensive and high quality «Data aggregation may be performed manually or through the use of data aggregators. However, data aggregation Is often performed on a large-scale’ basis, which makes manual aggregation less feasible. Furthermore, manual aggregation risks accidental omission of crucial data sources and patterns. 5.2.2(A) Uses for Data Aggregation ‘+ Data aggregation can be helpful for many disciplines, such as finance and business strategy decisions, product planning, product and service pricing, operations optimization and marketing strategy creation. Users may be data analysts, data scientists, data warehouse administrators and subject matter experts. = Aggregated data is commonly used for statistical analysis to obtain information about particular groups based on specific demographic or behavioral variables, such as age, profession, education level or income. + For business analysis purposes, data can be aggregated into summaries that help leaders make well- informed decisions. User data can be aggregated from multiple sources, such as social media communications, browsing history from IoT devices and other. personal data, to give companies critical insights into consumers. cayoi HolleeDps TOUTE MOT RQRNEDS “1 18> NOUSQOTEYS Gaake-SWeh NE h opetevis sMt-ataluolea asa WOee etData Modeling and Visualization 56 Data Aggregation and, Data aggregation is a crucial technique in data modeling and visualization, used to summarize and large datasets into. more manageable and insightful representations. This process involves. g mathematical or statistical operations to combine data within each group, leading to concise and my results. Data aggregation simplifies complex data, allowing analysts to identify patterns, trends, and characteristics, which are Vital for data-driven decision-making. Here's @ comprehensive explanation aggregation : 1. Aggregation Functions Aggregation functions are mathematical or statistical operations that consolidate data within each Common aggregation functions include sum, mean (average), median, count, standard deviation, and maximum. These functions help create summary statistics, enabling users to understand the i and characteristics of the data effectively. 2. Data Organization Before performing data aggregation, the data should be structured in a tabular format, typically repr 5 a data frame or a database table. Each row represents an individual observation, while each contains attributes or variables associated with those observations. 3. Group by Mechanics Data aggregation often goes hand in hand with the "Group by" operation. By grouping the data ‘one or more key variables, the dataset is divided into distinct groups. Aggregation functions are then a to each group independently, resulting in summary statistics for each group. For instance, consider a sales dataset with columns for product categories, sales dates, and sales at ‘After grouping the data by product categories, you can apply the sum function to calculate the total ‘each category, providing a concise overview of sales performance across different products. 4, Multi-Level Aggregation Similar to multi-level grouping, data aggregation also supports multi-level aggregation. In this case, apply aggregation functions to multiple key variables, creating a hierarchical summary. This enables insights by examining subgroups within larger groups. Continuing with the previous example, you could perform multi-level aggregation by both categories and sales regions. This would provide a comprehensive view of total sales for each category across different regions. 5. Time-based Aggregation In time series data, data aggregation is crucial for summarizing and analyzing trends over time. Tit aggregation involves grouping data into specific time intervals, such as days, weeks, or months, and aggregation functions to calculate relevant statistics for each interval. Time-based aggregation can help identify seasonality, trends, and patterns in time series data. For i you can calculate the average daily sales for each month or the total monthly revenue for a specificData Modeling and Visualization 57 Visualization Analysis After performing data aggregation, summarized information, Bar charts, fi representation of aggregated data, mal 7, Decision-Making and Analysis visualizing the results can be highly effective in presenting the ine charts, area charts, and other visualizations can provide a clear king it easier to compare and analyze trends across different groups. Data aggregation plays a pivotal role in data-driven deci , n-making. By summarizing large datasets into meaningful insights, analysts can, make. informed choices, identify growth opportunities, optimize business processes, and address potential issues, 5.2.3. Split Apply Combine anBython we do this by ling GroupBy and it invelves Gpe ox rors other hea steps of the Split-Apply- Combine strategy. Let us start by defining each of the three steps : Fig. 5.23 shows the Split-Apply-Combine using an aggregation function. Fig. 5.2.3: The Split-Apply-Combine using an aggregation function. 1. Split : Split the data into groups based on some criteria thereby creati column or a combination of columns to split the data into groups) _ a ad 2. Apply : Apply a function to each group independently. (Aggregate, Transform, or Filter the data in this step) 3. Combine : Combine the results into a data structure (Pandas Series, Pandas Data Frame) ew * Toaccess the code used inthis article please vs this ink. ing a Group by object. (We can use the ¥Data Modeling and Visualization Amport pandas as pd Amport numpy as np | import metplottin.; Import feaborn as sn lot as plt | data_sales~pd.DetaFrame(sales_dict) Inport Libraries and create a small Dataset to work on. 5-8 Create an Example Data-set in the form of dictionary having key value pairs. ieeeeeat Colour} Sales | Transactions | Product 0 | Yellow | 100000 100 Type A 1 | Black | 150000 150 TypeA 2 | Blue | 80000 820 TypeA 3 | Red | 90000 920 Type A Yellow | 200000 230 Type A 5 | Black | 145000 120 Type A 6 | Blue | 120000 70 Type A 7 | Red | 300000 250 Type A 8 | Yellow | 250000 250 Type A 9 | Black | 200000 110 Type B 10 | Blue | 160000 130 Type B 11_| Red | 90000 | 360 Type B —— —- 12 | Yellow | 90100 | 980 TypeB 13 | Black | 150000 300 Type B 14 | Blue | 142000 150 Type B 1s | Red | 130000 170 Type B = | 16 | Blue | 400000 | 230 TypeB 17 | Red | 350000 280 Type Bto summarize the whole data, seabom brary have been used to create Ms data raphically. ceo asieet ‘Transactions Visual Data Summary . Product Type A mn 2 Type8 Fig. 5.24 ° + After creating and summarizing the data, asa first step let's move on to the first part of Split-Apply-Combine. SPLIT : Create an Object + In this step we will create the the groups from the dataframe ‘data_sales’ by grouping on the basis of the column ‘colour’. 1 Spit: Growpty ehe column ‘cotour ata by = data_sales.groupby("colour’) prin ype(aata_gby) sea ‘elses “pandas. core.groupby.groupby DataFranecroupny"> * Once we apply the groupby( function on the dataframe, it creates the GroupBy object as a result. We can think of this object as a separate dataframe for each group. Each group has been created based on Categories in a grouped column (4 Groups will be created ‘Black, ‘Blue’, ‘Red', ‘Yellow’ from the column ‘colour of the dataframe in our case ). wand * A GroupBy object stores the data of the individual groups in the form of key value pairs as in dictionary. To know the group names , we can either use attribute ‘keys’ or use the attribute ‘groups’ of the GroupBy object.Data Modeling and Visualization 5-10 Data Aggregation and. [# Lets check the nones of the groups Intearnden({1, 5, 9, 23], dtypen’ inte’), ‘lug’: Inteatnden({2,"6,'30, 14,"16], atypen'sntéa"), Red": 14,15, 1 ined Yellow": Intostndex((e, 4,°8, 12], dtypes'intes')) For further clarity on Groups and its content, we can run a loop and print the key value pairs. My" (= the nome af the group and ‘value’ (= the tegeented rows fram the origtnet Oatafrawe. for man in data_gby: print Groupie: 7A | printtvalue) me 1 rine Back 1 ets cee sales transactions product 150000, 150 "type A 145000 120 type A 200000 sae type. 150000 300 ‘transactions product 820 "type A 72 ype ak 10 Spe 150 type 8 ype ype ype pe sales transactions product, 2 Yellow 102000 100 "type A 4 Yellow 200000 230 type A 3 Yellow 250000 250. type A 12 Yellow 96100 ‘988 type 8 With the above example, I hope we have developed some clarity on the GroupBy object along with its attributes and methods. With this, now let's move forward to the next stage, which is APPLY. APPLY : Apply some function on the Object. Apply step can performed in three ways : Aggregation, Transformation, and Filtering. We all ‘amount of experience in using Aggregation with GroupBy objects, but most of us might not have experience with the Transformation and Filtering. Here, we will discuss all the three with special Transformation? = = » s oes oe wor A a hetua qu eit am assuming that we are already comfortable with applying the aggregation functions with therefore I will start of with some interesting features of this function. db str soci agaleo eet ee . Ses oe> a ta deling and Visualization aggres? «By choosing multiple columns to create the group, we increase the granulaty of the aggregation: FOE instance, while spiting we created 4 groups based on the column ‘colours, which has 4.categories of colours, 50 we had 4 groups, Now, if include ‘product’ column, having 2 categories (type A’ and ‘type B. along with the ‘colour’ column, then we will be having total 8 categories (ex. type A-Blue’, type A-Black’.) in total (4 x 2). This would be more clear from the below mentioned code. Groupby two columns and aggregation pata Aggregation and Analysis 14 the Groups created by multiple columns ta prod olor inde sales wroupy( prduct,“aloun'1, at indent) ,sum() Th on inaometree aa, prod_colour_index tee tamactons ot_cobut Wek Bick 2000 + The above code used the aggregation function as sum), thus we get the sum of sales and the transactions to the level of granularity defined by the combination of the ‘product’ and ‘colour’ columns. + Itis to be noted that we have used the parameter ‘as index=True, therefore we can see the ‘product and the ‘colour’ column as the index. On the contrary, if.we take the same parameter as False then in our output we will not get the ‘product’ and ‘colour’ columns as the index but as the columns. (roupby without Index as grouped coluan Andex-False).som() ‘ate prod. colour Noindex ~ data_sales.groupby([ product”, ‘colour’ ], ata prod_colour_Noladex Si eave east oduct colour sale transactions 7 WA eux 255000 ES $ tpeA sn 200000 00 2 ym _ Ras. 300000 1170 . 2 typeA Yetow 550000, 00 4 amb Bec. 260000 a eS! 5 tymB Bim 702000 so a $ yaB ed stam 30 make Tyme Yetow 0100 00 ma Custom Aggregation grouped by Multiple Columns " In previous example we used only single type of aggregation function for all the coltimns; however, if we want to aggregate different columns with different aggregation functions then we can use the custom aggregation functionality of the aggregation function. For doing this we’ can pass on the dictionary ie ~ aggregation function stating the columin name as ‘key’ and function name as ‘value’. Interestingly, “aah also ass the multiple aggregation functions to a column. Let us see an example code below for more ecarity, :Data Modeling and Visualization 1% Custom Aggregation with Groupty using Olctionary as 4 paranaten Anside aggregation function “ORs()é data_san,growpby{loraduct calor], ax nism) sape( soles} aps) "ernasetlonscLopceedians ome 09 sates wanton sum madi count protect colour 208 1% 200 5.2.4 Pivot Tables * Pandas : Pandas is an open-source library that is built on top of the NumPy library. It is a Python package ita and time series. It is that offers various data structures and operations for manipulating numerical dat mainly popular for importing and analyzing data much easier. Pandas is fast and it has high-performance & productivity for users. + Pivot Tables : A pivot table is a table of statistics that summarizes the data of a more extensive table (such as from a database, spreadsheet, or business intelligence program). This summary might include sums, averages, or other statistics, which the pivot table groups together in a meaningful way: é © Steps Needed © Import Library (Pandas) ‘0 Import / Load / Create data. © Use Pandas.pivot_table() method with different variants. Pivot tables in Excel + Pivot tables are one of Excel's most powerful features. A pivot table allows you to extract the significance from a large, detailed data set. } © Ourdata set consists of 213 records and 6 fields. Order ID, Product, Category, Amount, Date and Country. (Had nies D Bee {238 ‘Country “ZiBroccoli Vegetables $3,239 1/7/2016 United Ki 3 Banana Fruit $617 1/8/2016 United States 4 Banana Fruit $8,384 1/10/2016 Canada 5Beans Vegetables $2,626) 1/10/2016 Germany 6 Orange Fruit | $3,610] 1/11/2016 United States 7 Broccoli Vegetables $9,062. 1/11/2016 Australia vs 8 Banana Fruit | $6,906 1/16/2016 New Zealand Fruit 9 Apple $2,417 1/16/2016 France-_ ta Modeling and Visualization spoon Pivot Table toinsert a pivot table, execute the following stéps, 1. Glick any single cell inside the data set, 2, Onthe Insert tab, in the Tables group, click PivotTable. Recommended Table PivotTables Tables The following dialog box appears. Excel automatically selects the data for you. The default location for a new pivot table is New Worksheet 3. Glick OK. Create PivotTable Choose the data that you want to analyze © Setect a table or rang Table Mange: { O Use an external data source Connection name: Use this workbook’s Data Model Choose where you want the PwatTable report to be placed @ New Worksheet O Baisting Worksheet bain (ose eee Choose whether you want to analyze multiple tables — (Aas this data to the Data Model yl Stancat ) Drag fields The PivotTable Fields pane appears. To get the total amount exported of each product, drag the following fields to the different areas. 1. Product field to the Rows area. 2 Amount field to the Values area. 3. Country field to the Filters area,Data Modeling and Visualization PivotTable Fields Choose fields to add to report: (Search (1 Order iD C1 Deferiayout Update F Update | Below you can find the pivot table. Bananas are our main export product. That's how easy pivot tables can b Sort To get Banana at the top of the list, sort the pivot table. 1. Click any cell inside the Sum of Amount column. “a 2. Right click and click on Sort, Sort Largest to Smallest.OB Cony ED Eormat cote. Number Format. 2X Remeye “Sum of Amount” Summarize Values By ‘Show Values As r BB voveried seringse Prue Table Cptions. BE ce Fea ie Filter enue Because we added the Country field to the Filters area, we can filter this pivot table by Country. For example, which products do we export the most to France? ‘ 1. Glick the filter drop-down and select France. Result. Apples are our main export product to France.Data Modeling and Visualization Change Summary Calculation By default, Excel-summarizes your data by either su! calculation that you want to use, execute the following steps. 1. Click any cell inside the Sum of Amount column, 2. Right click and click on Value Field Settings. mming or counting the items. To change the Lopy Eormat Cells... Number Format,. Refresh Sort Remoye “Sum of Amount” Summarize Values By Show Values As jvotTable Option: 8 By Hiderieta ist 3. Choose the type of calculation you want to use. For example, click Count. Value Field Settings Source Name: Amount "Summarize Values By Show Values AS | | Summarize value field by ; | Ghoose the type of calculation that you want to use to summarize | data from the selected field 4, Click OK. Result. 16 out of the 28 orders to France were ‘Apple’ orders.a I aang enterin ae et 5-17 ta Aggregation and Analysis A ‘Two-dimensional Pivot Table Ifyou drag @ field to the Rows area and Columns area, you can create a two-dimensional pivot table. First, insert a pivot table. Next, to get the total amount exported to each country, of each product, drag the following fields to the different areas. 1. Country field to the Rows area, 2, Product field to the Columns area, 3. Amount field to the Values area. 4, Category field to the Filters area, Pivotlable Fields > x | Choasefieldsto addto report: | $ >| Drag fields between areas below: Y Fiters BI Columns | Category ProductData Modeling and Visualization 5.10 Below you can find the two-dimensional pivot table. MIO Ae anon eee tw 3 |Sum of Amount Column|~| 4 |Rowlabels _[=|Apple Banana Beans Broccoli Carrots Mango Orange Grand Te 5 |Australia 20634 52721 14433 17953 8106 9186 8680 6 |Canada 24867 33775 12407 phe trl! 7 |France 80193 36094 680 5341 9104 7388 2256 8 |Germany 9082 39686 29905 37197 21636 8775 8887 9 |New Zealand 10332 40050 4390 seh 10 |United Kingdom 17534 42908 5100 38436 41815 5600 21744 A1|Unitedstates _—-28615 950617163 26715 56284 22363 30932 12 |Grand Total 191257 340295 57281 142439, 136845 57079 104438 B 5.2.5 Cross Tabulation Cross tabulation, also known as a contingency table, is another technique used to analyze the between two categorical variables. It presents the frequency distribution of the data for each co the variables, providing a clear overview of their associations. = Cross tabulation is often visualized as a table, with one variable’s categories farming the rows and the variable's categories forming the columns. The cells of the table display the counts or percent ‘observations falling into each category combination. Cross-tabulation analysis has its unique language, using terms such as “banners”, “stubs”, Statistic" and “Expected Values.” A typical cross-tabulation table comparing the two hypothetical “City of Residence" with “Favorite Baseball Team’ is shown below. Are city of residence and being a that team independent? The cells of the table report the frequency counts and percentages for the of respondents in each cell You typically use cross tabulation when you have categorical variables or data - e.g. information tt divided into mutually exclusive groups. For example, a categorical variable could be customer reviews by region. You divide this i reviews per geographical area: North, South, East, West, or state, and then analyze the rel ‘that data. ‘Another example of when to use cross-tabulation is with product surveys - you could ask a g people “Do you like our products?” and use cross-tabulation to get a more insightful answer. R recording the 50 responses, you can add another independent variable, such as gender, and tabulation to understand how the male and female respondents view your product. 5 With this information, you might see that your female customers prefer your products more tt ‘customers. You can then use these insights to improve your products for your male customers. eegg TS “=C W Multiply the Grd period by 4 to get the 12th period (one year) twelfth nonth_period = nonths_periodl2] * + print(twalfthmonth_peried) # Output: Period('2023', "A-DEC') In the context of mathematics and numerical operations, “periods” typically refer to different concepts depending on the specific field of mathematics or context in which they are used. Here are some common | interpretations of "periods" and "periods arithmetic” : . Ve 1. Periods in Trigonometry and Geometry * In trigonometry, a “period” refers to the interval over which a trigonometric function, such ss sine or cosine, repeats itself. For example, the sine function has a period of 2m radians, which jeans it repeats its values every 2n radians. ami. * In geometry, "periodic tessellation” refers to a repeating patter of tiles or shapes Oto wsithout any gaps or overlaps. These tessellations often have periodic characteristics. = 2. Periods in Time and Frequency ‘ Tea as tn the context of time and frequency analysis, a “period” usually refers to the time it takes | event or wave to complete one full cycle. For example, the period of a simple harrr oa haha time it takes for a pendulum to swing back and forth once, : eee rData Modeling and Visualization 5-28 Data Aggregation and, 3. Periods in Financial Mathematics: * _ In finance, "periods" often refer to discrete time intervals, such as months, quarters, or years, calculations often involve compound interest, where the number of periods plays a crucial determining the final amount * Now, let's briefly discuss "periods arithmetic," which might refer to. mathematical oper calculations involving periods: 4. Periodic Functions Arithmetic When working with periodic functions like sine and cosine, you can perform arithmetic operations i these functions. For example, you can add, subtract, multiply, or divide periodic functions while con their periods, 5. Time-Series Arithmetic: In time-series analysis, arithmetic operations can be performed on data points collected at regular intervals (periods). These operations might include calculating averages, growth rates, or changes in ‘ver specific periods of time. 6. Financial Periods Arithmetic: * In finance, calculations often involve periods, such as compounding interest over a specific numt Periods, calculating the net present value of cash flows occurring at different periods, or determining future value of investments over multiple periods. * To perform arithmetic operations with periods, it’s important to understand the specific context units involved, as different fields and situations may have their own conventions and formulas working with periods. 5.3.8 Resampling and Frequency Conversion Resampling and frequency conversion are important techniques used in time series data analysis to the time intervals at which data is recorded or observed, These operations are useful for adjusting the granu of time series data, aggregating data at different time intervals, and preparing data for specific ti analyses. Let's explore resampling and frequency conversion in more detail : 1. Resampling : Resampling involves changing the frequency of a time series by aggregating or int data to different time intervals. Resampling is often necessary when the original data is collected different frequency than the desired analysis frequency. There are two primary types of resampling = a) Downsampling : Downsampling involves reducing the frequency of the data, converting it to a time resolution. This is typically done to summarize data over longer time periods. For converting daily data to weekly or monthly data involves downsampling. In downsampling, data is aggregated or combined within each new time interval. Common functions include sum, mean, median, or other statistical measures. b) Upsampling : Upsampling involves increasing the frequency of the data, converting it to a higher resolution. This is often done to add more data points within a given time period, For converting monthly data to daily data involves upsampling.Data Aggregation and Analysis Interpolation ig on 29 In upsampling, new data points are interpolated or filled in between existing data pol snd methods may include linear interpolation, polynomial interpolation, or'other techniques depe' the nature of the data, Frequency Conversion : 2 frequency, which Frequency conversion refers to changi ime series to a new 's to changing the time intervals of a time ination of both can be either higher or lower than the original frequency. Frequency conversion is @ corm! downsampling and upsampling operations. + For example, converting daily data to hourly data involves frequency conversion by upsampling, where new data points are inserted between the existing daily data points : +. Frequency conversion is useful when analyzing data at a different time granularity, or when aligning multiple time series with different frequencies for comparison. 3, Resampling and Frequency Conversion Example : Let's illustrate resampling and frequency conversion using Python with the pandas library: sayian import pandas as pd # (Create a daily time series with random data dates = pd.date_range(start='2023-01-01', periods=10, freq="D') dally data = pd.Series(range(10), index=dates) ‘#Downsample to weekly data by taking the sum of each week ‘weekly data = daily dataresample('W').sumQ) print{weekdy_data) #Upsample to hourly data using linear interpolation hourly dates = pd.date range(start='2023-01-01', end='2023-01-10', freq="H) +hourly data = daily data.resample('H').interpolate(method="inear’) In this example, we first create a daily time series with random data. We then downsample the data to weekly frequency by taking the sum of data points within each week. Next, We upsample thé data to ‘hourly frequency Using linear interpolation to fill in new data points between the existing daily data points. 5.3.9 Moving Window Functions Moving window functions, also known as rolling or sliding window functions, are essential tools in time series” ata analysis. They involve applying a specific function or operation to a fixed-size window of data that moves along the time axis, These functions are used to calculate rolling statistics, smooth data, identify trends, and Perform other time-based computations, Moving window functions are Particularly valuable when dealing with ea 's explore moving window functions in more detailData Modeling and Visualization 5-30 Data Aggregation and Basic Concept : The basic concept of moving window functions involves defining a fixed-size windo Spans a specific number of consecutive data points in the time series, The window moves one step along the time axis, continuously updating the window's data points and recalculating the function's o Applications : Moving window functions have various applications in time series data analysis, includ 4) Rolling Statistics : Calculating rolling statistics such as rolling mean, rolling median, rolling deviation, or other aggregated measures within the moving window. These statistics help and identify underlying patterns and trends, b) Moving Averages : Computin, 19 moving averages by taking the mean of the data within the M loving averages are commonly used to reveal underlying trends or eliminate noise in the data, ‘J Exponential Moving Averages (EMA) : similar to moving averages, EMA assigns decreasing weights to data points within the window, giving more weight to recent observations. El widely used in financial analysis and trend detection, Rolling Sum : Calculating the sum of data points within the window to analyze cumulative trends. Window Size : The size of the moving window is an essential parameter in moving window fu window size determines the number of data points included in the computation at any given time. window size results in a smoother output but may lead to slower responsiveness to changes in the the other hand, a smaller window size provides a more responsive output but may be more noise, Handling Boundary Effects : When applying moving window functions, boundary effects considered. At the beginning and end of the time series, there may not be enough data points to fe complete window. Different strategies can be used to handle this, such as padding missing values or weighted windows that give more weight to available data points. In addition to the above two boundary effects of electrostatic and hydrodynamic nature Presence of the boundary also has a geometric confining effect upon the ion distribution around the if the double layer is thick enough to reach the boundary. This affects the electric driving force c external electric field is applied upon the system. This perhaps is the most important issue in the boundary effect in electrokinetic motion, as the double layer polarization/deformation ultimate particle motion in general. And obviously the thicker the double layer is, the more si factor is, as the deformation of the double layer can be much more profound. 5.3.9(A) _ Moving Window Functions Example * Time series data is a series of data points recorded with a time component (temporal) present. the time these data points are recorded at a fixed time interval. * * Many real-world datasets like stock market data, weather data, geography datasets, earthquake d are time series datasets. ‘ : * While working with time series datasets, we need to perform various operations on them to from different perspectives, The two most common operations are resampling and movingat a2 as a4 as a6 po jodeling and Visualization 31 Data Aggregation and Analysi Time series Resampling is the process of changing frequency at which data points(observations) are recorded. Resampling is generally performed to analyze how time series data behaves under different frequencies. Moving window functions are aggregate functions applied to time series datasets by moving window of fixed variable size through them. Moving window functions can be used to smooth time series to handle noise. Let's illustrate moving window functions using Python with the pandas library : Smport pandi as pd 4 Create o tine series with random data dates = pd.date_range(start='2023-01-01', periods=i0, freq='D') sta = pd.Series([10, 15, 20, 58], index=dates) # Calculate the rolling mean with a window size of 3 zolling_mean = data. rolling(window=).mean() print(rolling_mean) In this example, we create a time series with random data and calculate the rolling mean using a window size of 3. The rolling mean smooths the data by taking the average of every three consecutive data points. What is mean by group by mechanics? Explain data aggregation, Discuss various uses of data aggregation. Explain three steps of the split-apply-combine strategy. Explain various date and time tools. What is time series of data? Discuss about data ranges,

Functional Dependencies and Normalization
No ratings yet
Functional Dependencies and Normalization
7 pages
CNN RNN Assignment Set 4
0% (1)
CNN RNN Assignment Set 4
2 pages
Unit 3
No ratings yet
Unit 3
34 pages
DBMS - Unit-3
No ratings yet
DBMS - Unit-3
35 pages
Unit V Big Data Analytics
No ratings yet
Unit V Big Data Analytics
47 pages
Neural Network Unit - 4 - 221210 - 134739
No ratings yet
Neural Network Unit - 4 - 221210 - 134739
15 pages
Unit 4
No ratings yet
Unit 4
38 pages
Hadoop Ecosystem and Their Components
No ratings yet
Hadoop Ecosystem and Their Components
19 pages
Unit-I Introduction and ANN Structure
No ratings yet
Unit-I Introduction and ANN Structure
15 pages
Unit - 3
No ratings yet
Unit - 3
42 pages
R Language
No ratings yet
R Language
59 pages
Unit I R Data Structures
No ratings yet
Unit I R Data Structures
30 pages
Overview of Parallel Coordinates, Visualizing Neural Network and Visualization of Trees
No ratings yet
Overview of Parallel Coordinates, Visualizing Neural Network and Visualization of Trees
9 pages
Deep Learning and CNNFYTGS5101-Guoyangxie
No ratings yet
Deep Learning and CNNFYTGS5101-Guoyangxie
42 pages
Unit 5 RNN
No ratings yet
Unit 5 RNN
14 pages
Facets of Data
No ratings yet
Facets of Data
6 pages
End-to-End Machine Learning Project (Bootcamp)
No ratings yet
End-to-End Machine Learning Project (Bootcamp)
415 pages
Mc5502 Bda Unit I Notes
No ratings yet
Mc5502 Bda Unit I Notes
106 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Data Mining and Business Intelligence Lab Manual
No ratings yet
Data Mining and Business Intelligence Lab Manual
52 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
34 pages
Lecture 01 05.08.2024 AI-ML Introduction
No ratings yet
Lecture 01 05.08.2024 AI-ML Introduction
46 pages
Unit-1 Basics of Algorithms and Mathematics
No ratings yet
Unit-1 Basics of Algorithms and Mathematics
47 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
IV Unit Fds
No ratings yet
IV Unit Fds
16 pages
Study Materials - Restricted Boltzmann Machine
No ratings yet
Study Materials - Restricted Boltzmann Machine
6 pages
Model Building Through
No ratings yet
Model Building Through
21 pages
Bda - 2 Unit
No ratings yet
Bda - 2 Unit
12 pages
Handling Missing Value
No ratings yet
Handling Missing Value
12 pages
App SRM Unit 5 Notes
No ratings yet
App SRM Unit 5 Notes
35 pages
O.R - Unit - I, II, III
No ratings yet
O.R - Unit - I, II, III
44 pages
Issues in ML
No ratings yet
Issues in ML
2 pages
Introduction To Data Engineering
No ratings yet
Introduction To Data Engineering
8 pages
Distributed System
100% (1)
Distributed System
119 pages
Data Warehousing and Data Mining (10cs755)
No ratings yet
Data Warehousing and Data Mining (10cs755)
142 pages
Lec 6 Data Visualization
No ratings yet
Lec 6 Data Visualization
101 pages
Single Layer Perceptron
No ratings yet
Single Layer Perceptron
6 pages
Unit III
No ratings yet
Unit III
58 pages
6 Different Ways To Compensate For Missing Values in A Dataset
No ratings yet
6 Different Ways To Compensate For Missing Values in A Dataset
6 pages
DWDM R13 Unit 1 PDF
No ratings yet
DWDM R13 Unit 1 PDF
10 pages
4.7.1 - Data Warehousing Mining & Business Intelligence
No ratings yet
4.7.1 - Data Warehousing Mining & Business Intelligence
3 pages
Unit 4
No ratings yet
Unit 4
4 pages
Rajesh (DL Unit1) 04dec2024
No ratings yet
Rajesh (DL Unit1) 04dec2024
125 pages
Question Bank of Applied Machine Learning
No ratings yet
Question Bank of Applied Machine Learning
2 pages
Cp7029 Information Storage Management
100% (1)
Cp7029 Information Storage Management
1 page
Deep Learnig
No ratings yet
Deep Learnig
16 pages
CP5261 Data Analytics Laboratory LTPC0042 Objectives
No ratings yet
CP5261 Data Analytics Laboratory LTPC0042 Objectives
80 pages
Unit 1 Bda Complete Notes
No ratings yet
Unit 1 Bda Complete Notes
15 pages
CCchap 2
No ratings yet
CCchap 2
7 pages
Data Mining Concept Description: Characterization and Comparison
No ratings yet
Data Mining Concept Description: Characterization and Comparison
14 pages
Data Visualization Using Matplotlib and Seaborn
No ratings yet
Data Visualization Using Matplotlib and Seaborn
28 pages
Pythonic Data Cleaning With Numpy and Pandas
No ratings yet
Pythonic Data Cleaning With Numpy and Pandas
11 pages
Case Study On Dbms & Rdbms
No ratings yet
Case Study On Dbms & Rdbms
36 pages
ML Notes MAKAUT 7th Sem
No ratings yet
ML Notes MAKAUT 7th Sem
31 pages
Deep Learning Approach For Ethiopian Banknote Denomination Classification and Fake Detection System
No ratings yet
Deep Learning Approach For Ethiopian Banknote Denomination Classification and Fake Detection System
8 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
Acn Question Bank With Solution.
100% (1)
Acn Question Bank With Solution.
47 pages
Data Aggregation Using Python
No ratings yet
Data Aggregation Using Python
33 pages
Notes - 5 Unit
No ratings yet
Notes - 5 Unit
55 pages
Predictive Data Analytics With Python
100% (2)
Predictive Data Analytics With Python
97 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
74 pages
Chapter 3
No ratings yet
Chapter 3
57 pages
Competitive Learning Neural Network
No ratings yet
Competitive Learning Neural Network
62 pages
MAD LEARNING EBOOK - Watermark
No ratings yet
MAD LEARNING EBOOK - Watermark
92 pages
Object Detection Using YOLO: Challenges, Architectural Successors, Datasets and Applications
No ratings yet
Object Detection Using YOLO: Challenges, Architectural Successors, Datasets and Applications
33 pages
T.E. 2019 Pattern Insem Timetable
No ratings yet
T.E. 2019 Pattern Insem Timetable
13 pages

Data Modelling and Visualization

Uploaded by

Data Modelling and Visualization

Uploaded by

You might also like