Advanced Database Systems
Advanced Database Systems
Advanced Database Systems
Donghyun Jeong
Contents
n
n
n
n
n
n
n
Ill briefly introduce about 1-D time series data indexing. In fact, I
cannot get currently used data such as stock data. Therefore I
generate artificial data to process and show how to do indexing.
Ill have a look all processes of designed and implemented
procedures. (More brief information can find in the book;
ADVANCED DATABASE SYSTEMS)
Stock Data(example)
ABC NEWS
70
60
Stock
50
40
30
20
10
M9
M1
0
M1
1
M1
2
M8
M7
M6
M5
M4
M3
M2
M1
Month
M2
M3
NAME
M4
M5
45.6375
38.69375
69.34375
62.35625
58.6125
67.8875
75.690625
60.2
61.340625
. .. MBS TV
ABC
Hallym
Generate Data
n
Preprocessing
n
Searching
n
n
Refining Process
Preprocessing (I)
n
Using DFT, we can change 1-D time series data to spatial data
on frequency domain. After change to spatial data, choose three
data (f = 1 ~ 3). If the wanted data is included(false alarm), we
can find data with the refining process. But if the wanted data is
not included(false dismissals), we cannot find data. The reason
of changing data to spatial data is that the Euclidean distance of
original data is always greater than the Euclidean distance of
spatial data which is changed to spatial data using DFT. It
denotes no fault dismissals. [2][3]
DFT Pseudo Code
for k=0 to sizeofdata -1
real_tmp = 0
// real data
imag_tmp = 0
// image data
Preprocessing (II)
n
Clustering
n
I use the iterative method to cluster data. Especially I used Kmeans algorithm to cluster. K-means algorithm is to make K
numbers of cluster center using squared Euclidean distance. [4]
----- GROUP LIST(Clustering) ----Cluster Number + {Elements}
0{0}
1 { 28 44 87 }
2 { 19 30 67 84 }
3 { 3 32 70 }
4 { 4 39 81 91 95 }
5 { 5 37 69 83 86 }
6 { 6 46 48 62 80 93 }
7 { 7 41 71 74 }
8 { 8 36 40 63 }
9 { 9 60 61 78 88 89 90 97 }
10 { 10 31 54 }
11 { 11 72 73 92 }
12 { 12 96 }
13 { 13 42 49 59 85 }
14 { 14 35 47 77 }
15 { 15 26 34 43 55 57 58 65 82 }
16 { 16 50 64 75 }
17 { 17 68 94 98 }
18 { 18 38 51 56 }
19 { 25 53 66 }
Preprocessing (II)
n
Clustering
3-D Space(without Z value)
80
70
DFT2(Y)
60
50
40
30
20
10
0
0
20
40
60
DFT1(X)
80
100
120
DFT2
DFT3
Searching (I)
n
Search Data
Searching (II)
n
Search Data
Similary
Search
Result
Refining
Result
As you see on the above, we can find similar search results and
refining result. On the refining process, we determined e value
must be smaller than 1.0 (e<1.0). Otherwise the exact data which
we want to search cannot be found.
Serch
Found Similary Search Result
Company Name : GVE
Company Name : Medal co.
Company Name : YAYAWA
Refining Result
Company Name : Medal co.
10
Implementation (I)
n
User Interface
n
Database
n
11
Implementation (II)
n
Datagram(table descript)
Cluster
values
Data &
DFT
values
Table consists of cluster data and spatial data. Cluster data and
ID are connected with foreign key with GRP(cluster group
number).
CREATE TABLE [dbo].[timeseries ] (
[ID] [float] NULL ,
[M1] [float] NULL , [M2] [float] NULL , [M3]
[float] NULL , [M4] [float] NULL , [M5] [float] NULL , [M6] [float] NULL , [M7]
[float] NULL , [M8] [float] NULL , [M9] [float] NULL , [M10] [float] NULL , [M11]
[float] NULL , [M12] [float] NULL , [DFT1] [float] NULL , [DFT2] [float] NULL ,
[DFT3] [float] NULL , [GRP] [int] NULL , [NAME] [nvarchar] (255) NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[cluster] (
[ID] [int] NOT NULL , [C1] [float] NULL , [C2] [float] NULL , [C3] [float]
NULL ) ON [PRIMARY]
GO
ALTER TABLE [dbo].[cluster] WITH NOCHECK ADD
CONSTRAINT [PK_cluster] PRIMARY KEY NONCLUSTERED
( [ID] ) ON [PRIMARY]
GO
ALTER TABLE [dbo].[timeseries ] ADD
CONSTRAINT [FK_timeseries_cluster] FOREIGN KEY
( [GRP] ) REFERENCES [dbo].[cluster] ( [ID] )
GO
12
Demo
13
Result
n
14
Reference
n
15