Jkuilui Using Learning by Imitation: Dapeng Zhang, Zhongjie Cai, Bernhard Nebel
Jkuilui Using Learning by Imitation: Dapeng Zhang, Zhongjie Cai, Bernhard Nebel
Jkuilui Using Learning by Imitation: Dapeng Zhang, Zhongjie Cai, Bernhard Nebel
1 2
official cite: http://kde.org http://www.informatik.uni-freiburg.de/∼kiro/KBlocksDummyPlayer.tgz
The middle column with the dotted arrow lines shows the to develop the filters for reducing the amount of data in the
sequence of the computation in the games. With the current learning.
DataPreprocessor A filter consists of a set of patterns. Figure 3 shows the
Filtered column
Counting Filters
Data
x
Hand x x
Pattern Activated
Info.Gain Coded x x
Calculator Patterns
Features x x c c c
flat x x x x x x x c x
Support hole x x x x x x x x
SVM Class
Vector x x x x x x x x x
Learner Labels
Machine x x x x x x x x x
x x x x x x x x x
stop
buriedwell
Training the SVMs is time consuming. There are 7 different The patterns are useful not only in the filters but also for
modeling the skills of the imitated players. For instance, a
pieces in Tetris: L, J, O, I, T, Z, and S. To place one of L, J, or
pattern was activated 1000 times over the training set, among
T, there will be 34 candidates by combining all the possible
which 900 were activated by the positive cases. This pattern
rotations and positions; O has 9 combinations; I, Z, or S have
cannot be used in a filter because there are mixed negative and
17. The candidate chosen by the imitated player is regarded as
positive cases. However, activating it apparently indicates that
the positive case. The others are the negative cases. If the size
the placement tends to be positive because of the positive to
of the training set is 10000, there are about 220000 tuples
negative rate in the training data. Therefore, the patterns are also
(cases) in the set. If each tuple is a vector of 39 values, training
used in this section to compute the inputs of the SVMs.
a SVM from these data would take more than a week using a
However, the patterns can only get the “local” information.
2.3GHz PC.
They are checked within the small field around the placement.
In order to reduce the data set, the types of pieces are used in
From another aspect, it is important to consider some “global”
the data preprocessor to separate the data into 7 subsets. Each
parameters in Tetris. For example, a candidate placement can
subset is used to train its own filter, patterns, and SVM. In other
clear 4 rows. This would be important for the game. The
words, seven SVMs work together in the artificial player.
patterns, however, cannot express this occurrence.
When placing the current piece, human players can first
We designed hand-coded features to acquire “global”
reduce the candidates to a limited number by observing the
information. If the patterns can define the tactics of the games,
surface of the accumulated blocks. Then, they choose one from
the features can be used to describe the strategies. In order to
the filtered candidates as their final decision. This idea was used
define these features, we use Figure 4 to illustrate some phrases:
hole, flat, column, and well. A well or a hole is buried if it is no The values of the inputs should be within the same range in
deeper than three cells from the surface. the SVMs. The patterns always have a value of 0 or 1, which
The features are listed in Table I. Items 2 and 3 are for the denotes whether or not it is activated by the current placement.
column. Items 4 − 6 are about the flat. 9 − 11 are for the hole. The value of features, however, can be much bigger. For
14 − 18 are about the well. Our features are compared example, the maximum length of the flat can be up to 9 in a
TABLE I standard Tetris game. In order to avoid this situation, the values
List of Hand-Coded Features of the features were mapped to 0, 0.5, or 1 in our
Imitating Human
1* How many attacks are possible after the current 0.8 Imitating AI
0.7
placement. 0.6
0.5
2 The number of the columns. 0 20 40 60 80 100
5
5 The decreased number of the flat. 0 20 40 60 80 100