Chat Bot Making Process

Download as pdf or txt
Download as pdf or txt
You are on page 1of 99

SEOUL | Oct.

7, 2016

CHAT BOT MAKING PROCESS


USING
PYTHON 3
& TENSORFLOW
Jeongkyu Shin
Lablup Inc.

Illustration * Idol M@ster / Bandai Namco Games. All rights reserved.


IM / Jeongkyu Shin / @inureyes
Humble business man
Lablup Inc. : Coding education platform / technology

Open-source devotee
Textcube maintainer / KOSS Lab.

Play with some (open||hidden) projects / companies

Physicist / Neuroscientist
Adjunct Professor / Hanyang Univ. (Computer Science)

Studied information processing procedure in Brain / social systems

Ph.D in Statistical Physics (complex system)

Major in Physics / Computer science

2
ARE YOU ALREADY LISTENED MY TALK?
Then lets eat again!

3
> RUNME -LOOP=4
Became the first man to get 2 official presenter
shirts in PyCON APAC 2016!
8.13.2016 (in Korean)

8.14.2016 (in English)

And at Google Developer Community


8.31.2016

And now.

Are you ready? (Im not ready)*

4
*Parody of something. Never mind.
WELCOME TO MY GARAGE!
Tons of garbage here!

5
First with the head, then with the heart.

Bryce Courtenay's The Power of One


TODAYS ENTREE: CHAT BOT
Python 3
Twitter Korean Analyzer / Komoran with KoNLPy / pandas

TensorFlow
0.8 -> 0.9 -> 0.10RC0

And special sauce!


Special data with unique order

Special python program to organize / use the data!

7
Clipart* (c) thetomatos.com
INGREDIENTS FOR TODAY'S RECIPE
Data
Test: FAS dataset (26GB)

Today: Idolm@ster series and etc.

Tools
TensorFlow + Python 3

Todays insight
Multi-modal Learning models and model chaining

8
Im not sure but
Ill try to explain
the
whole process I did
(in 30 minutes?)
Game screenshot*
9 (c) CAVE
Forkcrane* (c) Iconix
Im not sure but
Ill try to explain
the
whole process I did
(in 30 minutes?)
Game screenshot*
10 (c) CAVE
Forkcrane* (c) Iconix
And I assume that
you already have
experience /
knowledge about
machine learning
and TensorFlow

11
Illustration *(c) marioandluigi97.deviantart.com
THINGS THAT WILL NOT BE COVERED TODAY
Phase space / embedding dimension Vector representation of language
sentence
Recurrent Neural Network (RNN)
Sequence-to-sequence model
GRU cell / LSTM cell
Word2Vec / Senti-Word-Net
Multi-layer stacking
Batch process for training

12
Clip * Idol M@ster the animation / Bandai Namco Games All rights reserved.
NEED TO LEARN?
codeonweb.com
https://www.codeonweb.com/course/@deep-learning-with-tensorflow-tutorials

13
14
15
16
ONE DAY IN SEOUL ITAEWON, 2013
All started with dinner talks of neuroscientists...

17
WHAT IS CHAT BOT?
Chatting bots
One of the
Oldest Human-Computer Interface (HCI) based machines

Challenging lexical topics

Interface: Text Speech (vocal) Brain-Computer Interface (BCI)


Commercial UI: Messengers!

18
BASIC CHAT BOT COMPONENTS

Natural
Lexical Context Decision Response Lexical
Language
Input Analyzer maker Generator Output
Processor

20
TRADITIONAL CHAT BOTS

Natural
Lexical Context Decision Response Lexical
Language
Input Analyzer maker Generator Output
Processor

Morphemic
analyzer Search engine Templates

Taxonomy Knowledge
analyzer base
21
CHAT-BOTS WITH MACHINE LEARNING

Sentence
Natural
Lexical To Context Decision Response Lexical
Language
Input vector Analyzer maker Generator Output
Processor
converter

Deep-learning model Deep-learning model


(RNN / sentence-to-sentence)
SyntaxNet / NLU
(Natural Language Knowledgebase Per-user context
Understanding)
(useful with TF/IDF ask bots) memory
22
PROBLEMS
Hooray! Deep-learning based chat bots works well with Q&A scenario!

General problems

Inhuman: restricted for model training sets

Cannot "start" conversation

Cannot handle continuous conversational context and its changes

Korean-specific problems

Dynamic type-changes

Postpositions / conjunction (Josa hell)


23
The great wall
of Korean ML+NLP
= Like
Hell Josa ActiveX+N*+F*
In Korean Web
We expect these but...

Clip art *Lego


We got these.

...How can I assemble them?

Photo * amazon.com
BACK TO THE ORIGIN
What I learned for 9 years

27
BRAIN AS A MULTI-MODAL CONTEXT MACHINE
Selection
Functionally orthogonal connection types should
have complementary indicators for smaller dim. /
better presentation

Mixture
Final axes are weighted according to the context
density of mixtures

Weight function
Maximize the state difference in context space

Space transformation: likelihood estimation

28
One liner:
divide and conquer
INFORMATION PATHWAY DURING
CONVERSATION
During conversation:
2. Send information

1. Preprocessing
3. Context recognition
4. Spread / gather
processes to 6. Postprocessing to
determine answer generate sentence

5. Send conceptual
response to parietal lobe 30
Clipart* (c) cliparts.co
ARCHITECTURING
Separate the dots

Simplifying information to context analyzer

Generates complex response using diverse models

Sentence generator

Grammar generator model

Simple word sequence to be complete sentence

Tone generator model

Change sentence sequence tones with specific tone


31
IDEAS FROM STRUCTURE
During conversation:
2. Send information

1. Disintegrator
3. Context parser Grammar
engine
4. Decision maker 6. Postprocessing with
using ML model tone engine to
generate sentence

5. Send conceptual response


to Sentence generators 32
Clipart* (c) cliparts.co
IDEAS FROM STRUCTURE
Multi-modal model
Disintegrator (to simplify sentence into morphemes)

Bot engine

Generates morpheme sequence

Grammar model

Make meaningful sentence from morpheme sequence

Tone model

Change some conjunction (eomi) / words of grammar model result

33
FINAL STRUCTURE
Deep-learning model Sentence generator
(sentence-to-sentence
+ context-aware word generator)

Knowledge engine
Lexical Disintegrator Context Lexical
Input parser Grammar Tone Output
Emotion engine generator generator

Context memory

NLP + StV Context analyzer+Decision maker Response generator 34


MAKING MODELS
The importance of Prototyping

35
CREATING ML MODELS
Define
Prepare
input function
train dataset
step function
test dataset
evaluator
Runtime environment
batch

Do
Make
Training
Estimator
Testing
Optimizer
Predicting

36
CREATING ML MODELS
Define
Prepare
input function
train dataset
step function
test dataset
evaluator
Runtime environment
batch

Do
Make
Training
Estimator
Testing
Optimizer
Predicting

37
CREATING ML MODELS
Define
Prepare
input function
train dataset
step function
test dataset
evaluator
Runtime environment
batch

Do
Make
Training
Estimator
Testing
Optimizer
Predicting

38
CREATING ML MODELS
Define
Prepare
input function
train dataset
step function
test dataset
evaluator
Runtime environment
batch

Do
Make
Training
Estimator
Testing
Optimizer
Predicting

39
MODEL CHAIN ORDER
Sentence generator

Context analyzer
Lexical Disintegrator + Lexical
Input Decision maker Output
Grammar Tone
generator generator

NLP + StV AI Response generator 40


MODEL CHAIN ORDER
Sentence generator

Fragmented Fragmented
(Almost) Text with
Normal text text text
Normal text tones
sequence sequence
Context analyzer
Lexical Disintegrator + Lexical
Input Decision maker Output
Grammar Tone
generator generator

Semantic
sequence

NLP + StV AI Response generator 41


DISINTEGRATOR
a.k.a. morpheme analyzer for speech / talk analysis

Input

Text as conversation

Output

Ordered word fragments

42
DISINTEGRATOR
Rouzeta (https://shleekr.github.io/)
Finite State-based Korean morphological analyzer (2 month ago!)

Great and fast / with Python wrapper! (Just 3 days ago!)

Twitter Korean analyzer


Compact and very fast / Can be easily adopted with KoNLP package

Komoran can be a good alternative (with enough time)

Komoran with ko_restoration package (https://github.com/lynn-hong/ko_restoration)


Increases both model training accuracy / speed

However, it is soooooooo slow... ( > 100 times longer execution time)

43
DISINTEGRATOR
get_graining_data_by_disintegration
def get_training_data_by_disintegration(sentence):
disintegrated_sentence = konlpy.tag.Twitter().pos(sentence, norm=True, stem=True)
original_sentence = konlpy.tag.Twitter().pos(sentence)
inputData = []
outputData = []
is_asking = False

for w, t in disintegrated_sentence:
if t not in ['Eomi', 'Josa', 'Number', 'KoreanParticle', 'Punctuation']:
inputData.append(w+/+t)
for w, t in original_sentence:
if t not in ['Number', 'Punctuation']:
outputData.append(w)
if original_sentence[-1][1] == 'Punctuation' and original_sentence[-1][0] == "?":
if len(inputData) != 0 and len(outputData) != 0:
is_asking = True # To extract ask-response raw data
return ' '.join(inputData), ' '.join(outputData), is_asking 44
SAMPLE DISINTEGRATOR
Super simple disintegrator using twitter Korean analyzer (with KoNLPy interface)

(venv) disintegrator python test.py


Original : .
Disintegrated for bot / grammar input :
Training data for grammar model output:

. I ate miso soup in this morning.

[('', 'Noun'), ('', 'Josa'), ('', 'Noun'), ('', 'Noun'), ('', 'Josa'), ('
', 'Noun'), ('', 'Josa'), ('', 'Verb'), ('.', 'Punctuation')]

I / this morning / miso soup / eat


45
DATA RECYCLING / REUSING
Data recycling
Input of disintegrator Output of grammar model

Output of disintegrator Input of grammar model

original sentence (output for grammar model): ?


Disintegrated sentence (input for grammar model):
original sentence (output for grammar model): . .
Disintegrated sentence (input for grammar model):
original sentence (output for grammar model): .
Disintegrated sentence (input for grammar model):
original sentence (output for grammar model): !
Disintegrated sentence (input for grammar model):

46
CONVERSATION BOT MODEL
Embedding RNN Sequence-to-sequence model for chit-chat
For testing purpose: 4-layer to 8-layer swallow-learning (without input/output layer)

What is deep-learning model?


According review papers, ML with > 10 layers are.
And its changing now... it became buzz word..

Use tensorflow.contrib.learn (formally sklearn package)


Simpler and easier than traditional (3 month ago?) handcrafted RNN

Of course, seq2seq, LSTMCell, GRUCell are all bundled!

47
CONTEXT PARSER
Challenges
Continuous conversation
Knowledge engine
Context-aware talks
Context
Ideas parser
Emotion engine
Context memory

Knowledge engine

Emotion engine Context memory

48
MEMORY AND EMOTION
Context memory as short-term memory
Memorizes current context (variable categories. Tested 4-type situations.)

Emotion engine as model


Understands past / current emotion of user

Use context memory / emotion engine as


First inputs of context parser model (for training / serving)

Context parser input

Context Emotion Input


memory engine Disintegrated sentence fragments
49
CONVERSATIONAL CONTEXT LOCATOR
Using Skip-gram and bidirectional 1-gram distribution in recent text
. => Disintegrate first

Bidirectional 1-gram set: {(,),}, {(,),}, {(,),}

<I> <FOOD> <EAT>


Simplifying: {(<I>,),}, {(,<FOOD>),}, {(,<EAT>),<FOOD>}

<I> <TIME:DATE> <TIME:DAY> <FOOD> <EAT>


Distribution: more simplification is needed
{(<I>,<TIME:DAY>), <TIME:DATE>}, {(<TIME:DATE>,<FOOD>), <TIME:DAY>},
{(<TIME:DAY>,<EAT>),<FOOD>}

50
CONVERSATIONAL CONTEXT LOCATOR
Training context space
Context-marked sentences (>20000)
Context: LIFE / CHITCHAT / SCIENCE / TASK
Prepare Generated 1-gram sets with context bit
Train RNN with 1-gram-2-vec
Matching context space
Input bd 1-gram sequence to context space
Take the dominator axis

51
EMOTION ENGINE
Input: text sequence
Output: Emotion flag (6-type / 3bit)
Training set
Sentences with 6-type categorized emotion

Uses senti-word-net to extract emotion

6-axis emotional space by using WordVec model

Current emotion indicator: the most weighted emotion axis using WordVec model

Position in senti-space:
[0.95, 0.14, 0.01, 0.05, 0.92, 0.23] [1, 0, 0, 0, 0, 0] 0x01
index: 1 2 3 4 5 6 52
Illustration *(c) http://ontotext.fbk.eu/
KNOWLEDGE ENGINE
Advanced topic: Not necessary for chit-chat bots
Searches the tokenized knowledge related to current conversation
Querying information
If target of conversation is query, use knowledge engine result as inputs of sentence
generator

If information fitness is so high, knowledge+template shows great result

Thats why information server bot will come to us soon at first.

Big topic: I'll not cover today.

53
SENTENCE GENERATOR
Generates human-understandable sentence as a reply of conversation
Idea
Thinking and speaking is a separate processes in Brain

Why we use same model for these processes?

Models
Consists of two models: Grammar generator + tone generator

Why separate models?


Training cost

Much useful: various tones for user preferences


54
Clip art *Lego
GRAMMAR GENERATOR
Assembling sentence from word sequence
Input: Sequence of Nouns, pronouns, verbs, adjectives
sentence without postpositions / conjunction.

Output: Sequence of normal / monotonic sentence

55
RNN SEQ2SEQ GRAMMAR MODEL
Simple grammar model (word-based model with GRUCell and RNN Seq2seq / tensorflow translation example)
HIDDEN_SIZE = 25
EMBEDDING_SIZE = 25

def grammar_model(X, y):


word_vectors = learn.ops.categorical_variable(X,
n_classes=n_disintegrated_words,
embedding_size=EMBEDDING_SIZE, name='words')
in_X, in_y, out_y = learn.ops.seq2seq_inputs(
word_vectors, y, MAX_DOCUMENT_LENGTH, MAX_DOCUMENT_LENGTH)
encoder_cell = tf.nn.rnn_cell.GRUCell(HIDDEN_SIZE)
decoder_cell = tf.nn.rnn_cell.OutputProjectionWrapper(
tf.nn.rnn_cell.GRUCell(HIDDEN_SIZE), n_recovered_words)
decoding, _, sampling_decoding, _ = learn.ops.rnn_seq2seq(in_X, in_y,
encoder_cell, decoder_cell=decoder_cell)
return learn.ops.sequence_classifier(decoding, out_y, sampling_decoding)

56
GRAMMAR GENERATOR
Training set
Make sequence by disintegrating normal sentence
Remove postpositions / conjunction from sequence

Normalize nouns, verbs, adjectives

Model
3-layer Sequence-to-sequence model (for each encoder / decoder)

Estimator: ADAM optimizer with GRU cell


Adagrad with LSTM cell is also ok. In my case, ADAM+GRU works slightly better.
(Data size effect?)

Hidden feature size of GRU cell: 25, Embedding dimension for each word: 25.

57
TONE GENERATOR
Tones to make sentence to be more humanized
Every sentence has tones by speaker
The most important part to build the pretty girl chat-bot
Model
3-Layer sequence-to-sequence model

Almost same as grammar model (training set is different)

Can also be used to make chat bot speaking dialects

58
TONE GENERATOR
Input: sentence without tones
Output: sentence with tones
Data: Normal sentences from various conversation sources
Training / test set
Remove tones from normal sentences

morpheme treating effectively removes tone from sentence.

59
USEFUL TIPS
Sequence-to-sequence model is inappropriate for Bot engine
Easily diverges during training

Of course, RNN training will not work.

in this case, input / output sequence relationship is too complex

Very hard to inject context-awareness to conversation

Response with context-aware need to generate sentence not only from the ask,
but with context-aware data / knowledgebase / decision making process

Idea: input sequence into semantic bundle

It will work, I guess...

60
USEFUL TIPS
Sequence-to-sequence model really work well with grammar / tone engine
This is important for todays.

61
TRAINING MODELS
Goal is near here

62
TRAINING BOT MODEL
Input
Disintegrated sentence sequence without postpositions / conjunction

Emotion flag (3 bits)

Context flag (extensible, appending sentence with special indicator / 2 bits)

Output
Answer sequence with nouns, pronouns, verbs, adjectives

Learning
Supervised learning (for simple communication model / replaces template)

Reinforcement learning (for emotion / context flag, on the fly production)

63
TRAINING BOT MODEL
Training set
FAS log data ( http://antispam.textcube.org )

2006~2016 (from EAS data) / comments on weblogs / log size ~1TB (with spams)

Visited and crawled non-spam data, based on comment link (~26GB / MariaDB)

Original / reply pair as input / output

Preprocessing
Remove non-Korean characters from data

Data anonymization with id / name / E-mail information

64
TRAINING GRAMMAR GENERATOR
Original data set
Open books without license problem ( https://ko.wikisource.org )

Comments are not a good dataset to learn grammar

Preprocessing
Input data: disintegrated sentence sequence

Output data: original sentence sequence

65
TRAINING TONE GENERATOR
Original data set
Open books without license problem

Extract sentences wrapped with

e.g. " ? ?"

Preprocessing
Input data: sentence sequence without tone

e.g. ? ? (using morpheme analyzer)

Output data: original sentence sequence

66
ONE PAGE SUMMARY
The simplest is the best

67
Lexical
Input

? Disintegrator Disintegrator NLP + StV


Deep-learning model
Context analyzer (sentence-to-sentence Context
+ context-aware word generator) analyzer
[GUESS] [CARE] [PRESENT] +
Knowledge engine Decision
Decision maker Context
parser
Context
memory
maker
Emotion engine

Grammar generator Sentence generator


Response
Grammar generator
generator

Tone generator Tone generator

Lexical
Output
MAKING BOT
Lets make anime character bot (as I promised)!

69
DATA SOURCE
Subtitle (caption) files of many Animations!
Prototyping
Idol master conversation script (translated by online fans)

Field tests
Animations only with female characters

New data!
Communication script from Idol master 2 / OFA

Script from Idol master PS

70
71
DATA CONVERTER

Fetch
Remove
Logo / Ending Character names
Song scripts Nouns
Remove Numbers
Join timestamps : Lines with using
.smi to .srt
.srt files into one .txt and Japanese custom dictionary
blank lines Characters
and (Anime characters,
the next lines Locations,
of them Specific nouns)

cat *.srt >> data.txt


subtitle_converter.py

*.smi file format is de facto standard of movie caption files in Korea 72


Sentence data Train
Remove for
disintegrator
Reformat Too short sentences disintegrator integrator with
Duplicates grammar model grammar model
merge tone model tone model
sliced captions
into one line pandas

Extract Conversations
if last_sentence [-1] == '?': Conversation data Train
conversation.add(( for sequence-to-sequence
last_sentence, Bot model bot model
current_sentence))
subtitle_converter.py pandas

73
CONVENIENCES FOR DEMO
Simple bot engine
ask response sentence similarity match engine (similar to template engine)

Merge grammar model with tone model


Grammar is not important to create anime character bot?

Loose parameter set


For fast convergence: data size is not big / too diverse

No knowledge engine
We just want to talk with him/her.

74
Bot training procedure (initialization)
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
total conversations: 4217
Transforming...
Total words, asked: 1062, response: 1128
Steps: 0
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had
negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memoryClockRate (GHz) 1.304
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.92GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device:
0, name: GeForce GTX 970, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1501 get requests,
put_count=1372 evicted_count=1000 eviction_rate=0.728863 and unsatisfied allocation rate=0.818787
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 2405 get requests,
put_count=2388 evicted_count=1000 eviction_rate=0.41876 and unsatisfied allocation rate=0.432432
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 256 to 281
Bot model training procedure (after first fitting)
ask: <REP>.
response (pred): NAME <REP>.
response (gold): NAME .

ask: <UNK> <UNK> .


response (pred): NAME <REP>.
response (gold):

ask: <UNK> <REP>.


response (pred): NAME <REP>. Trust me.
response (gold):
Your NVIDIA card
can not only play
Bot model training procedure (after 50 more fittings) Overwatch, but
ask: <REP>. this, too.
response (pred): <REP>.
response (gold): NOUN .

ask: <REP>.
response (pred): <REP>.
response (gold): .

ask: <REP>.
response (pred): <REP>.
response (gold): .
Grammar+Tone model training procedure (initialization)
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
total line: 7496
Fitting dictionary for disintegrated sentence...
Fitting dictionary for recovered sentence...
Transforming...
Total words pool size: disintegrated: 3800, recovered: 5476
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:924] successful NUMA node read from SysFS had
negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GTX 970
major: 5 minor: 2 memory
ClockRate (GHz) 1.304
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.92GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: YI
tensorflow/core/common_runtime/gpu/gpu_device.cc:806] Creating TensorFlow device (/gpu:0) -> (device: 0,
name: GeForce GTX 970, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 1501 get requests,
put_count=1372 evicted_count=1000 eviction_rate=0.728863 and unsatisfied allocation rate=0.818787
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 100 to 110
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:244] PoolAllocator: After 2405 get requests,
put_count=2388 evicted_count=1000 eviction_rate=0.41876 and unsatisfied allocation rate=0.432432
I tensorflow/core/common_runtime/gpu/pool_allocator.cc:256] Raising pool_size_limit_ from 256 to 281
Grammar+Tone model training procedure (after first fitting)
disintegrated: NOUN <REP>.
recovered (pred): <REP>.
recovered (gold): NOUN .
disintegrated: <REP>.
recovered (pred): <REP>.
recovered (gold): .
disintegrated: <UNK> .
recovered (pred): <REP>.
recovered (gold): .
disintegrated: <REP>.
recovered (pred): <REP>.
Grammar model
recovered (gold): . converges fast.

Grammar+Tone model training procedure (after 10 more fitting) With GPU,


disintegrated: <REP>.
recovered (pred): <REP>.
it converges much
recovered (gold): .
disintegrated: NAME <REP>.
faster.
recovered (pred): <REP>.
recovered (gold): NAME.
disintegrated: <UNK> <UNK> <UNK> <UNK>.
recovered (pred): <REP>.
recovered (gold): .
disintegrated: <REP>.
recovered (pred): <REP>.
recovered (gold): .
Calculation time (scaled to 100%) Training speed test

100

80
And you must need
60 GPU-accelerated
environment
40 to let them work.
20 Bot training
Grammar training
0
CPU-only GPU(GTX970)

Grammar training Bot training


Thats why
we are prototyping
Handheld Machine Learning devices

using Nvidia GTX 1070 & SkyLake


With Sorna Framework and
Nublar webGUI

Sorna: Our Open-source


distributed code running platform
(https://github.com/lablup/sorna)
80
And also
Personal Machine Learning device
with support from backbone machine
via internet through sorna API.
82
Of course,
Your NVIDIA card
can also play
Overwatch.

83
USEFUL TIPS FOR ANIME CHARACTER BOT
DO NOT MIX different anime subtitles
Easily diverges during grammar model training. Strange. Huh?

Does it come from different translators tone? Need to check why.

Choose animation with extreme gender ratio


Very hard to divide gender-specific conversations from data

Tones of Japanese animation character are very different by speakers gender

Just choose boy-only / girl-only animation for easy data categorization

84
AND TACKLES TODAY
From TensorFlow 0.9RC, Estimator/TensorFlowEstimator.restore is removed
and not returned yet
I can create / train model but cannot load model with original code on TF 0.10RC.

Made some tricks for todays demo


Auto-generated talk templates from bot

Response matcher (match ask sentence and return response from template pool)

Conversation dataset size is too small to create conversation model


Not smooth talks

Easily diverges. Train many, many models to get proper result.

85
SERVING
Like peasant in Warcraft (OR workleft?)

86
TELEGRAM API
Why Telegram?
Telegram is my primary messenger

API implementation is as easy as writing echobot

Well-suited with python 3

87
SERVING TELEGRAM BOT
Python 3

Install python-telegram-bot package


~$ pip3 install python-telegram-bot

Supervisor (for continuous serving)

/etc/supervisor/conf.d/pycon_bot.conf
[program:pycon-bot]
command = /usr/bin/python3 /home/ubuntu/pycon_bot/serve.py

supervisorctl
ubuntu@ip-###-###-###-###:~$ sudo supervisorctl
pycon-bot RUNNING pid 12417, uptime 3:29:52
88
BOT SERVING CODE
/home/ubuntu/pycon_bot/serve.py
from telegram import Updater
from pycon_bot import pycon_bot, error, model_server

bot_server = None
grammar_server = None

def main():
global bot_server, grammar_server
updater = Updater(token=[TOKENS generated via bot_father]')
job_queue = updater.job_queue
dispatcher = updater.dispatcher
dispatcher.addTelegramCommandHandler('start', start)
dispatcher.addTelegramCommandHandler("help", start)
dispatcher.addTelegramMessageHandler(pycon_bot)
dispatcher.addErrorHandler(error)
bot_server = model_server(./bot, ask.vocab, response.vocab)
grammar_server = model_server(./grammar, fragment.vocab, result.vocab)
updater.start_polling()
updater.idle()

if __name__ == '__main__':
main()
89
MODEL SERVER

pycon_bot.model_server
class model_server(self):
""" pickle version of TensorFlow model server """
def __init__(self, model_path='.', x_proc_path='', y_proc_path=''):
self.classifier = learn.TensorFlowEstimator.restore(model_path)
self.X_processor = pickle.loads(open(model_path+'/'+x_proc_path,'rb').read())
self.y_processor = pickle.loads(open(model_path+'/'+y_proc_path,'rb').read())

def predict(input_data):
X_test = X_processor.transform(input_data)
prediction = self.classifier.predict(X_test, axis=2)
return self.y_processor.reverse(prediction)

90
BOT ENGINE CODE
pycon_bot.pycon_bot
def pycon_bot(bot, update):
msg = disintegrate(update.message.text)
raw_response = bot_server.predict(msg)
response = grammar_server.predict(raw_answer)
bot.sendMessage(chat_id=update.message.chat_id, text= '.join(response))

pycon_bot.disintegrate
def disintegrate(sentence):
disintegrated_sentence = konlpy.tag.Twitter().pos(sentence, norm=True,
stem=True)
result = []
for w, t in disintegrated_sentence:
if t not in ['Eomi', 'Josa', 'Number', 'KoreanParticle', 'Punctuation']:
result.append(w)
return ' '.join(result)
91
RESULT
That's one small step for a man, one giant leap for anime fans.

92
And finally... created pretty sad bot.

Reason?
Idol M@sters conversations are mostly about
failure and recover
rather than success.
Illustration * Idol M@aster / Bandai Namco Games. All rights reserved.
SUMMARY
Today
Covers garage chat bot making procedure

Making chat bot with TensorFlow + Python 3

My contributions / insight to you


Multi-modal Learning models / structures for chat-bots

Idea to generate data for chat-bots

94
AND NEXT...
Add Idol Master 2 / OFA game conversation script to current dataset
Suggestion from Shin Yeaji (PyCon APAC staff) and Eunjin Hwang in this week
Train bot with some unknown (to me) animations.

Finish anonymization of FAS data and re-train bot with TensorFlow (almost finished!)
In fact, FAS data-based bot is run by Caffe. (http://caffe.berkeleyvision.org/)
This speak preparation encourages me to migrate my Caffe projects to TensorFlow

RL-based context parser with preprocessed data


More test and adopt rouzeta into Miki_bot engine
Test Seq2seq to bot engine?
By making input sequence into semantic bundle (in August)
Working but need more works
95
HOME ASSIGNMENT
If you are Loveliver*, you already know what to do.
*The fans of lovelive (another Japanese animation)

Are you Lov..?

Idol M@ster? 96
Internet meme * (c) Marble Entertainment / inven.co.kr
First with the head, then with the heart.

Bryce Courtenay's The Power of One


THANK YOU FOR LISTENING :)
@inureyes
github.com/inureyes

98
SELECTED REFERENCES
De Brabandere, B., Jia, X., Tuytelaars, T., & Van Gool, L. (2016, June 1). Dynamic Filter Networks. arXiv.org.

Noh, H., Seo, P. H., & Han, B. (2015, November 18). Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction.
arXiv.org.

Andreas, J., Rohrbach, M., Darrell, T., & Klein, D. (2015, November 10). Neural Module Networks. arXiv.org.

Bengio, S., Vinyals, O., Jaitly, N., & Shazeer, N. (2015, June 10). Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. arXiv.org.

Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science (New York, NY), 349(6245), 253255.
http://doi.org/10.1126/science.aac4520

Bahdanau, D., Cho, K., & Bengio, Y. (2014, September 2). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv.org.

Schmidhuber, J. (2014, May 1). Deep Learning in Neural Networks: An Overview. arXiv.org. http://doi.org/10.1016/j.neunet.2014.09.003

Zaremba, W., Sutskever, I., & Vinyals, O. (2014, September 8). Recurrent Neural Network Regularization. arXiv.org.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013, January 17). Efficient Estimation of Word Representations in Vector Space. arXiv.org.

Smola, A., & Vishwanathan, S. V. N. (2010). Introduction to machine learning.

Schmitz, C., Grahl, M., Hotho, A., & Stumme, G. (2007). Network properties of folksonomies. World Wide Web .

Esuli, A., & Sebastiani, F. (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. Presented at the Proceedings of LREC.

99

You might also like