This document discusses Fluentd and its webhdfs output plugin. It explains how the webhdfs plugin was created in 30 minutes by leveraging existing Ruby gems for WebHDFS operations and output formatting. The document concludes that output plugins can reuse code from mixins and that developing shared mixins allows plugins to incorporate common features more easily.
1 of 20
Downloaded 92 times
More Related Content
Fluentd and WebHDFS
1. Fluentd and WebHDFS
& what makes it possible to write out_webhdfs in 30min.
TAGOMORI Satoshi (@tagomoris)
NHN Japan
Fluentd meetup 3 (2012/11/08)
12年11月8日木曜日
2. @tagomoris
NHN Japan Corp (Web Service Division)
Fluentd committer, plugin developer
fluent-agent-lite, ...
12年11月8日木曜日
5. Fluentd as log collector
Many many output plugins for various storages
file, file-alternative
mongo, couch, cassandra, redis, s3, ....
Hadoooooooooooooooooooooooooooooooooooooop
12年11月8日木曜日
6. Fluentd with HDFS
To write data on HDFS:
Java native protocol: HDFSClient.java
hadoop fs -put
libhdfs and its binding (like scribed)
Cloudera Hoop (2011/07-)
+WebHDFS (Apache 1.0-), +HttpFs (Apache 2.0-)
12年11月8日木曜日
7. fluent-plugin-webhdfs
Output plugin to write data into HDFS
Supports WebHDFS and HttpFs
First release: 2012/05/20 by tagomoris
v0.1.0 bundled within td-agent v1.1.10 (or later)
12年11月8日木曜日
8. WebHDFS
HTTP REST API of HDFS
Clients communicate all of NameNode and DataNodes
(like HDFSClient)
NameNode
DataNode
Client
DataNode
DataNode
HTTP
12年11月8日木曜日
9. HttpFs
Proxy server 'httpfs', provides REST API for HDFS
Same method set with WebHDFS (not like Hoop)
Clients communicate with httpfs server only
NameNode
DataNode
httpfs
Client
server DataNode
HTTP Java Native DataNode
12年11月8日木曜日
10. WebHDFS or HttpFs
WebHDFS: Peer-to-Peer communication
Jetty based HTTP server
High throughput and stability
HttpFs: Proxyed and Centralized communication
Tomcat based HTTP server
Simple network topology
Relatively low performance and SPOF
12年11月8日木曜日
11. Configuration: WebHDFS
Use Apache 1.0.0(or later), CDH3u5 or CDH4(or later)
In Namenode/Datanode
dfs.webhdfs.enabled=true
dfs.support.append=true (only CDH3u5 ?)
dfs.support.broken.append=true (only CDH3u5 ?)
In fluent-plugin-webhdfs (type webhdfs)
host hostname.of.namenode
port 50070
path /hdfs/access.%Y%m%d_%H.${hostname}.log
12年11月8日木曜日
12. WebHDFS in NHN Japan
BEFORE: 1400 Timeouts/day with Hoop
Tue Aug 14 15:04:34 2012 +0900
"fix to use webhdfs to write into hdfs"
"2012-08-14 15:08:18 +0900: starting fluentd-0.10.25"
Wed Aug 15 13:11:04 2012 +0900
"fix timeouts for busy AM2-5"
AFTER: 130 Timeouts from 08/16 to 11/07
1.2-1.5 TB/day from 10 fluentd nodes
12年11月8日木曜日
13. CONCLUSION 1
WebHDFS is good enough for:
continuous appending into log file
daily operations to move/remove/copy/head/tail over
client libraries (and your scripts)
Fluentd and td-agent is good enough for:
log collector before Hadoop/HDFS
12年11月8日木曜日
15. fluent-plugin-webhdfs
commit log
Thu May 17 18:20:15 2012 on 'fluent-plugin-webhdfs'
"writing code": in fact, no lines of ruby code....
Sun May 20 19:01:26 2012 on 'xxxxx'
(some commits)
Sun May 20 19:35:34 2012 on 'fluent-plugin-webhdfs'
"fix typo": tagged as v0.0.1
12年11月8日木曜日
16. 30min!?
fluent-plugin-webhdfs
120 lines (including blank line and 'end')
65 lines of configurations
very few lines of actual code
WebHDFS operations by 'webhdfs' gem
Output formatting by 'PlainTextFormatterMixin'
12年11月8日木曜日
17. webhdfs gem commit log
Sun May 20 17:00:57 2012
(15 commits)
Sun May 20 19:01:26 2012
"v0.3: add WebHDFS::Client"
12年11月8日木曜日
18. fluent-mixin-*
fluent-mixin-plaintextformatter
output text data formatter
webhdfs, file-alternative, hoop
fluent-mixin-config-placeholders
provide placeholders like '${hostname}', '${uuid}' in
configurations
webhdfs, ping-message
12年11月8日木曜日
19. CONCLUSION 2
Output plugins have many (complex) problems:
communication, formatting, configuration formats, ...
We CAN/MUST depends on existing GEMS!
We SHOULD write fluent-mixin gems for other plugin
developers!
many features/codes may be shared by many
plugins
unified syntax/features over plugins
12年11月8日木曜日
20. Questions?
Thanks!
photo: crouton
thanks to @kbysmnr
12年11月8日木曜日