14. #ccc_cd4 / #embulk
HDFS
MySQL
Amazon S3
CSV Files
SequenceFile
Salesforce.com
Elasticsearch
Cassandra
Hive
Redis
Broken
script :(
Sometimes
fails :(
No one
can fix :(
14
15. #ccc_cd4 / #embulk
HDFS
MySQL
Amazon S3
CSV Files
SequenceFile
Salesforce.com
Elasticsearch
Cassandra
Hive
Redis
Broken
script :(
Sometimes
fails :(
No one
can fix :(
N x M
scripts!
> Poor error handling
> No retrying / resuming
> Low performance
> Often no maitainers
15
38. #ccc_cd4 / #embulk
Jacksonによるモデルクラス(task)
public class CsvParserPlugin
implements ParserPlugin
{
public interface PluginTask
extends Task, LineDecoder.DecoderTask, TimestampParser.ParserTask
{
@Config("columns")
public SchemaConfig getSchemaConfig();
@Config("header_line")
@ConfigDefault("null")
public Optional<Boolean> getHeaderLine();
@Config("skip_header_lines")
@ConfigDefault("0")
public int getSkipHeaderLines();
public void setSkipHeaderLines(int n);
@Config("delimiter")
@ConfigDefault("","")
public char getDelimiterChar();
39. #ccc_cd4 / #embulk
InputPlugin OutputPlugin
Executor plugin
Filter plugin
Filter plugin
Filter plugins
39
task
schema
report
task
schema
report
records records
task
schema
records
config
config diff
resume state
40. #ccc_cd4 / #embulk
Jacksonによるモデルクラス(schema)
public class ColumnConfig
{
private final String name;
private final Type type;
@JsonCreator
public ColumnConfig(
@JsonProperty("name") String name,
@JsonProperty("type") Type type)
{
this.name = name;
this.type = type;
}
@JsonProperty("name")
public String getName() { return name; }
@JsonProperty("type")
public Type getType() { return type; }
}
48. #ccc_cd4 / #embulk
Contributing to the Embulk project
> Pull-requests & issues on Github
> Posting blogs
> “使ってみた”
> “コードを読んでみた”
> “ここがイケてる / イケてない”
> Talking on Twitter with a word “embulk"
> Writing & releasing plugins
> Windows support
> Integration to other software
> ETL tools, Fluentd, Hadoop, Presto, …
48
49. 1. Distributed Systems Engineer
2. Integration Engineer
3. Software Engineer, MPP DBMS
4. Sales Engineer
5. Technical Support Engineer
(日本,東京,丸の内)
https://jobs.lever.co/treasure-data
We’re hiring!
ANALYTICS INFRASTRUCTURE. SIMPLIFIED IN THE CLOUD.
54. M x N → M + N
Nagios
MongoDB
Hadoop
Alerting
Amazon S3
Analysis
Archiving
MySQL
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Databases
buffer/filter/route