High level API for Adobe Analytics Clickstream Data Feeds metadata and log parsing.
The library provides a log parser and several convenience methods for manifest file loading and parsing, including lookup data and column type resolution.
The implementation is able to read all data from several popular data sources, including Amazon S3, Hadoop/HDFS, and local file systems.
Internally, the Amazon S3 DataSource implementation uses AWSCredentialsProviderChain with SystemPropertiesCredentialsProvider, EnvironmentVariableCredentialsProvider and InstanceProfileCredentialsProvider.
The library has been tested in Java and Scala applications.
Maven:
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
<dependency>
<groupId>com.github.bartekdobija</groupId>
<artifactId>omniture-clickstream</artifactId>
<version>65002e3321</version>
</dependency>
SBT:
resolvers += "jitpack" at "https://jitpack.io"
libraryDependencies += "com.github.bartekdobija" % "omniture-clickstream" % "65002e3321"
Gradle:
repositories {
maven {
url "https://jitpack.io"
}
}
dependencies {
compile 'com.github.bartekdobija:omniture-clickstream:65002e3321'
}
Java - metadata load from the Omniture manifest file
String localManifest = "file://omniture_manifest.txt";
String hdfsManifest = "hdfs://namenode/omniture_manifest.txt";
String s3Manifest = "s3://my-omniture/manifest_a.txt,s3://my-omniture/manifest_b.txt";
String row = "a\tb\tc";
OmnitureMetadata metadata = new OmnitureMetadataFactory().create(hdfsManifest);
// or get a metadata list
List<OmnitureMetadata> metadatas = new OmnitureMetadataFactory().create(s3Manifest, ",");
RowParser parser = DenormalizedDataRowParser.newInstance(metadata);
Row row = parser.parse(row);
RowParserStats stats = parser.getRowParserStats();