[B! spark] nobusueのブックマーク

nobusue id:nobusue

sparkに関するnobusueのブックマーク (275)

Why Spark on Ceph? (Part 3 of 3)
nobusue 2019/10/03
Ceph

Spark
リンク
Why Spark on Ceph? (Part 2 of 3)
nobusue 2019/10/03
Ceph

Spark
リンク
Why Spark on Ceph? (Part 1 of 3)
nobusue 2019/10/03
Ceph

Spark
リンク
Home | Delta Lake
Build Lakehouses with Delta Lake Delta Lake is an open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs for Scala, Java, Rust, and Python. Get Started
nobusue 2019/04/27
Sparkのストレージプラグインを拡張する形で実装されているようで、SparkのAPIがそのまま使えるのは便利そう。試してみるか。

DataLake

Spark
リンク
グーグル、「Apache Spark」向け「Kubernetes Operator」のベータ版リリースを発表
「Apache Spark」は、データエンジニアリングや機械学習のワークロード用の非常に人気が高い実行フレームワークだ。Databricksのプラットフォームに利用され、「Azure HDInsight」「Amazon EMR」「Google Cloud Dataproc」など、オンプレミスとクラウドベースの両方の「Hadoop」サービスで利用できる。また、「Mesos」クラスタでも実行できる。だが、Mesosを使わず、「Hadoop YARN」の文字列を付加することなしに「Kubernetres」（k8s）クラスタでSparkのワークロードを実行したい場合はどうなのだろうか？Sparkはまず、バージョン2.3のリリースでKubernetes固有の機能を追加し、バージョン2.4でそれを改善したが、完全に統合された方法で、Sparkをk8sでネイティブで実行させるのは、まだ難しい場合がある
nobusue 2019/01/31
Spark

Kubernetes
リンク
sig-big-data: Apache Spark and Apache Airflow on Kubernetes | Red Hat Developer
Try Red Hat products and techno logies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
nobusue 2019/01/11
Apache Airflowがk8sネイティブ対応とな。Jobスケジューラとして有望そう。

Airflow

Kubernetes

Spark
リンク
Uber 分散JVMトレースのJVM Profilerをオープンソースに
Spring BootによるAPIバックエンド構築実践ガイド第2版何千人もの開発者が、InfoQのミニブック「Practical Guide to Building an API Back End with Spring Boot」から、Spring Bootを使ったREST API構築の基礎を学んだ。この本では、出版時に新しくリリースされたバージョンである Spring Boot 2 を使用している。しかし、Spring Boot3が最近リリースされ、重要な変...
nobusue 2018/10/21
Sparkのような分散システムでは特に便利そう

Java

Spark

Kafka
リンク
Apache Spark (Driver) resilience on Kubernetes - network partitioning
nobusue 2018/07/23
Spark

Kubernetes
リンク
Apache Groovy＋Grapesで、Apache Sparkを動かす - CLOVER🍀
Apache Sparkを、Apache Groovy＋Grapesを使って、スクリプトで動かしてみようかなと。いや、ローカル動作でいいので、少しお手軽にApache Sparkを使う方法が欲しくてですね…。サンプルとしては、こちらのドキュメントを見ながらGroovyスクリプトにしていこうと思います。 Spark SQL, DataFrames and Datasets Guide / Data Sources CSVファイルを読んでみる例ですね。ドキュメントと同じく、CSVファイルはexampleのものを使用してみます。 $ wget https://raw.githubusercontent.com/apache/spark/master/examples/src/main/resources/people.csv こんなやつですね。 people.csv name;age;jo
nobusue 2018/06/20
Spark

Groovy
リンク
Outshift | How to correctly size containers for Java 10 applications
Get emerging insights on innovative techno logy straight to your inbox. At Banzai Cloud we run and deploy containerized applications to our PaaS, Pipeline. Java or JVM-based workloads, are among the notable workloads deployed to Pipeline, so getting them right is pretty important for us and our users. Java/JVM based workloads on Kubernetes with Pipeline Why my Java application is OOMKilled Deployin
nobusue 2018/05/19
Spark

Kubernetes
リンク
A powerful new IDE to build, test, and run Apache Spark applications on your desktop for free! - KDnuggets
A powerful new IDE to build, test, and run Apache Spark applications on your desktop for free! Build enterprise-grade functionally rich Spark applications with the aid of an intuitive drag-and-drop user interface and a wide array of pre-built Spark operators. Sponsored Post. Apache Spark is one of the most popular big data frameworks today. Even though Spark’s popularity has grown significantly, u
nobusue 2018/05/12
Spark
リンク
Outshift | Apache Spark application resilience on Kubernetes
nobusue 2018/04/18
Spark

Kubernetes
リンク
[SPARK-23618] docker-image-tool.sh Fails While Building Image - ASF JIRA
nobusue 2018/03/14
Sparkのコンテナイメージをビルドする際に argsを一つも指定していないとこのバグに当たります。既にPRがmain branchに取り込まれているので、次のリリースで直ると思います。

Spark

Kubernetes
リンク
Apache Spark 2.3 with Native Kubernetes Support
Unified governance for all data, analytics and AI assets
nobusue 2018/03/14
SparkのKubernetesインテグレーションが main側に取り込まれました。Kubernetes as Framework(k8sのCRDを使うパターン)の良い例。

Spark

Kubernetes
リンク
Streaming Analytics | Real-time Actionable Insights | Gathr
nobusue 2018/03/02
Spark
リンク
Introduction to Spark on Kubernetes
The content of this page hasn't been updated for years and might refer to discontinued products and projects. Apache Spark on Kubernetes series: Introduction to Spark on Kubernetes Scaling Spark made simple on Kubernetes The anatomy of Spark applications on Kubernetes Monitoring Apache Spark with Prometheus Spark History Server on Kubernetes Spark scheduling on Kubernetes demystified Spark Streami
nobusue 2018/02/25
Spark

Kubernetes
リンク
AWS Glue – 一般提供開始 | Amazon Web Services
Amazon Web Services ブログ AWS Glue – 一般提供開始本日、AWS Glue の一般提供開始がアナウンスされました。Glue はフルマネージドでサーバレス、そして、クラウド最適化された ETL(extract, transf orm, load) サービスです。Glue は他の ETL サービスやプラットフォームと、いくつかのとても重要な点で違いがあります。第1に、Glue はサーバレスです — リソースのプロビジョニングや管理を行う必要はありません。ジョブ、もしくは、クローリングを実行している間に Glue が使用したリソースに対する支払いのみで利用可能です(分単位課金) 。第2に、Glue のクローラです。 Glue のクローラは、複数のデータソース、データタイプ、そして、様々な種類のパーティションを跨いで、スキーマを自動的に検出・推測することができます。ク
nobusue 2017/08/15
aws

glue

ETL

Spark
リンク
Welcome to the radanalytics.io community portal
nobusue 2017/06/13
Spark

OpenShift

Kubernetes
リンク
decode17
分散並列処理の基本に関する解説と，分散並列処理のオープンソース界隈で最近起こっていることをまとめた資料です．
nobusue 2017/05/25
Spark

Kafka

hadoop

分散並列処理
リンク
SparkのRDDとDataFrameでそれぞれwordcount - 映画は中劇
Sparkでデータ処理プログラムを書くためのAPIには、RDDとDataFrameの二種類がある。2つのAPIを用いてwordcountを書いてみる。wordcountは、テキスト中の単語の出現回数を数えるプログラムであり、分散データ処理の必修課題である。 RDDは低レベルなAPIで、データのレコードにはスキーマがない。データ処理は、map関数やflatMap関数などリスト処理的な高階関数によって記述する。reduceByKeyなどいくつかの操作は、レコードが(key, value)のタプルであることを要求するが、その検査はジョブ投入時ではなく、タスク実行時に行われる。総じて、古式ゆかしいMapReduceの感覚で扱える。 DataFrameは高レベルのAPIで、データのレコードにはスキーマが適用される。データ処理は、SQLによって記述するか、あるいはホスト言語上のDSL（以下クエリDSL
nobusue 2017/01/30
Spark

python
リンク
1 2 3 4 5 6 7 8 9 10 次のページ

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx