nabinnoのブックマーク / 2019年12月27日

nabinno id:nabinno

2019年12月27日のブックマーク (34件)

Spark vs. Hadoop MapReduce: Which big data framework to choose
nabinno 2019/12/27
"Linear processing of huge datasets is the advantage of Hadoop MapReduce, while Spark delivers fast performance, iterative processing, real-time analytics, graph processing, machine learning and more"

sciencesoft

alex-bekker

apache-spark

apache-hadoop

mapreduce

functional-comparison
リンク
MapJoin: a simple way to speed up your Hive queries - Gregory Trubetskoy
Map join is a little-known feature of Hive. It allows a table to be loaded into memory so that a (very fast) join could be performed entirely within a mapper without having to use a Map/Reduce step. If your queries frequently rely on small table joins (e.g. cities or countries, etc.) you might see a very substantial speed-up from using map joins. There are two ways to enable it. First is by using a
nabinno 2019/12/27
gregory-trubetskoy

apache-hive

mapjoin

hive.mapjoin.smalltable.filesize

hive.mapjoin
リンク
apache spark - DataFrame join optimization - Stack Overflow
Ask questions, find answers and collaborate at work with Stack Overflow for Teams. Explore Teams Collectives™ on Stack Overflow Find centralized, trusted content and collaborate around the techno logies you use most. Learn more about Collectives
nabinno 2019/12/27
stack-overflow

apache-spark

apache-hive

join

broadcast

broadcast-hash-join

map-join

mapjoin
リンク
Spark SQL - 3 common joins (Broadcast hash join, Shuffle Hash join, Sort merge join) explained
nabinno 2019/12/27
ram-ghadiyaram

apache-spark

broadcast

broadcast-hash-join

shuffle-hash-join

sort-merge-join

join

performance-engineering
リンク
Performance Tuning - Spark 3.5.4 Documentation
Performance Tuning Caching Data In Memory Other Configuration Options Join Strategy Hints for SQL Queries Coalesce Hints for SQL Queries Adaptive Query Execution Coalescing Post Shuffle Partitions Spliting skewed shuffle partitions Converting sort-merge join to broadcast join Converting sort-merge join to shuffled hash join Optimizing Skew Join Misc For some workloads, it is possible to improve pe
nabinno 2019/12/27
apacheh-spark

broadcast-join

sort-merge-join

performance-engineering
リンク
Standard Functions — functions Object · The Internals of Spark SQL
nabinno 2019/12/27
apache-spark

pyspark.sql.functions
リンク
What is the difference between Apache Hive and Apache Spark?
nabinno 2019/12/27
quora

apache-hive

apache-spark

distributed-computing

functional-comparison
リンク
Ganglia
Ganglia is a scala ble distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. Supports clusters up to 2000 nodes in size.
nabinno 2019/12/27
ganglia

monitoring
リンク
Home - Apache Hive - Apache Software Foundation
This wiki is now read only. This and new content has been migrated to a new location Apache HiveThe Apache Hive™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage and queried using SQL syntax. Built on top of Apache Hadoop™, Hive provides the following features: Tools to enable easy access to data via SQL, thus enabling data warehousi
nabinno 2019/12/27
apache-hive

documentation
リンク
Apache Hive
Apache Hive The Apache Hive ™ is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale and facilitates reading, writing, and managing petabytes of data residing in distributed storage using SQL. Github Mail Docker Community Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale. Hive Metastore(HMS) provid
nabinno 2019/12/27
apache-hive

apache-hadoop

structured-query-language

mapreduce
リンク
Home - HADOOP2 - Apache Software Foundation
This HADOOP2 space was migrated from old Hadoop wiki. Please check https://cwiki.apache.org/confluence/display/HADOOP for the current information. Apache HadoopApache Hadoop is a framework for running applications on large cluster built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named
nabinno 2019/12/27
apache-hadoop

documentation
リンク
Amazon EMR 5.x release versions - Amazon EMR
nabinno 2019/12/27
amazon-emr

amazon-emr-5

documentation
リンク
医療ヘルスケア分野の課題を解決する｜株式会社メドレー
医療ヘルスケアの未来をつくるメドレーは、テクノロジーを活用した事業やプロジェクトを通じて「納得できる医療」の実現を目指します
nabinno 2019/12/27
medley

healthcare-industry

company

public-company
リンク
タイタニック号乗客の生存予測モデルを立ててみる - Qiita
Deleted articles cannot be recovered. Draft of this article would be also deleted. Are you sure you want to delete this article? はじめにこの記事はGizumoエンジニア Advent Calendar 2015の24日目の記事です。株式会社Gizumoというまだ出来て半年という若い会社でWebアプリエンジニアやってる@suzumiです。アドベントカレンダー２回目の記事になります。一回目の記事は「IoT - node.jsを使ってWebからエアコンを遠隔操作できるようにしてみた」です。よければ合わせてご覧下さい。お題まずはじめにKaggleを知っておきましょう。 Kaggleとは Kaggleは企業や研究者がデータを投稿し、世界中の統計家やデータ分析
nabinno 2019/12/27
qiita

kaggle

python

scikit-learn

pandas

machine-learning
リンク
Kaggleに登録したら次にやること～これだけやれば十分闘える！Titanicの先へ行く入門 10 Kernel ～ - Qiita
Deleted articles cannot be recovered. Draft of this article would be also deleted. Are you sure you want to delete this article? 事業会社でデータサイエンティストをしているu++です。普段ははてなブログ1で、Kaggleや自然言語処理などデータ分析に関する記事を定期的に書いています。 Kaggleでは2019年に「PetFinder.my Adoption Prediction」2というコンペで優勝（チーム）し、「Santander Value Prediction Challenge」3というコンペで銀メダルを獲得（個人）しました。「Kaggle Master」と呼ばれる称号4を得ており、Kaggle内ランクは、約16万人中最高229位です5。本記事では「Ka
nabinno 2019/12/27
qiita

kaggle

machine-learning

statistics

guide
リンク
甲状腺 - Wikipedia
甲状腺と周囲の組織図最上部は舌骨、次いで甲状軟骨後方の喉頭 (Larynx)、甲状腺錐体葉 (Pyramidal lobe)、左葉と右葉、甲状腺峡部 (Isthmus of thyroid)、気管 (Trachea) が描かれている。甲状腺（こうじょうせん、Thyroid gland）とは、頚部前面に位置する内分泌器官。甲状腺ホルモン（トリヨードチロニン、チロキシン、カルシトニンなど）を分泌する。ヒトの甲状腺は、重さが15～20 g程度、上下方向に3～5 cm程度の長さがあり、H型（あるいは蝶が翅を広げたような形）をしていて、のどの部分で、甲状軟骨のやや下方に位置し、気管を前面から囲むように存在する。H型とは、甲状腺の左右の部分（右葉、左葉と呼ばれる）が上下にのびて発達しており、それらは、幅の狭い中央部（峡部）でつながっている。発生的には受精後に内胚葉から組織形成される器官である。
nabinno 2019/12/27
thyroid

immunology

endocrinology
リンク
Amazon EMR cluster error: Instance type not supported - Amazon EMR
nabinno 2019/12/27
amazon-emr

trouble
リンク
AWS Public Data Set
Amazon is an Equal Opportunity Employer: Minority / Women / Disability / Veteran / Gender Identity / Sexual Orientation / Age.
nabinno 2019/12/27
amazon-s3

data-set

statistics

google-books
リンク
Google Ngram Viewer - Wikipedia
Example of an Ngram query The Google Ngram Viewer or Google Books Ngram Viewer is an online search engine that charts the frequencies of any set of search strings using a yearly count of n-grams found in printed sources published between 1500 and 2019[1][2][3][4] in Google's text corpora in English, Chinese (simplified), French, German, Hebrew, Italian, Russian, or Spanish.[2][5] There are also so
nabinno 2019/12/27
google-ngram-viewer

natural-language-processing

search-engine
リンク
Manage Amazon EMR clusters - Amazon EMR
nabinno 2019/12/27
amazon-emr

cluster-manager

documentation
リンク
Plan, configure and launch Amazon EMR clusters - Amazon EMR
nabinno 2019/12/27
amazon-emr

cluster-manager

documentation
リンク
Different ways to get data into Amazon EMR - Amazon EMR
nabinno 2019/12/27
amazon-emr

documentation
リンク
順序組 - Wikipedia
数学における順序組（じゅんじょぐみ、英: ordered tuplet, ordered list etc.）あるいは単に組 (tuple, tuplet etc.) とは、通常は有限な長さの列を言う。特に非負整数 n に対して、n 個の対象を順番に並べた（あるいは番号付けた）ものは n-組 (n-tuple) と呼ぶ（このとき、並べられた対象のことは、この n-組の「要素」や「成分」などと呼ぶ）。 0-組はただ一つ存在して「何も並べないこと」を意味するが、文脈によりそれは空集合や、空列や、空リストなどと呼ばれる。 1-組（あるいは一つ組）は定義により、ただ一つの元からなる集合、ただ一つの項からなる列、ただ一つの点からなる空間などであって、それはそのそれぞれのただ一つの要素であるところの元、項、点などとは厳密には異なるが、にも拘らず多くの場合においてその唯一の要素と同一視して、あるいはそれ
nabinno 2019/12/27
tuple

set-theory

type-theory

data-structure
リンク
scikit-image でテンプレートマッチング - Qiita
match_template というそのものズバリな関数があります。 skimage.feature.match_template(image, template, pad_input=False, mode='constant', constant_values=0) 公式ドキュメント Module: feature — skimage docs Template Matching — skimage docs 使用例1 第1引数に入力画像、第2引数にテンプレート画像を指定すればOK。 import matplotlib.pyplot as plt import numpy as np import skimage import skimage.feature import skimage.io from matplotlib.patches import Rectangle img =
nabinno 2019/12/27
qiita

python

scikit-image

image-processing

skimage.feature.match_template
リンク
GitHub - scikit-image/scikit-image: Image processing in Python
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
nabinno 2019/12/27
github

scikit-image

python

image-processing
リンク
3.3. Scikit-image: 画像処理 — Scipy lecture notes
3.3. Scikit-image: 画像処理¶ 著者: Emmanuelle Go uillart scikit-image は画像処理に特化した Python 画像ライブラリで、 NumPy 配列を画像オブジェクトをネイティブに扱います。この章では scikit-image を多様な画像処理タスクにどう利用するかや NumPy や Scipy などの他の Python の科学技術モジュールとの連携についても扱います。参考基本的な画像操作、たとえば画像の切り抜きや単純なフィルタリングなど、多くの単純な操作は NumPy や SciPy でも実現できます Numpy と Scipy を利用した画像の操作と処理を参照して下さい。この章を読む前に前の章の内容について慣れておく必要があります、マスクやラベルといった基本操作は準備として必要です。
nabinno 2019/12/27
emmanuelle-gouillart

scikit-image

python

image-processing

numpy
リンク
scikit-image: Image processing in Python — scikit-image
scikit-image is a collection of algorithms for image processing. It is available free of charge and free of restriction. We pride ourselves on high-quality, peer-reviewed code, written by an active community of volunteers. If you find this project useful, please cite: [BiBTeX] Stéfan van der Walt, Johannes L. Schönberger, Juan Nunez-Iglesias, François Boulogne, Joshua D. Warner, Neil Yager, Emmanu
nabinno 2019/12/27
scikit-image

python

image-processing
リンク
畳み込みニューラルネットワーク - Wikipedia
畳み込みニューラルネットワーク（たたみこみニューラルネットワーク、英: convolutional neural network、略称: CNNまたはConvNet）は、畳み込みを使用しているニューラルネットワークの総称である。画像認識や動画認識、音声言語翻訳[1]、レコメンダシステム[2]、自然言語処理[3]、コンピュータ将棋[4]、コンピュータ囲碁[4]などに使用されている。畳み込みニューラルネットワークの定義は厳密に決まっているわけではないが、画像認識の（縦, 横, 色）の2次元画像の多クラス分類の場合、以下の擬似コードで書かれるのが基本形である[5]。ここから色々なバリエーションが作られている。損失関数は交差エントロピーを使用し、パラメータは確率的勾配降下法で学習するのが基本形である。これらの偏微分は自動微分を参照。以下の繰り返し畳み込み層と活性化関数最大値プーリングベク
nabinno 2019/12/27
convolutional-neural-network

image-analysis

image-processing

machine-learning
リンク
画像解析 - Wikipedia
画像解析そのものはコンピューターが利用される以前から光学演算による文字読み取りや、天体写真・衛星写真の画質改善などが実施されていた。画像データの中から文字を認識してテキストデータに変換する光学文字認識 (OCR) も一種の画像解析で近年は機械学習の進展により、応用分野が広がりつつある[1]。以前は専用のハードウェアや専門的な知識を必要としていたが、近年では専門的な知識が無くても利用できるGoogle Cloud Vision APIのようにクラウドコンピューティングを使用した画像解析のサービスも提供される[2][3][4]。
nabinno 2019/12/27
image-analysis

image-processing

machine-learning
リンク
”詐欺的”と指摘のインド・OYOホテル問題が宿泊業界に投げかけた課題【永山久徳の宿泊業界インサイダー】 - TRAICY（トライシー）
インド発ホテルチェーンが設立したOYO Hotels ＆ Homesの日本法人、OYO Hotel Japanが展開する「OYO Hotel」とフランチャイズ（FC）契約を結んだ中小ホテルとの間に契約トラブルが発生し物議を呼んでいる。財団法人宿泊施設活性化機構（JALF）も事態を重視し、公式見解を発表するとともに被害者の会を結成する場合の支援などを表明した。「客室数で世界第2位のホテルチェーンにフランチャイズ形式で加盟でき、設備投資資金をもらえる上、一定期間収入の最低保証が受けられる」というセールスポイントで中小ホテル旅館経営者の間で話題となり、いわゆるアーリーアダプターが飛び付いたのだが、契約内容の一方的な変更、売上保証の未払いや減額が発生しているとの報道があり、それを期に複数の中小ホテルからネット上への告発が相次いだ。ある告発によるとOYOからの設備投資を受けたホテルは契約解除に際し
nabinno 2019/12/27
hisanori-nagayama

oyo

hotel

tourism

market-trend
リンク
第1回　暗号化の基礎
暗号化技術は、情報の保護やコンピューターセキュリティに欠かせない技術である。今回は暗号化技術の基礎として、暗号化の基本、暗号の安全性、共通鍵暗号と公開鍵暗号について解説。暗号化技術は、情報の保護やコンピューターセキュリティに欠かせない技術である。ファイルやデータの暗号化の他、HTTPSや、無線LANにおけるWEP／WPA／TKIP／AESのようなセキュアな通信、証明書やデジタル署名、PKIなど、多くの場面で暗号化技術が使われている。今回からしばらくは、暗号化の基礎や共通鍵暗号、公開鍵暗号、証明書、PKIなどについて、IT Proの初心者向けに暗号化技術の基礎を解説していく。今回は、暗号化の基礎を解説する。暗号化とはデータを保護するだけなら、暗号化ではなく、「ファイルの許可属性（読み出し禁止などの属性）」や「アクセス制御（ACL）」などの方法もある。これらは、アクセスするユーザーに応じ
nabinno 2019/12/27
atmarkit

cryptography

security-engineering

guide
リンク
False sharing - Wikipedia
In computer science, false sharing is a performance-degrading usage pattern that can arise in systems with distributed, coherent caches at the size of the smallest resource block managed by the caching mechanism. When a system participant attempts to periodically access data that is not being altered by another party, but that data shares a cache block with data that is being altered, the caching
nabinno 2019/12/27
false-sharing

cache-coherency

cache

memory-management

performance-engineering
リンク
キャッシュコヒーレンシ - Wikipedia
リソースを共有する複数のキャッシュの概念図キャッシュコヒーレンシ（英: cache coherency）とは、共有リソースに対する複数のキャッシュの一貫性を意味する。キャッシュコヒーレンシはメモリ一貫性の一種である。複数のクライアントが共有メモリリソースのキャッシュを保持するとき、キャッシュ間のデータの不一致という問題が生じる。この問題は特にマルチプロセッシングシステムのCPU間で顕著である。右図において、上のクライアントがメモリのある部分を以前に読み込んでいてキャッシュ上にコピーを保持しているとき、下のクライアントが同じメモリ部分を更新すると、更新を何らかの方法で伝えない限り上のクライアントのキャッシュの内容は不正となる。キャッシュコヒーレンシはそのような状況に対処し、キャッシュとメモリの間の一貫性を保つことである。一貫性を保つには、同じメモリ位置へのリードとライトの振る舞いを定義
nabinno 2019/12/27
cache-coherency

cache

memory-coherency

memory-management
リンク
インポスター症候群 - Wikipedia
インポスター症候群（インポスターしょうこうぐん、英: Impostor syndrome、インポスター・シンドローム）は、自分の達成を内面的に肯定できず、自分は詐欺師であると感じる傾向であり、一般的には、社会的に成功した人たちの中に多く見られる。ペテン師症候群（ペテンししょうこうぐん）、もしくはインポスター体験（インポスターたいけん、impostor experience）、詐欺師症候群（さぎししょうこうぐん、fraud syndrome）とも呼ばれる。この言葉は、1978年に心理学者のポーリン・R・クランスとスザンヌ・A・アイムスによって命名された[1]。この症候群にある人たちは、能力があることを示す外的な証拠があるにもかかわらず、自分は詐欺師であり、成功に値しないという考えを持つ。自分の成功は、単なる幸運やタイミングのせいとして見過ごされるか、実際より能力があると他人を信じ込ませるこ
nabinno 2019/12/27
impostor-syndrome

neuroscience

physiology

personal-development
リンク
- 2019年12月28日
- 2019年12月27日
- 2019年12月26日