Skip to content

Commit bf0c423

Browse files
committed
First version of Pregel Notebook.
1 parent 80d500b commit bf0c423

File tree

1 file changed

+72
-50
lines changed

1 file changed

+72
-50
lines changed

notebooks/Pregel.ipynb

Lines changed: 72 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -36,39 +36,27 @@
3636
"\n",
3737
"*“Many practical computing problems concern large graphs.”*\n",
3838
"\n",
39-
"Distributed graph processing enables you to do online analytical processing directly on graphs stored in ArangoDB. This is intended to help you gain analytical insights on your data, without having to use external processing systems. Examples of algorithms to execute are PageRank, Vertex Centrality, Vertex Closeness, Connected Components, Community Detection.\n",
40-
"\n",
41-
"Check out the hands-on ArangoDB Pregel Tutorial to learn more.\n",
42-
"\n",
43-
"The processing system inside ArangoDB is based on: Pregel: A System for Large-Scale Graph Processing – Malewicz et al. (Google), 2010. This concept enables us to perform distributed graph processing, without the need for distributed global locking.\n",
44-
"\n",
45-
"This system is not useful for typical online queries, where you just work on a small set of vertices. These kind of tasks are better suited for AQL traversals.\n",
46-
"\n",
47-
"Pregel support since 2014\n",
48-
"Predefined algorithms\n",
49-
"Could be extended via C++\n",
50-
"\n",
51-
" https://www.arangodb.com/docs/stable/graphs-pregel.html\n",
52-
"\n",
53-
"https://www.arangodb.com/docs/stable/graphs-pregel.html#available-algorithms\n",
54-
"\n",
55-
"\n",
39+
"Distributed graph processing enables you to do online analytical processing directly on graphs stored in ArangoDB. This is intended to help you gain analytical insights on your data, without having to use external processing systems.\n",
40+
"[The processing system](https://www.arangodb.com/docs/stable/graphs-pregel.html) inside ArangoDB is based on Google's Pregel framework: [Pregel: A System for Large-Scale Graph Processing](http://www.dcs.bbk.ac.uk/~dell/teaching/cc/paper/sigmod10/p135-malewicz.pdf). This concept enables us to perform distributed graph processing, without the need for distributed global locking.\n",
5641
"\n",
42+
"Currently, ArangoDB support the [following algorithms out of box](https://www.arangodb.com/docs/stable/graphs-pregel.html#available-algorithms) (For custom algorithms see note about Custom Pregel below):\n",
5743
"* Page Rank\n",
5844
"* Seeded PageRank\n",
5945
"* Single-Source Shortest Path\n",
6046
"* Connected Components:\n",
6147
" * WeaklyConnected\n",
6248
" * StronglyConnected\n",
63-
"*Hyperlink-Induced Topic Search (HITS)Permalink\n",
64-
"*Vertex Centrality\n",
49+
"* Hyperlink-Induced Topic Search (HITS)\n",
50+
"* Vertex Centrality\n",
6551
"* Effective Closeness\n",
6652
"* LineRank\n",
6753
"* Label Propagation\n",
6854
"* Speaker-Listener Label Propagation\n",
6955
"\n",
7056
"\n",
71-
"Best with SMART Graphs https://www.arangodb.com/enterprise-server/smartgraphs/\n"
57+
"Pregel is not useful for typical online queries, where you just work on a small set of vertices. These kind of tasks are better suited for AQL traversals.\n",
58+
"\n",
59+
"Furthermore, for best performance Pregel should be used in combination with [SMART Graphs (Enterprise feature)](https://www.arangodb.com/enterprise-server/smartgraphs/).\n"
7260
]
7361
},
7462
{
@@ -135,11 +123,7 @@
135123
"cell_type": "code",
136124
"execution_count": null,
137125
"metadata": {
138-
"colab": {
139-
"base_uri": "https://localhost:8080/"
140-
},
141-
"id": "c_6RZVqu3JW6",
142-
"outputId": "4b6b66d9-6f16-48e1-958e-fd4a425dbf1d"
126+
"id": "c_6RZVqu3JW6"
143127
},
144128
"outputs": [],
145129
"source": [
@@ -154,11 +138,7 @@
154138
"cell_type": "code",
155139
"execution_count": null,
156140
"metadata": {
157-
"colab": {
158-
"base_uri": "https://localhost:8080/"
159-
},
160-
"id": "uGptRNz93JW7",
161-
"outputId": "44b36e45-b5c0-429e-cf56-eef6e6d38d6e"
141+
"id": "uGptRNz93JW7"
162142
},
163143
"outputs": [],
164144
"source": [
@@ -186,15 +166,20 @@
186166
"## Import Data"
187167
]
188168
},
169+
{
170+
"cell_type": "markdown",
171+
"metadata": {
172+
"id": "5q-sYHDqvMAu"
173+
},
174+
"source": [
175+
"Let us first start by creating an empty graph:"
176+
]
177+
},
189178
{
190179
"cell_type": "code",
191180
"execution_count": null,
192181
"metadata": {
193-
"colab": {
194-
"base_uri": "https://localhost:8080/"
195-
},
196-
"id": "FCTkPLNF9-vS",
197-
"outputId": "57a5178c-16ee-48cb-946c-da4995fd71e8"
182+
"id": "FCTkPLNF9-vS"
198183
},
199184
"outputs": [],
200185
"source": [
@@ -210,15 +195,20 @@
210195
"print(school.edge_definitions())"
211196
]
212197
},
198+
{
199+
"cell_type": "markdown",
200+
"metadata": {
201+
"id": "W5R-S4V7vR8z"
202+
},
203+
"source": [
204+
"Next, we create a Pregel job on a (empty) graph:"
205+
]
206+
},
213207
{
214208
"cell_type": "code",
215209
"execution_count": null,
216210
"metadata": {
217-
"colab": {
218-
"base_uri": "https://localhost:8080/"
219-
},
220-
"id": "siA4yJGd8HvE",
221-
"outputId": "d6c88ce4-bf27-4728-da41-1ab4f1d33269"
211+
"id": "siA4yJGd8HvE"
222212
},
223213
"outputs": [],
224214
"source": [
@@ -241,11 +231,18 @@
241231
"cell_type": "code",
242232
"execution_count": null,
243233
"metadata": {
244-
"colab": {
245-
"base_uri": "https://localhost:8080/"
246-
},
247-
"id": "vZSY1jrv-y6w",
248-
"outputId": "660efec7-2ab4-473c-c1b5-c51e302c0157"
234+
"id": "hzZ7u3s2vfE-"
235+
},
236+
"outputs": [],
237+
"source": [
238+
"Furthermore, we can observe the status of a given Pregel job."
239+
]
240+
},
241+
{
242+
"cell_type": "code",
243+
"execution_count": null,
244+
"metadata": {
245+
"id": "vZSY1jrv-y6w"
249246
},
250247
"outputs": [],
251248
"source": [
@@ -256,22 +253,38 @@
256253
"print(job)"
257254
]
258255
},
256+
{
257+
"cell_type": "markdown",
258+
"metadata": {
259+
"id": "AgSw2TWjvkZV"
260+
},
261+
"source": [
262+
"And even delete it:"
263+
]
264+
},
259265
{
260266
"cell_type": "code",
261267
"execution_count": null,
262268
"metadata": {
263-
"colab": {
264-
"base_uri": "https://localhost:8080/"
265-
},
266-
"id": "nygXo9HE-TOf",
267-
"outputId": "b35580ea-483e-4ab7-9686-7fad25c3cabc"
269+
"id": "nygXo9HE-TOf"
268270
},
269271
"outputs": [],
270272
"source": [
271273
" # Delete a Pregel job by ID.\n",
272274
" pregel.delete_job(job_id)"
273275
]
274276
},
277+
{
278+
"cell_type": "markdown",
279+
"metadata": {
280+
"id": "bhZ66A9dv5bR"
281+
},
282+
"source": [
283+
"# Custom Pregel\n",
284+
"\n",
285+
"So far we looked at predefined algorithms. ArangoDB is also offering an (at time of writing experimental) feature which allows users to add/modify their custom Pregel algorithms at runtime. Check out [this webinar](https://www.arangodb.com/events/arangodb-feature-preview-custom-pregel/) for more details."
286+
]
287+
},
275288
{
276289
"cell_type": "markdown",
277290
"metadata": {
@@ -281,6 +294,15 @@
281294
"# Next Steps"
282295
]
283296
},
297+
{
298+
"cell_type": "markdown",
299+
"metadata": {
300+
"id": "fIF4PVuluT9m"
301+
},
302+
"source": [
303+
"Check out the [community detection tutorial](https://www.arangodb.com/learn/graphs/pregel-community-detection/) to explore further applications of pregel to social network analytics.\n"
304+
]
305+
},
284306
{
285307
"cell_type": "markdown",
286308
"metadata": {

0 commit comments

Comments
 (0)