Skip to content
This repository was archived by the owner on Feb 23, 2023. It is now read-only.

Commit cb197a6

Browse files
jcx120Shaptic
andauthored
Create Ingestion-filtering.mdx (#693)
* Create Ingestion-filtering.mdx Ingestion Filtering public doc draft * Update Ingestion-filtering.mdx Updating spacing around <CodeExample> * Update Ingestion-filtering.mdx * Update Ingestion-filtering.mdx attempting to debug build failure * Apply suggestions from code review Co-authored-by: George <Shaptic@users.noreply.github.com> * Update Ingestion-filtering.mdx removed space prefix * Update Ingestion-filtering.mdx Removed <CodeExample> tags, which seem unnecessary Co-authored-by: George <Shaptic@users.noreply.github.com>
1 parent d17d178 commit cb197a6

File tree

1 file changed

+121
-0
lines changed

1 file changed

+121
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
---
2+
title: Ingestion Filtering
3+
order: 46
4+
---
5+
6+
The Ingestion Filtering feature is now released for public beta testing available from Horizon [version 2.18.0](https://github.com/stellar/go/releases/tag/horizon-v2.18.0) and up.
7+
8+
## Overview
9+
10+
Ingestion Filtering enables Horizon operators to drastically reduce storage footprint of their Horizon DB by whitelisting Assets and/or Accounts that are relevant to their operations. This feature is ideally suited for private Horizon operators who do not need full history for all assets and accounts on the Stellar network.
11+
12+
### Why is it useful:
13+
14+
Previously, the only way to limit data storage is by limiting the amount of history Horizon ingests, either by configuring the starting ledger to be later than genesis block or via rolling retention (ie: last 30 days). This feature allows users to store the full history of assets and accounts (and related entities) that they care about.
15+
16+
For further context, running a full history Horizon instance currently takes ~ 15TB of disk space (as of June 2022) with storage growing at a rate of ~ 1TB / month. As a benchmark, filtering by even 100 of the most active accounts and assets reduces storage by over 90%. For the majority of users who care about an even more limited set of assets and accounts, storage savings should be well over 99%. Other benefits are reducing operating costs for maintaining storage, improved DB health metrics and query performance.
17+
18+
### How does it work:
19+
20+
This feature provides an ability to select which ledger transactions are accepted at ingestion time to be stored in Horizon’s historical database. Filter whitelists are maintained via an admin REST API (and persisted in the DB). The ingestion process checks the list and persists transactions related to Accounts and Assets that are whitelisted. Note that the feature does not filter the current state of the ledger and related DB tables, only history tables.
21+
22+
Whitelisting can include the following supported entities:
23+
24+
- Account id
25+
- Asset id (canonical)
26+
27+
Given that all transactions related to the white listed entities are included, all historical time series data related to those transactions are saved in horizon's history db as well. For example, whitelisting an Asset will also persist all Accounts that interact with that Asset and vice versa, if an Account is whitelisted, all assets that are held by that Account will also be included.
28+
29+
## Configuration:
30+
The filters and their configuration are optional features and must be enabled with horizon command line or environmental parameters:
31+
32+
```
33+
admin-port=[your_choice]
34+
```
35+
36+
and
37+
38+
```
39+
exp-enable-ingestion-filtering=true
40+
```
41+
42+
As Environment properties:
43+
44+
```
45+
ADMIN_PORT=<port>
46+
```
47+
48+
and
49+
50+
```
51+
EXP-ENABLE-INGESTION-FILTERING=True
52+
```
53+
54+
These should be included in addition to the standard ingestion parameters that must be set also to enable the ingestion engine to be running, such as `ingest=true`, etc.
55+
Once these flags are included at horizon runtime, filter configurations and their rules are initially empty and the filters are disabled by default. To enable filters, update the configuration settings, refer to the Admin API Docs which are published as Open API 3.0 doc on the Admin Port at `http://localhost:<admin_port>/`. You can paste the contents from that url into any OAPI tool such as [Swagger](https://editor.swagger.io/) which will render a visual explorer of the API endpoints. Follow details and examples for endpoints:
56+
57+
```
58+
/ingestion/filters/account
59+
/ingestion/filters/asset
60+
```
61+
62+
## Operation:
63+
64+
Adding and Removing Entities can be done by submitting PUT requests to the `http://localhost:<admin_port>/` endpoint.
65+
66+
To add new filtered entities, submit an `HTTP PUT` request to the admin API endpoints for either Asset or Account filters. The PUT request body will be JSON that expresses the filter rules, currently the rules model is a whitelist format and expressed as JSON string array.
67+
To remove entities, submit an `HTTP PUT` request to update the list accordingly.
68+
To retrieve what is currently configured, submit an `HTTP GET` request.
69+
70+
The OAPI doc published by the Admin Server can be pulled directly from the Github repo [here](https://github.com/stellar/go/blob/horizon-v2.18.0/services/horizon/internal/httpx/static/admin_oapi.yml).
71+
72+
### Reverting Options:
73+
74+
1. Disable both Asset and Account Filter config rules via the [Admin API](https://github.com/stellar/go/blob/master/services/horizon/internal/httpx/static/admin_oapi.yml) by setting `enabled=false` in each filter rule, or set `--exp-enable_ingestion_filtering=false`, this will open up forward ingestion to include all data again. It is then your choice whether to run a Re-ingestion to capture older data from past that would have been dropped by filters but could now be re-imported with filters off, e.g. `horizon db reingest <from_ledger> <to_ledger>`
75+
76+
2. If you have a DB backup:
77+
- restore the DB
78+
- run a Reingestion Gap Fill command to fill in the gaps to current tip of the chain
79+
- resume Ingestion Sync
80+
81+
3. Start over with a fresh DB (or see Patching Historical Data below)
82+
83+
### Patching Historical Data:
84+
If new Assets or Accounts are added to the whitelist and you would like to patch in its missing historical data, Reingestion can be run. The Reingestion process is idempotent and will re-ingest the data from the designated ledger range and overwrite or insert new data if not already on current DB.
85+
86+
## Sample Use Case:
87+
As an Asset Issuer, I have issued 4 assets and am interested in all transaction data related to those assets including customer Accounts that interact with those assets and the following:
88+
- Operations
89+
- Effects
90+
- Payments
91+
- Claimable balances
92+
- Trades
93+
94+
I would like to store the full history of all transactions related from the genesis of those assets.
95+
96+
### Pre-requisites:
97+
You have an existing Horizon installed, configured and has forward ingestion enabled at a minimum to be able to successfully sync to the current state of the Stellar network. Bonus if you are familiar with running re-ingestion.
98+
99+
Steps:
100+
101+
1. Configure 4 whitelisted Assets via the Admin API. Also check the `HISTORY_RETENTION_COUNT` and set it to `0` if you don’t want any history purged anymore now that you are filtering, otherwise it will continue to reap all data older than the retention.
102+
103+
2. Decide if you want to wipe existing history data on the DB first before the filtering starts running, you can effectively clear the history by running
104+
105+
```
106+
HISTORY_RETENTION_COUNT=1 stellar-horizon db reap
107+
```
108+
109+
or drop/create the db and run `stellar-horizon db init`.
110+
111+
Alternatively, if you do not need to free up old history tables, you can effectively stop here, anytime changes or enablement of filter rules are done, the history tables will immediately reflect filtered data per those latest rules from the time the filter config is updated and forward.
112+
113+
3. If starting with a fresh DB, decide if you want to re-run ingestion from the earliest ledger # related to the whitelisted entities to populate history for just the allowed data from filters.
114+
- Tip: To find this ledger number, you can check for the earliest transaction of the Account issuing that asset.
115+
- Also consider running parallel workers to speed up the process.
116+
117+
4. Optional: When re-ingestion is finished, run an ingestion gap fill `stellar-horizon db fill-gaps` to fill any gaps that may have been missed.
118+
119+
5. Verify that your data is there
120+
- Do a spot check of Accounts that should be automatically be ingested against a full history Horizon instance such as SDF Horizon
121+

0 commit comments

Comments
 (0)