Skip to content

Data Lakes Reports Documentation [DOC-359] #2136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 15, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/_data/sidenav/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,8 @@ sections:
title: Set Up Data Lakes
- path: /connections/storage/data-lakes/sync-reports
title: Sync Reports and Error Reporting
- path: /connections/storage/data-lakes/sync-history
title: Data Lakes Sync History and Health
- path: /connections/storage/data-lakes/comparison
title: Data Lakes vs. Warehouses
- section_title: Data Warehouses
Expand Down
51 changes: 51 additions & 0 deletions src/connections/storage/data-lakes/sync-history.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
title: Data Lakes Sync History and Health
---
{% include content/plan-grid.md name="data-lakes" %}

The Segment Data Lakes sync history and health tabs generate real-time information about data syncs so you can monitor the health and performance of your data lakes. These tools provide monitoring and debugging capabilities within the Data Lakes UI, so you can identify and proactively address data sync or data pipeline failures.

> info ""
> This feature is currently in Public Beta. Segment’s [First-Access and Beta terms](https://segment.com/legal/first-access-beta-preview/) govern the use of this feature.

## Sync History
The 'Sync History' table shows detailed information about the latest 100 syncs to the data lake. The table includes the following fields:
* **Sync status:** The status of the sync: either 'Success,' indicating that all rows synced correctly, 'Partial Success,' indicating that some rows synced correctly, or 'Failed,' indicating that no rows synced correctly
* **Start time:** The time the sync began
* **Duration:** How long the sync took to complete
* **Synced rows:** The number of rows that synced to the data lake
* **Notices:** Any notes or warnings about the sync

Selecting a row in the Sync History table opens a sidebar showing the number of rows from each collection that synced.

To access the Sync History page from the Segment app, open the **My Destinations** page and select the data lake. On the data lakes Settings page, select the **Sync History** tab.

## Health
The health tab provides an overview of the rows that synced to your data lake both today and each day for the last 30 days.

The bar chart, 'Daily Synced Rows,' shows an overview of the rows synced for each of the last 30 days. Hovering over a date shows the number of rows that were synced for that day. Selecting a date from the bar chart opens the Daily Row Volume table, which provides a breakdown of which collections synced, how many rows from each collection synced, and the percentage of all synced rows from each collection .

The Daily Row Volume table contains the following information:
* **Collections:** The name of each collection of properties synced to the data lake
* **Rows:** The number of rows synced from each collection
* **% of Total:** The percentage of the total number of rows synced that each collection represents

Above the Daily Row Volume table is an overview of the total syncs for the current day, showing the number of rows synced, the number of collections that synced, and the current date.

To access the Sync history page from the Segment app, open the **My Destinations** page and select the data lake. On the data lakes settings page, select the **Health** tab.

## Data Lakes Reports FAQ
##### How long is a data point available?
The health tab shows an aggregate view of the last 30 days worth of data, while the sync history retains the last 100 syncs.

##### How do sync history and health compare?
The sync history feature shows detailed information about the most recent 100 syncs to a data lake, while the health tab shows just the number of rows synced to the data lake over the last 30 days.

##### What timezone is the time and date information in?
All dates and times on the sync history and health pages are in the user's local time.

##### When does the data update?
The sync data for both reports updates in real time.

##### When do syncs occur?
Syncs occur approximately every two hours. Users cannot choose how frequently the data lake syncs.