Skip to content

Commit b38fe81

Browse files
authored
Merge pull request #93 from segmentio/DOC-334
Added warehouse Sync Duration and made changes to the docs for Warehouse Syncs and Selective Sync [DOC-334]
2 parents a53ee51 + 634e639 commit b38fe81

File tree

5 files changed

+112
-97
lines changed

5 files changed

+112
-97
lines changed

src/_data/sidenav/main.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -197,8 +197,8 @@ sections:
197197
title: Warehouse Overview
198198
- path: /connections/storage/warehouses/schema
199199
title: Warehouse Schemas
200-
- path: /connections/storage/warehouses/selective-sync
201-
title: Warehouse Selective Sync
200+
- path: /connections/storage/warehouses/warehouse-syncs
201+
title: Warehouse Syncs
202202
- path: /connections/storage/warehouses/health
203203
title: Warehouse Health Dashboards
204204
- path: /connections/storage/warehouses/choose-warehouse

src/connections/storage/warehouses/faq.md

+6-25
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,11 @@ redirect_from: '/connections/warehouses/faq/'
55

66
## Can I control what data is sent to my warehouse?
77

8-
Yes! For those of you who are on our [Business plan](https://segment.com/pricing), you can choose which sources, collections, and properties sync to your data warehouse.
8+
Yes. Customers on Segment's [Business plan](https://segment.com/pricing) can choose which sources, collections, and properties sync to your data warehouse using [Warehouse Selective Sync](/docs/connections/storage/warehouses/warehouse-syncs/#warehouse-selective-sync).
99

10-
Selective Sync will help manage what data is sent to each individual warehouse, allowing you to sync different sets of data from the same source to different warehouses. Check out more information on how to use Selective Sync [here](https://segment.com/docs/guides/filtering-data/#warehouse-selective-sync).
11-
12-
Once a source, collection or property is disabled, we no longer sync data from that source. We will not, however, delete any historical data from your warehouse. When a source is re-enabled, we will sync all events since the last sync. Note: This does not apply when a collection or property is re-enabled - Only new data generated after re-enabling a collection or property will sync to your warehouse.
13-
14-
For Self-Serve and free customers, we do not currently support the ability to select which collections or properties sync to your warehouse.
10+
Selective Sync helps manage the data Segment sends to each warehouse, allowing you to sync different sets of data from the same source to different warehouses.
1511

12+
When you disable a source, collection or property, Segment no longer syncs data from that source. Segment won't delete any historical data from your warehouse. When you re-enable a source, Segment syncs all events since the last sync. This doesn't apply when a collection or property is re-enabled. Only new data generated after re-enabling a collection or property will sync to your warehouse.
1613

1714
## Can we add, tweak, or delete some of the tables?
1815

@@ -47,27 +44,11 @@ Your warehouse id appears in the URL when you look at the [warehouse destination
4744

4845
## How fresh is the data in Segment Warehouses?
4946

50-
Your data will be available in Warehouses within 24-48 hours. The underlying Redshift datastore has a subtle tradeoff between data freshness, robustness, and query speed. For the best experience we need to balance all three of these.
51-
52-
Real-time loading of the data into Segment Warehouses would cause significant performance degradation at query time because of the way Redshift uses large batches to optimize and compress columns. To optimize for your query speed, reliability, and robustness, our guarantee is that your data will be available in Redshift within 24 hours.
53-
54-
As we improve and update our ETL processes and optimize for SQL query performance downstream, the actual load time will vary, but we'll ensure it's always within 24 hours.
47+
Data is available in Warehouses within 24-48 hours. The underlying Redshift datastore has a subtle tradeoff between data freshness, robustness, and query speed. For the best experience, Segment needs to balance all three of these.
5548

56-
You can use the Sync History page to see the status and history of data updates in your warehouse. The Sync History page is available for every source connected to each warehouse. This page helps you answer questions like, "has the data from a specific source been updated recently?" "Did a sync completely fail, or only partially fail?" and "Why wasn't this sync successful?"
57-
58-
The Sync History includes the following information:
59-
- **Sync Status**: The possible statuses are:
60-
- _Success_: Sync run completed without any notices and all rows synced, OR no rows synced because no data was found.
61-
- _Partial_: Sync run completed with some notices and some rows synced.
62-
- _Failure_: Sync run with some notices and no rows synced.
63-
- **Start Time**: The time at which the sync began. Shown in your local timezone.
64-
- **Duration**: Length of time this sync took.
65-
- **Synced Rows**: Number of rows successfully synced from the sync run.
66-
- **Notices**: A list of errors or warnings found, which could indicate problems with the sync run. Click a notice message to show details about the result, and any errors or warnings for each collection included in the sync run.
67-
68-
> info ""
69-
> If a sync run shows a partial success or failure, the next sync attempts to syncing any data which was not successfully synced in the prior run.
49+
Real-time loading of the data into Segment Warehouses would cause significant performance degradation at query time because of the way Redshift uses large batches to optimize and compress columns. To optimize for your query speed, reliability, and robustness, Segment guarantees that your data will be available in Redshift within 24 hours.
7050

51+
As Segment improves and updates the ETL processes and optimizes for SQL query performance downstream, the actual load time will vary, but Segment ensures it's always within 24 hours.
7152

7253
## What if I want to add custom data to my warehouse?
7354

src/connections/storage/warehouses/selective-sync.md

-56
This file was deleted.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
---
2+
title: Warehouse Syncs
3+
redirect_from: '/connections/warehouses/selective-sync/'
4+
---
5+
6+
The Warehouse Sync process prepares the raw data coming from a source and loads it into a warehouse destination. There are two phases to the sync process:
7+
1. **Preparation phase**: This is where Segment prepares the data coming from a source so that it's in the right format for the loading phase.
8+
2. **Loading phase**: This is where Segment deduplicates data and the data loads into the warehouse destination. Any sync issues that occur in this phase can be traced back to your warehouse.
9+
10+
Instead of constantly streaming data to the warehouse destination, Segment loads data to the warehouse in bulk at regular intervals. Before the data loads, Segment inserts and updates events and objects, and automatically adjusts the schema to make sure the data in the warehouse is inline with the data in Segment.
11+
12+
Warehouses sync with all data coming from your source and your data is available in your warehouse within 24-48 hours. If you'd like to manage the data you send to your warehouse, use [Warehouse Selective Sync](#warehouse-selective-sync).
13+
14+
## Sync History
15+
You can use the Sync History page to see the status and history of data updates in your warehouse. The Sync History page is available for every source connected to each warehouse. This page helps you answer questions like, “Has the data from a specific source been updated recently?” “Did a sync completely fail, or only partially fail?” and “Why wasn’t this sync successful?”
16+
17+
The Sync History includes the following information:
18+
19+
* **Sync Status**: The possible statuses are:
20+
* *Success*: The sync run completed without any notices and all rows synced, OR no rows synced because no data was found.
21+
* *Partial*: The sync run completed with some notices and some rows synced.
22+
* *Failure*: The sync run completed with some notices and no rows synced.
23+
* **Start Time**: The time at which the sync began. This is shown in your local timezone.
24+
* **Duration**: The length of time the sync took.
25+
* **Synced Rows**: Number of rows successfully synced from the sync run.
26+
* **Notices**: A list of errors or warnings found, which could indicate problems with the sync run. Click a notice message to show details about the result, and any errors or warnings for each collection included in the sync run.
27+
28+
> info ""
29+
> If a sync run shows a partial success or failure, the next sync attempts to sync any data that was not successfully synced in the prior run.
30+
31+
### View the Sync History
32+
33+
To view the Sync History:
34+
1. Go to **Connections > Destinations** and choose the warehouse destination you want to view the sync history for.
35+
2. Click the source you want to view the sync history for.
36+
3. *(Optional)* Click on any of the rows in the Sync History table to see additional details related to that sync. You can view:
37+
* The **Results** of your sync which shows the number of rows synced for each collection.
38+
* The **Sync Duration** which shows the **Preparation** and **Loading** times of your sync.
39+
40+
## Warehouse Selective Sync
41+
42+
Warehouse Selective Sync allows you to manage the data that you send to your warehouses. You can use this feature to stop syncing specific events (also known as collections) or properties that aren’t relevant, and may slow down your warehouse syncs.
43+
44+
> info ""
45+
> This feature is only available to Business Tier customers. <br><br>You must be a Workspace Owner to change Selective Sync settings.
46+
47+
With Selective Sync, you can customize which collections and properties from a source are sent to each warehouse. This helps you control the data that is sent to each warehouse, allowing you to sync different sets of data from the same source to different warehouses.
48+
49+
> note ""
50+
> **NOTE:** This feature only affects [warehouses](/docs/connections/storage/warehouses/), and doesn't prevent data from going to any other [destinations](/docs/connections/destinations/).
51+
52+
When you disable a source, collection or property, Segment no longer syncs data from that source. Segment won't delete any historical data from your warehouse. When you re-enable a source, Segment syncs all events since the last sync. This doesn't apply when a collection or property is re-enabled. Only new data generated after re-enabling a collection or property will sync to your warehouse.
53+
54+
> warning ""
55+
> For each warehouse only the first 5,000 collections per source and 5,000 properties per collection are visible in the Selective Sync user interface. [Learn more about the limits](#selective-sync-user-interface-limits).
56+
57+
### When to use Selective Sync
58+
59+
By default, all sources and their collections and properties are sent, and no data is prevented from reaching warehouses.
60+
61+
When you disable sources, collections, or properties using Selective Sync, Segment stops sending new data for these sources, collections, or properties to your warehouse. It doesn’t delete any existing data in the warehouse.
62+
63+
If you choose to re-enable a source to begin syncing again, Segment loads all data that arrived since the last sync into the warehouse, but doesn’t backfill data that was omitted while these were disabled. When a collection or property is re-enabled, data only syncs going forward. It will not be loaded from the last sync.
64+
65+
### Enable Selective Sync
66+
67+
To use Selective Sync:
68+
1. Go to **Connections > Destinations** and select the warehouse you want to enable Selective Sync for.
69+
2. Click the **Settings** tab and click **Selective Sync** in the left menu.
70+
3. Select which sources, collections, and properties to sync. All that is not selected won't be synced to your warehouse.
71+
4. Click **Save Changes**.
72+
73+
### Change sync settings to a single warehouse from multiple sources
74+
75+
To change the sync settings to a single warehouse from multiple sources, follow the same steps as [above](#enable-selective-sync).
76+
77+
This may be valuable if you’re looking to make changes in bulk, such as when setting up a new warehouse.
78+
79+
80+
### Change sync settings on a specific Warehouse to Source connection
81+
82+
To manage data from one specific source to an individual warehouse:
83+
1. Go to **Connections > Destinations** and select the warehouse you want to change the sync settings for.
84+
2. On the **Warehouse Overview** page, click the **Schema** you want to change the sync settings for.
85+
3. On the **Settings** tab of the **Sync History** page for that source, select the data you want synced to your warehouse, or deselect the data you don't want synced.
86+
87+
This may be valuable when you're making smaller changes, for example, disabling all properties from one unnecessary collection.
88+
89+
> info ""
90+
> All changes made through Selective Sync only impact an individual warehouse. They don't impact multiple warehouses at once. To make changes to multiple warehouses, you need to enable/disable data for each individual warehouse.
91+
92+
### Selective Sync User Interface Limits
93+
94+
Regardless of schema size, for each warehouse only the first 5,000 collections per source and 5,000 properties per collection can be managed using the Selective Sync user interface. After you hit any of these limits, all future data is still tracked and sent to your warehouse. New collections created after hitting this limit is not displayed in the Selective Sync table.
95+
96+
You will see a warning in the Selective Sync user interface when the warehouse schema has reached 80% of the limit for collections and/or properties. An error message will appear when you've reached the limit.
97+
98+
Contact [Support](https://app.segment.com/help/contact/) to edit Selective Sync settings for any collections and/or properties which exceed the limit.
99+
100+
> warning ""
101+
> Only Workspace Owners can change Selective Sync settings.

0 commit comments

Comments
 (0)