Creating a Flow
Defining the pipeline between Salesforce & S3 with copy to Redshift
Last updated
Defining the pipeline between Salesforce & S3 with copy to Redshift
Last updated
Follow the below steps to create a flow with Salesforce as Input & S3 as Output along with a copy option to Redshift
Give a friendly name to the flow
Accept the default flow code, that is generated based on the name
Uncheck "Publish Transformation" if you want to perform some transformations on the flow events
Uncheck "Publish Mapping" if you want to manually perform some complex mapping between input & output
Uncheck if you want to allow events that doesn't comply with the input schema
Choose the Input connection
Check to copy the file uploaded to S3 into Redshift
Choose the Redshift connection to be used for the copy command
Enter the Salesforce object name to be replicated from Input to Output
Specify the fetch size to be used which performing the JDBC query
Specify the batch size used to determine the topic partition. Specify a large number if you want the data to load in the order in which it was queried.
Specify a partition size greater than 1 if you want data to be imported in parallel across nodes - You must run the Input service on multiple nodes for this to be more effective.
Specify a comma separated list of columns to be included in the query, if left blank, all columns will be fetched
Specify a comma separated list of columns to be excluded. Will be ignored, if include list is provided.
Specify a local directory path to be used to temporarily store the Salesforce bulk export files for the initial run.
Choose an Incremental Policy to be used for this flow. Refer to Step 7
Choose the creation date column if available. It will be used to determine if the event is newly inserted or updated
Specify the last update date column to be used to track the updates
Full dump and load - will delete all the rows from the target table and reload full data again on every run
Incremental Using Numeric ID Column - will use a numeric ID column to incrementally load newly added rows with ID higher than the max ID of the previous run. e.g. inventory transactions or ledger transactions where there won't be any updates and only inserts with a running sequential ID used for the transaction ID column.
Incremental Using Last Update Date Column - will use a timestamp column to fetch incremental data with the timestamp value greater than the max value of the previous run
One Time Load - will just load data only once and the flow will not be schedule again. The flow status will be changed to Complete after the one time load.
Cron Expression - use this to schedule at a specific time of the day or day of the week/month etc using a cron expression
Fixed Interval - use this to schedule every x minutes
After Parent Flow - use this to define dependency between flows. You can choose to run this flow soon after the parent flow runs irrespective of the status.
After Parent Flow Success - similar to the above, but only runs if the parent flow completed without any errors.
After Parent Flow Failure - similar to the above, but only runs when the parent flow completed with errors.
Congratulations, you have just created your first flow