FAQ

Frequently asked questions about FLOW

What is FLOW?

FLOW is a modern, scalable, real time data pipeline. It allows you to stream data from various sources, validate, transform and enrich the data and then load into your data lake or data warehouse.

Does CloudIO offer an on-premise solution of FLOW?

Yes, FLOW can be installed either on on-premise or on the Cloud.

Do I really need an out-of-the-box data pipeline solution? Can't I just build my own pipeline?

That’s a good question. Most of our customers end up deciding based on these factors:

  • How quickly do I want to get set up? Days or months?

  • How complex are my data needs? One source with a schema that never changes or many sources with more in the future that change schemas often?

  • How fast am I growing? Can I handle my current and future scale or will any growth exceed my capabilities?

  • How important is data integrity? Do I need all of my data no matter what, or is data loss acceptable?

  • How fast do I want the data? Minutes or delayed up to a few hours or days?

  • What kind of transformation or enrichment do I need? Is this just a dump and load or will making minor or even major changes along the way be of importance?

  • Do I need an ability to monitor in real-time with audit logs?

Is CloudIO resilient to changes in my data?

Yes, you got two choices. You can either let FLOW automatically take care of the schema changes by automatically adding any newly created fields into the output schema or have all the data with schema changes moved into an error topic.

Does FLOW store my data?

No, FLOW is installed on your systems and hence we have no access to your data.

Moreover it's a data pipeline and your data in FLOW is considered "data in motion" which allows us to work within many financial services and health care regulations.

Under the hood, FLOW uses Apache Kafka and you can configure how long the data will be allowed to stay in Kafka topics.

If a situation occurs that prevents data from loading to your data destination the data that have not yet replicated to your data destination will be temporarily stored within an error topic until the error(s) are rectified and re-processed.

Does FLOW requires a database?

Yes, we store our metadata in a database. We support any of Oracle, MySQL, SQL Server or PostgreSQL databases.

How long does it take to install and setup FLOW?

Once the infrastructure is ready for the installation, it usually takes few minutes to less than a day to install and setup FLOW.

Can I selectively extract only few columns from a given source table?

Yes, while defining a flow, you have an option to either include or exclude columns from the source table.

Can I mask or encrypt parts of my incoming data?

Yes, you can mask, encrypt or hash incoming data based on matching table and/or column names. e.g. you can hash all the values with column name containing SSN.

Can I prevent invalid data from loading into my data destination?

Yes, you can define validation rules such as Not Null, Min Length or Custom which will route all the data that fail validation into an error queue for re-processing. You can also use the transformation to either skip or fix any invalid data.

Other Questions?

We're always happy to help with any other questions you might have! Send us an email.

Last updated