Overview of workflows

This document helps you understand the architecture and execution of workflows in Dataform.

You can use Dataform to develop, test, and version control workflows that you can run in BigQuery to transform data for analytics purposes. You can develop workflows with Dataform core, using SQLX files and optionally JavaScript files, or with JavaScript.

A workflow can consist of the following objects:

Data source declarations
Declarations of BigQuery data sources that let you reference these data sources in Dataform table definitions and SQL operations.
Tables
Tables that you create in Dataform based on the declared data sources or other tables in your workflow. Dataform supports the following table types: table, incremental table, view, and materialized view.
Assertions
Data quality test queries that you can use to validate table data. Dataform runs assertions every time it updates your workflow and it alerts you if any assertions fail.
Custom SQL operations
SQL statements that Dataform runs in BigQuery as they are, without modification.
Includes
JavaScript files with definitions of variables and functions that you can reuse across your workflow.

Visualisation of a workflow

You can view your workflow visualized in the form of a Directed Acyclic Graph (DAG). The DAG displays all objects of the workflow defined in your workspace, and the relationships between them. You can zoom in and out and use drag-and-drop navigation in the DAG. If there are compilation errors in your workflow, Dataform displays an error message instead of the DAG.

To view the DAG of your workflow, in your workspace, click Compiled graph.

Execution of a workflow

In your development workspace, you can manually trigger a run of your entire workflow, a selection of actions, or a selection of tags.

You can schedule runs with Dataform release configurations and workflow configurations. First, create a release configuration to create compilation results of your repository. Then, create a workflow configuration, select a release configuration, select workflow actions that you want to run, and set the run schedule.

Alternatively, you can schedule runs either with Cloud Composer, or with Workflows and Cloud Scheduler.

During execution, Dataform runs SQL queries in BigQuery, following the order of object dependencies in your workflow. After execution, you can use your defined tables and views for all your analytics purposes in BigQuery.

Execution configuration options

To run a specific group of your workflow actions, you can add Dataform execution tags to the files you select. You can then run only the files with a selected tag when you manually trigger a run.

By default, Dataform runs your workflow with the execution settings defined in the dataform.json file. You can override these execution settings with compilation overrides.

With workspace compilation overrides, you can turn workspaces into isolated execution environments. This means that when you manually trigger execution in a workspace, Dataform runs the output in an isolated location in BigQuery.

To create and run a single compilation result with compilation overrides you can pass requests with the Dataform API.

With release configurations, you can configure compilation overrides for your whole repository, as well as the frequency of creating compilation results with the applied settings.

To learn more about ways to configure compilation and code lifecycle in Dataform, see Introduction to code lifecycle in Dataform.

What's next