Pipelines

This section describes the data pipelines in this project, their data sources, and how to manage data refresh patterns.

Data Sources

The following data will be ingested from my personal systems into a BigQuery warehouse for automation and analysis.

Notion
HubSpot
Fitbit

Pipeline Refresh Patterns

Your pipelines support flexible refresh modes for data loading:

Incremental (default): Only loads new/changed data since last run
Full refresh: Completely reloads all data, useful for data quality issues or schema changes

How to Trigger Full Refresh

Method 1: Environment Variable Override (Global)

export FORCE_FULL_REFRESH=true
pipenv run python -m pipelines.hubspot

Method 2: Pipeline-Specific Override

# Force full refresh for HubSpot only
export PIPELINE_NAME=HUBSPOT
export HUBSPOT_FULL_REFRESH=true
pipenv run python -m pipelines.hubspot

Method 3: Direct Function Parameter

from pipelines.hubspot import refresh_hubspot

# Force full refresh
refresh_hubspot(is_incremental=False)

# Use environment-based detection (default)
refresh_hubspot()  # or refresh_hubspot(is_incremental=None)

Environment Variables Reference

Variable	Description	Example
`FORCE_FULL_REFRESH`	Global override for all pipelines	`export FORCE_FULL_REFRESH=true`
`PIPELINE_NAME`	Pipeline identifier for specific overrides	`export PIPELINE_NAME=HUBSPOT`
`{PIPELINE_NAME}_FULL_REFRESH`	Pipeline-specific full refresh flag	`export HUBSPOT_FULL_REFRESH=true`