Project Overview

This repository contains the infrastructure and workflows for a personal data platform. It leverages Google Cloud Platform services including BigQuery for data warehousing and Secret Manager for secure credential management, with automated orchestration through GitHub Actions.

Architecture

Data Pipeline Stack

dlt hub - Extract, load, and transform source data into BigQuery raw layer
dbt core - Transform raw data into analytics-ready models and views
BigQuery - Cloud data warehouse for storage and analysis
GCP Secret Manager - Secure credential management for API keys and connections
GitHub Actions - Automated orchestration and scheduling of data pipelines

Project Structure

The project follows modern data engineering best practices with clear separation of concerns:

├── pipelines/           # dlt data extraction pipelines
│   ├── hubspot.py      # HubSpot CRM data pipeline
│   ├── fitbit.py       # Fitbit health data pipeline
│   ├── notion.py       # Notion habits data pipeline
│   └── common/         # Shared utilities and helpers
├── dbt/                # dbt transformation models
│   └── michael/        # Personal dbt project
├── .github/            # GitHub Actions workflows
│   └── workflows/      # CI/CD and orchestration
├── scripts/            # Utility scripts and helpers
└── config/             # Configuration files and templates

Naming Conventions

dlt pipelines: {source}__{entity} (e.g., hubspot__contacts, fitbit__sleep)
dbt models: {layer}_{source}__{entity} (e.g., staging_hubspot__contacts, contacts)
GitHub Actions: {actions}-{frequency} (e.g., dlt-daily)