Building a modern analytics pipeline for Tableau Public with dlt, dbt, and DuckDB

Written on August 12, 2025

At the outset of the pandemic, I first wrote about another crisis in motion: the cratering of newspaper jobs in Maine, where I spent most of my journalism career.

Since that writing, many times when a story pops about Maine’s newspaper industry and its challenges, I’ll get an email about those numbers and where the most recent figures might be found.

This is probably the most limited use case for building a robust and lightweight analysis pipeline for data from the U.S. Bureau of Labor Statistics’ Quarterly Census of Employment and Wages, but it is my immediate use case and a good sample problem.

The result is a pipeline that demonstrates how modern data tools can work together to serve both rigorous data engineering practices and public-facing visualizations through Tableau Public.

To start, I’ll focus on the visualizations themselves. Skip ahead for details on the how-to (which, full disclosure, Claude helped me write based on the Github README that I wrote with my own brain).

A dramatic drop in newspaper jobs

It’s not surprising that newspaper jobs have been on a steady decline, but the extent in the context of all jobs is more striking.

The map below allows you to explore these trends by individual state or region (lasso select) to see the raw change of jobs, the percentage change relative to all jobs, and the percentage change in average pay compared to all jobs.

Diving more into pay, it’s interesting that there are many markets where newspaper publishing jobs do make more than the average job in the area. You could perhaps guess at most of the places where this is the case, but it does support the perception of places associated with the “media elite.”

Of course, this doesn’t include Maine or most of the country.

The Challenge: Making Government Data Accessible

The U.S. Bureau of Labor Statistics publishes rich employment data through the Quarterly Census of Employment and Wages (QCEW), but working with this data presents several challenges:

Scale: The complete dataset is massive, requiring selective extraction
Format: Full historical data comes in ZIP archives that need processing
Evolution: Industry codes (NAICS) change over time, requiring careful handling
Publishing constraints: Tableau Public has specific requirements for data source connections in workbooks

My goal was to create a pipeline that could handle these complexities while producing publication-ready visualizations showing the dramatic decline in newspaper publishing jobs from 1990 to 2024.

The Solution: A Modern Data Stack

The architecture I settled on combines three tools I’ve settled on for robust data projects running locally:

dlt (data load tool): Robust Data Extraction

dlt handles the heavy lifting of data extraction from the BLS ZIP archives. What I love about dlt is its reliability and flexibility – the same pipeline code could easily target different databases or cloud warehouses and it provides robust metadata about each load.

DuckDB: Local Analytics Powerhouse

DuckDB serves as both the analytical database and the transformation engine. Its ability to handle complex analytical queries on local data makes it perfect for this use case. As an added benefit, DuckDB databases can be attached over HTTPS, meaning that a database that others can directly access and query a DuckDB database stored alongside a project (most likely using Git Large File Storage).

dbt: Transformation with Documentation

dbt provides the transformation layer, but more importantly, it brings software engineering best practices like testing, documentation, and modular code to the analytics workflow.

Customizing dbt for Tableau Public: Schema-Based CSV Export

A key part of this pipeline is how it handles the final mile to Tableau Public. Since Tableau Public doesn’t support direct DuckDB connections, I sought a way to automatically generate CSV sources for the visualizations while maintaining the full analytical capabilities of the warehouse.

To simplify this, I created a dedicated tableau_public schema in dbt that contains models specifically designed for public consumption. This separation serves several purposes:

Clean interfaces: Public-facing models can be simplified and optimized for visualization
Security: Sensitive or incomplete data stays in other schemas
Performance: Export only what’s needed for the dashboards
Flexibility: The main warehouse can contain much more complex transformations

Automated CSV Export with Post-Hooks

The magic happens in the dbt_project.yml configuration:

models:
  newspaper_jobs:
    tableau_public:
      +materialized: view
      +post-hook: "COPY (SELECT * FROM ) TO '/tableau_public/.csv' (HEADER, DELIMITER ',')"

This configuration automatically exports every model in the tableau_public schema to a CSV file after it’s built. The post-hook runs a DuckDB COPY command that:

Exports the complete model results
Includes headers for easy Tableau consumption
Places files in a predictable location
Maintains the same naming convention as the dbt models

Why This Pattern Works

This approach solves several problems I’ve encountered in analytics projects:

1. Best of Both Worlds

You get the full power of a modern data warehouse for development and exploration, plus the accessibility of CSV files for tools like Tableau Public that have connection limitations.

2. Reproducible Publishing

Every time you run dbt run, your public-facing datasets are automatically refreshed. No more manual exports or stale data in dashboards.

3. Scalable Architecture

This pattern works whether you’re building a simple dashboard or a comprehensive analytics platform. Additional schemas can serve different purposes (internal reporting, data science, etc.) while the tableau_public schema remains focused on public consumption.

4. Version Control for Visualizations

Since the CSV files are generated automatically and can be committed to git, you have a complete audit trail of what data was used in each version of your visualizations.

Implementation Tips

If you’re building something similar, here are key considerations:

Database Choice

DuckDB excels for local analytics, but this same pattern works with PostgreSQL, BigQuery, or Snowflake. The post-hook syntax would need adjustment for different SQL dialects.

File Organization

I recommend a clear folder structure:

project/
├── dbt_models/
│   ├── staging/
│   ├── marts/  
│   └── tableau_public/
├── tableau_public/           # Auto-generated CSVs
└── visualizations/          # Tableau workbooks

Beyond This Use Case

While this example focuses on employment data and Tableau Public, the pattern generalizes well:

Academic research: Export cleaned datasets alongside analysis code
Client deliverables: Automated report generation with data exports
Multi-tool workflows: Bridge different analytics tools with standardized outputs
Data sharing: Create public datasets while maintaining private analytical capabilities

Conclusion

Modern data tools have made it possible to maintain rigorous engineering practices while still producing accessible, public-facing analytics. By combining dlt’s extraction capabilities, dbt’s transformation framework, and DuckDB’s analytical power with a schema-based export strategy, you can build pipelines that serve both internal analytics teams and public visualization needs.

The key insight is treating your public-facing data as a product with its own schema, transformations, and quality requirements, rather than an afterthought export from your main analytical work.

The complete pipeline code and Tableau workbooks are available at github.com/darrenfishell/newspaper-jobs.