lefttexas.blogg.se

Airflow etl redshift
Airflow etl redshift












The remainder of this post describes our approach and learnings in detail. This made it difficult to coordinate the various activities around this project.Ĭhallenges notwithstanding, we were able to successfully complete the migration in two months.

  • At the time when we started this project, we were forced to go fully remote due to the Covid-19 situation with the entire team on reduced working hours.
  • A migration project meant communicating and getting support from all these teams and making it easy for them to quickly migrate. It is difficult to keep track of all these users and the way they use data.
  • Many internal teams consume the data from the warehouse using various tools (plain SQLs, Tableau, Redash and many more).
  • These are poorly documented and very difficult to understand .
  • Several core legacy pipelines (mostly batch jobs) were written purely in SQL over Redshift.
  • We had over 150 tables across multiple schemas and almost an equal number of downstream systems and processes.
  • There were a large number of tables and datasets to be migrated.
  • Attempting a warehouse migration in this environment was tricky because

    #Airflow etl redshift code#

    But this has also created an unwanted amount of legacy code through our data infrastructure. Our data pipelines have grown significantly in the last few years. There were significant cost savings in using a single data warehouse instead of two.Hence, it made sense to store all data in a single data warehouse.

    airflow etl redshift

    Marketing ) were already using BigQuery due to its deeper integration with Google Analytics and Adwords. It is also much more forgiving of poorly written legacy code. We found BigQuery performance to be orders of magnitude better than Redshift for our use cases.

    airflow etl redshift

    Although Redshift has improved quite a lot in this area (with concurrency scaling, elastic resize etc.), it is still not as hands-free as BigQuery. Redshift requires non trivial amount of effort to keep running.Our BI data pipelines were traditionally run on AWS for legacy reasons ¹ and it was time for us to align with the rest of the organization. The rest of our company’s infrastructure runs on Google Cloud.My team recently migrated our data warehouse at Omio from AWS Redshift to Google BigQuery. We recently migrated our data warehouse at Omio from AWS Redshift to Google BigQuery. On Data-Engineering, Datawarehouse, Bigquery, and Redshift How we migrated our data warehouse from Redshift to BigQuery | Miles 2 code Home Subscribe How we migrated our data warehouse from Redshift to BigQuery












    Airflow etl redshift