Back in 2022 as a BI developer I’ve fallen into a situation that our corporate functions couldn’t serve us the necessary data in a timely manner. Unfortunately our organization was not their priority so we were continuously suffering from the lack of data.
This situation motivated me to start learning data engineering and initiate our own data environment independent from our corporate function. After some trials, finally we chose Azure Synapse Analytics. The reason behind it is that Synapse Analytics is including several different components such as Pipelines like in Data Factory, notebooks like in Databricks, UI for SQL queries, monitoring, etc, all in one. And as we are a small team, this all in one approach was balanced the relatively high cost of the system vs if we would have used the above mentioned services one-by-one.
In this section you can find some thoughts around how I set up our environment, some tips and some approaches which are not necessarily the industry standards, but considered as best options for a relatively small organization.
Time Travel with Data – Use cases and techniques
Data Historization is a pretty frequent need in data analytics that should be managed at some point in most of the companies. Let’s just think about the following use case. A company is having it’s sales order backlog in it’s ERP system. Management is interested in the backlog, but not only how much it is…
Upgrade Pipeline to New Salesforce Connector
In the last couple of days I was working on migrating my Synapse Analytics pipelines from one tenant to the other. One of my challenges was the Salesforce data ingesting. As it can be found on Microsoft’s site, the Salesforce connector has changed to V2, and all pipelines should be updated latest by the Oct…
Get notification on Synapse Pipeline failures
I do love working in Synapse as I think this is a highly integrated orchestration tool that really contains almost everything that’s needed for a data engineer. However just as all tools it also has some missing pieces. One of those pieces is the alerting functionality at least a native build in alerting functionality. This…
Dynamic Schema Adjustment in Synapse Pipeline
In my previous blog posts () I’ve explained how to setup a dynamic pipeline with a control table. With that the ingestion job can be fully automated and scheduled for the appropriate frequency. In this post I’ll write a bit about how we can manage schema changes in the source system. Schema change is dependent…
Setting up a Dynamic Pipeline from Control Table
In the previous post I’ve explained how a Control Table should be setup up to handle necessary sources and processing behavior. In this post I’ll explain how to start creating a dynamic pipeline and add customized actions based on the table needs. When I’m setting up a dynamic pipeline I’m considering what actions should be…
Create and Manage the Control Table for Dynamic Pipelines
As introduced in my other article I’m setting up dynamic pipelines to create sustainable and scalable flow for data integration. In this article I’m going to show you the first step for this, that is setting up the Control Table. The Control Table is the place where you can manage which tables should be collected…
The “Data Ferrari” vs “Data Toyota”
When I’ve learned about data engineering and started to setup our own system first I was in trouble how to do that the best way. How the architecture should look like for our exact needs? Those questions were even harder after having some consultations with other experts and completing more and more training materials. Ultimately the…
