AI as a Catalyst: Can Intelligent Tools Transform Data Engineering in the Zürich Rental Market?

In our journey through the data engineering landscape of the Zürich rental market, we're in the phase of meticulously examining additional datasets to broaden our analysis. These datasets, vital to our project, cover crime statistics, housing information, economic indicators, and demographic data. Among these, 21 datasets stand out, each linked by the common thread of district number and name—a key spatial attribute that will serve as our anchor for integrating these diverse datasets into a unified analysis.

Google Docs Image - Structured Prompt for Translating Attributes and CSV Headers

Prompt Engineering

Given my preference and proficiency in English, a significant step involves translating dataset headers and attribute descriptions found on the Stadt Zürich Open Data portal. This task is crucial for ensuring accuracy and coherence in our analysis. To achieve this, I leverage a structured input format, akin to a COSTAR prompt, with ChatGPT to facilitate the translation and interpretation of these critical data elements. It's worth noting for our readers that a "COSTAR prompt" refers to a meticulously crafted query that optimizes the interaction with AI models, ensuring precise and contextually relevant translations. In the above example, the traditional COSTAR framework is not used, rather the a request is made and the context is provided after the request. It works, though the results warrant careful review as naming conventions need to be consistent across the various data sets. ChatGPT appears to have provided more or less consistent results so far.

ChatGPT Output - Translated Attribute List, including english headers, descriptions and data types

The flexibility and efficiency of ChatGPT have made it an indispensable tool at this stage. My workflow integrates the use of Notion for project management and Google Docs for prompt creation and documentation. This system allows for an organized record of the AI-generated translations, which are then carefully documented in a Google Docs page for each dataset, with supplementary notes in Notion for comprehensive tracking.

ChatGPT's contribution extends beyond mere translation; it assists in structuring the data headers into a concise, comma-separated format that is directly implemented into our project documentation. This streamlined approach significantly enhances our documentation process, setting a solid foundation for the subsequent data transformation tasks in Azure Databricks.

The Right Tool for the Job

Addressing the technical nuances, it's important to clarify the role and implications of utilizing tools like Microsoft Azure Translator for extensive datasets or complex translation needs. While OpenAI (including ChatGPT) and Azure Translator are indispensable for their efficiency, it's crucial to acknowledge the cost implications of these services. Unlike some basic services, these advanced tools require a subscription, underscoring the need for a careful cost-benefit analysis in choosing the right tools for our project needs.

Moreover, the post briefly mentioned the transformation tasks in Azure Databricks without delving into the specifics. For a clearer understanding, these tasks typically involve data cleaning, normalization, and aggregation—essential steps that prepare our datasets for in-depth analysis. For instance, renaming headers for consistency across datasets ensures seamless integration and analysis, highlighting the practical application of these transformations in our project.

This detailed exploration into the data engineering process behind analyzing the Zürich rental market is not just a technical endeavor but a narrative that blends precision, efficiency, and strategic planning. Our aim is to demystify the complex processes involved, making them accessible and engaging for our audience of blog readers, peers, and fellow data engineers. Through this journey, we not only navigate the intricacies of data engineering but also underscore the value of leveraging advanced tools and methodologies to unravel the dynamics of the rental market in Zürich.


