The Schema Wears No Tables

A Dad to Data Engineer’s blog.

Anthony Calek Anthony Calek

AI as a Catalyst: Can Intelligent Tools Transform Data Engineering in the Zürich Rental Market?

AI as a Catalyst: Can Intelligent Tools Transform Data Engineering in the Zürich Rental Market?

In our journey through the data engineering landscape of the Zürich rental market, we're in the phase of meticulously examining additional datasets to broaden our analysis. These datasets, vital to our project, cover crime statistics, housing information, economic indicators, and demographic data. Among these, 21 datasets stand out, each linked by the common thread of district number and name—a key spatial attribute that will serve as our anchor for integrating these diverse datasets into a unified analysis.

Row 26022024

In our journey through the data engineering landscape of the Zürich rental market, we're in the phase of meticulously examining additional datasets to broaden our analysis. These datasets, vital to our project, cover crime statistics, housing information, economic indicators, and demographic data. Among these, 21 datasets stand out, each linked by the common thread of district number and name—a key spatial attribute that will serve as our anchor for integrating these diverse datasets into a unified analysis.

Google Docs Image - Structured Prompt for Translating Attributes and CSV Headers

Prompt Engineering

Given my preference and proficiency in English, a significant step involves translating dataset headers and attribute descriptions found on the Stadt Zürich Open Data portal. This task is crucial for ensuring accuracy and coherence in our analysis. To achieve this, I leverage a structured input format, akin to a COSTAR prompt, with ChatGPT to facilitate the translation and interpretation of these critical data elements. It's worth noting for our readers that a "COSTAR prompt" refers to a meticulously crafted query that optimizes the interaction with AI models, ensuring precise and contextually relevant translations. In the above example, the traditional COSTAR framework is not used, rather the a request is made and the context is provided after the request. It works, though the results warrant careful review as naming conventions need to be consistent across the various data sets. ChatGPT appears to have provided more or less consistent results so far.

ChatGPT Output - Translated Attribute List, including english headers, descriptions and data types

The flexibility and efficiency of ChatGPT have made it an indispensable tool at this stage. My workflow integrates the use of Notion for project management and Google Docs for prompt creation and documentation. This system allows for an organized record of the AI-generated translations, which are then carefully documented in a Google Docs page for each dataset, with supplementary notes in Notion for comprehensive tracking.

ChatGPT's contribution extends beyond mere translation; it assists in structuring the data headers into a concise, comma-separated format that is directly implemented into our project documentation. This streamlined approach significantly enhances our documentation process, setting a solid foundation for the subsequent data transformation tasks in Azure Databricks.

The Right Tool for the Job

Addressing the technical nuances, it's important to clarify the role and implications of utilizing tools like Microsoft Azure Translator for extensive datasets or complex translation needs. While OpenAI (including ChatGPT) and Azure Translator are indispensable for their efficiency, it's crucial to acknowledge the cost implications of these services. Unlike some basic services, these advanced tools require a subscription, underscoring the need for a careful cost-benefit analysis in choosing the right tools for our project needs.

Moreover, the post briefly mentioned the transformation tasks in Azure Databricks without delving into the specifics. For a clearer understanding, these tasks typically involve data cleaning, normalization, and aggregation—essential steps that prepare our datasets for in-depth analysis. For instance, renaming headers for consistency across datasets ensures seamless integration and analysis, highlighting the practical application of these transformations in our project.

This detailed exploration into the data engineering process behind analyzing the Zürich rental market is not just a technical endeavor but a narrative that blends precision, efficiency, and strategic planning. Our aim is to demystify the complex processes involved, making them accessible and engaging for our audience of blog readers, peers, and fellow data engineers. Through this journey, we not only navigate the intricacies of data engineering but also underscore the value of leveraging advanced tools and methodologies to unravel the dynamics of the rental market in Zürich.

Anthony Calek Anthony Calek

Diving into the Depths of Zurich's Rental Market: A Data Engineering Perspective

Diving into the Depths of Zurich's Rental Market: A Data Engineering Perspective

Zurich, a city with a rich tapestry of history and culture, has seen its rental market become a topic of intense discussion and analysis. Since 1893, when the city was first unified, the evolution of rental prices has been meticulously documented, revealing trends that mirror the city's growth and transformation. The 2022 analysis of rental prices, a continuation of this long-standing tradition, offers a comprehensive view of the current state of the market, leveraging the wealth of data collected over the years.

Row 24022024

Zurich, a city with a rich tapestry of history and culture, has seen its rental market become a topic of intense discussion and analysis. Since 1893, when the city was first unified, the evolution of rental prices has been meticulously documented, revealing trends that mirror the city's growth and transformation. The 2022 analysis of rental prices, a continuation of this long-standing tradition, offers a comprehensive view of the current state of the market, leveraging the wealth of data collected over the years.

A Deep Dive into the 2022 Analysis

The latest rental prices analysis reveals that Zurich had 182,900 apartments at the end of 2021, with 161,769 of these being available for a deeper dive after excluding special cases. This extensive database, covering two-thirds of the apartments in the non-profit sector and about 20 percent in the private sector, allows for detailed evaluations across all urban districts and some of the 34 statistical districts. The findings highlight a citywide median net rental price of 1,336 francs per month, with a noticeable increase for larger apartments, reflecting a nuanced rental landscape that varies significantly across different parts of the city.

The Role of Open Data and AI

The city of Zurich's open data initiative provides a treasure trove of information for data engineers and analysts. By making this data publicly available, it opens up opportunities for innovative analyses and deeper insights into the factors driving rental prices. The application of artificial intelligence and machine learning to this dataset enables the prediction of trends, the identification of affordability challenges, and the exploration of the impact of various factors on rental costs. This approach not only enhances our understanding of the rental market but also contributes to the development of more informed housing policies and decisions.

Insights from the Data

One of the key findings from the 2022 analysis is the significant increase in rental prices over the years. Since the 2000 census, apartment rents have risen by approximately 40 percent across all room categories. However, when looking at prices per square meter, the increase is somewhat less sharp, ranging from 20 to 35 percent. This discrepancy highlights the impact of modernization and replacement construction on the housing stock, leading to a proliferation of larger, more expensive apartments.

Moreover, the analysis sheds light on the disparity between non-profit and private sector rents. While non-profit housing developers, operating on the principle of cost rent, have seen a comparatively moderate increase in square meter rents, private rents have surged by 25 to 38 percent. This divergence underscores the broader market dynamics at play, influenced by factors such as construction standards, condition, exposure, and the noise situation, which were not directly captured in the survey.

Encouraging Exploration and Engagement

The comprehensive dataset provided by Zurich's open data initiative is not just a resource for data engineers; it's an invitation to all interested parties to explore, analyze, and engage with the data. By delving into the 2022 rental prices analysis, individuals can gain a deeper understanding of the market's complexities and contribute to the ongoing dialogue about housing affordability and policy.

In conclusion, the exploration of Zurich's rental market through data engineering and AI offers profound insights into the city's living conditions and economic trends. As we continue to unravel the intricacies of the rental landscape, the collaboration between data professionals, policymakers, and the public will be crucial in shaping a more equitable and sustainable housing market for all Zurich residents.

Link to report: Rental Prices in the City of Zürich / Mietpreise in der Stadt Zürich

At the time of writing: 1 Swiss Franc (CHF) = $1.14 and €1.05

Anthony Calek Anthony Calek

Data Engineering's New Frontier: LLMs and the Leap Beyond

Data Engineering's New Frontier: LLMs and the Leap Beyond

In the nascent stages of our project analysing Zurich's rental prices, I've turned to Large Language Models (LLMs) like OpenAI, not as a crutch, but as a catalyst. Here's why: these models are reshaping the terrain of data engineering, nudging us toward the vast expanses of data analysis and machine learning. Even for someone like me, who previously eyed these fields with a mix of respect and hesitation, LLMs have been a revelation.

Row 18022024

In the nascent stages of our project analysing Zurich's rental prices, I've turned to Large Language Models (LLMs) like OpenAI, not as a crutch, but as a catalyst. Here's why: these models are reshaping the terrain of data engineering, nudging us toward the vast expanses of data analysis and machine learning. Even for someone like me, who previously eyed these fields with a mix of respect and hesitation, LLMs have been a revelation.

But let's set the record straight: embracing LLMs is not about sidelining the rigorous path of education. On the contrary, these technologies serve as an adjunct, enhancing the learning curve, enriching the experience, and paving the way for better outcomes. They're like the seasoned guide in a climber's ascent, offering new pathways to those willing to explore.

Acknowledging the potential of LLMs also means tempering our expectations. The journey into AI-assisted analysis is charged with promise but punctuated by reality checks. Not every experiment will redefine the field, and not every analysis will uncover groundbreaking insights. Yet, it's precisely this technology that empowers us, granting access to new realms of knowledge and challenging us to stretch our capabilities.

In sum, this project, still in its proof of concept phase, is not just an exploration of Zurich's rental market. It's a testament to the evolving role of data engineers, encouraged by LLMs to venture into new territories. It's an affirmation that while technology like OpenAI can augment our skills and broaden our horizons, it doesn't replace the foundational value of education and the human curiosity that drives us forward.

So, as we delve deeper into this analysis, let's keep our minds open and our spirits willing to embrace the challenges and opportunities that lie ahead. This isn't an advertisement for AI—it's a celebration of the potential within us all to grow, learn, and innovate, with LLMs as our companions on this exciting journey.

Github Page & Wiki

Anthony Calek Anthony Calek

Not the Header: Blending Data Engineering and Dad Duties in Zurich's Rent Analysis

Not the Header: Blending Data Engineering and Dad Duties in Zurich's Rent Analysis

Hello and welcome to "Not the Header," the inaugural foray into our journey dissecting Zurich's 2022 rental market. This venture is where the analytical firepower of Azure's Data Factory, Databricks, and Power BI meets OpenAI's sharp cognitive skills, all while juggling the joys and juggernauts of fatherhood. Why mix data engineering with dad life, you ask? Because in the quest for work-life balance, what's better than having an AI assistant to tackle the complexities of rental prices, allowing for more quality time to be spent—diaper-free—with the family.

Row 17022024

Hello and welcome to "Not the Header," the inaugural foray into our journey dissecting Zurich's 2022 rental market. This venture is where the analytical firepower of Azure's Data Factory, Databricks, and Power BI meets OpenAI's sharp cognitive skills, all while juggling the joys and juggernauts of fatherhood. Why mix data engineering with dad life, you ask? Because in the quest for work-life balance, what's better than having an AI assistant to tackle the complexities of rental prices, allowing for more quality time to be spent—diaper-free—with the family.

Here, we're not just mapping out numbers; we're weaving a narrative around Zurich's living costs, neighborhood by neighborhood, using technology as our lens and family as our compass. This blog promises a deep dive into the data that shapes our understanding of home and hearth in one of the world's most livable cities, with a touch of humor and humanity only a dad could add.

Expect insights into Zurich's rent dynamics, analyses enriched by cutting-edge tech, and reflections on balancing professional ambitions with personal commitments. This is your guide to the intricacies of the rental market, served with a side of dad humor and real-life anecdotes.

So, gear up for a series of posts that promise not just to inform but also entertain, making the vast world of data engineering a little more relatable. Join me on this adventure, where every graph tells a story, and every data point brings us closer to understanding the nuances of life in Zurich. Together, we'll uncover the secrets behind the numbers, all while keeping family time sacred. Welcome to the intersection of data, technology, and dad life.

Github Page & Wiki
