What the wording tells about the roles — and why some companies should rethink their approach and expectations from data projects

Photo by ThisisEngineering RAEng on Unsplash

The role of a “data scientist” now exists for about 10 years, and soon after it was understood that an additional role of a “data engineer” was needed to support steady progress. And finally “data architects” were required to choreograph the interactions between multiple teams and systems. But what are…

A simple statistical test shows that average temperatures are very unlikely to increase due to “bad luck”.

Photo by Lacie Slezak on Unsplash


We are currently experiencing an increase of the average temperature from one year to the next. But is this really statistically significant? The average temperature is varying a lot over time, so it might well be the case that the current situation is just “bad luck” and that we will…

Create your own Insights on Global Warming using publicly available Data.

Photo by Karsten Würth on Unsplash

You are now reading the third and last part of my mini series on analyzing publicly available data to research the climate change. Still, the idea is not to become a real expert in meteorology but to apply common sense with appropriate tooling to derive some insights, such that everyone…

Photo by NOAA on Unsplash

This is the second part of a small series dedicated to perform some analytics on publicly available weather data in order to find significant indicators for the climate change, specifically for the global warming.

I am by no means an expert for meteorology nor for climate models, but I have…

Many companies follow the hype of big data without understanding the implications of the technology.

Photo by Jan Antonin Kolar on Unsplash

I call myself a “Big Data Expert”. I have tamed many animals in the ever growing Hadoop zoo like HBase, Hive, Oozie, Spark, Kafka, etc… I helped companies to build and structure their Data Lake using appropriate subsets of these technologies. I like to wrangle with data from multiple sources…

Why neither Spark nor Pandas is better than the other. Or: Always chose the right tool for the right job.

Photo by Cesar Carlevarino Aragon on Unsplash

Originally I wanted to write a single article for a fair comparison of Pandas and Spark, but it continued to grow until I decided to split this up. This is the second part of the small series.

Kaya Kupferschmidt

Freelance Big Data and Machine Learning expert at dimajix.

