Open in app

Sign In

Write

Sign In

Mastodon
Kaya Kupferschmidt
Kaya Kupferschmidt

220 Followers

Home

About

Published in Towards Data Science

·Jan 20, 2021

Rethinking the Roles of Data Scientists, Engineers and Architects

What the wording tells about the roles — and why some companies should rethink their approach and expectations from data projects — The role of a “data scientist” now exists for about 10 years, and soon after it was understood that an additional role of a “data engineer” was needed to support steady progress. And finally “data architects” were required to choreograph the interactions between multiple teams and systems. But what are…

Data Science

18 min read

Rethinking the Roles of  Data Scientists, Engineers and Architects
Rethinking the Roles of  Data Scientists, Engineers and Architects
Data Science

18 min read


Published in Towards Data Science

·Dec 23, 2020

Using Permutation Tests to proof the Climate Change

A simple statistical test shows that average temperatures are very unlikely to increase due to “bad luck”. — Introduction We are currently experiencing an increase of the average temperature from one year to the next. But is this really statistically significant? The average temperature is varying a lot over time, so it might well be the case that the current situation is just “bad luck” and that we will…

Climate Change

9 min read

Using Permutation Tests to proof the  Climate Change
Using Permutation Tests to proof the  Climate Change
Climate Change

9 min read


Published in Towards Data Science

·Dec 17, 2020

Data Engineering at Scale

How to speed up building your Big Data ETL pipelines and getting them into production — We have been hearing the slogan “Data is the new Gold” since a couple of years for now and many companies are heavily investing to follow down this route. Initially most companies believed that is was enough to hire a bunch of expensive data scientists to become a leader in…

Big Data

7 min read

Data Engineering at Scale
Data Engineering at Scale
Big Data

7 min read


Published in Towards Data Science

·Dec 15, 2020

Investigating the Climate Change with Python and Spark, Part 3

Create your own Insights on Global Warming using publicly available Data. — You are now reading the third and last part of my mini series on analyzing publicly available data to research the climate change. Still, the idea is not to become a real expert in meteorology but to apply common sense with appropriate tooling to derive some insights, such that everyone…

Climate Change

12 min read

Investigating the Climate Change with Python and Spark, Part 3
Investigating the Climate Change with Python and Spark, Part 3
Climate Change

12 min read


Published in Towards Data Science

·Dec 11, 2020

Using Python and Spark to research the Climate Change, Part 2

Create your own Insights on Global Warming using publicly available Data. — This is the second part of a small series dedicated to perform some analytics on publicly available weather data in order to find significant indicators for the climate change, specifically for the global warming. I am by no means an expert for meteorology nor for climate models, but I have…

Pyspark

13 min read

Using Python and Spark to research the Climate Change, Part 2
Using Python and Spark to research the Climate Change, Part 2
Pyspark

13 min read


Published in Towards Data Science

·Dec 8, 2020

Using Python and Spark to research the Climate Change, Part 1

Create your own Insights on Global Warming using publicly available Data — The climate change currently is a hot topic, with many experts claiming a significant increase of the average temperature over the whole world. …

Climate Change

11 min read

Using Python and Spark to research the Climate Change, Part 1
Using Python and Spark to research the Climate Change, Part 1
Climate Change

11 min read


Published in Towards Data Science

·Nov 16, 2020

Do I need Big Data? And if so, how much?

Many companies follow the hype of big data without understanding the implications of the technology. — I call myself a “Big Data Expert”. I have tamed many animals in the ever growing Hadoop zoo like HBase, Hive, Oozie, Spark, Kafka, etc… I helped companies to build and structure their Data Lake using appropriate subsets of these technologies. I like to wrangle with data from multiple sources…

Big Data

9 min read

Do I need Big Data? And if so, how much?
Do I need Big Data? And if so, how much?
Big Data

9 min read


Published in Towards Data Science

·Nov 14, 2020

Spark vs Pandas, part 4— Recommendations

Why neither Spark nor Pandas is better than the other. Or: Always chose the right tool for the right job. — Originally I wanted to write a single article for a fair comparison of Pandas and Spark, but it continued to grow until I decided to split this up. This is the second part of the small series. Spark vs Pandas, part 1 — Pandas Spark vs Pandas, part 2 —…

Spark

6 min read

Spark vs Pandas, part 4— Recommendations
Spark vs Pandas, part 4— Recommendations
Spark

6 min read


Published in Towards Data Science

·Oct 26, 2020

Spark vs Pandas, part 3 — Scala vs Python

Why programming languages matter — In this third installment of the series “Pandas vs Spark” we will have a closer look at the programming languages and the implications of choosing one. Originally I wanted to write a single article for a fair comparison of Pandas and Spark, but it continued to grow until I decided…

Big Data Analytics

11 min read

Spark vs Pandas, part 3 — Scala vs Python
Spark vs Pandas, part 3 — Scala vs Python
Big Data Analytics

11 min read


Published in Towards Data Science

·Oct 22, 2020

Spark vs Pandas, part 2 — Spark

Pushing the limits by scaling with Spark — Originally I wanted to write a single article for a fair comparison of Pandas and Spark, but it continued to grow until I decided to split this up. This is the second part of the small series. Spark vs Pandas, part 1 — Pandas Spark vs Pandas, part 2 —…

Big Data

13 min read

Spark vs Pandas, part 2 — Spark
Spark vs Pandas, part 2 — Spark
Big Data

13 min read

Kaya Kupferschmidt

Kaya Kupferschmidt

220 Followers

Freelance Big Data and Machine Learning expert at dimajix.

Following
  • ODSC - Open Data Science

    ODSC - Open Data Science

  • Christianlauer

    Christianlauer

  • Will Lockett

    Will Lockett

  • Lauren Balik

    Lauren Balik

  • zhamak dehghani

    zhamak dehghani

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech