Image for post
Image for post

Understanding programming and where to start

Why learn Python?

As the year winds down and a new year starts, you may be setting a New Year’s Resolution to learn a new skill like programming. It’s not a bad idea — programming is vital to our digital economy, with “Software Engineer” holding the number 9 spot in the top 25 jobs from Glassdoor, and many more programming-related jobs like Software Engineering Manager holding other tops spots. As for which language to learn, python tops the list as the most important and fastest growing.

If you haven’t programmed before, however, you may be wondering where to start. Perhaps more importantly than…


Image for post
Image for post

Accuracy/ Recall/ Precision/ Confusion Matrix/ ROC Curve/ AUC

Github Repository

In pretty much 50% of all Data Science interviews around the world, the interviewee is asked to build and assess a binary classification model. This means classifying a certain observation to be either positive (normally denoted as the number 1) or negative (denoted as 0), given a bunch of features. A common mistake that interviewees make is to spend too much time building and tuning an overly-sophisticated model and not enough time elaborating on which classification evaluation metric is appropriate for the problem. …


Image for post
Image for post

Why geographically correct maps show elections results inaccurately

Github Repository

The presidential election of 2020 was said by many to be the most important election ever held. The voting turnout was higher than ever before, and the arguably most controversial president in history, Donald Trump, lost to his democratic opponent Joe Biden.

The unprecedented amount of mail-in ballots took its toll on the patience of the nation and the world. On election day, the 3rd of November, it was anything but clear who won the election. The polls before election turned out not to be very representative of true sentiments…


Image for post
Image for post

A practical introduction to machine learning model deployment in Python

What is deployment, and why do I need to do it?

Congratulations, you’ve trained a machine learning model! You’ve worked hard for the past few months exploring the data, completing analysis, creating training features, and finally training models. You used cross validation and optimized hyperparameters, and when you show your boss and other stakeholders from the business, they agree with the your choice of performance metric. You excitedly show them charts and visualizations of the test score of your model, and they agree that your model is making good predictions for the task at hand.

Then they ask how the model…


Image for post
Image for post

Visualizing 30 years of economic data between East and West Germany

The year 2020 is certainly a special year for many people. A global pandemic and the election of the next US president are just two out of many examples. For Germany, the year 2020 is historically important for another reason. Thirty years earlier, on October 3rd, 1990, the reunification of Germany took place. With five “new federal states” and the reunited city state of Berlin, the territory of what had been Communist East Germany, the German Democratic Republic became part of the Federal Republic of Germany (West Germany).

It…


Image for post
Image for post

This blogpost elaborates on how to implement a reinforcement algorithm, which not only masters the game “Snake”, it even outperforms any human in a game with two players in one playing field.

The structure of the blogpost is as follows: First the theory of a special type of Reinforcement Learning is discussed, namely Q-Tables. This approach is then benchmarked against a pure Deep Learning approach to show the superiority of Reinforcement Learning for these kinds of problems. Secondly, we explore how to implement a second human-controlled snake next to the machine-controlled snake into the same playing field.

Python Code →…


Image for post
Image for post

Using the STL Forecasting Method with an ARIMA model, which is parameterized through the Box-Jenkins Method.

This post builds on our first blogpost which dealt with the initial data-transformation of the exogenous variables. Now we build a first model using only target variable itself. This is done by using the STL Forecasting method. This allows us to model time series which are affected by seasonal effects, by first removing the specified seasonality through a STL decomposition and then modeling the deseasonalized time series with a model of our choice — which is an Autoregressive Integrated Moving Average Model (ARIMA).

For…


Image for post
Image for post

One of the most powerful visualization tools for regional Panel data there is.

Much of the data that we interact with in our daily lives has a geographical component. Google maps records our favorite places, we calculate how many customers frequent each location for a given brand shop, and we compare eachother by regional difference. Next to bar-charts and line-charts, are there better ways that we can visualize geospatial data? A geographical heatmap can be a powerful tool for displaying temporal changes over geographies.

The question is how to build such a visualization. This tutorial explains how to build a…


Image for post
Image for post

Exploring bias in AI systems, and what we can do to prevent it.

About the AI Clarified series

For business and non-profit leaders trying to understand AI, it can be surprisingly difficult to find good information in the sweet spot between high-level overview and technical jargon. The AI Clarified series attempts to fill this void and answer some of the most commonly asked AI questions with practical, easy-to-follow explanations.

To Clarify: The question of bias in AI

Question: Is AI more biased than humans, or less? I’ve heard both and am not sure which side to believe.

Indeed it’s hard to know what to believe about bias in Artificial Intelligence (AI) systems when…


Image for post
Image for post

Github Repository

One of the biggest data challenge on DrivenData, with more than 9000 participants is the DengAI challenge. The objective of this challenge is predict the number of dengue fever cases in two different cities.

This blogpost series covers our journey of tackling this problem, starting from initial data analysis, imputation and stationarity problems up un to the different forecasting attempts. This first post covers the imputation and stationarity checks for both cities in the challenge, before moving on to trying different forecasting methdologies.

Throughout this post, code-snippets are shown in order to give an understanding of how the…

data4help

Helping non-profits and NGOs harness the power of their data. data4help.org

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store