Image for post
Image for post

Accuracy/ Recall/ Precision/ Confusion Matrix/ ROC Curve/ AUC

Github Repository

In pretty much 50% of all Data Science interviews around the world, the interviewee is asked to build and assess a binary classification model. This means classifying a certain observation to be either positive (normally denoted as the number 1) or negative (denoted as 0), given a bunch of features. A common mistake that interviewees make is to spend too much time building and tuning an overly-sophisticated model and snot enough time elaborating on which classification evaluation metric is appropriate for the problem. …


Image for post
Image for post

Why geographically correct maps show elections results inaccurately

Github Repository

The presidential election of 2020 was said by many to be the most important election ever held. The voting turnout was higher than ever before, and the arguably most controversial president in history, Donald Trump, lost to his democratic opponent Joe Biden.

The unprecedented amount of mail-in ballots took its toll on the patience of the nation and the world. On election day, the 3rd of November, it was anything but clear who won the election. The polls before election turned out not to be very representative of true sentiments, a similar flaw as was seen in 2016. …


Image for post
Image for post

A practical introduction to machine learning model deployment in Python

What is deployment, and why do I need to do it?

Congratulations, you’ve trained a machine learning model! You’ve worked hard for the past few months exploring the data, completing analysis, creating training features, and finally training models. You used cross validation and optimized hyperparameters, and when you show your boss and other stakeholders from the business, they agree with the your choice of performance metric. You excitedly show them charts and visualizations of the test score of your model, and they agree that your model is making good predictions for the task at hand.

Then they ask how the model will perform on real data, in real time. How will it contribute to business value? How will the predictions be delivered? At the moment, your newly-trained model lives on your local machine inside the Jupyter Notebook used to train it. As a Data Scientist, your focus has been on tweaking your model and improving its performance. You haven’t thought about how your model will move from the development stage to the operational stage. …


Image for post
Image for post

Visualizing 30 years of economic data between East and West Germany

The year 2020 is certainly a special year for many people. A global pandemic and the election of the next US president are just two out of many examples. For Germany, the year 2020 is historically important for another reason. Thirty years earlier, on October 3rd, 1990, the reunification of Germany took place. With five “new federal states” and the reunited city state of Berlin, the territory of what had been Communist East Germany, the German Democratic Republic became part of the Federal Republic of Germany (West Germany).

It is important to stress that the historical context of the split and reunification of Germany fills several textbooks. This blogpost is trying to provide the most remarkable historical pieces to give an overall understanding of the situation to make the reader be able to put the number into better context. …


Image for post
Image for post

This blogpost elaborates on how to implement a reinforcement algorithm, which not only masters the game “Snake”, it even outperforms any human in a game with two players in one playing field.

The structure of the blogpost is as follows: First the theory of a special type of Reinforcement Learning is discussed, namely Q-Tables. This approach is then benchmarked against a pure Deep Learning approach to show the superiority of Reinforcement Learning for these kinds of problems. Secondly, we explore how to implement a second human-controlled snake next to the machine-controlled snake into the same playing field.

Python Code →…


Image for post
Image for post

Using the STL Forecasting Method with an ARIMA model, which is parameterized through the Box-Jenkins Method.

This post builds on our first blogpost which dealt with the initial data-transformation of the exogenous variables. Now we build a first model using only target variable itself. This is done by using the STL Forecasting method. This allows us to model time series which are affected by seasonal effects, by first removing the specified seasonality through a STL decomposition and then modeling the deseasonalized time series with a model of our choice — which is an Autoregressive Integrated Moving Average Model (ARIMA).

For those not familiar with the forecasting challenge, this competition deals with the prediction of dengue fever in two cities, or in the words from DrivenData…


Image for post
Image for post

One of the most powerful visualization tools for regional Panel data there is.

Much of the data that we interact with in our daily lives has a geographical component. Google maps records our favorite places, we calculate how many customers frequent each location for a given brand shop, and we compare eachother by regional difference. Next to bar-charts and line-charts, are there better ways that we can visualize geospatial data? A geographical heatmap can be a powerful tool for displaying temporal changes over geographies.

The question is how to build such a visualization. This tutorial explains how to build a usable GIF from panel data- time series for multiple entities, in this case German states. The visualization problem to which we apply this heatmap GIF is displaying data about the economic development (change in GDP per Capita) of Germany between 1991 and 2019.


Image for post
Image for post

Exploring bias in AI systems, and what we can do to prevent it.

About the AI Clarified series

For business and non-profit leaders trying to understand AI, it can be surprisingly difficult to find good information in the sweet spot between high-level overview and technical jargon. The AI Clarified series attempts to fill this void and answer some of the most commonly asked AI questions with practical, easy-to-follow explanations.

To Clarify: The question of bias in AI

Question: Is AI more biased than humans, or less? I’ve heard both and am not sure which side to believe.

Indeed it’s hard to know what to believe about bias in Artificial Intelligence (AI) systems when just reading articles online — there is plenty of support in both directions. With the growth of AI and the widespread adaption of AI models, there is a lot of noise on both sides, especially for high-stakes use cases like those affecting humans. …


Image for post
Image for post

Github Repository

One of the biggest data challenge on DrivenData, with more than 9000 participants is the DengAI challenge. The objective of this challenge is predict the number of dengue fever cases in two different cities.

This blogpost series covers our journey of tackling this problem, starting from initial data analysis, imputation and stationarity problems up un to the different forecasting attempts. This first post covers the imputation and stationarity checks for both cities in the challenge, before moving on to trying different forecasting methdologies.

Throughout this post, code-snippets are shown in order to give an understanding of how the concepts discussed are implemented into code. The entire Github repository for the imputation and stationary adjustment can be found here. …


Image for post
Image for post

Polarization is increasing. How does this play out in the Senate?

Code for this project can be found on Github

In our last article, we showed empirically that polarization in the American Congress in increasing. In this article, we dive deeper and explore these trends in the Senate specifically. We specify another metric, party loyalty, as a proxy value to polarization. This metric helps us further confirm our empirical findings on the trend of polarization and to answer the question: are individual Senators becoming more loyal to their party? …

About

data4help

Helping non-profits and NGOs harness the power of their data. data4help.org

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store