A beginner’s guide to deployment with Dash: Deploying a model that classifies paintings from the Metropolitan Museum of Art in New York
In Part 1 of this series, we showed how we collected and analyzed open-source image data from the Metropolitan Museum of Art to build an image classifier to classify paintings by their country of origin. In this post, we go one step further and show how to take this trained model and bring it to life with an interactive front-end dashboard.
This post is intended as a tutorial for Data Scientists interested in deploying their models and in…
Exploring the Metropolitan Museum of Art’s treasure trove of art images, and predicting where a painting come from.
Due to the Covid-19 pandemic, many museums around the world have shut their doors, with the Metropolitan Museum of Art in New York being no exception. “The Met”, as the museum is commonly called, presents an extensive collection spanning 5,000 years of art history across its two locations, at 5th Avenue in the Upper East Side and at the Met Cloisters. …
As the year winds down and a new year starts, you may be setting a New Year’s Resolution to learn a new skill like programming. It’s not a bad idea — programming is vital to our digital economy, with “Software Engineer” holding the number 9 spot in the top 25 jobs from Glassdoor, and many more programming-related jobs like Software Engineering Manager holding other tops spots. As for which language to learn, python tops the list as the most important and fastest growing.
If you haven’t programmed before, however, you may be wondering where to start. Perhaps more importantly than…
Accuracy/ Recall/ Precision/ Confusion Matrix/ ROC Curve/ AUC
In pretty much 50% of all Data Science interviews around the world, the interviewee is asked to build and assess a binary classification model. This means classifying a certain observation to be either positive (normally denoted as the number 1) or negative (denoted as 0), given a bunch of features. A common mistake that interviewees make is to spend too much time building and tuning an overly-sophisticated model and not enough time elaborating on which classification evaluation metric is appropriate for the problem. …
Why geographically correct maps show elections results inaccurately
The presidential election of 2020 was said by many to be the most important election ever held. The voting turnout was higher than ever before, and the arguably most controversial president in history, Donald Trump, lost to his democratic opponent Joe Biden.
The unprecedented amount of mail-in ballots took its toll on the patience of the nation and the world. On election day, the 3rd of November, it was anything but clear who won the election. The polls before election turned out not to be very representative of true sentiments…
A practical introduction to machine learning model deployment in Python
Congratulations, you’ve trained a machine learning model! You’ve worked hard for the past few months exploring the data, completing analysis, creating training features, and finally training models. You used cross validation and optimized hyperparameters, and when you show your boss and other stakeholders from the business, they agree with the your choice of performance metric. You excitedly show them charts and visualizations of the test score of your model, and they agree that your model is making good predictions for the task at hand.
Then they ask how the model…
Visualizing 30 years of economic data between East and West Germany
The year 2020 is certainly a special year for many people. A global pandemic and the election of the next US president are just two out of many examples. For Germany, the year 2020 is historically important for another reason. Thirty years earlier, on October 3rd, 1990, the reunification of Germany took place. With five “new federal states” and the reunited city state of Berlin, the territory of what had been Communist East Germany, the German Democratic Republic became part of the Federal Republic of Germany (West Germany).
This blogpost elaborates on how to implement a reinforcement algorithm, which not only masters the game “Snake”, it even outperforms any human in a game with two players in one playing field.
The structure of the blogpost is as follows: First the theory of a special type of Reinforcement Learning is discussed, namely Q-Tables. This approach is then benchmarked against a pure Deep Learning approach to show the superiority of Reinforcement Learning for these kinds of problems. Secondly, we explore how to implement a second human-controlled snake next to the machine-controlled snake into the same playing field.
Using the STL Forecasting Method with an ARIMA model, which is parameterized through the Box-Jenkins Method.
This post builds on our first blogpost which dealt with the initial data-transformation of the exogenous variables. Now we build a first model using only target variable itself. This is done by using the STL Forecasting method. This allows us to model time series which are affected by seasonal effects, by first removing the specified seasonality through a STL decomposition and then modeling the deseasonalized time series with a model of our choice — which is an Autoregressive Integrated Moving Average Model (ARIMA).
One of the most powerful visualization tools for regional Panel data there is.
Much of the data that we interact with in our daily lives has a geographical component. Google maps records our favorite places, we calculate how many customers frequent each location for a given brand shop, and we compare eachother by regional difference. Next to bar-charts and line-charts, are there better ways that we can visualize geospatial data? A geographical heatmap can be a powerful tool for displaying temporal changes over geographies.