Hunches & Crunches

Posts


fastText: Words, subwords & n-grams

Published at September 1, 2020 ·  10 min read

FastText is an algorithm developed by Facebook that can be used for both word representation and text classification. This post goes through some of the inner workings of the text classifier, in particular how predictions are obtained for a trained model with both n-grams and subword features. The purpose is not to find an optimal model but to bring some clarity to how the algorithm works behind the scenes....

Custom data visualisation with d3.js

Published at October 21, 2019 ·  8 min read

When it comes to creating fast and good looking visualisations there's no better package than ggplot (my personal opinion). Implementing the grammar of graphics it's concise and intuitive allowing you to produce advanced plots in only a few lines of code. This is extremely helpful when performing EDA where I tend to produce a large amount of visualisations in order to familiarise myself with the data. If you want something interactive though, you have to turn elsewhere....

XGBoost: prediction contributions

Published at March 10, 2019 ·  9 min read

In my most recent post I had a look at the XGBoost model object. I went through the calculations behind Quality and Cover with the purpose of gaining a better intuition for how the algorithm works, but also to set the stage for how prediction contributions are calculated. Since November 2018 this is implemented as a feature in the R interface. By setting predcontrib = TRUE the predict function returns a table containing each features contribution to the final prediction....