OPENAI'S WHISPER IS SO GOOD IT CAN TRANSCRIBE ANY SONG'S LYRICS

Though a quick Google search confirms that the first attempts at having computers identify and extract spoken language date back to the early 1950s, voice recognition technology has only recently been made accessible to the general public. Like most of my friends, I own a small Alexa device at home. And like them, I never use it. I was yet very excited when I discovered about OpenAI’s Whisper on Hacker News a little while ago.

TOPIC MODELLING VISUALISATION WITH ANYCHART.JS

An example of what we’ll be doing in this article Foreword: this post is dedicated to my workmate and friend Martin, who recently showed me some pretty cool stuff he has been doing with sankey charts Back in the early days of the 2020 pandemic, I got a bit bored at home and started thinking about creating a website. I remember that the first idea that I got was that I would write a couple of articles dedicated to topic modelling, and see where that would take me to.

ARQUERO: A GREAT DATAFRAME TOOLKIT FOR JAVASCRIPT

An example of what we’ll be doing in this article: Most open positions for data related jobs on any popular employment website will likely list Python or R as the languages that applicants must be skilled in. But hey, nobody leaves JavaScript in the corner! Data manipulation packages for the Node ecosystem have grown a lot over the past three or four years, to a point where they have become a credible alternative to using more popular Python or R based libraries such as Pandas or Dplyr.

EXPLAINING SENTIMENT SCORES WITH TRANSFORMERS AND SHAP

An example of what we’ll be doing in this article: Wouldn’t sentiment analysis be made easier if we could find a way to show which terms or chunks of terms within a given corpus contribute to the overall sentiment score of the corpus or some of its parts? I recently came across this pretty neat library named SHAP which amongst many other things provides some useful tools for explaining sentiment scores.

EXPLORING POS TAGS CO-OCCURRENCE WITH WINKNLP AND HIGHCHARTS.JS

An example of what we’ll be doing in this article: I’ve been playing around a lot with NeuralCoref lately, a pipeline extension for spaCy developed by Hugging Face. If you’re interested in coreference resolution, this article from Hugging Face’s Thomas Wolfe seems like a great place to start. Are we going to discuss neural coreferencing today? Absolutely not. If you head over to NeuralCoref GitHub page, your eyes will probably immediately feel drawn towards this very fancy visualisation that maps the semantic relationship between each terms within a short sentence:

CREATE A SIMPLE IN-BROWSER SQL PLAYGROUND WITH PYSCRIPT

An example of what we’ll be doing in this article Finding an online SQL playground that’s both free and user-friendly can be a little bit challenging. Most platforms, such as StrataScratch for instance, restrict what free tier users can do, while others hide the querying interface under layers or ads and pop-ups. That being said, it’s still possible to find a couple of high-quality solutions, and I personally really like Coderpad.

GOING BEYOND THE SENTIMENT SCORE, PART 1: SENTIMENT.JS

An example of what we’ll be doing in this article A good few years back, I used to work for a bank where part of my daily job was to monitor and evaluate the “happiness score” of our customers across several social media platforms, using a tool called Brandwatch. Amongst many other things, this platform offered its customers the ability to define a set of rules and add a corresponding sentiment tag to each and every mention of their brand or of any of their competitors.

TIME SERIES FORECASTING WITH META'S PROPHET

An example of what we’ll be doing in this article Please note that though I am currently employed by Meta, this article expresses my own views and wasn’t endorsed by my employer The past few years have seen the rise in popularity of new libraries whose purpose is to focus on ease of use and automation. If like me you have always been fascinated by time series forecasting, you must then be familiar with packages like Dart or PyCaret.

VISUALIZING TEXT DATA HIERARCHY WITH WORD TREES

An example of what we’ll be doing in this article Over the past few weeks, I have been looking for a quick and effective way of representing the structural differences within a set of similar-looking short sentences. To provide a bit of context, as we approached the end of 2022, my workmates and I got heavily involved in a planning phase for the new year to start. More specifically, we were asked to write a set of objectives and key results that would help drive a common strategy across our supported programs and pillars over the months to come.

NETWORK GRAPHS PART I: PYTHON AND JAVASCRIPT

An example of what we’ll be doing in this article A quick note before we start. The purpose of today’s article isn’t to show how network graphs work and discuss their underlying mathematical structure. Instead we’re going to focus on practical applications and easy to reproduce examples, using two of the most popular programming languages of the early 2020s: Python and JavaScript. Typically, a network graph will allow us to visualise the various entities that live within a complex network structure, and see how densily its nodes are connected.