EASY IN-BROWSER EXPLORATION OF SMALL CSV FILES WITH WEBDATAROCKS

An example of what we’ll be doing in this article I’ve been meaning to share some thoughts on WebDataRocks for a while now, as it’s helped me find a solution to a fairly minor technical challenge I stumbled upon a few months ago. To add in a bit of context, exploring the vast online world from a corporate device can be a bit of a hit-and-miss experience. Take the official Irish Data Portal for instance.

BOOK REVIEW: JAVASCRIPT FOR DATA SCIENCE (2020)

Disclaimer: all screenshots were produced from the digital version of the book, but I do own a physical copy of it. Though most entries on this website usually consist of hands-on guides and random programming tutorials, I will from now on try to share some books or articles that I have read and found interesting. I remember first hearing about JavaScript for Data Science on Twitter, as I was looking for a Pandas-like package for the Node.

DICTIONARY APIS FOR THE ENTHUSIASTIC LINGUIST: AN OVERVIEW

An example of what we’ll be doing in this article For anybody who’s ever worked with textual data, the past 4 or 5 years have been an absolute blast. Since the publication of Attention Is All You Need in 2017, the field of natural language processing has seen new frameworks, libraries, and concepts coming up on a regular basis. Take sentiment analysis for instance. For years, available solutions were limited to rule-based models like VADER or AFINN.

NEW YEAR, ALMOST NEW ME

An example of what we’ll be discussing in this article New year’s resolutions and I haven’t always been the best of friends. For a long time, the concept of committing to doing something for a whole year, while being totally clueless about what I’d be doing even 2 months later felt like peak stupidity. I mean, the whole thing just seemed absurd, as I perfectly knew that the aspirations and centres of interest of my future self would simply no longer match those that I had at the time of making that decision.

ADVENT OF CODE 2023: DAYS 1 AND 2

An example of what we’ll be doing in this article Yes, it’s this time of the year again! While we’re all enjoying a few weeks of festive activities and a bit of well-deserved quality time with our loved ones, some of us are deliberately choosing to spend this time solving some random programming challenges. Now if you ask me, what I really like about Advent of Code, is the creative and fun ways that some programmers approach each new puzzle.

TEXT SUMMARISATION IN TYPESCRIPT WITH TRANSFORMERS.JS

An example of what we’ll be doing in this article If you’re a long-time follower of this website, you probably know by now how much I’ve been advocating for the use of JavaScript (and TypeScript) as a second language for any data practitionner that might want to broaden their horizon and learn some new and useful skills. I was therefore very excited when HugginFace recently announced that they would soon be porting their state-of-the-art transformers libraries to the JavaScript ecosystem.

SIMPLIFY WEBSITE SCRAPING WITH TRAFILATURA

Below is an example of what we’ll be doing in this article: In early 2022, I wrote a very basic Python program to scrape some articles from an Irish website named The Journal. Long story short, all I needed at that time was to capture the content of Covid-related articles as well as their attached user comments, and attempt to train a model on that data. A bit less than six months later, that simple .

STARBOARD.GG, AND OTHER NOTEBOOK ENVIRONMENTS FOR NON-PYTHON DATA SCIENCE: PART I

Have you ever wondered what makes a language be a good fit for a particular space or not? Its design choices, overall syntax, and to a lesser extent speed and performance are arguably some of the first elements that you’ll likely hear when asking this question around. I personally think that tooling and the landscape of existing dependencies also play a huge role in the adoption of a given language by a specific community.

AUTOMATING PIPELINES WITH AIRFLOW'S TASKGROUP

Earlier this year, I got involved in a fun little side project at work. As part of our “Build Great Teams” initiative, I was tasked with providing a simple tech newsletter for my co-workers. We didn’t need anything fancy really: just a bi-weekly curation of articles that I could find about data science, programming, and the tech industry in general. As the project gained in scope, I decided to refactor the couple of Python scripts I had written so that I could get the whole pipeline to run from a Raspberry Pi at home.

THE POLARS DATAFRAME LIBRARY, BUT FOR RUBY

An example of what we’ll be doing in this article I was reading some random conversation threads on HackerNews the other day when I came across an article which announced that Polars had just been ported to the Ruby programming language. Now, unless you have been living under a rock for the past year or so, you probably know that Polars is a data manipulation library that was written entirely in Rust.