Julien's data blog

MINIMALIST CSS FRAMEWORKS FOR YOUR DATA SCIENCE PROJECTS

An example of what we’ll be doing in this article: Disclaimer: I am absolutely not a web developer, which is actually why I wanted to write this article! One of the biggest challenges for data practitioners isn’t to explore or process data, but to find effective ways to showcase their work. Of course, we can always build dashboards, share some Jupyter notebooks with our workmates, or paste a few charts and a couple of tables into a Microsoft Word document.

Thu, Nov 17, 2022

BM25: AN ALTERNATIVE TO TF-IDF IN JAVASCRIPT USING WINKNLP

Below is an example of what we’ll be doing in this article: The JavaScript ecosystem frequently receives a fair amount of criticism for being a bit of a mess, with poorly maintained packages and new front-end or back-end frameworks appearing almost every month. However, if you’re a data science practioner, the past few years have seen JavaScript slowly rise as a perfectly viable option for a second programming language to learn.

Tue, Nov 1, 2022

PYSCRIPT, AKA PYTHON IN THE BROWSER

An example of what we’ll be doing in this article This article is going to be slightly shorter than the ones I usually post on my blog. As PyScript is after all still fairly new, I’ll need to get a bit more familiar with what this library has to offer before I can start working on more interesting projects and share them here. Why write an article about PyScript then if I’m not too comfortable with this framework yet, you might wonder?

Sat, Oct 22, 2022

DANFO.JS AND DNOTEBOOK, A PANDAS / JUPYTER COMBO FOR JAVASCRIPT

An example of what we’ll be doing in this article If I had the money to make a very opinionated John Carmack-like bet, I would probably wager that the future language of choice for data science is going to be none other than JavaScript. Jeff Atwood’s Law If you’re a frequent reader of this blog, you are probably pretty familiar with either Python, R, Matlab, or Julia. And yet as mentioned earlier, I think that the programming language you should really start investing some time into is JavaScript.

Fri, Sep 30, 2022

BERTOPIC, OR HOW TO COMBINE TRANSFORMERS AND TF-IDF FOR TOPIC MODELLING

If you follow this blog, you are probably aware of my interest for natural language processing, and more specifically for topic modelling. As a matter of fact, some of the first articles that I wrote back in 2020 were themed around discussing things like TF-IDF and popular text clustering models. Anyway, if you also happen to share my passion for this niche field, it is quite likely that you have already worked with some or all of the following models:

Sun, Sep 4, 2022

PANDAS-BOKEH: THE SIMPLICITY OF PANDAS PLOTS, THE INTERACTIVITY OF BOKEH

An example of what we’ll be doing in this article Sometimes, all we want is to be able to use a framework or a library that we’re not too familiar with, without necessarily spending too much time learning its syntax in depth. Personally, and though I have extensively used some visualisation packages such as Matplotlib, Seaborn, Plotly, or Altair, I must confess that Bokeh is one of these tools that I have never given much attention to.

Mon, Aug 29, 2022

CHOOSING A MODEL FOR TIME SERIES REGRESSION WITH PYCARET

I have always found working with times series extremely interesting. I especially like the fact that it often starts with a single succession of data points, aka a “line”, and that understanding the dynamics behind this “line” usually involves creating adaptations of that initial set of values, like calculating a moving average or looking for indicators of seasonality for instance. PyCaret is a low code machine learning framework for the Python programming language.

Mon, Jul 11, 2022

USING THE COMMAND LINE TO EXTRACT CONTEXTUAL WORDS FROM TEXTUAL DATA

September 2008 was my first encounter with computational linguistics, as it used to be called back then. I was starting my second (and last year) of MA at the University of Bordeaux and as I began doing some research for my disseration thesis, I encountered a problem that I thought a computer would be able to help with. What I was trying to do was to obtain, within a single text corpus, the n preceding and following words for any given word.

Thu, Jun 2, 2022

BEAUTIFUL CHARTS WITH VEGALITE.JS

I’m always a bit surprised when I read negative comments about JavaScript. My take on it might not be a very popular one, but I quite like the language it has evolved to be, and its ecosystem isn’t as bad as some people like to picture it. Though there have been some very interesting initiatives over the past few years, such as DataForge or more recently Arquero, JavaScript is still lacking a strong library for querying and manipulating data tables.

Tue, Apr 12, 2022

BASIC IN-BROWSER TEXT PROCESSING USING COMPROMISE.JS

Though JavaScript might not be as obvious a choice as Python when it comes to Natural Language Processing libraries, its ecosystem actually features some highly performing text processing packages. And this actually makes perfect sense, as such dependencies are very much needed to build mobile or web based applications such as chatbots for instance. Finding the right tool for the job Over the past few months, I have experimented a bit with the following Node packages:

Mon, Mar 7, 2022