Another Year in France (Consulting Again)
It’s that time of year - time for a recap of what I’ve been up to!
NLP and Toxic Speech
I spent a year after I left my teaching gig doing remote consulting for a London-based startup. I was lead data scientist doing NLP, primarily working on toxic speech detection in game chat. We used a mix of keyword-based approaches, SpaCy models, and neural nets (pytorch and later tensorflow, for speed). I wrote a lot of Spark code. In the course of this work, I labeled a lot of chat data myself and became convinced this is an almost unsolvable problem that will always require human-in-the-loop moderation.
Talks I Gave / Personal Projects
Every time someone invites me to speak, I use it as an opportunity to finish a personal project and talk about it. Sometimes it's a learning project (like "learn about the state of the art for summarization") and sometimes it's an artistic or data vis project. So, invite me at your own risk :)
- Euro Python 2019 Invited Keynote and PyData London 2019 Keynote: I gave the same talk because they were less than a week apart. I did a data vis personal project, and showed some text vis/poetry generation apps. Lots of people said they enjoyed them tons. Slides here.
- PyData Warsaw invited keynote - I talked about summarization. Slides here.
- EMAEE 19: Invited panelist on data vis, I spoke about big data and EDA (exploratory data analysis). Slides here.
- Micro Macro Mesa Conf in Lyon (invited): I spoke about visualizing and generating poetry with VAE's (variational autoencoders), based on a project by Allison Parrish. My slides (which need to be written up) are here.
![]() |
Example generation of poem lines (red) from a VAE using a TSNE layout of training lines as guide. |
Reboot of the TinyLetter "Things I Think Are Awesome"
I didn't feel very awesome during a lot of the toxic speech consulting, but I revived the newsletter this fall! I added a poem, recipes, and tv shows to the latest edition. It's all about recommendations. My goal is to keep it positive, short, and tech-arty. Join here.
Current Consulting
I took a month off between gigs and primarily worked on text generation with VAEs. I've started work again, splitting my time among 3 clients: Google Arts and Culture in Paris (a possibly short-term contract on data analysis, NLP, and vis of museum assets), writing Python charting tutorials for Flowingdata, and generating poetry for the UK Dubai Pavilion 2020 (working with Kyle McDonald).
![]() |
Design by Es Devlin (source link) |
Next year, I will be a judge and speaker at the data visualization conference Malofiej 28 in March. Come see me in Pamplona, Spain?
Happy holidays, and a great 2020 full of inspiring creative tech and datasets to all!