2022 Year in Review
Слава Україні! Support our Ukrainian friends..
It’s been a long year and only now do I realize how much has changed in just 363 days (it’s Dec 29, ok?). Here’s a list of the things I’ve learned (or was reminded of), in no particular order. Some are accompanied with links to the books I’ve read that touch on that topic; I recommend all of them to a varying degree.
People and presentation skills
-
Ask for the things you want. The worst that can happen is you don’t get it. But you don’t know it until you’ve asked.
-
Business wants to talk about money. Saying “between the control variant and Z, the increase in CVR is statistically significant” is ok; “we’re likely to gain $X over Y if we do Z” is better, even if it’s a very ballpark figure. It’s always a fine line between being very accurate and very boring to the higher-ups. If you’re not convincing enough you risk having your hard work overriden by a hippo’s opinion. Stakeholder management is part of the job, for better or worse.
-
Don’t try to point out what questions people have before they ask them. It’s annoying. Not trying to point those out is different from anticipating, btw; anticipation is done silently in the background, it’s the I thought you might ask that vs I think you want to ask about X. It’s also very much against how things are often done in QA, so it takes some effort to unlearn.
-
Don’t use pie charts. My love for pie charts comes (I think) from primary school Maths classes and the time I’ve learned about Pi. But Storytelling with data has indeed hammered home the fact they’re not too readable.
Nails and hammers
-
Start the discussion from the requirements, not the tools; otherwise you end up in a pile of dung. Or, in the wise words of Laura B. Madsen, “Because data work can be tech-heavy, and the “tech” part of the work is easier to tangibly define, we tend to prematurely invest money in software”.
-
Working with Tableau is a nightmare to me; working with Streamlit was n times better. Because I liked Streamlit so much I might prepare some blueprints with proper OOP and tests in my spare time. (When I see scripting with multiple data sources etc I see circular dependencies approaching.)
-
SQL stands for Spaghetti Query Language. Don’t get me wrong, I really like using it, but for anything more complex my brain immediately goes to Pandas or PySpark, and I’ve never even liked Pandas that much. (This book makes it more worth your while, though.)
-
Matillion is as useful as it is hard to navigate. Again, I often find working with GUI tools confusing. And it’s all the small things: the SQL is unreadable there (have a longer query? Better copy it into DataGrip so your eyes don’t bleed!) and the search results are hard to navigate (very little info provided for the space it takes on the screen). As a data-consumer I don’t like dealing with it; I can only imagine for now how frustrating it must be to set up pipelines there. (I hope to be proven wrong.)
My reading list for 2023
Here’s the books waiting on my desk. I would like to get through them all. (These days I take my free time very seriously; so no promises.)
-
Practical Statistics for Data Scientists. Remember my telling I don’t want to be the 4th statistician on the team? I still want to be able to help out with the coding part. I started reading this book online but as always, I prefer paper for this stuff, so it’s waiting for its turn.
-
Fundamentals of Data Engineering. I want to dip my toes a bit more to (1) get those sweet sweet DE skills (2) explore this as a career option.
-
The Data Warehouse Toolkit. THE book for DEs to read and get very familiar with. If no. 2 sits well with me I’ll dive into it.
-
Data Quality Fundamentals. I definitely do not want to be the “QA but for data”; however, data quality is an important topic for DAs, DEs, DSes and everyone else, and I’d like to get a better understanding of how I can help out with it.
Looking forward to all the learning in 2023! Hopefully I find the time to do at least some updates here ;)