How Data Scientists Save Time

Rumor has it that some people living among us have all the free time they need; if you’re one of those four lucky individuals, you’re probably meditating blissfully in a misty forest rather than reading these words.

The rest of us have a more complex relationship with time, efficiency, and the tension of balancing competing needs. A few hours can make the difference between shipping a machine learning project on time and missing a deadline. They can determine whether you get to spend an afternoon with friends, bake a cheesecake from scratch, rewatch the Succession finale, or… not do any of those things.

The internet is full of productivity hacks and time-saving tricks; we’re sure you can find them on your own. Instead, what we offer you this week are pragmatic, hands-on approaches for speeding up workflows that data professionals execute every day. From choosing the right tools to streamlining your data-cleaning process, you stand to gain some precious minutes by learning from our authors, so we hope you use them well—or not! It’s your time.

Analyzing geospatial data can be a slow and elaborate process. For his TDS debut, Markus Hubrich highlights R-trees’ power to dramatically improve the speed of spatial searches, and uses the (delightful) example of tree-mapping to illustrate his point.
Many libraries and tools promise to improve Python’s famously sluggish performance. Theophano Mitsa benchmarks a recent arrival—Pandas 2.0—against contenders like Polars and Dask, so you can make the most informed decision when designing your next project.
Still on the topic of speeding up Python-centric workflows, Isaac Harris-Holt’s latest tutorial shows how to leverage the nimbleness of Rust by embedding it into your Python code with tools like PyO3 and maturin. (Staying true to our theme, it’s also a quick and concise post.)

How effective are large language models when it comes to executing complicated, nuanced processes? The verdict might still be out, but one area where they’re already showing promise is text summarization— and Andrea Valenzuela’s recent guide explains how you can use them to generate high-quality summaries quickly and consistently.
For Vicky Yu—and, we suspect, many of you as well—data cleaning can get tedious, fast. To breeze through this stage of your project, Vicky recommends creating user-defined functions (UDFs), which make it possible to simplify your SQL queries and avoid having to code the same logic over multiple columns in a table.
CPUs or GPUs? If you work with massive amounts of data, you likely already know that your choice of hardware setup can have an outsized effect on the time and resources you’ll need. Robert Kwiatkowski’s helpful primer covers several use cases and maps the pros and cons of both processor options.

Share this: