Luke Posey, Product Manager
Below is a compiled list of the best data science tools from every category in 2024.
Think we're missing a tool or category? Feel free to let us know.
Last updated: September 25, 2024
Table of contents:
- Dashboards
- Notebooks
- Spreadsheets
- Databases
- Data warehouses, data lakes, data lakehouses, etc.
- Programming languages
- Workflow orchestration
- Large language models (LLMs)
Dashboards
Dashboarding tools are extremely popular across industry at the moment for how easy it is to create and share interactive visualizations and reports. Dashboards are differentiated on performance, ease of charting, sharing, and integrations. Also included in this list are dashboards built with code rather than just drag and drop.
- Metabase
- Tableau
- Power BI
- Streamlit
- Panel
- Looker
- Sigma
- Plotly Dash
- Superset
- Redash
- Domo
- Kibana
- Mode
- Qlik Sense
- Chartio
- Klipfolio
- Yellowfin
- Zoho Analytics
- Cluvio
- Periscope Data
Notebooks
Notebooks are popular tools for data teams to collaborate on code and run a wide range of experiments and analytics tasks. Notebooks are structured as a series of cells, each of which can contain code, text, and visualizations. Notebooks are differentiated on performance, stability, and features like support for different programming languages, charting, and integrations.
- Jupyter & Jupyter lab
- Colab
- Observable
- Hex
- Deepnote
- Zeppelin
- Polynote
- nteract
- SageMaker Studio Lab
- RStudio
- Wolfram Notebooks
- Papermill
- Voilà
Spreadsheets
Inevitably, things always come back to spreadsheets. And fortunately, modern spreadsheets have come a long way, supporting modern workflows with programming languages, direct connections to data sources, and deeply integrated AI capabilities.
Databases
Databases are the backbone of any data infrastructure. They are responsible for storing, retrieving, and managing data in a structured format. Databases are differentiated on use-cases, spanning everything from simple key-value stores to complex graph databases for real-time analytics, structured data, unstructured data, and everything in-between.
- Postgres
- MySQL
- MongoDB
- Redis
- SQL Server
- SQLite
- DuckDB
- Cassandra
- MariaDB
- Elasticsearch
- Neo4j
- Couchbase
- InfluxDB
- Cockroach DB
- Firebird
- RavenDB
- ArangoDB
- TimescaleDB
- Dgraph
- Fauna
- CrateDB
- TiDB
- Druid
Data warehouses, data lakes, data lakehouses, etc.
This category is rapidly evolving and receiving new entrants all the time. It is worth noting the difference between the 3 categories, as it can get a little blurry. Data warehouses: optimized for structured data, frequently used for reporting and analytics workflows. Data lakes: optimized for unstructured data, frequently used for storing large amounts of data that is not yet structured. Data lakehouses: attempt to combine the best of both worlds, seeking to offer data lake flexibility with warehouse capabilities.
Programming languages
In surveying career listings from the top employers of data roles, SQL and Python remain the most common programming languages. Technologies like Spark and dbt are also pushing opinionated frameworks on top of existing technologies.
Others, more niche
Workflow orchestration
Workflow orchestration tools are used to manage and automate complex workflows and tasks. They manage data pipelines and ensure tasks are executed in the correct order and at the right time across data workflows.
Large language models (LLMs)
LLMs are taking over and it's now not uncommon to see experience with LLMs tacked onto the "nice to haves" in job listings.