Apache Spark vs. dbt

Apache Spark

Apache Spark

163 Reviews and Ratings

dbt

dbt

62 Reviews and Ratings

Overview
Product	Rating	Most Used By	Product Summary	Starting Price
Apache Spark	Score 9.1 out of 10	N/A	Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.	N/A
dbt	Score 9.0 out of 10	N/A	dbt is an SQL development environment, developed by Fishtown Analytics, now known as dbt Labs. The vendor states that with dbt, analysts take ownership of the entire analytics engineering workflow, from writing data transformation code to deployment and documentation. dbt Core is distributed under the Apache 2.0 license, and paid Teams and Enterprise editions are available.	$0 per month per seat

Pricing

Apache Spark

dbt

Editions & Modules

No answers on this topic

No answers on this topic

Offerings

Pricing Offerings
Apache Spark	dbt
Free Trial
No	Yes
Free/Freemium Version
No	Yes
Premium Consulting/Integration Services
No	Yes

Entry-level Setup Fee

No setup fee

No setup fee

Additional Details

—

—

More Pricing Information

Community Pulse
	Apache Spark	dbt

Features

Apache Spark

dbt

Data Transformations

Comparison of Data Transformations features of Product A and Product B
	Apache Spark - Ratings	dbt 9.7 8 Ratings 18% above category average
Simple transformations	00 Ratings	10.08 Ratings
Complex transformations	00 Ratings	9.48 Ratings

Data Modeling

Comparison of Data Modeling features of Product A and Product B
	Apache Spark - Ratings	dbt 9.1 8 Ratings 15% above category average
Data model creation	00 Ratings	9.78 Ratings
Metadata management	00 Ratings	8.78 Ratings
Business rules and workflow	00 Ratings	9.08 Ratings
Collaboration	00 Ratings	10.06 Ratings
Testing and debugging	00 Ratings	8.08 Ratings

Best Alternatives
	Apache Spark	dbt
Small Businesses	No answers on this topic	Skyvia Score 10.0 out of 10
Medium-sized Companies	Cloudera Manager Score 9.9 out of 10	IBM InfoSphere Information Server Score 8.0 out of 10
Enterprises	IBM Analytics Engine Score 7.3 out of 10	IBM InfoSphere Information Server Score 8.0 out of 10
All Alternatives	View all alternatives	View all alternatives

User Ratings
	Apache Spark	dbt
Likelihood to Recommend	9.0 (24 ratings)	10.0 (10 ratings)
Likelihood to Renew	10.0 (1 ratings)	- (0 ratings)
Usability	8.1 (4 ratings)	9.7 (3 ratings)
Support Rating	8.7 (4 ratings)	- (0 ratings)

User Testimonials
	Apache Spark	dbt
Likelihood to Recommend	Apache Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible. Incentivized Ananth Gouri Assistant Professor Read full review	dbt Labs The prerequisite is that you have a supported database/data warehouse and have already found a way to ingest your raw data. Then dbt is very well suited to manage your transformation logic if the people using it are familiar with SQL. If you want to benefit from bringing engineering practices to data, dbt is a great fit. It can bring CI/CD practices, version control, automated testing, documentation generation, etc. It is not so well suited if the people managing the transformation logic do not like to code (in SQL) but prefer graphical user interfaces. Incentivized Verified User Anonymous Read full review
Pros	Apache Rich APIs for data transformation making for very each to transform and prepare data in a distributed environment without worrying about memory issues Faster in execution times compare to Hadoop and PIG Latin Easy SQL interface to the same data set for people who are comfortable to explore data in a declarative manner Interoperability between SQL and Scala / Python style of munging data Incentivized Nitin Pasumarthy Software Engineer Read full review	dbt Labs dbt supports version control through GIT, this allows teams to collaborate and track the data transformation logic. dbt allows us to build data models which helps to break complex transformation logic into simple and smaller logic. dbt is completely based on SQL which allows data analyst and data engineers to build the transformation logic. dbt can be easily integrated with snowflake. Incentivized Sahil Khan Data Analyst Read full review
Cons	Apache Memory management. Very weak on that. PySpark not as robust as scala with spark. spark master HA is needed. Not as HA as it should be. Locality should not be a necessity, but does help improvement. But would prefer no locality Incentivized Anson Abraham Data Czar Read full review	dbt Labs Field-level lineage (currently at table level) Documentation inheritance - if a field is documented the downstream field of the same name could inherit the doc info Adding python model support (in beta now) Incentivized Judy Campion Data Architect Read full review
Likelihood to Renew	Apache Capacity of computing data in cluster and fast speed. Steven Li Senior Software Developer (Consultant) Read full review	dbt Labs No answers on this topic
Usability	Apache If the team looking to use Apache Spark is not used to debug and tweak settings for jobs to ensure maximum optimizations, it can be frustrating. However, the documentation and the support of the community on the internet can help resolve most issues. Moreover, it is highly configurable and it integrates with different tools (eg: it can be used by dbt core), which increase the scenarios where it can be used Incentivized Verified User Anonymous Read full review	dbt Labs dbt is very easy to use. Basically if you can write SQL, you will be able to use dbt to get what you need done. Of course more advanced users with more technical skills can do more things. Incentivized SI Sid Iyer BizOps Read full review
Support Rating	Apache 1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications. YM Yogesh Mhasde Technical Manager Read full review	dbt Labs No answers on this topic
Alternatives Considered	Apache Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing. Incentivized Verified User Anonymous Read full review	dbt Labs I actually don't know what the alternative to dbt is. I'm sure one must exist other than more 'roll your own' options like Apache Airflow, say, bu tin terms of super easy managed/cloud data transforms, dbt really does seem to be THE tool to use. It's $50/month per dev, BUT there's a FREE version for 1 dev seat with no read-only access for anyone else, so you can always start with that and then buy yourself a seat later. Incentivized Verified User Anonymous Read full review
Return on Investment	Apache Business leaders are able to take data driven decisions Business users are able access to data in near real time now . Before using spark, they had to wait for at least 24 hours for data to be available Business is able come up with new product ideas Incentivized Surendranatha Reddy Chappidi Senior Data Engineer Read full review	dbt Labs Simplified our BI layer for faster load times Increased the quality of data reaching our end users Makes complex transformations manageable Incentivized Verified User Anonymous Read full review
ScreenShots