Apache Spark vs. dbt

Overview
ProductRatingMost Used ByProduct SummaryStarting Price
Apache Spark
Score 8.9 out of 10
N/A
N/AN/A
dbt
Score 9.0 out of 10
N/A
dbt is an SQL development environment, developed by Fishtown Analytics, now known as dbt Labs. The vendor states that with dbt, analysts take ownership of the entire analytics engineering workflow, from writing data transformation code to deployment and documentation. dbt Core is distributed under the Apache 2.0 license, and paid Teams and Enterprise editions are available.
$0
per month per seat
Pricing
Apache Sparkdbt
Editions & Modules
No answers on this topic
No answers on this topic
Offerings
Pricing Offerings
Apache Sparkdbt
Free Trial
NoYes
Free/Freemium Version
NoYes
Premium Consulting/Integration Services
NoYes
Entry-level Setup FeeNo setup feeNo setup fee
Additional Details——
More Pricing Information
Features
Apache Sparkdbt
Data Transformations
Comparison of Data Transformations features of Product A and Product B
Apache Spark
-
Ratings
dbt
9.5
7 Ratings
16% above category average
Simple transformations00 Ratings10.07 Ratings
Complex transformations00 Ratings9.17 Ratings
Data Modeling
Comparison of Data Modeling features of Product A and Product B
Apache Spark
-
Ratings
dbt
9.0
7 Ratings
12% above category average
Data model creation00 Ratings9.57 Ratings
Metadata management00 Ratings8.57 Ratings
Business rules and workflow00 Ratings8.97 Ratings
Collaboration00 Ratings10.05 Ratings
Testing and debugging00 Ratings8.17 Ratings
Best Alternatives
Apache Sparkdbt
Small Businesses

No answers on this topic

Skyvia
Skyvia
Score 9.8 out of 10
Medium-sized Companies
Cloudera Manager
Cloudera Manager
Score 9.9 out of 10
IBM InfoSphere Information Server
IBM InfoSphere Information Server
Score 8.0 out of 10
Enterprises
IBM Analytics Engine
IBM Analytics Engine
Score 7.8 out of 10
IBM InfoSphere Information Server
IBM InfoSphere Information Server
Score 8.0 out of 10
All AlternativesView all alternativesView all alternatives
User Ratings
Apache Sparkdbt
Likelihood to Recommend
9.4
(24 ratings)
10.0
(9 ratings)
Likelihood to Renew
10.0
(1 ratings)
-
(0 ratings)
Usability
8.7
(4 ratings)
9.5
(2 ratings)
Support Rating
8.7
(4 ratings)
-
(0 ratings)
User Testimonials
Apache Sparkdbt
Likelihood to Recommend
Apache
Well suited: To most of the local run of datasets and non-prod systems - scalability is not a problem at all. Including data from multiple types of data sources is an added advantage. MLlib is a decently nice built-in library that can be used for most of the ML tasks. Less appropriate: We had to work on a RecSys where the music dataset that we used was around 300+Gb in size. We faced memory-based issues. Few times we also got memory errors. Also the MLlib library does not have support for advanced analytics and deep-learning frameworks support. Understanding the internals of the working of Apache Spark for beginners is highly not possible.
Read full review
dbt Labs
The prerequisite is that you have a supported database/data warehouse and have already found a way to ingest your raw data. Then dbt is very well suited to manage your transformation logic if the people using it are familiar with SQL. If you want to benefit from bringing engineering practices to data, dbt is a great fit. It can bring CI/CD practices, version control, automated testing, documentation generation, etc. It is not so well suited if the people managing the transformation logic do not like to code (in SQL) but prefer graphical user interfaces.
Read full review
Pros
Apache
  • Rich APIs for data transformation making for very each to transform and prepare data in a distributed environment without worrying about memory issues
  • Faster in execution times compare to Hadoop and PIG Latin
  • Easy SQL interface to the same data set for people who are comfortable to explore data in a declarative manner
  • Interoperability between SQL and Scala / Python style of munging data
Read full review
dbt Labs
  • dbt supports version control through GIT, this allows teams to collaborate and track the data transformation logic.
  • dbt allows us to build data models which helps to break complex transformation logic into simple and smaller logic.
  • dbt is completely based on SQL which allows data analyst and data engineers to build the transformation logic.
  • dbt can be easily integrated with snowflake.
Read full review
Cons
Apache
  • Memory management. Very weak on that.
  • PySpark not as robust as scala with spark.
  • spark master HA is needed. Not as HA as it should be.
  • Locality should not be a necessity, but does help improvement. But would prefer no locality
Read full review
dbt Labs
  • Field-level lineage (currently at table level)
  • Documentation inheritance - if a field is documented the downstream field of the same name could inherit the doc info
  • Adding python model support (in beta now)
Read full review
Likelihood to Renew
Apache
Capacity of computing data in cluster and fast speed.
Read full review
dbt Labs
No answers on this topic
Usability
Apache
If the team looking to use Apache Spark is not used to debug and tweak settings for jobs to ensure maximum optimizations, it can be frustrating. However, the documentation and the support of the community on the internet can help resolve most issues. Moreover, it is highly configurable and it integrates with different tools (eg: it can be used by dbt core), which increase the scenarios where it can be used
Read full review
dbt Labs
It requires proficiency with SQL coding and with git practices, but with these prerequisites, it is easy to use. Especially with the dbt cloud, you get a nice interface that makes all the administrative tasks like scheduling jobs quite easy. I also like the built-in SQL editor with syntax highlighting and auto-completion.
Read full review
Support Rating
Apache
1. It integrates very well with scala or python. 2. It's very easy to understand SQL interoperability. 3. Apache is way faster than the other competitive technologies. 4. The support from the Apache community is very huge for Spark. 5. Execution times are faster as compared to others. 6. There are a large number of forums available for Apache Spark. 7. The code availability for Apache Spark is simpler and easy to gain access to. 8. Many organizations use Apache Spark, so many solutions are available for existing applications.
Read full review
dbt Labs
No answers on this topic
Alternatives Considered
Apache
Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing.
Read full review
dbt Labs
I actually don't know what the alternative to dbt is. I'm sure one must exist other than more 'roll your own' options like Apache Airflow, say, bu tin terms of super easy managed/cloud data transforms, dbt really does seem to be THE tool to use. It's $50/month per dev, BUT there's a FREE version for 1 dev seat with no read-only access for anyone else, so you can always start with that and then buy yourself a seat later.
Read full review
Return on Investment
Apache
  • Business leaders are able to take data driven decisions
  • Business users are able access to data in near real time now . Before using spark, they had to wait for at least 24 hours for data to be available
  • Business is able come up with new product ideas
Read full review
dbt Labs
  • Simplified our BI layer for faster load times
  • Increased the quality of data reaching our end users
  • Makes complex transformations manageable
Read full review
ScreenShots