What users are saying about
109 Ratings
183 Ratings
109 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.4 out of 101
183 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8 out of 101

Likelihood to Recommend

Apache Spark

Apache Spark has rich APIs for regular data transformations or for ML workloads or for graph workloads, whereas other systems may not such a wide range of support. Choose it when you need to perform data transformations for big data as offline jobs, whereas use MongoDB-like distributed database systems for more realtime queries.
Nitin Pasumarthy profile photo

SSIS

SQL Server Integration Services is extremely well built for creating packages to run ETL operations in environments where the structure of the source and/or destination data never or rarely changes, however, it tends to be difficult to maintain packages in production environments where the structure of the data changes frequently.
No photo available

Feature Rating Comparison

Data Source Connection

Apache Spark
SSIS
8.3
Connect to traditional data sources
Apache Spark
SSIS
9.3
Connecto to Big Data and NoSQL
Apache Spark
SSIS
7.2

Data Transformations

Apache Spark
SSIS
8.8
Simple transformations
Apache Spark
SSIS
9.8
Complex transformations
Apache Spark
SSIS
7.9

Data Modeling

Apache Spark
SSIS
7.2
Data model creation
Apache Spark
SSIS
8.1
Metadata management
Apache Spark
SSIS
6.8
Business rules and workflow
Apache Spark
SSIS
8.1
Collaboration
Apache Spark
SSIS
5.7
Testing and debugging
Apache Spark
SSIS
7.0

Data Governance

Apache Spark
SSIS
7.8
Integration with data quality tools
Apache Spark
SSIS
8.1
Integration with MDM tools
Apache Spark
SSIS
7.5

Pros

Apache Spark

  • Rich APIs for data transformation making for very each to transform and prepare data in a distributed environment without worrying about memory issues
  • Faster in execution times compare to Hadoop and PIG Latin
  • Easy SQL interface to the same data set for people who are comfortable to explore data in a declarative manner
  • Interoperability between SQL and Scala / Python style of munging data
Nitin Pasumarthy profile photo

SSIS

  • It handles SQL Server databases flawlessly
  • It provides a robust developer interface
  • It allows a developer to encapsulate complex scripts directly within an SSIS project or reuse scripts across projects
  • It interfaces quite well with a large number of available libraries
No photo available

Cons

Apache Spark

  • Resource heavy, jobs, in general, can be very memory intensive and you will want the nodes in your cluster to reflect that.
  • Debugging, it has gotten better with every release but sometimes it can be difficult to debug an error due to ambiguous or misleading exceptions and stack traces.
No photo available

SSIS

  • Integration with Access/Excel should be more seamless and less problematic
  • CASS certified address standardization
  • Higher performing Slowly Changing Dimension functionality
  • SFTP
  • Incremental loading (deletion, upsert, etc.)
  • PowerBI integration. I really really really want to be able to refresh reports via IS packages
  • More Azure administration tasks
  • Office365 and Sharepoint integration
David Milillo profile photo

Likelihood to Renew

Apache Spark

No score
No answers yet
No answers on this topic

SSIS

SSIS 6.0
Based on 2 answers
A bit outdated compared to competitors, esp in the open source community
No photo available

Usability

Apache Spark

No score
No answers yet
No answers on this topic

SSIS

SSIS 8.2
Based on 6 answers
SQL Server Integration Services is a relatively nice tool but is simply not the ETL for a global, large-scale organization. With developing requirements such as NoSQL data, cloud-based tools, and extraordinarily large databases, SSIS is no longer our tool of choice.
No photo available

Performance

Apache Spark

No score
No answers yet
No answers on this topic

SSIS

SSIS 8.3
Based on 4 answers
SQL Server Integration Services performance is dependent directly upon the resources provided to the system. In our environment, we allocated 6 nodes of 4 CPUs, 64GB each, running in parallel. Unfortunately, we had to ramp-up to such a robust environment to get the performance to where we needed it. Most of the reports are completed in a reasonable timeframe. However, in the case of slow running reports, it is often difficult if not impossible to cancel the report without killing the report instance or stopping the service.
No photo available

Support

Apache Spark

No score
No answers yet
No answers on this topic

SSIS

SSIS 9.0
Based on 3 answers
The support, when necessary, is excellent. But beyond that, it is very rarely necessary because the user community is so large, vibrant and knowledgable, a simple Google query or forum question can answer almost everything you want to know. You can also get prewritten script tasks with a variety of functionality that saves a lot of time.
Chris Morgan profile photo

Implementation

Apache Spark

No score
No answers yet
No answers on this topic

SSIS

SSIS 10.0
Based on 1 answer
The implementation may be different in each case, it is important to properly analyze all the existing infrastructure to understand the kind of work needed, the type of software used and the compatibility between these, the features that you want to exploit, to understand what is possible and which ones require integration with third-party tools
Luca Campanelli profile photo

Alternatives Considered

Apache Spark

Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing.
No photo available

SSIS

I've used several solutions from just simple ETL processes of stored procedures run manually to using Informatica. SSIS is a good in between solution. It has sufficient maturity to be preferable to scheduled procs in SS agent jobs. But does not have the sophistication and full feature set of Informatica. But significantly cheaper than Informatica. Plus since it's bundled with SS all SS shops already have it.
David Milillo profile photo

Return on Investment

Apache Spark

  • The ability to program and run Spark programs makes consulting companies more attractive to clients. Clients like hearing new technology being leveraged and fancy terms.
  • Projects can be completed faster because the programs run faster.
No photo available

SSIS

  • Faster implementations
  • Integrated reporting environments
  • Cost savings via automation
Samir Patel, PMP profile photo

Pricing Details

Apache Spark

General

Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

SSIS

General

Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

Add comparison