What users are saying about
102 Ratings
159 Ratings
102 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 8.5 out of 101
159 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow'>trScore algorithm: Learn more.</a>
Score 7.8 out of 101

Add comparison

Likelihood to Recommend

Apache Spark

It is suitable for processing large amounts of data, as it is very easy to use and its syntax is simple and understandable. I also find it useful to use in a variety of applications without the need to integrate many other processing technologies, and it is very fast and has many machine learning algorithms that can be used for data problems. I find it less appropriate for data that is not so large, as it uses too many resources.
Carla Borges profile photo

SSIS

SSIS is well suited for any processes that can be automated to move data from a source to a destination. However, I don't think SSIS can work directly with Rest API's during it's processing. If that is required than it would be necessary to build your own custom SSIS component to enable this functionality. Extending SSIS to permit this is possible.
Eddie Brady profile photo

Feature Rating Comparison

Data Source Connection

Apache Spark
SSIS
8.0
Connect to traditional data sources
Apache Spark
SSIS
9.1
Connecto to Big Data and NoSQL
Apache Spark
SSIS
6.8

Data Transformations

Apache Spark
SSIS
8.7
Simple transformations
Apache Spark
SSIS
9.6
Complex transformations
Apache Spark
SSIS
7.9

Data Modeling

Apache Spark
SSIS
7.2
Data model creation
Apache Spark
SSIS
7.5
Metadata management
Apache Spark
SSIS
6.8
Business rules and workflow
Apache Spark
SSIS
8.5
Collaboration
Apache Spark
SSIS
6.0
Testing and debugging
Apache Spark
SSIS
7.3

Data Governance

Apache Spark
SSIS
7.6
Integration with data quality tools
Apache Spark
SSIS
7.7
Integration with MDM tools
Apache Spark
SSIS
7.4

Pros

  • It performs a conventional disk-based process when the data sets are too large to fit into memory, which is very useful because, regardless of the size of the data, it is always possible to store them.
  • It has great speed and ability to join multiple types of databases and run different types of analysis applications. This functionality is super useful as it reduces work times
  • Apache Spark uses the data storage model of Hadoop and can be integrated with other big data frameworks such as HBase, MongoDB, and Cassandra. This is very useful because it is compatible with multiple frameworks that the company has, and thus allows us to unify all the processes.
Carla Borges profile photo
  • SSIS allows you to run many processes in parallel. Thus, you can run multiple data flows simultaneously to increase the throughput of the migration process.
  • SSIS provides many tools for transforming data during the migration process.
Eddie Brady profile photo

Cons

  • Increase the information and trainings that come with the application, especially for debugging since the process is difficult to understand.
  • It should be more attentive to users and make tutorials, to reduce the learning curve.
  • There should be more grouping algorithms.
Carla Borges profile photo
  • The one issue that I have with SSIS is that sometimes the business logic gets baked into the SSIS package. This can make it harder to debug. In some cases this makes sense if the source and destination is not a database. However, when using a database as a source I prefer to manipulate and transform the data via sql and then simply expose the dataset to SSIS after the data has been prepared. I find it easier to write and debug sql directly rather than working in SSIS. However, in cases when a database is not involved then putting the business logic in SSIS makes sense.
Eddie Brady profile photo

Likelihood to Renew

No score
No answers yet
No answers on this topic
SSIS6.0
Based on 2 answers
A bit outdated compared to competitors, esp in the open source community
No photo available

Usability

No score
No answers yet
No answers on this topic
SSIS7.0
Based on 3 answers
Easy to use, however there are functionality limits
No photo available

Performance

No score
No answers yet
No answers on this topic
SSIS6.0
Based on 1 answer
SQL Server Integration Services performance is dependent directly upon the resources provided to the system. In our environment, we allocated 6 nodes of 4 CPUs, 64GB each, running in parallel. Unfortunately, we had to ramp-up to such a robust environment to get the performance to where we needed it. Most of the reports are completed in a reasonable timeframe. However, in the case of slow running reports, it is often difficult if not impossible to cancel the report without killing the report instance or stopping the service.
No photo available

Support

No score
No answers yet
No answers on this topic
SSIS9.0
Based on 3 answers
The support, when necessary, is excellent. But beyond that, it is very rarely necessary because the user community is so large, vibrant and knowledgable, a simple Google query or forum question can answer almost everything you want to know. You can also get prewritten script tasks with a variety of functionality that saves a lot of time.
Chris Morgan profile photo

Implementation

No score
No answers yet
No answers on this topic
SSIS10.0
Based on 1 answer
The implementation may be different in each case, it is important to properly analyze all the existing infrastructure to understand the kind of work needed, the type of software used and the compatibility between these, the features that you want to exploit, to understand what is possible and which ones require integration with third-party tools
Luca Campanelli profile photo

Alternatives Considered

All the above systems work quite well on big data transformations whereas Spark really shines with its bigger API support and its ability to read from and write to multiple data sources. Using Spark one can easily switch between declarative versus imperative versus functional type programming easily based on the situation. Also it doesn't need special data ingestion or indexing pre-processing like Presto. Combining it with Jupyter Notebooks (https://github.com/jupyter-incubator/sparkmagic), one can develop the Spark code in an interactive manner in Scala or Python
Nitin Pasumarthy profile photo
If you are in a SQL Server environment, I really don't know why you wouldn't use SSIS since it is free with SQL Server and I don't know of any standalone tool that can match it. Redgate makes some great addon tools for SQL server that integrate with it to make it more powerful, versatile and easy to use.
Chris Morgan profile photo

Return on Investment

  • By learning Spark, we can become certified and/or provide proper recommendations or implementations on Spark solutions.
  • With a background in Hadoop distributed processes, it has been easy to understand and diagnose how Spark handles the transfer of data within a cluster. Especially when using YARN as the resource manager and HDFS as the data source.
  • Staying up to date with the latest changes to Spark has become a repetitive task. While most Hadoop distributions only support Spark 1.6 at the moment, Spark 2.0 has introduced some useful features, but those require a re-write of existing applications.
Jordan Moore profile photo
  • Various data requests have been fulfilled without much hazard using SSIS (data processing, ETL job, schedules, etc.)
  • Data related standards have been enforced on SSIS development methodology so that data quality has been improved.
  • Advanced features have been implemented using SSIS (e.g. FTP file notification) to automate some of the manual work.
No photo available

Pricing Details

Apache Spark

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details

SSIS

General
Free Trial
Free/Freemium Version
Premium Consulting/Integration Services
Entry-level set up fee?
No
Additional Pricing Details