What users are saying about
147 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow noopener'>trScore algorithm: Learn more.</a>
Score 8.7 out of 100
23 Ratings
<a href='https://www.trustradius.com/static/about-trustradius-scoring' target='_blank' rel='nofollow noopener'>trScore algorithm: Learn more.</a>
Score 8 out of 100

Feature Set Ratings

    Data Source Connection

    Apache Spark

    Feature Set Not Supported
    N/A
    8.7

    IBM InfoSphere DataStage

    87%
    IBM InfoSphere DataStage ranks higher in 2/2 features

    Connect to traditional data sources

    N/A
    0 Ratings
    9.4
    94%
    8 Ratings

    Connecto to Big Data and NoSQL

    N/A
    0 Ratings
    7.9
    79%
    7 Ratings

    Data Transformations

    Apache Spark

    Feature Set Not Supported
    N/A
    9.3

    IBM InfoSphere DataStage

    93%
    IBM InfoSphere DataStage ranks higher in 2/2 features

    Simple transformations

    N/A
    0 Ratings
    9.7
    97%
    8 Ratings

    Complex transformations

    N/A
    0 Ratings
    9.0
    90%
    8 Ratings

    Data Modeling

    Apache Spark

    Feature Set Not Supported
    N/A
    8.3

    IBM InfoSphere DataStage

    83%
    IBM InfoSphere DataStage ranks higher in 6/6 features

    Data model creation

    N/A
    0 Ratings
    8.4
    84%
    5 Ratings

    Metadata management

    N/A
    0 Ratings
    7.7
    77%
    7 Ratings

    Business rules and workflow

    N/A
    0 Ratings
    7.5
    75%
    7 Ratings

    Collaboration

    N/A
    0 Ratings
    8.2
    82%
    8 Ratings

    Testing and debugging

    N/A
    0 Ratings
    9.3
    93%
    8 Ratings

    feature 1

    N/A
    0 Ratings
    8.7
    87%
    3 Ratings

    Data Governance

    Apache Spark

    Feature Set Not Supported
    N/A
    8.3

    IBM InfoSphere DataStage

    83%
    IBM InfoSphere DataStage ranks higher in 2/2 features

    Integration with data quality tools

    N/A
    0 Ratings
    8.4
    84%
    7 Ratings

    Integration with MDM tools

    N/A
    0 Ratings
    8.2
    82%
    7 Ratings

    Attribute Ratings

    • Apache Spark is rated higher in 2 areas: Likelihood to Recommend, Usability
    • IBM InfoSphere DataStage is rated higher in 1 area: Support Rating

    Likelihood to Recommend

    9.2

    Apache Spark

    92%
    22 Ratings
    8.2

    IBM InfoSphere DataStage

    82%
    8 Ratings

    Likelihood to Renew

    10.0

    Apache Spark

    100%
    1 Rating

    IBM InfoSphere DataStage

    N/A
    0 Ratings

    Usability

    9.4

    Apache Spark

    94%
    2 Ratings
    9.0

    IBM InfoSphere DataStage

    90%
    2 Ratings

    Performance

    Apache Spark

    N/A
    0 Ratings
    9.0

    IBM InfoSphere DataStage

    90%
    2 Ratings

    Support Rating

    8.7

    Apache Spark

    87%
    6 Ratings
    8.9

    IBM InfoSphere DataStage

    89%
    5 Ratings

    Likelihood to Recommend

    Apache Spark

    The software appears to run more efficiently than other big data tools, such as Hadoop. Given that, Apache Spark is well-suited for querying and trying to make sense of very, very large data sets. The software offers many advanced machine learning and econometrics tools, although these tools are used only partially because very large data sets require too much time when the data sets get too large. The software is not well-suited for projects that are not big data in size. The graphics and analytical output are subpar compared to other tools.
    Thomas Young | TrustRadius Reviewer

    IBM InfoSphere DataStage

    Excellent Cloud data mapping tool and easy creating multiple project data analytics in real-time and the report distribution are excellent via this IBM product. Easy tool to provide data visualization and the integration is effective and helpful to migrating huge amounts of data across other platforms and different websites insights gathering.
    Edger Loredo | TrustRadius Reviewer

    Pros

    Apache Spark

    • Rich APIs for data transformation making for very each to transform and prepare data in a distributed environment without worrying about memory issues
    • Faster in execution times compare to Hadoop and PIG Latin
    • Easy SQL interface to the same data set for people who are comfortable to explore data in a declarative manner
    • Interoperability between SQL and Scala / Python style of munging data
    Nitin Pasumarthy | TrustRadius Reviewer

    IBM InfoSphere DataStage

    • Data movement
    • Seamless integration of scripts and etl jobs
    • Descriptive logging
    • Ability to work with myriad of data assets
    • Direct integration for Governance catalog
    Anonymous | TrustRadius Reviewer

    Cons

    Apache Spark

    • Memory management. Very weak on that.
    • PySpark not as robust as scala with spark.
    • spark master HA is needed. Not as HA as it should be.
    • Locality should not be a necessity, but does help improvement. But would prefer no locality
    Anson Abraham | TrustRadius Reviewer

    IBM InfoSphere DataStage

    • Connector Stages to Snowflake on the cloud. We had some issues initially but since then had been corrected.
    • Accessing tool from a browser (zero foot-print). Currently we need to either install locally or connect to a server to do ETL work.
    • Diversify ways of authenticating users.
    Herber Gonzalez | TrustRadius Reviewer

    Pricing Details

    Apache Spark

    General

    Free Trial
    Free/Freemium Version
    Premium Consulting/Integration Services
    Entry-level set up fee?
    No

    Starting Price

    IBM InfoSphere DataStage

    General

    Free Trial
    Free/Freemium Version
    Premium Consulting/Integration Services
    Entry-level set up fee?
    No

    Starting Price

    Likelihood to Renew

    Apache Spark

    Apache Spark 10.0
    Based on 1 answer
    Capacity of computing data in cluster and fast speed.
    Steven Li | TrustRadius Reviewer

    IBM InfoSphere DataStage

    No score
    No answers yet
    No answers on this topic

    Usability

    Apache Spark

    Apache Spark 9.4
    Based on 2 answers
    The only thing I dislike about spark's usability is the learning curve, there are many actions and transformations, however, its wide-range of uses for ETL processing, facility to integrate and it's multi-language support make this library a powerhouse for your data science solutions. It has especially aided us with its lightning-fast processing times.
    Anonymous | TrustRadius Reviewer

    IBM InfoSphere DataStage

    IBM InfoSphere DataStage 9.0
    Based on 2 answers
    Because it is robust, and it is being continuously improved.DS is one of the most used and recognized tools in the market. Large companies have implemented it in the first instance to develop their DW, but finding the advantages it has, they could use it for other types of projects such as migrations, application feeding, etc.
    Anonymous | TrustRadius Reviewer

    Performance

    Apache Spark

    No score
    No answers yet
    No answers on this topic

    IBM InfoSphere DataStage

    IBM InfoSphere DataStage 9.0
    Based on 2 answers
    It could load thousands of records in seconds. But in the Parallel version, you need to understand how to particionate the data. If you use the algorithms erroneously, or the functionalities that it gives for the parsing of data, the performance can fall drastically, even with few records.It is necessary to have people with experience to be able to determine which algorithm to use and understand why.
    Anonymous | TrustRadius Reviewer

    Support Rating

    Apache Spark

    Apache Spark 8.7
    Based on 6 answers
    1. It integrates very well with scala or python.2. It's very easy to understand SQL interoperability.3. Apache is way faster than the other competitive technologies.4. The support from the Apache community is very huge for Spark.5. Execution times are faster as compared to others.6. There are a large number of forums available for Apache Spark.7. The code availability for Apache Spark is simpler and easy to gain access to.8. Many organizations use Apache Spark, so many solutions are available for existing applications.
    Yogesh Mhasde | TrustRadius Reviewer

    IBM InfoSphere DataStage

    IBM InfoSphere DataStage 8.9
    Based on 5 answers
    I believe that IBM generally has one of the worst and most complex assistance systems (physical and online) that exists.
    Filippo Orlando | TrustRadius Reviewer

    Alternatives Considered

    Apache Spark

    Spark in comparison to similar technologies ends up being a one stop shop. You can achieve so much with this one framework instead of having to stitch and weave multiple technologies from the Hadoop stack, all while getting incredibility performance, minimal boilerplate, and getting the ability to write your application in the language of your choosing.
    Anonymous | TrustRadius Reviewer

    IBM InfoSphere DataStage

    It's obvious since they both are from the same vendors and it makes it easier and can get better rates for licensing. Also, sales rapes are very helpful in case of escalations and critical issues.
    Anonymous | TrustRadius Reviewer

    Return on Investment

    Apache Spark

    • Business leaders are able to take data driven decisions
    • Business users are able access to data in near real time now . Before using spark, they had to wait for at least 24 hours for data to be available
    • Business is able come up with new product ideas
    Surendranatha Reddy Chappidi | TrustRadius Reviewer

    IBM InfoSphere DataStage

    • Reduce development time by 65% compared with hand coding.
    • Reduces ETL process maintenance times.
    • Better data governance for technical and non-technical people.
    • Improve time to market for initiatives that require data integration.
    Gonzalo Angeleri | TrustRadius Reviewer

    Add comparison