Skip to main content
TrustRadius
Apache Spark

Apache Spark

Overview

Learn from top reviewers

Return to navigation

Product Demos

Spark Project | Spark Tutorial | Online Spark Training | Intellipaat

YouTube

Spark SQL Tutorial | Spark SQL Using Scala | Apache Spark Tutorial For Beginners | Simplilearn

YouTube

Apache Spark Full Course | Apache Spark Tutorial For Beginners | Learn Spark In 7 Hours |Simplilearn

YouTube

Apache Spark Architecture | Spark Cluster Architecture Explained | Spark Training | Edureka

YouTube

Introduction to Databricks [New demo linked in description]

YouTube

Apache Spark Tutorial | Spark Tutorial for Beginners | Spark Big Data | Intellipaat

YouTube
Return to navigation

Product Details

What is Apache Spark?

Apache Spark Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews From Top Reviewers

(1-5 of 24)

Apache Spark is still a valid DE tool

Rating: 9 out of 10
December 28, 2024
Vetted Review
Verified User
Apache Spark
6 years of experience
  • Parallel processing
  • Configurability
  • Usage with other tools
Cons
  • More ready-to-use solutions for tweaking the Apache Spark configs
  • Reduce the creation of UDFs for Pyspark by implementing transformations directly

Apache Spark: Lightning-Fast Distributed Computing with a Learning Curve

Rating: 10 out of 10
August 18, 2023
AG
Vetted Review
Verified User
Apache Spark
4 years of experience
  • Fault-tolerant systems: in most cases, no node fails. If it fails - the processing still continues.
  • Scalable to any extent.
  • Has built-in machine learning library called - MLlib
  • Very flexible - data from various data sources can be used. Usage with HDFS is very easy
Cons
  • Its fully not backward compatible.
  • It is memory-consuming for heavy and large workloads and datasets
  • Support for advanced analytics is not available - MLlib has minimalistic analytics.
  • Deployment is a complex task for beginners.

Apache Spark is the next generation of big data computing.

Rating: 9 out of 10
April 18, 2022
SL
Vetted Review
Verified User
Apache Spark
1 year of experience
  • DataFrame as a distributed collection of data: easy for developers to implement algorithms and formulas.
  • Calculation in-memory.
  • Cluster to distribute large data of calculation.
Cons
  • It would be great if Apache Spark could provide a native database to manage all file info of saved parquet.

Apache Spark - your go to technology for distributed data processing

Rating: 9 out of 10
May 03, 2021
  • Spark is very fast compered to other frameworks because it works in cluster mode and use distributed processing and computation frameworks internally
  • Robust and fault tolerant
  • Open source
  • Can source data from multiple data sources
Cons
  • No Dataset API support in python version of spark
  • Apache Spark job run UI can have more meaningful information
  • Spark errors can provide more meaningful information when a job is failed
Return to navigation