What is Google Cloud Dataproc?
Dataproc, on Google Cloud, is a fully managed and scalable service for running Apache Hadoop, Apache Spark, Apache Flink, Presto, and 30+ open source tools and frameworks. Dataproc is used for data lake modernization, ETL, and secure data science, at scale, integrated with Google Cloud.
Key features
Key features
Fully managed and automated big data open source software
Serverless deployment, logging, and monitoring so users can focus on data and analytics, not on your infrastructure. Reduces TCO of Apache Spark management, enables data scientists and engineers to build and train models faster, compared to traditional notebooks, through integration with Vertex AI Workbench. The Dataproc Jobs API makes it easy to incorporate big data processing into custom applications, while Dataproc Metastore eliminates the need to run a Hive metastore or catalog service.
Containerize Apache Spark jobs with Kubernetes
Apache Spark jobs built using Dataproc on Kubernetes can use Dataproc with Google Kubernetes Engine (GKE) to provide job portability and isolation.
Enterprise security integrated with Google Cloud
When creating a Dataproc cluster, users can enable Hadoop Secure Mode via Kerberos by adding a Security Configuration. Additionally, some of the most commonly used Google Cloud-specific security features used with Dataproc include default at-rest encryption, OS Login, VPC Service Controls, and customer-managed encryption keys (CMEK).
The best of open source with the best of Google Cloud
Dataproc lets users take the open source tools, algorithms, and programming languages they prefer, but makes it easy to apply them on cloud-scale datasets. At the same time, Dataproc has out-of-the-box integration with the rest of the Google Cloud analytics, database, and AI ecosystem. Data scientists and engineers can quickly access data and build data applications connecting Dataproc to BigQuery, Vertex AI, Cloud Spanner, Pub/Sub, or Data Fusion.
Categories & Use Cases
Technical Details
| Mobile Application | No |
|---|