Pachyderm
Key Features
Automated Data Versioning — Pachyderm’s Data Versioning gives teams an automated and performant way to keep track of all data changes
Utilizes a Git-like structure that enables effective team collaboration through commits, branches and rollbacks
Powerful content-based deduplication reduces the cost of storing and accessing large data sets
File-based versioning provides a complete audit trail for all data and artifacts across pipeline stages including intermediate results
Stored as native objects (not metadata pointers) so that versioning is automated and guaranteed
Data-Driven Pipelines — Pachyderm’s Containerized Pipelines speed data processing while lowering compute costs
Kubernetes native approach supports any library or language
Autoscale with parallel processing of data without writing additional code
Automated pipelines execute whenever new data is committed
Incremental processing saves compute by only processing differences and automatically skipping duplicate data
Pipeline steps have JSON/YAML defined inputs and outputs that ease debugging
Immutable Data Lineage — Pachyderm’s Data Lineage provides an immutable record for all activities and assets in the ML lifecycle
Track every version of your code, models, and data
Maintain reproducibility of data and code for compliance
Manage relationships between historical data states
Pachyderm’s Global IDs make it easy for teams to track any result all the way back to its raw input, including all analysis, parameters, code, and intermediate results.
Console — The Pachyderm Console provides an intuitive visualization of your DAG (directed acyclic graph) and aids in reproducibility
See the overall structure and flow of all your pipelines
Ease pipeline and workflow design
Facilitate collaboration across teams on shared DAGs
Drill into pipelines and job details for easy debugging
Notebooks — Pachyderm’s JupyterLab Mount Extension provides a point-and-click interface to Pachyderm versioned data
Accelerate experimentation with easy and intuitive access to versioned data
Mount any Pachyderm data repository locally for convenient access
Work with versioned data like it’s on your own file system. No Pachyderm knowledge required
Explore data with a built in file browser
Collaborate across teams with a single source of truth for your data
Enterprise Administration — Pachyderm provides robust tools for deploying and administering Pachyderm at scale across different teams in your organization
Helm 3 provides robust and standards-based deployment on any public or private cloud
Enterprise Server provides easy centralized licensing and administration of all Pachyderm clusters / workspaces
Use any identity provider with Pachyderm’s pluggable authentication
Role Based Access Control (RBAC), allows for fine grained control over access to clusters and data