Pachyderm

Pachyderm is the leader in data versioning and pipelines for MLOps. We provide the data foundation that allows data science teams to automate and scale their machine learning lifecycle while guaranteeing reproducibility. With investment from Benchmark, Microsoft M12, and others, Pachyderm, Inc. offers a commercial Pachyderm Enterprise Edition and an open source Pachyderm Community Edition. Pachyderm helps customers get their ML and AI projects to market faster, lower data processing and storage costs, and supports strict data governance requirements.

Key Features

Automated Data Versioning — Pachyderm’s Data Versioning gives teams an automated and performant way to keep track of all data changes

Utilizes a Git-like structure that enables effective team collaboration through commits, branches and rollbacks
Powerful content-based deduplication reduces the cost of storing and accessing large data sets
File-based versioning provides a complete audit trail for all data and artifacts across pipeline stages including intermediate results
Stored as native objects (not metadata pointers) so that versioning is automated and guaranteed

Data-Driven Pipelines — Pachyderm’s Containerized Pipelines speed data processing while lowering compute costs

Kubernetes native approach supports any library or language
Autoscale with parallel processing of data without writing additional code
Automated pipelines execute whenever new data is committed
Incremental processing saves compute by only processing differences and automatically skipping duplicate data
Pipeline steps have JSON/YAML defined inputs and outputs that ease debugging

Immutable Data Lineage — Pachyderm’s Data Lineage provides an immutable record for all activities and assets in the ML lifecycle

Track every version of your code, models, and data
Maintain reproducibility of data and code for compliance
Manage relationships between historical data states
Pachyderm’s Global IDs make it easy for teams to track any result all the way back to its raw input, including all analysis, parameters, code, and intermediate results.

Console — The Pachyderm Console provides an intuitive visualization of your DAG (directed acyclic graph) and aids in reproducibility

See the overall structure and flow of all your pipelines
Ease pipeline and workflow design
Facilitate collaboration across teams on shared DAGs
Drill into pipelines and job details for easy debugging

Notebooks — Pachyderm’s JupyterLab Mount Extension provides a point-and-click interface to Pachyderm versioned data

Accelerate experimentation with easy and intuitive access to versioned data
Mount any Pachyderm data repository locally for convenient access
Work with versioned data like it’s on your own file system. No Pachyderm knowledge required
Explore data with a built in file browser
Collaborate across teams with a single source of truth for your data

Enterprise Administration — Pachyderm provides robust tools for deploying and administering Pachyderm at scale across different teams in your organization

Helm 3 provides robust and standards-based deployment on any public or private cloud
Enterprise Server provides easy centralized licensing and administration of all Pachyderm clusters / workspaces
Use any identity provider with Pachyderm’s pluggable authentication
Role Based Access Control (RBAC), allows for fine grained control over access to clusters and data

Home Page Twitter

Products

Pachyderm

0 reviews and ratings