Docker for research reproducibility
March 09, 2016

Docker for research reproducibility

Anonymous | TrustRadius Reviewer
Score 9 out of 10
Vetted Review
Verified User

Overall Satisfaction with Docker

As a research organization, we utilize and develop bioinformatics tools as our contribution to the research communities. In particular, we address the research reproducibility issues by developing Docker containers to wrap our research pipelines. Currently we use GitHub and DockerHub as a public repository for other users. Since the users are targeted for a wide range of users (not necessarily tech savvy), graphical user interface is essential. One of the challenges that Docker users currently face is to deliver the graphical user interface from Docker to the user.
  • Light weight and portable.
  • Easy to share (either by Docker file or as a container/DockerHub).
  • Same environment regardless of users operating system.
  • Docker is mainly a command line tool; delivering a graphical users interface out of a container is still a problem.
  • When Docker runs within a VM as in the case of Mac and Windows users, transferring files in and out of Docker is challenging.
  • Since with Mac and Windows users Docker runs within a VM, there's an extensive overhead that need careful consideration.
  • It increases the reproducibility in our research, as many more people are using the methods and tools we're developing.
  • As we are developing new and more tools, docker has increased the productivity in our research group.
Compared to virtualization, docker is a more efficient approach for us to deliver the tools and pipelines as containers. With less overhead, and the automation to rebuild the Docker images on DockerHub for any changes made in the GitHub, this repository makes deployment and testing more convenient. In addition, Docker is also less overhead.
As in our case, Docker is a great tool to ship and deliver research pipelines for other scientists to use, as it minimizes the hassle of compiling and dependencies issues. In data analytics pipelines, it especially great for running on the cloud where the data sits.