Tools
In the following sections, we will more systematically introduce the following tools:
📄️ OBP
Process flow
📄️ Pachyderm
If you are familiar with Git, a version control and life cycle system for code, you will find many similarities between the most important Git and Pachyderm concepts. Version control systems such as Git and its hosted version GitHub have become an industry standard for thousands of developers worldwide. Git enables you to keep a history of changes in your code and go back when needed. Data scientists deserve a platform that will let them track the versions of their experiments, reproduce results when needed, and investigate and correct bias that might crawl into one of the stages of the data science life cycle. Pachyderm provides benefits similar to Git that enable data scientists to reproduce their experiments and effortlessly manage the complete life cycle of the data science workflow.
📄️ PySpark
📄️ River
River is a Python library for online machine learning. It is the result of a merger between creme and scikit-multiflow. River's ambition is to be the go-to library for doing machine learning on streaming data.