Fixing Fragmentation – The Journey To Cleaner Machine Leaning Pipelines

2022-10-17 01:53:34 By : Mr. Barton Zhang

SASSNITZ, GERMANY - OCTOBER 19: Workers are working on an iron cage the production process of pipes ... [+] in the production hall at the Nord Stream 2 facility at Mukran on Ruegen Islandon October 19, 2017 in Sassnitz, Germany. Nord Stream is laying a second pair of offshore pipelines in the Baltic Sea between Vyborg in Russia and Greifswald in Germany for the transportation of Russian natural gas to western Europe. An initial pair of pipelines was inaugurated in 2012 and the second pair is due for completion by 2019. A total of 50,000 pipes are currently on hand at Mukran, where they receive a concrete wrapping before being transported out to sea. Russian energy supplier Gazprom, whose board is led by former German chancellor Gerhard Schroeder, owns a 51% stake in Nord Stream. (Photo by Carsten Koall/Getty Images)

Fragmentation kills software. Cracks can appear in established blocks of enterprise software, whole applications or entire software suites. Equally, fissures and disconnects occur in smaller software components or services. When fragmentation occurs, pipes leak.

Those of us that worry about software fragmentation have long made friends with the engineers that work along the code pipeline. Something of a Sysephean task (once one crack is fixed, another often inevitably appears) the rise of Articificial Intelligence (AI) and Machine Learning (ML) has given us a new reason to be concerned about holes in our data DNA threads.

Fragmented ML means poor AI, less-than-smart intelligence and dumbed-down applications.

In the fight against fragmentation we come across a popular industry term. Technology companies use the label end-to-end too cheaply i.e. they slap it on every product, app and toolset to denote some notion of robust scalability. What end-to-end really means is software code and data structures that start when they are architected at one end, work throughout the toolchain that they need to execute inside… and then deliver to the machine (and usually human) endpoint that they were created to serve.

In real terms, solid end-to-end systems are the opposite of fragmented ones.

This all brings us to ClearML, an open source company that offers an MLOps platform designed to help data science, MLOps and DevOps teams develop, orchestrate and automate ML workflows at scale. It is designed as an end-to-end MLOps suite allowing users and customers to focus on developing their ML code and automation, ensuring their work is reproducible and scalable.

To clarify the term, MLOps is not Machine Learning applied to Ops-operations teams (database administrators and so on), it is operations for ML to make sure the ML team execute, manage, monitor, audit and analyze the entire MLOps process from a single fully integrated platform – in this case all with just two lines of code. Paradoxically, MLOps could ultimately be Machine Learning applied to Ops-operations when Ops use ML tools, but in the first instance it is all about getting ML right and avoiding fragmented technology, regardless of where it is used.

ClearML started life in a select deployment group on an invite-only basis. The company has now made its technology generally available and envisages application use cases across industries such as healthcare, healthtech, retailtech, adtech, martech and manufacturing etc.

"ClearML is proud to be the only unified, end-to-end, frictionless MLOps platform supporting enterprises," said Moses Guttmann, CEO and co-founder of ClearML. "In a category dominated by closed point solutions and fragmented semi-platforms, ClearML delivers an open-sourced, comprehensive offering that enables companies to scale their MLOps while successfully bridging the innovation and revenue gaps with our unified end-to-end platform."

Key features include ClearML Experiment, a tools that allows data scientists to track every part of the ML experimentation process and automate tasks. With it, users can log, share and version all experiments and instantly orchestrate pipelines. With ClearML Orchestrate DevOps and data scientists are empowered through autonomy and control over compute resources. The cloud native solution also enables Kubernetes and bare-metal resource scheduling with a simple and unified interface to control costs and workloads.

"Many machine learning projects fail because of closed-off, point tools that lead to an inability to collaborate and scale," said Guttmann. "Customers are forced to invest in multiple tools to accomplish their MLOps goals, creating a fragmented experience for data scientists and ML engineers. Through our offerings, customers experience the full potential and business impact of machine learning."

Every component of ClearML integrates with each other, the promise here being the ability to deliver cross-department visibility in research, development and production.

As we strive to build smarter software systems today and tomorrow, knowing a little more (scratch that - make it a lot more) about the integrity of the pipeline feeding our machines’ brains will be critical. Does anyone have a spanner?