3
The Stellus Data Platform Speeds the Cryo-EM Pipeline A revolution in high-speed microscopy is paving the way for new drug delivery mechanisms, using custom therapies aimed at sub-molecular targets. Powered by the latest cryogenic electron microscopy (Cryo-EM) tools, these techniques give clinicians a new arsenal of microscopic weapons to fight the deadliest diseases. To do it though, research organizations must transform millions of high-resolution Cryo-EM images into useful 3D models. To date, that process has taken months—sometimes a year or longer—but that time can be reduced dramatically. Stellus is breaking those traditional speed limits for life sciences computing with the Stellus Data Platform (SDP) file system. Explicitly built to meet modern data storage needs using the fastest native flash storage, SDP enables organizations to dramatically accelerate the Cryo-EM data pipeline. Imagine the possibilities when researchers can process many more workloads in far less time. Their research advances can increase exponentially, and they can take concrete steps to enable the personalized medicine therapies of the future. Powering the Resolution Revolution Cryo-EM samples encompass thousands of high-resolution (4K) two-dimensional images, which must be transformed into 3D models and motion clips. At each stage in this process, systems work with large image and video files, performing computationally intensive tasks like blur removal, motion correction, 3D image classification, and more. Even running in an HPC environment 24/7, processing a single sample can take three months—and sometimes this is an iterative process. Unsurprisingly, laboratories face long backlogs and high costs. A typical detector and camera system can produce 4TB of unstructured data every day and cost $10 million annually to operate—even when sitting idle. Organizations use a variety of techniques (specialized RELION systems, multi-GPU and FPGA approaches, etc.) to accelerate the computational workflows. Since so many stages are read/write-intensive, however, these accelerators end up shifting the bottleneck to the storage systems feeding data to the CPU complex. Effectively, they change Cryo-EM workloads from being CPU-bound to being I/O-bound. Legacy Storage Architectures Can’t Keep Up The barrier to faster performance is antiquated hardware and legacy file systems. Modern components like NVMe interconnects and NAND flash media are capable of order-of-magnitude I/O improvements. The decades-old software and file systems run by even the newest storage systems still cannot exploit the most modern storage media components. These aging architectures waste significant resources on processes like the following: Converting data between file and block I/O, which gets more resource-intensive as data grows Maintaining global data maps at scale as the number of files grows exponentially Ensuring global cache coherence across multiple nodes in a large cluster SOLUTION BRIEF: CRYO-EM Break the Backlog for High-Speed Microscopy Life Sciences

SOLUTION BRIEF: CRYO-EM Break the Backlog for High-Speed ... · to accelerate the computational workflows. Since so many stages are read/write-intensive, however, these accelerators

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SOLUTION BRIEF: CRYO-EM Break the Backlog for High-Speed ... · to accelerate the computational workflows. Since so many stages are read/write-intensive, however, these accelerators

The Stellus Data Platform Speeds the Cryo-EM Pipeline

A revolution in high-speed microscopy is paving the way for new drug delivery mechanisms, using custom therapies aimed at sub-molecular targets. Powered by the latest cryogenic electron microscopy (Cryo-EM) tools, these techniques give clinicians a new arsenal of microscopic weapons to fight the deadliest diseases. To do it though, research organizations must transform millions of high-resolution Cryo-EM images into useful 3D models. To date, that process has taken months—sometimes a year or longer—but that time can be reduced dramatically.

Stellus is breaking those traditional speed limits for life sciences computing with the Stellus Data Platform (SDP) file system. Explicitly built to meet modern data storage needs using the fastest native flash storage, SDP enables organizations to dramatically accelerate the Cryo-EM data pipeline. Imagine the possibilities when researchers can process many more workloads in far less time. Their research advances can increase exponentially, and they can take concrete steps to enable the personalized medicine therapies of the future.

Powering the Resolution RevolutionCryo-EM samples encompass thousands of high-resolution (4K) two-dimensional images, which must be transformed into 3D models and motion clips. At each stage in this process, systems work with large image and video files, performing computationally intensive tasks like blur removal, motion correction, 3D image classification, and more. Even running in an HPC environment 24/7, processing a single sample can take three months—and sometimes this is an iterative process.

Unsurprisingly, laboratories face long backlogs and high costs. A typical detector and camera system can produce 4TB of unstructured data every day and cost $10 million annually to operate—even when sitting idle. Organizations use a variety of techniques (specialized RELION systems, multi-GPU and FPGA approaches, etc.) to accelerate the computational workflows. Since so many stages are read/write-intensive, however, these accelerators end up shifting the bottleneck to the storage systems feeding data to the CPU complex. Effectively, they change Cryo-EM workloads from being CPU-bound to being I/O-bound.

Legacy Storage Architectures Can’t Keep UpThe barrier to faster performance is antiquated hardware and legacy file systems. Modern components like NVMe interconnects and NAND flash media are capable of order-of-magnitude I/O improvements. The decades-old software and file systems run by even the newest storage systems still cannot exploit the most modern storage media components. These aging architectures waste significant resources on processes like the following:

• Converting data between file and block I/O, which gets more resource-intensive as data grows• Maintaining global data maps at scale as the number of files grows exponentially• Ensuring global cache coherence across multiple nodes in a large cluster

SOLUTION BRIEF: CRYO-EMBreak the Backlog for High-Speed Microscopy

Life Sciences

Page 2: SOLUTION BRIEF: CRYO-EM Break the Backlog for High-Speed ... · to accelerate the computational workflows. Since so many stages are read/write-intensive, however, these accelerators

These processes were useful when storage primarily meant working with HDDs and block I/O on structured data. They’re irrelevant for the transformation and manipulation of unstructured image data in Cryo-EM workloads, diverting resources that could be used to service I/O requests.

The New Standard in Life Sciences Performance: Stellus Data Platform Stellus created the SDP to address the problems of legacy storage architectures. With a new software stack built from the ground up to exploit the latest Compute, network and memory infrastructures, the platform sets a new standard in I/O performance for life sciences applications and unstructured Genomics data.

The Stellus SDP replaces block stores, data maps, and data caches with high-performance Key-Value Stores, Key-Value-over NVMe, and algorithmic data placement. At the same time, it provides file access across standard protocols like NFS and SMB, as well as newer object storage access methods like S3—all in a scalable, enterprise-ready storage system.

The Stellus SDP delivers these key benefits:

• Composable platform flexibility—In most life sciences organizations, HPC/Multi-GPU clusters are ashared resource supporting a range of applications, with requirements that change all the time.Today, if organizations want to add more capacity or performance, they have to buy a new node—paying for more compute, network, and storage, even if they need to increase only one of thosedimensions. With the SDP, organizations can add performance (throughput) and capacityindependently to cost-effectively scale as requirements evolve. Increase throughput by adding newData Manager (DMs). Add capacity by adding to the Key-Value Store (KVS) layer. As labs scale upto processing multiple terabytes’ worth of samples per day, that flexibility will be essential to keepingcosts predictable and compute investments aligned with actual needs.

• Software-defined storage—To keep budgets under control, organizations need to get maximum useout of their IT resources. That means augmenting rather than replacing storage hardware wheneverpossible. The smartest way to do that is with software-defined storage. Most system intelligenceresides in software, which can be changed with relative ease, rather than locked within thehardware. Unfortunately, most storage vendors still rely on legacy hardware-based models to deliverhigh performance, often requiring forklift overhauls to take advantage of new capabilities. TheStellus SDP is a software-based file system, independent from any particular hardware stack. TheStellus Data Platform is a software-based file system that is able to deliver perfroamnce in Cloud,Core & Edge environments.

• User-mode file system—Life sciences organizations have many more options today to achieve high-performance storage. To deliver it though, most solutions require custom client software, specializedcontrollers, or Linux kernel customizations. Those strategies can work on smaller scales, but in largeHPC environments with hundreds of machines and thousands of cores, they just can’t scale andquickly become a nightmare to maintain. The Stellus SDP runs as strictly user-mode software on topof standard Linux—no kernel hacks, special client software, or custom controllers required.

SOLUTION BRIEFCRYO-EM

Page 3: SOLUTION BRIEF: CRYO-EM Break the Backlog for High-Speed ... · to accelerate the computational workflows. Since so many stages are read/write-intensive, however, these accelerators

SOLUTION BRIEF CRYO-EM

Turbocharge the Data Pipeline for Cryo-EM ApplicationsIt’s time for laboratories to break the backlog for Cryo-EM analysis. By eliminating the artifacts of yesterday’s storage technologies from today’s life sciences computing, Stellus is making it possible. Even processing the highest-resolution images and 3D animations, the Stellus Data Platform can accelerate data storage, transformation, and analysis across every stage of the Cryo-EM data pipeline.

Armed with these capabilities, researchers can apply advanced genomic analysis and the latest AI and machine learning techniques to develop a broader range of novel molecular delivery medicines, more quickly, for more patients. They can unleash a new generation of precision therapies specifically tuned to attack disease in the most effective way for each individual.

Solution Brief: CRYO-EM| Version 1

Stellus Technologies 3833 North First Street San Jose, CA 95134

www.stellus.com

©2020 Stellus Technologies is a leading data systems company that delivers high-performance Key-Value Store technology to solve fast-growing unstructured data challenges in the Cloud, Core and Edge infrastructures.

Stellus Data Platform for the Life Sciences Workflow