Posted by in Solutions

We just reduced image propagation time on FutureGrid’s Sierra cloud at UCSD from hours to minutes! This magic comes courtesy of Nimbus LANTorrent.

We blogged about LANTorrent before: it can distribute the same file among many nodes using peer-to-peer techniques. It is available in Nimbus since version 2.6 and allows users to efficiently deploy a cluster of virtual machines based on the same image. Installing and configuring LANTorrent on the Nimbus nodes (both service and hypervisor nodes) is easy; it took only a couple of hours on Sierra (all the details are explained in the LANTorrent Configuration section of the Nimbus documentation).

Granted, Sierra’s configuration helped make this spectacular. Sierra is backed by an NFS server connected via Gigabit Ethernet to the cluster. To transfer virtual machines to a hypervisor node, a copy of the virtual machine image is made using scp. While copying a single virtual machine image can be done in less than one minute, deployment of large virtual clusters takes much more time because all file transfers originate from the centralized NFS server. With LANTorrent however, it is dramatically faster! Instead of forcing 100 copies of the same files through the NFS server’s single NIC, LANTorrent uses the collecting power of every receiving nodes’ NIC to transmit the data, bringing us approximately a 3x speedup! I made a graph that compares deployment time of a 4 GB virtual machine image (SCP in red, LANTorrent in blue). Faster is not all – the growth rate is where we really reap the benefits!

LANTorrent performance on Sierra

Posted by in General

A few weeks ago (May 23rd-26th) I traveled to sunny Newport Beach, California to present a paper, Improving Utilization of Infrastructure Clouds, at CCGrid 2011. Our paper addresses one of the main challenges faced by infrastructure cloud providers: ensuring that resources are utilized efficiently while still providing resources on-demand. To solve this catch-22, we deployed backfill VMs on idle VMM nodes. For evaluation, we deployed Condor in the backfill VMs and demonstrated an increase to 100% utilization of the infrastructure resources. All of the details are in the paper, so I won’t elaborate here. You can also try backfill for yourself with Nimbus 2.7.

While at the conference I also attended a number of excellent presentations. In the interest of brevity, I’ll highlight only a few of them here. The keynote on the second day, Maximizing Profit and Pricing in Cloud Environments, given by Albert Zomaya of the University of Sydney, discussed the challenges of pay-as-you-go cloud computing and the often conflicting objectives of cloud service providers (maximize profit) and cloud users (minimize expenses). Dr. Zomaya proposed numerous models and algorithms for profit-driven scheduling of resources. He also proposed an application profiling technique that detects application patterns (e.g. IO intensive, memory intensive, or CPU intensive) and then applies a prediction model to achieve optimal VM placement in a cloud infrastructure.

Another interesting discussion at CCGrid was a panel on autonomic cloud computing. The panel emphasized the difficulties of managing large-scale cloud infrastructures. Much of the panel focused on the need for tools and techniques to manage these resources in and automatic and efficient manner where resources dynamically scale (up or down) to match demand. The Chief of Research from Rackspace, Adrian Otto, highlighted the efforts at Rackspace to address this challenge. They analyzed various characteristics (e.g. identifying distribution models to describe bursts in website activity) to predict cloud resource needs.

There were many other excellent presentations, however, I’ll leave it to you to read the papers from the conference (the full program and paper listing is available here). Next year CCGrid is heading to Ottawa, Canada, perhaps I’ll see you there!

Sunset dinner cruise in Newport Bay

Sunset dinner cruise in Newport Bay

Posted by in General

I had the privilege of presenting Cumulus: Open Source Storage Cloud for  Science at the Science Cloud 2011 workshop yesterday.  While I was focused on our open source S3 implementation ideal for the extensibility and scientific experimentation, many other interesting topics were presented. Shane Canon present a very interesting look at common misconceptions about the cloud in scientific circles.  In it he exposed some truths about what ‘on demand’ ultimately means to a data center.  He worked to illustrate where on the hype curve the cloud currently is, and what features work for science and what was missing.  Elasticity for bursty applications is a clear win but a sighted glaring gap is the lack of a shared file system.  A shared file system is an assumed service to most scientific users coming from the grid and most other HPC platforms.  This need for a shared file system struck a chord with me and it seemed to be a common theme at the workshop.  Lavanya Ramakrishnan gave a talk on Magellan: Experiences from a Science Cloud.  In it she mentioned the struggles scientific users had with their applications inside of VMs.  One was the difficulty staging in data into the VM’s space.  A couple of other talks discussed the huge volumes of data created by scientific applications. All of this discussion made me wonder if a Cloud agnostic shared file system service could be created and if such a thing could solve these problems.

The full program is available here.

Posted by in Applications

When we think of science we don’t immediately think of quality assurance and yet… scientists have to run their codes somewhere, they need hardware, the hardware needs software, and the software needs to be operated reliably and efficiently – enter the Quality Assurance (QA) team of TeraGrid: the most powerful open science resource.

Shava Smallen, the co-lead of the TeraGrid QA team, told me recently of their first venture into infrastructure clouds. The TeraGrid Science Gateway projects have been experiencing scalability problems with grid infrastructure. A potential solution came out in the form of GRAM 5; the scientists developed scalability tests to see if it solved their problem — but where could they run them? They tried Ranger, a top-of-the-line TeraGrid resource at Texas Advanced Computing Center (TACC). But Ranger is a powerful resource, very much in demand for large scientific computations that cannot run elsewhere — and thus the QA team found itself with tests all ready to run – but no resources to run them on.

Fortunately, the FutureGrid – which includes several IaaS clouds — had just become available.  The QA team quickly put together a virtual cluster, similar to a typical TeraGrid configuration but using GRAM5, and deployed it on the FutureGrid Nimbus cloud at the University of Florida. They ran, they scaled, they reported… While they were at it, they tested GridFTP for scalability as well: running data transfers between two Nimbus clouds at Florida and San Diego.

For the QA team the ability to find resources on-demand – and find them without disrupting the production scientific runs – was the key motivator for turning to cloud computing. It is not hard to imagine many other use cases in TeraGrid with similar requirements. For example, users could use virtual clusters, configured to represent the environment deployed on Ranger or other TeraGrid resources, as a development or debugging platform. In addition to providing on-demand availability, such resources would also give them root access – often important for debugging, but not available on many TeraGrid resources. To leverage cloud computing for science we don’t have to wait to first decide if it will support all of the HPC applications: there are plenty of places where it can be useful now.

Posted by in Solutions

Cloud computing users think on-demand availability is the best thing since sliced bread: it enables elastic computing, outsourcing for applications requiring urgent or interactive response, and reduces wait times in batch queues. But if you are a cloud provider you might not think so… In order to ensure on-demand availability you  need to overprovision: keep a lot of nodes idle so that they can be used to service an on-demand request, which could come at any time. This means low utilization. The only way to improve it is to keep fewer nodes idle. But this means rejecting more requests – at which point you’re not really on-demand… a veritable catch-22.

Low utilization is particularly hard to swallow in the scientific community where utilization rates are high – why configure a cloud if the good old batch scheduler will amortize your resource so much better?

This gave us an idea. What if we deployed a VM on every idle cloud node and joined it to something that is used to operating in an environment where resources are coming and going, something that makes use of screensaver time on people’s desktops, something like SETI@home or a Condor pool? We went to work and extended Nimbus so that an administrator can configure a “backfill VM” that gets deployed on cloud nodes by default. When an on-demand request comes in, enough backfill VMs get terminated to service the request; when the on-demand request finishes, the backfill VMs get deployed again.

Cloud utilization: without backfill (top) and with backfill (bottom)

Cloud utilization: without backfill (top) and with backfill (bottom)

For the user, this solution means that they can now choose from two types of instances: on-demand instances — what you get from a typical EC2-style cloud — and opportunistic instances – a pre-configured VM joining a volunteer computing pool. To find out what this means for the provider, Paul Marshall ran some experiments with backfill VMs configured to join a Condor pool and came up with the graphs on the right: a purely on-demand cloud is cold – add backfill VMs and the system heats up… you can get 100% utilization! More details in Paul’s CCGrid paper.

But now we had more questions: who can configure the backfill VM: does it have to be the administrator or could it be the user? And: can we have more than one type of backfill VM? One simple refinement would be for the admin to use multiple VMs and have policies on what percentage of available cycles should be devoted to each. But if we are going to allow the user to submit such VMs, how is this percentage set? Somebody has already figured this out before: why not auction it off – and provide spot instances. Both the backfill VMs and spot pricing are special cases of a more general mechanism… that we released today in Nimbus 2.7.

Nimbus 2.7 contains both the backfill and spot pricing implementation — different configurations of roughly the same thing. This makes Nimbus the only EC2-compatible open source IaaS implementation with support for spot pricing. Backfill instances may be more relevant to scientific settings where the concept of payment is not explicit, and simulating it with auctions has been known to cause “inflation”. We hope it will allow providers to leverage their cloud cycles better. And we also hope that it will provide a flexible tool for all of you investigating and fine-tuning the relationships between various aspects of resource utilization, energy savings, cost, and pricing.

Best for last — this is probably the first time that content is not the most important feature of a Nimbus release. You’ve seen us mention the hard work of our open source community contributors before – but this is the first time that open source contributions are primarily responsible for a Nimbus release. The spot pricing implementation is the work of our brilliant Brazilian contributor Paulo Ricardo Motta Gomes (just one person despite appearances to the contrary… 😉 sponsored by the Google Summer of Code (GSoC) last summer. And incidentally, there may be new opportunities this year — watch the Nimbus news feed for details.

Posted by in General

I often get asked if there is any published work evaluating performance and cost of scientific applications on IaaS clouds and comparing them to using clusters — and I always say LOTS! …and then can’t remember more than a few off the top of my head ;-). So I recently put together a list — included below — of various evaluation and comparison efforts I’ve been able to find. They  look all sorts of aspects of performance — from low-level benchmarks to applications of various types, from reliability to cost. They all tend to focus on somewhat different aspects of the issue and collectively paint a picture blessings and challenges of cloud computing for science.

My personal favorite is “Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud Amazon Web Services Cloud” — on top of the list since, having just come out at CloudCom 2010 last December it is the most recent. The authors evaluate the AWS IaaS offering based on the NERSC benchmarks framework — a comprehensive set of benchmarks capturing the typical workload in a scientific datacenter. They report not only the performance characteristics of scientific applications on virtual clusters created in the cloud but also note the mean time between failures (MTBF) of a virtual cluster deployed on cloud resources — the consequences of which I (coincidentally) blogged about around the time this paper was presented.

And finally, I have a favor to ask — if you know of papers evaluating various aspects of scientific applications on IaaS clouds or have favorites in the filed, or opinions on what you would like to see evaluated — please tell us about it. I will do a post of lessons learned.

Evaluation of IaaS clouds for scientific applications:

Posted by in Applications

BaBar from the children’s books is a young elephant who comes to a big city, and brings back the benefits of civilization to other elephants in the jungle. He also happens to be a very apt mascot for a high-energy physics project.

The name BaBar actually derives from the B/B-bar subatomic particles produced at the SLAC collider in Stanford, California during electron-positron collisions. These experiments help us achieve a better understanding of the relationship between matter and anti-matter and ultimately answer questions about the nature of the universe. This groundbreaking research is moving forward at a brisk pace: the BaBar scientists have petabytes of data, a plethora of insight, and a recent Nobel prize to show for it.

Sifting through petabytes of data requires petaflops of computation. And here the BaBar scientists faced what is a common problem in science — the software required to process the data is complex: hard to maintain, impossible to rewrite. To illustrate: we are talking about roughly 9 million lines of C++ and Fortran code developed by hundreds of scientists spread across 75 institutions in 10 different countries… Such software requires significant effort to port to new platforms and operating systems. This, in practice, limits the scientists’ access to resources since it takes a lot of effort to port the code to e.g., a new grid site.

And this is also where cloud computing comes in.  I was recently talking to Ian Gable from  the University of Victoria (UVIC) who told me of their efforts to put BaBar in the clouds. The UVIC team’s idea was to put BaBar software on virtual machine (VM) images that could be run on any Infrastructure-as-a-Service (IaaS) cloud. They configured a base image which could be customized to support specific BaBar applications – this smoothed the way for the scientists to try cloud computing because it took the time-consuming configuration step out of the equation.

Three clouds used by the BaBar experiment; figure provided by the UVIC team

Three clouds used by the BaBar experiment; figure provided by the UVIC team

Running BaBar in the clouds proved to be a great success: so much so that the demand soon outstripped the capabilities of the Nimbus cloud at UVIC (named, appropriately enough, “Elephant” ;-). Fortunately, the UVIC team was able to provision additional cloud resources from the Alto Nimbus cloud at the National Research Council in Ottawa and from EC2 East. But another problem then arose: how can you present resources provisioned over distributed clouds to the users in a uniform fashion? Rather than sending people from cloud to cloud shopping for cycles, the UVIC team automated the process. They did so via an infrastructure called the Cloud Scheduler – a combination of cloud provisioning and the Condor scheduler typically used to submit BaBar jobs. Cloud Scheduler works roughly like this: a user submits a job to a Condor queue (as always) and the Cloud Scheduler monitors the queue. Then, if there are jobs in the queue Cloud Scheduler stands up VMs on the various available clouds and makes them available to the Condor installation; if there are no jobs in the queue, the VMs are terminated. With the aid of this infrastructure, BaBar computations have been happily churning away on the distributed clouds demonstrating how the provisioning issues in a distributed environments can be effectively overcome.

The BaBar story highlights two trends that increasingly come up in the context of cloud computing for science. One is the need to provide appliance management for scientists who do not necessarily have the skills — or frankly — the interest in working on configuring and maintaining an operating system installation. This is something that previously came “bundled” with resource management: a system administrator would both keep the machines running and try to provide a software configuration reconciling the needs of different application groups. The latter became harder as  more application groups coming from remote sites tried to use the resource since there were more potentially conflicting requirements to resolve. Infrastructure-as-a-Service cuts the system administrator’s job in half. The cloud of course still needs to be operated — but the software it is running is now application-independent. The application software still has to be configured — but this is now done by configuring community-specific appliances developed by community experts such as the UVIC team who stepped up as appliance providers for their community.

The second trend is using multiple clouds – what we called at some point “sky computing”. In provisioning resources for BaBar, the UVIC team managed to successfully combine two distributed community clouds and one commercial cloud thus demonstrating that resources can indeed be provisioned from multiple sources with minimum inconvenience to the user. Furthermore, they provided a mechanism to do it based on need. This pattern — elastically provisioning over multiple clouds — is becoming increasingly important as communities explore using multiple resources — we used it in our work with the ALICE experiment at CERN, working on the ElasticSite project which elastically adds resources provisioned in the cloud to a local scheduler, and are implementing it in the infrastructure for the Ocean Observatory Imitative (OOI) – something to watch.

All in all an amazing effort by the UVIC team in making cloud computing easier to use for their community. When we think of elephants we tend to think of heavy, plodding animals that are hard to move… Not this one ;-).

Posted by in Applications

HPC in the Cloud posted a nice article describing how scientists from the Canadian CANFAR projects are using cloud computing to deal with their data problem. Quoting from a recent white paper by Nicholas Ball and David Schade:

“in the past two decades, astronomy has gone from being starved for data to being flooded by it. This onslaught has now reached the stage where the exploitation of these data has become a named discipline in its own right”

This is yet another scientific discipline that is turning to cloud computing to deal with their expanding need for data analysis. The HPC in the Cloud article has a nice summary of the researchers’ needs and initial exploration in this space.

Posted by in News

Happy New Year!

To get it off to a new start check out the call for papers for the ScienceCloud 2010 workshop – announced right before Christmas!

The last year’s Science Cloud workshop was a great venue for anybody interested in cloud computing for science. The program covered everything from scientific cloud platforms (and how to set them up), through standards and middleware, to case studies of scientific applications on commercial cloud platforms such as Amazon and Azure. The latter were perhaps the most interesting of the workshop – and in fact one of them, a performance study of a cosmology application on Amazon from Lawrence Berkeley National Lab won the best paper award. The slides and papers can be viewed online – still a great reference to see what’s happening in cloud computing for science.  Some of the papers will also appear in the Scientific Programming Journal’s special issue on science-driven cloud computing.

This year’s Science Cloud workshop will be again collocated with HPDC and solicit papers addressing similar issues. If you are using clouds for science and have a thing or two to say about it this may be a good opportunity.

Posted by in General

Will the mountain come to the Mohammad or Mohammad go to the mountain?

When we consider whether clouds can provide a suitable platform for high performance computing (HPC) we always talk about how cloud computing needs to evolve to suit the needs of HPC – in other words will the mountain come to Mohammad. But there are signs that there may also be movement in the other direction – transforming HPC so that it may work better in the cloud paradigm. Mohammad may have to go.

Discussions around this issue typically focus on performance: how the existing cloud hardware and software has to change. But those are not the only issues. I recently listened to a talk given by a colleague from the Joint Lab for Petascale Computing, Franck Cappello, who considered an often overlooked aspect of HPC – fault management. As it turns out, the way fault tolerance for HPC applications is handled is dramatically different from other applications and can have enormous influence over both its performance and the cost.

HPC applications are typically single program multiple data (SPMD) — tightly-coupled codes executing in lockstep and running on thousands or hundreds of thousands of processors. The assumption is that if just one node in the whole computation fails, the whole computation has to be repeated. To make such failures less catastrophic – potentially throwing out many weeks’ worth of computation — we use global restart based on checkpointing – application state is periodically saved and when the failure occurs the application is restarted from the last checkpoint data. How often do we checkpoint? The answer to this depends most on a quality called mean time between failures (MTBF) – if your checkpointing interval is greater than MTBF you’d have to be lucky for your computation to make much progress.  As the architectures evolved to support computations running on increasingly more nodes the probability of failure of at least one of those nodes during the computation started increasing, thus pushing MTBF down. To compensate, MTBF became an increasingly important factor in the design of both HPC hardware and the software that executes on it.

Before we go on let’s pause and reflect when have we last even heard of an MTBF of a cloud? Or MTBF of a virtual cluster deployed on that cloud for that matter? Likely never, because so far these systems tend to support applications that are more loosely coupled where the failure of one component does not affect all.

But here is the issue: global restart is expensive. You spend a lot of time saving state and occasionally you also have to read it and redo part of your calculation.  This affects both the overall time of your computation (when your code finishes in practice) and the cost of that computation. In fact, Franck and his colleagues estimate that global restart can range from 20% of the total HPC computation cost to as much as 50% in extreme cases  – and will of course go up as the MTBF goes down. In other words, if MTBF of a virtual cluster is low — as it is likely to be — HPC on a cloud will not only drag down the execution time but also be prohibitively expensive due to more frequent need for restarts. These factors combined could easily keep HPC out of clouds no matter how good their benchmark results are.

But do we really need global restart if only one component fails? Franck and his colleagues investigated this question and found that in most cases we do not. They are now working on leveraging this finding: formulating protocols that log less data and restart fewer nodes thus significantly reducing the cost of providing fault tolerance for SPMD style applications. The MTBF of clouds, while still an important factor may not be a deal-breaker after all.

It seems that the pay-per-use model of cloud computing sent us all on a global efficiency drive. Before it emerged, optimizing qualities such as fault-tolerance and the resulting power usage and cost was largely a global concern driven by the resource owner. The individual users had little incentive to optimize the cost of their specific run. For this reason, progress happened largely on a global scope, e.g., by driving architecture evolution. Pay-per-use changes this point of view: it now becomes important to individual users to ensure that their run costs as little as possible. It is therefore likely that the next wave of progress will arise out of optimizing individual runs.

It will be fascinating to watch as Mohammad and the mountain maneuver around each other during the next few years ;-).

You can find more information about this and related issues on the Joint Lab publications page.