Last updated July 15, 2008

 

Grids Promise to Move Beyond Analytics

March 29, 2004 - Pete Johnson has seen grids come a long way over the past few years. They've been used for space exploration and to map the human genome. Biotech companies use them for designing drugs, for running experiments with massive amounts of data and for handling databases that organize complex images, not just sets of data points.

Johnson sits on IBM's external computing board, the part of IBM that developed Blue Gene, a DNA-processing supercomputer.

But the grids he's most interested in are those running Wall Street applications, because Johnson is also SVP for strategic technology at Pittsburgh-based Mellon Financial Corp. Grids are large groups of small computers tied together by special software that lets them run compute-intensive applications. Grids work by chopping up applications into small pieces and giving each piece to a different computer to run.
The technology is an inexpensive alternative to supercomputers, and has long been popular with universities and UFO hunters. But as the software to set up grids becomes more developed, they have started to be used for mission-critical Wall Street applications as well.

For example, portfolio analysis is well suited to grid computing, since it can easily be divided into small, independent problems that can be solved by individual computers.

Some firms have also been using grids to optimize highly complex derivatives that need to be traded on a close to real-time basis, said Johnson.

"Companies like Morgan Stanley and others are known for this type of trading and they leverage grid computing," he said. "The challenge here isn't so much vast quantities of historical data, but whatever it is they're analyzing, they have to do it quickly."

Mellon itself is exploring using grids in asset management. Mellon has between $650 billion and $700 billion in assets under management, and has quite a bit of expertise in quantitative analysis, said Johnson.

Mellon has been doing some laboratory testing of grids recently, he said. "If we were able to apply massive amounts of horsepower, would it be able to yield better results?" he asked. "Would we be able to do more modeling or have longer time series or cover a broader set of securities? These are the questions we're asking ourselves."

Johnson is also keeping an eye on the software that's becoming available for grid computing. "The major database vendors are extending their products to take advantage of computer grids," he said. "Oracle is doing this, and that's going to enable database performance that will hopefully be much, much better than you can get today."

This is of particular interest at Mellon, where most of the actual implementations of the transaction applications in its back-office database are in Oracle.

"As Oracle begins to embrace this technology, is this a bigger and faster way of processing massive amounts of transactions?" he asked.

In fact, Oracle recently addressed this very question with a test of its 10-gigabit database, running on a group of 16 Hewlett-Packard Intel servers running Linux. The system was able to set a new world record of more than 1 million transactions per minute, the company said.

Oracle Database 10g, introduced last fall, was specifically designed for grid computing.

"Grid computing represents a significant new technology direction for the IT industry," said Charles Rozwat, Oracle's EVP of server technologies.

IDC analyst Carl Olofson called it a great stride forward, in particular noting the product's self-managing features.

Grids are also likely to have other impacts on software vendors, he added. The issues of software licenses will either force vendors to come up with new pricing models or prompt more customers to open-source software, he said.

Already, open-source software is used for much of grid computing, because it was originally developed to be cheap and flexible--a major feature of the Linux operating system. If the open-source community continues to stay ahead of proprietary vendors, then it could have an impact on the development of grid technology, Johnson said.

To the extent that grids are built around Linux and other open-source software, licensing is not an issue. If users combine their own, in-house analytical software and open-source tools, then they don't have to worry about the licensing fees racking up when they distribute the applications to hundreds or thousands of machines.

But Johnson added that some proprietary software vendors have already begun to address the licensing problem.

"Oracle, for instance, has come up with a licensing scheme where you can have a large array running on virtual machines," he said. "Oracle is definitely seeing this as a real opportunity for them to combat the growth of Microsoft's SQL Server."

PC or Not PC?
The computers that make up the grid could be dedicated, inexpensive Linux servers stacked up on blade racks. Grids can also be made up of individual employee desktops. Even with the most high-powered user, a typical desktop's central processing unit (CPU) spends most of its time idle.

One company that has done both is London-based Abbey National Plc.

"One of the divisions we have supports the market risk associated with derivatives trading, and the computation associated with that risk is highly intensive," said John Hasson, Abbey's relationship director for central services. "The nature of our trading and our desire to run more and more complex risk models meant that we were consuming a lot of processor time."

Abbey set up a test using a grid based on DataSynapse management software on five PCs in a laboratory, and immediately saw a reduction in run time. The test was extended to 10 PCs, then 20, and then the system was put into production.

At first, the application ran on dedicated computers in a separate room.

"I can't just put grid computing out on every single desktop in the organization because, potentially, I could have a catastrophic event and lose it," he said.

Setting aside dedicated machines didn't maximize the possibilities of grid, he admitted. "Every processing cycle that goes by unconsumed is a processing cycle lost to me," he said. "But what I was doing was reducing the operational risk associated with support, which was becoming onerous to me."

Later, another application was implemented on the grid-for pricing-that spilled out beyond the base set of computers to employee desktops, to run in the background.

"The reason for having a tightly defined number of computers [on the grid] is that I need to be in a place where I know I can get work done," he said. "There are some tasks that I want to get done, and I can't rely on spare capacity to get it done. But I've got other tasks where I have reasonable expectations that there is enough spare capacity."

That's where the grid was extended to include other PCs available in the firms.

"The benefit to me was that I could extract idle CPU time without having any adverse impact on the users of the computers concerned," he said. "They effectively wouldn't know that I was using it. And it meant that I didn't have to continue to buy processing power for my computer room; I could use the processing power that currently exists. And I could move into different applications."

For example, the grid was used to take a pricing model that takes 10 minutes to run on a single PC, and reduce the time to between 12 and 20 seconds over the grid.

"The means that there's far more responsiveness from the trader," Hasson said. "It was far easier to get to a decision point on price."

Now that Abbey's proven the grid concept when it comes to computation-intensive applications, the company is thinking of moving into high-volume transaction processing, Hasson said. And the company will continue to use grids for more compute-centric applications, such as actuarial applications in the life business, running large-scale econometric models.

The risk modeling and pricing applications have both used the same grid, but the actuarial application will run on a different grid because it's in a different geographical location, he said. Transaction processing will also be on a completely new grid, specially architected to handle transactions, Hasson said, adding that the evaluation process will begin in the next month, with something rolled out into production in the third quarter.

DataSynapse is getting ready for customers that want to branch out from analytics.

"We've announced a grid server that handles multiple types of applications, compute type, and data types," said Frank Cicio, chief marketing and strategy officer at DataSynapse.

One key area is transactional optimization, he added. "We're now getting down to not just the multi-hour--from 20 hours to 20 minutes--but we're also taking computations that are being done in a minute or less and getting it done in sub-second time," he said. "You'll find that embedded in transactional applications."

DataSynapse customers like Wachovia have already begun using grids in the area of retail transaction processing, he said. Other applications include business intelligence, anti-money laundering and compliance.

"We're currently working with Veritas to automate compliance issues for a top-three investment bank," he said. "Certain information is required, especially from the risk side, so that they can reduce their risk exposure. To have that kind of work done within the time frame that's required requires an extraordinary amount of computing resources."

Grids and Databases
Germany's Landesbank Baden-Wurttemberg originally decided to deploy grids to save money when running risk management Monte Carlo simulations.

"If you have an equity or interest rates, you look in the future and you simulate if the interest rate goes up or the interest rate goes down," said Peter Oellers, the bank's head of portfolio management. "You run such scenarios a million times and, for each scenario, you reevaluate your portfolio. If the portfolio is very big, and you need 1 million valuations, then it takes a lot of time to do each portfolio for each scenario."

Using grid tools from Platform Computing, Oellers set up a 20-computer grid using normal Windows PCs, in combination with applications that the bank developed itself.

Now the bank can measure risk several times during the day, instead of once at the end of the day, as before.

The grid is also used to generate cash flows to calculate, say, the income from future bond payments or even retail loans. That application requires a database, whereas the Monte Carlo simulations run essentially unmodified on the grid.

So the bank built a distributed database system, Oellers said. It was a tricky problem.

"If you have 100 notes in the grid, and every note is writing to the database table, you have 100 computers writing massively in the database and there could be problems with the database to get all the figures in in a short time," he said.

To solve this problem, Oellers said, he built a database cluster, with two database servers working in parallel, using the Microsoft SQL Server database product. After distributing the database, it was able to keep up with the distributed application, he said.

 

Maria Trombly can be reached at 011-86-21-6387-7243 or by email at maria@trombly.com