Meta leaps into the supercomputer game with its AI Research SuperCluster

There’s a global competition to build the biggest, most powerful computers on the planet, and Meta (AKA Facebook) is about to jump into the melee with the “AI Research SuperCluster,” or RSC. Once fully operational, it may well sit in the top 10 fastest supercomputers in the world, which it will use for the massive number crunching needed for language and computer vision modeling.

Large AI models, of which OpenAI’s GPT-3 is probably the best known, don’t get put together on laptops and desktops; they’re the final product of weeks and months of sustained calculations by high-performance computing systems that dwarf even the most cutting-edge gaming rig. And the faster you can complete the training process for a model, the faster you can test it and produce a new and better one. When training times are measured in months, that really matters.

RSC is up and running and the company’s researchers are already putting it to work… with user-generated data, it must be said, though Meta was careful to say that it is encrypted until training time and the whole facility is isolated from the wider internet.

The team that put RSC together is rightly proud at having pulled this off almost entirely remotely — supercomputers are surprisingly physical constructions, with base considerations like heat, cabling and interconnect affecting performance and design. Exabytes of storage sound big enough digitally, but they actually need to exist somewhere too, on site and accessible at a microsecond’s notice. (Pure Storage is also proud of the setup they put together for this.)

RSC is currently 760 Nvidia DGX A100 systems with a total 6,080 GPUs, which Meta claims should put it approximately in competition with Perlmutter at Lawrence Berkeley National Lab. That’s the fifth most powerful supercomputer in operation right now, according to longtime ranking site Top 500. (No. 1 is Fugaku in Japan by a long shot, in case you’re wondering.)

That could change as the company continues building out the system. Ultimately they plan for it to be about three times more powerful, which would in theory put it in the running for third place.

There’s arguably a caveat in there. Systems like second-place Summit at Lawrence Livermore National Lab are employed for research purposes, where precision is at a premium. If you’re simulating the molecules in a region of the Earth’s atmosphere at unprecedented detail levels, you need to take every calculation out to a whole lot of decimal points. And that means those calculations are more computationally expensive.

Meta explained that AI applications don’t require a similar degree of precision, since the results don’t hinge on that thousandth of a percent — inference operations end up producing things like “90% certainty this is a cat,” and if that number were 89% or 91% wouldn’t make a big difference. The difficulty is more about achieving 90% certainty for a million objects or phrases rather than a hundred.

It’s an oversimplification, but the result is that RSC, running TensorFloat-32 math mode, can get more FLOP/s (floating point operations per second) per core than other, more precision-oriented systems. In this case it’s up to 1,895,000 teraFLOP/s, or 1.9 exaFLOP/s, more than 4x Fugaku’s. Does that matter? And if so, to whom? If anyone, it might matter to the Top 500 folks, so I’ve asked if they have any input on it. But it doesn’t change the fact that RSC will be among the fastest computers in the world, perhaps the fastest to be operated by a private company for its own purposes.

Original source:


Amazon Pulls the Plug on Kids’ Interactive Gadget Glow

Launched during the pandemic, the gadget was designed to help kids keep in touch with loved ones they couldn’t be with. Original source:

Read More

India plans to speed up testing and safety approvals for electronic devices

India is planning to speed up testing and safety approvals of electronic devices, including smartphones and earbuds, to reduce their time to market. Currently, it can take as long as 20 weeks to for consumer electronics to pass safety tests — but the new approvals could bring that down to as little as three days. […]

Read More

JBL’s new earbuds have a case with touchscreen, because we don’t have enough displays

Tired: JBL announced the Tour PRO 2 earbuds with active noise cancellation (ANC). Wired: The case has a touchscreen. That’s right. In the off-chance you didn’t have enough screens in your life, the new earbuds’ case adds another 1.4-inch LED touch display. Users can flick through multiple screens like different watch faces that allow them […]

Read More