MobiusHorizons 11 hours ago

> El Capitan, housed at Lawrence Livermore National Laboratory in Livermore, Calif., can perform over 2700 quadrillion operations per second at its peak. The previous record holder, Frontier, could do just over 2000 quadrillion peak operations per second.

> El Capitan uses AMD’s MI300a chip, dubbed an accelerated processing unit, which combines a CPU and GPU in one package. In total, the system boasts 44,544 MI300As, connected together by HPE’s Slingshot interconnects.

Seems like a nice win for AMD.

  • alephnerd 11 hours ago

    > Seems like a nice win for AMD

    Yep! They've been part of the Exascale project for a long time, and it's good to see their commitment on HPC actually succeeded unlike Intel's during the same time period.

teleforce 8 hours ago

Fun facts, FFT was discovered back in 1965 based on the urgent necessity of discovering and detecting illegal nuke testing activities, just two years after the Partial Test Ban Treaty (PTBT) was signed in 1963 [1].

The first sentence statement in the article mentioning that United States and other nuclear powers committed to the Comprehensive Nuclear-Test-Ban Treaty in 1965 is wrong since the treaty was only signed in 1996 not in 1965 [2].

[1] The Algorithm That Almost Stopped The Development Of Nuclear Weapons:

https://www.iflscience.com/the-algorithm-that-almost-stopped...

[2] The Comprehensive Nuclear-Test-Ban Treaty:

https://www.ctbto.org/our-mission/the-treaty

  • sliken 7 hours ago

    Another fun fact, the priority for detecting nuclear testing led to seisometers all over the planet. So the detection the exact position and nature of any disturbance on the planet became radically better. This was quite the boon to anyone interested in earthquakes, not only can the earth quake be detected in 2D, but accurately in 3D. The number and accuracy is enough you can see where on each fault is, the thickness of the crust, and the outline of subduction zones in 3d. Pretty crazy to see enough detail to see where plates enter the mantle and melts.

    Said sensors can also track sonic booms from secret supersonic planes, but governments don't like to talk about that.

    • johnisgood 6 hours ago

      Cool factoids! If you have any more to share, please do so.

cryptozeus 11 hours ago

This is great but I absolutely love that poster of el capitan on the supercomputer racks ! Also TIL there is a list of top500 at https://www.top500.org/lists/top500/2024/11/

  • qingcharles 8 hours ago

    I've always loved these charts. The Numerical Wind Tunnel, #1 in 1993, achieved 124.2 gigaflops on the Linpack benchmark.

    In comparison, the iPhone 15 Pro Max cellphone, released in 2023, delivers approximately 2150 gigaflops.

    I once drew the chart backwards. I think my PC in 2013 would have been the fastest on Earth in 1990. And faster than every computer combined in about 1982.[0]

    [0] might not be accurate

  • theideaofcoffee 11 hours ago

    That's a pretty standard Cray feature for systems larger than a few cabinets. El Capitan has the landscape, Hopper at NERSC had a photo of Grace Hopper, Aurora at ANL has a creamy gradient reminiscent of the Borealis, and on and on. Gives them a bit of character beyond the bad-ass Cray label on the doors.

balia 11 hours ago

Some may not want to hear this, but these “fastest supercomputer” list is now meaningless because all the Chinese labs have started obfuscating their progress.

A while ago there were a few labs in China in top 10 and they all attracted sanctions / bad attention. Now no Chinese lab report any data now

  • cameron_b 9 hours ago

    They are in good company, with X, Meta, Microsoft and others not reporting theirs either.

    The basis for the ranking was a cumulative tracking of benchmark results that were required as part of commissioning bespoke computers. A contract would be written to buy a computer that could achieve a certain performance in operations per second, and in order to satisfy that the benchmarks were agreed to and codified in the contracts. Government contracts are to a certain extent public information so the goals and clout of successive performance were tracked in this way.

    If you don’t need to satisfy a government contract, or don’t need the clout to attract engineers or funding, submitting results draws unwanted attention to what you’re cooking up.

    • sliken 8 hours ago

      Microsoft has the #4 cluster on the top 500 list. Sure not everyone reports, still seems like a useful list to watch the trends in computing and in particular HPC.

      Keep in mind the average hyperscalers cloud is not a particularly good setup for the top500. HPC tends towards more bandwidth, lower latency, and no virtualization.

  • pknomad 11 hours ago

    I wouldn't say meaningless... just incomplete.

  • leptons 10 hours ago

    I doubt the US Government is telling everyone about their fastest computer.

    • buildbot 8 hours ago

      Unless it's like, air gapped powered by a naval nuclear reactor, I feel like someone would question why a random US gov building is drawing 20-30MW of power, and exhausting most of that as heat...

      • ethbr1 6 hours ago

        Aside from the utility, who would know? There's a lot of land out there, relative to the size of even a very large building.

      • remram 6 hours ago

        Not disagreeing with you but how could any building draw power without radiating most of it as heat?

    • grapesodaaaaa 8 hours ago

      The DOE has entered the chat.

      (after the nuclear test ban treaty, they run a LOT of simulations)

      • buildbot 8 hours ago

        Isn't that the open secret for El Cap? "Classified workloads" aka weapons sims.

        • sliken 8 hours ago

          Not a secret, from IEEE:

          The NNSA—which oversees Lawrence Livermore as well as Los Alamos National Laboratory and Sandia National Laboratories—plans to use El Capitan to “model and predict nuclear weapon performance, aging effects, and safety,”

olao99 11 hours ago

I fail to understand how these nuclear bomb simulations require so much compute power.

Are they trying to model every single atom?

Is this a case where the physicists in charge get away with programming the most inefficient models possible and then the administration simply replies "oh I guess we'll need a bigger supercomputer"

  • p_l 11 hours ago

    It literally requires simulating each subatomic particle, individually. The increases of compute power have been used for twin goals of reducing simulation time (letting you run more simulations) and to increase the size and resolution.

    The alternative is to literally build and detonate a bomb to get empirical data on given design, which might have problems with replicability (important when applying the results to rest of the stockpile) or how exact the data is.

    And remember that there is more than one user of every supercomputer deployed at such labs, whether it be multiple "paying" jobs like research simulations, smaller jobs run to educate, test, and optimize before running full scale work, etc.

    AFAIK for considerable amount of time, supercomputers run more than one job at a time, too.

    • Jabbles 10 hours ago

      > It literally requires simulating each subatomic particle, individually.

      Citation needed.

      1 gram of Uranium 235 contains 2e21 atoms, which would take 15 minutes for this supercomputer to count.

      "nuclear bomb simulations" do not need to simulate every atom.

      I speculate that there will be some simulations at the subatomic scale, and they will be used to inform other simulations of larger quantities at lower resolutions.

      https://www.wolframalpha.com/input?i=atoms+in+1+gram+of+uran...

      • p_l 10 hours ago

        Subatomic scale is the perfect option, but we tend to not have time for that, so we sample and average and do other things. At least that's the situation within aerospace's hunger for CFD, I figure nuclear has similar approaches.

        • Jabbles 10 hours ago

          I would like a citation for anyone in aerospace using (or even realistically proposing) subatomic fluid dynamics.

          • p_l 10 hours ago

            Ok, that misreading is on me - in aerospace generally you care to level of molecules, and I've met many people who would love to be just able to brute force it this way. Hypersonics do however end up dealing with simulating subatomic particle behaviours (because of things like air turning into plasma)

            • Jabbles 9 hours ago

              > in aerospace generally you care to level of molecules

              I would like a citation for this.

              > Hypersonics do however end up dealing with simulating subatomic particle behaviours

              And this.

              ---

              For example, you could choose to cite "A Study on Plasma Formation on Hypersonic Vehicles using Computational Fluid Dynamics" DOI: 10.13009/EUCASS2023-492 Aerospace Europe Conference 2023 – 10ᵀᴴ EUCASS – 9ᵀᴴ CEAS

              At sub-orbital altitudes, air can be modelled as a continuous flow governed by the Navier-Stokes equations for a multicomponent gas mixture. At hypersonic speeds, however, this physical model must account for various non-equilibrium phenomena, including vibrational and electronic energy relaxation, dissociation and ionization.

              https://www.eucass.eu/doi/EUCASS2023-492.pdf

              • p_l 9 hours ago

                "I wish I could give the finger to Navier-Stokes and brute force every molecules kinematics" does not make for a paper that will get to publication if not accompanied with actually doing that at speed and scale that makes it usable, no matter how many tenured professors dream of it. So instead they just ramp up resolution whenever you give them access to more compute

                (younger generations are worse at it, because the problems that forced elder ones into more complex approaches can now be an overnight job on their laptop in ANSYS CFX)

                So unfortunately my only source on that is bitching of post-docs and professors, with and without tenure (or rather its equivalent here), at premier such institutions in Poland.

    • pkaye 11 hours ago

      Are they always designing new nuclear bombs? Why the ongoing work to simulate?

      • AlotOfReading 11 hours ago

        Multiple birds with one stone.

        * It's a jobs program to avoid the knowledge loss created by the end of the cold war. The US government poured a lot of money into recreating the institutional knowledge needed to build weapons (e.g. materials like FOGBANK) and it's preferred to maintain that knowledge by having people work on nuclear programs that aren't quite so objectionable as weapon design.

        * It helps you better understand the existing weapons stockpiles and how they're aging.

        * It's an obvious demonstration of your capabilities and funding for deterrence purposes.

        * It's political posturing to have a big supercomputer and the DoE is one of the few agencies with both the means and the motivation to do so publicly. This has supposedly been a major motivator for the Chinese supercomputers.

        There's all sorts of minor ancillary benefits that come out of these efforts too.

      • p_l 11 hours ago

        Because even normal explosives degenerate over time, and fissile material in nuclear devices is even worse about it - remember that unstable elements are ongoing constant fission events, critical mass is just one where they trigger each others' fission fast enough for runaway process.

        So in order to verify that the weapons are still useful and won't fail in random ways, you have to test them.

        Which either involves actually exploding them (banned by various treaties that have enough weight that even USA doesn't break them), or numerical simulations.

      • colonCapitalDee 10 hours ago

        Basically yes, we are always designing new nuclear bombs. This isn't done to increase yield, we've actually been moving towards lower yield nuclear bombs ever since the mid Cold War. In the 60s the US deployed the B41 bomb with a maximum yield of 25 megatons, making it the most powerful bomb ever deployed by the US. When the B41 was retired in the late 70s, the most powerful bomb in the US arsenal was the B53 with a yield of 9 megatons. The B53 was retired in 2011, leaving the B83 as the most powerful bomb in the US arsenal with a yield of only 1.2 megatons.

        There are two kinds of targeting that can be employed in a nuclear war: counterforce and countervalue. Counterforce is targeting enemy military installations, and especially enemy nuclear installations. Countervalue is targeting civilian targets like cities and infrastructure. In an all out nuclear war counterforce targets are saturated with nuclear weapons, with each target receiving multiple strikes to hedge against the risks of weapon failure, weapon interception, and general target survival due to being in a fortified underground positions. Any weapons that are not needed for counterforce saturation strike countervalue targets. It turns out that having a yield greater than a megaton is basically just overkill for both counterforce and countervalue. If you're striking an underground military target (like a missile silo) protected by air defenses, your odds of destroying that target are higher if you use three one megaton yield weapons than if you use a single 20 megaton yield weapon. If you're striking a countervalue target, the devastation caused by a single nuclear detonation will be catastrophic enough to make optimizing for maximum damage pointless.

        Thus, weapons designers started to optimize for things other than yield. Safety is a big one, an American nuclear weapon going off on US soil would have far reaching political effects and would likely cause the president to resign. Weapons must fail safely when the bomber carrying them bursts into flames on the tarmac, or when the rail carrying the bomb breaks unexpectedly. They must be resilient against both operator error and malicious sabotage. Oh, and none of these safety considerations are allowed to get in the way of the weapon detonating when it is supposed to. This is really hard to get right!

        Another consideration is cost. Nuclear weapons are expensive to make, so a design that can get a high yield out of a small amount of fissile material is preferred. Maintenance, and the cost of maintenance, is also relevant. Will the weapon still work in 30 years, and how much money is required to ensure that?

        The final consideration is flexibility and effectiveness. Using a megaton yield weapon on the battlefield to destroy enemy troop concentrations is not a viable tactic because your own troops would likely get caught in the strike. But lower yield weapons suitable for battlefield use (often referred to as tactical nuclear weapons) aren't useful for striking counterforce targets like missile silos. Thus, modern weapon designs are variable yield. The B83 mentioned above can be configured to detonate with a yield in the low kilotons, or up to 1.2 megatons. Thus a single B83 weapon in the US arsenal can cover multiple continencies, making it cheaper and more effective than maintaining a larger arsenal of single yield weapons. This is in addition to special purpose weapons designed to penetrate underground bunkers or destroy satellites via EMP, which have their own design considerations.

        • dekhn 9 hours ago

          Great comment- I have only one thing to add. Many people will enjoy reading "Command and Control" which covers the history of nuclear weapons accidents in the US and how they were managed/mitigated. It's always interesting to learn that a missile silo can explode, popping the warhead up and out (but without it exploding due to fission/fusion), that from the perspective of the nuclear warhead, the safety controls worked.

        • SoftTalker 7 hours ago

          > Another consideration is cost. Nuclear weapons are expensive to make, so a design that can get a high yield out of a small amount of fissile material is preferred. Maintenance, and the cost of maintenance, is also relevant. Will the weapon still work in 30 years, and how much money is required to ensure that?

          I've seen speculation that Russia's (former Soviet) nuclear weapons are so old and poorly maintained that they probably wouldn't work. Not that anyone wants to find out.

        • ethbr1 6 hours ago

          Small addition: weapon precision has drastically increased since the days of the monster bombs

          Less need of 9 megatons against a hardened silo if you have a 1.2 megaton weapon with a 120m CEP.

      • dekhn 11 hours ago

        The euphemistic term used in the field is "stockpile stewardship", which is a catch-all term involving a wide range of activities, some of them forward-looking.

      • danhon 11 hours ago

        It's also to check that the ones they have will still work, now that there are test bans.

  • sliken 7 hours ago

    Well there's a fair bit of chemistry related to the explosions to bring the sub-critical bits together. Time scales are in the nanosecond range. Then as the subcritical bits get closer obviously the nuclear effects start to dominate. Things like berrylium are used to reflect and intensive the chain reaction. All of that is basically just a starter for the fusion reaction. That often involved uranium, lithium deturide, and more plutonium.

    So it involves very small time scales, chemistry, fission, fusion, creating and channeling plasmas, high neutron fluxes, extremely high pressures, and of course the exponential release of amazing amounts of energy as matter is literally converted to energy and temperatures exceeding those in the sun.

    Then add to all of that is the reality of aging. Explosives can degrade, the structure can weaken (age and radiation), radioactive materials have half lives, etc. What should the replacement rate be? What kind of maintenance would lengthen the useful lives of the weapons? What fraction of the arsenal should work at any given time? How will vibration during delivery impact the above?

    Seems like plenty to keep a supercomputer busy.

    • ethbr1 6 hours ago

      I'd never considered this, but do the high temperatures impose additional computational requirements on the chemical portions?

      I'd assume computing atomic behavior at 0K is a lot simpler than at 800,000,000K, over the same time step. ;)

  • JumpCrisscross 11 hours ago

    > Are they trying to model every single atom?

    Given all nuclear physics happens inside atoms, I'd hope they're being more precise.

    Note that a frontier of fusion physics is characterising plasma flows. So even at the atom-by-atom level, we're nowhere close to a solved problem.

    • amelius 11 hours ago

      Or maybe it suffices to model the whole thing as a gas. It all depends on what they're trying to compute.

      • JumpCrisscross 11 hours ago

        > maybe it suffices to model the whole thing as a gas

        What are you basing this on? Plasmas don't flow like gases even absent a magnetic field. They're self interacting, even in supersonic modes. This is like saying you can just model gases like liquids when trying to describe a plane--they're different states of matter.

  • GemesAS 6 hours ago

    Modern weapon codes couple computationally heavy physics like radiation & neutron transport, hydrodynamics, plasma, and chemical physics. While a 1-D or 2-D simulation might not be too heavy in compute often large ensembles of simulations are done for UQ or sensitivity analysis in design work.

  • rcxdude 9 hours ago

    >Are they trying to model every single atom?

    Modelling a single nucleus, even one much lighter weight than uranium, is a captital-H Hard Problem involving many subject matter experts and a lot of optimisation work far beyond 'just throw it on a GPU'. Quantum systems get non-tractable without very clever approximations and a lot of compute very quickly, and quantum chromodynamics is by far the worst at this. Look up lattice QCD for a relevant keyword.

  • piombisallow 8 hours ago

    These usually get split into nodes and scientists can access some nodes at a time. The whole thing isn't working on a single problem.

  • CapitalistCartr 11 hours ago

    It's because of the way the weapons are designed, which requires a CNWDI clearance to know, so your curiosity is not likely to be sated.

  • TeMPOraL 11 hours ago

    Pot, meet kettle? It's usually the industry that's leading with "write inefficient code, hardware is cheaper than dev time" approach. If anything, I'd expect a long-running physics research project to have well-optimized code. After all, that's where all the optimized math routines come from.

    • glial 3 hours ago

      I bet the bulk of it is still super-fast Fortran code.

  • alephnerd 11 hours ago

    > I fail to understand how these nuclear bomb simulations require so much compute power

    I wrote a previous HN comment explaining this:

    Tl;dr - Monte Carlo Simulations are hard and the NPT prevents live testing similar to Bikini Atoll or Semipalatinsk-21

    https://news.ycombinator.com/item?id=39515697

  • bongodongobob 11 hours ago

    My brother in Christ, it's a supercomputer. What an odd question.

tuananh 4 hours ago

This is a major milestone for Oxide Computer team. Congrats

  • sargun an hour ago

    Is El Capitan entirely made of Oxide components?

declan_roberts 11 hours ago

Do super computers need proximity to other compute nodes in order to perform this kind of computations?

I wonder what would happen if Apple offered people something like iCloud+ in exchange for using their idle M4 compute at night time for a distributed super computer.

  • theideaofcoffee 11 hours ago

    The thing that sets these machines apart from something that you could set up in AWS (to some degree), or in a distributed sense like you're suggesting is the interconnect, how the compute nodes communicate. For a large system like El Capitan, you're paying a large chunk of the cost in connecting the nodes together, low latency, interesting topologies that ethernet, nor even Infiniband can get close to. Code that requires a lot of DMA or message passing really will take up all of the bandwidth that's available, that becomes the primary bottleneck in these systems.

    The interconnect has been Cray's bread and butter for multiple decades: Slingshot, Dragonfly, Aries, Gemini, SeaStar, numalink via sgi, etc. and those for the less massively parallel systems before those.

    • sliken 7 hours ago

      I've seen nothing showing that slingshot has any particular advantage over IB for HPC. Sure HPE pushes slingshot (an HPE interconnect) over giving bags of money to Nvidia, but that's a business decisions. Eagle (the #4 cluster on the list) is Infiniband NDR.

      I believe 306 of the top 500 clusters used Infiniband. Pretty sure the advance topologies like dragonfly are supported on IB as well as Slingshot. From what I can tell slingshot is much like ultra ethernet, trying to take the best of IB and ethernet and making a new standard. From what I can tell slingshot 11 latency is much like I got with omnipath/pathscale way back when dual core opterons were the cutting edge.

  • philipkglass 11 hours ago

    Yes, supercomputers need low-latency communication between nodes. If a problem is "embarrassingly parallel" (like folding@home, mentioned by sibling comment) then you can use loosely coordinated nodes. Those sorts of problems usually don't run on supercomputers in the first place, since there are cheaper ways to solve them.

einpoklum 11 hours ago

So, they built this supercomputer to test new and more deadly nuclear weapons. That makes me so "happy". I am absolutely not worried about two nuclear powers being close to the brink of direct war, even as we speak; nor about the abandonment of the course of nuclear disarmament treaty; nor about the repeated talk of a coming war against certain Asian powers. Everything is great and I'll just fawn over the colorful livery and the petaflops figure.

  • JumpCrisscross 11 hours ago

    > they built this supercomputer to test new and more deadly nuclear weapons

    If you are afraid of nuclear war, the thing to fear is a nuclear state's capacity to retaliate being questioned. These supercomputers are the alternative to live tests. Taking them away doesn't poof nuclear weapons, it means you are left with a half-assed deterrent or must resume live tests.

    > the abandonment of the course of nuclear disarmament treaty

    North Korea, the American interventions in the Middle East and Ukraine set the precedent that nuclear sovereignty is in a separate category from the treaty-enforced kind. Non-proliferation won't be made or broken on the back of aging, degrading weapons.

    > repeated talk of a coming war against certain Asian powers

    One invites war by refusing to prepare for it.

  • rbanffy 11 hours ago

    The whole point of testing (and making) deadly nuclear weapons is to ensure they are never used again. The Mutually Assured Destruction doctrine has kept us alive through the darkest pf the Cold War (also keeping the Cold War cold). In order to credibly threaten anyone who tries to annihilate you with certain annihilation is with lots of such doomsday weapons. We have lived in this Mexican standoff for longer than we remember.

    • postalrat 11 hours ago

      Are are living in the darkest days of the cold war right now.

  • shagie 11 hours ago

    I would reference an older article on super computers and the nuclear weapon arsenal.

    https://www.techtarget.com/searchdatacenter/news/252468294/C...

    > "The Russians are fielding brand new nuclear weapons and bombs," said Lisa Gordon-Hagerty, undersecretary for nuclear security at the DOE. She said "a very large portion of their military is focused on their nuclear weapons complex."

    > It's the same for China, which is building new nuclear weapons, Gordon-Hagerty said, "as opposed to the United States, where we are not fielding or designing new nuclear weapons. We are actually extending the life of our current nuclear weapons systems." She made the remarks yesterday in a webcast press conference.

    > ...

    > Businesses use 3D simulation to design and test new products in high performance computing. That is not a unique capability. But nuclear weapon development, particularly when it involves maintaining older weapons, is extraordinarily complex, Goldstein said.

    > The DOE is redesigning both the warhead and nuclear delivery system, which requires researchers to simulate the interaction between the physics of the nuclear system and the engineering features of the delivery system, Goldstein said. He characterized the interaction as a new kind of problem for researchers and said 2D development doesn't go far enough. "We simply can't rely on two-dimensional simulations -- 3D is required," he said.

    > Nuclear weapons require investigation of physics and chemistry problems in a multidimensional space, Goldstein said. The work is a very complex statistical problem, and Cray's El Capitan system, which can couple this computation with machine learning, is ideally suited for it, he said.

    ---

    This isn't designing new ones. Or blowing things up ( https://www.reuters.com/article/us-usa-china-nuclear/china-m... ) to see if they still work. It is simulating them to have the confidence that they still work - and that the adversaries of the US know that the scientists are confident that they still work without having to blow things up.

    • JumpCrisscross 11 hours ago

      > to see if they still work. It is simulating them to have the confidence that they still work

      The Armageddon scenario is some nuclear states conduct stockpile stewardship, some don’t, and those who do discover that warheads come with a use-by date.

  • comboy 11 hours ago

    I'd guess it's unlikely to be the real use case. The real one is classified. Plus it's not like more deadly nuclear weapons would change anything, we can do bad enough with what we already have.

    • JumpCrisscross 11 hours ago

      > it's unlikely to be the real use case. The real one is classified.

      What are you basing this on?

      > it's not like more deadly nuclear weapons would change anything

      We haven't been chasing yield in nuclear weapons since the 60s.

      Our oldest warheads date from the 60s [1]. For obvious reasons, the experimental track record on half-century old pits is scarce. We don't know if novel physics or chemistry is going on in there, and we don't want to be the second ones to find out.

      [1] https://en.wikipedia.org/wiki/B61_nuclear_bomb

    • realo 11 hours ago

      Maybe there is research not on bigger bangs, but on smaller packages?

      Think about a baseball-size device able to take out a city block.

      Then think about an escadron of drones able to transport those baseballs to very precise city blocks...

    • alephnerd 11 hours ago

      > I'd guess it's unlikely to be the real use case

      I can safely say that nuclear simulations are one of the major drivers for HPC research globally.

      It is not the only one (genomics, simulations, fundamental research are also major drivers) but it is a fairly prominent one.

  • theideaofcoffee 11 hours ago

    I'd rather have a few supercomputers doing stockpile stewardship over being tested live. As much as I hate it personally, these weapons are a part of our society for better or for worse until we (as in the people) decide they won't be by electing those that will help dismantle the programs. They should be maintained and these tools help in that.

  • freeone3000 11 hours ago

    Eh, we have all the nukes we need and we already know how to build them. This is going to help more with fusion power than fusion explosives.

pama 11 hours ago

Noting here that 2700 quadrillion operations per second is less than the estimated sustained throughput of productive bfloat16 compute during the training of the large llama3 models, which IIRC was about 45% of 16,000 quadrillion operations per second, ie 16k H100 in parallel at about 0.45 MFU. The compute power of national labs has fallen far behind industry in recent years.

  • bryanlarsen 11 hours ago

    A 64 bit float operation is >4X as expensive as a 16 bit float operation.

    • pama 11 hours ago

      Agreed. However also note that if it was only matrix multiplies and no full transformer training, the performance of that Meta cluster would be closer to 16k PFlops/s, still much faster than the El Capitain performance measured on linpack and multiplied by 4. Other companies presumably cabled 100k H100s together, but they dont yet publish training data for their LLMs. It is good to have competition, I just didnt expect the tables to flip so dramatically over the last two decades from a time when governments still ruled the top spots in computer centers with ease to nowadays where the assumption is that there are at least ten companies with larger clusters than the most powerful governments.

      • sliken 8 hours ago

        I'd expect linpack to be much closer to a user research application than training LLMs. My understanding of LLMs is that it's more about throughput and has a very predictable communication patterns, not latency sensitive, and bandwidth intensive.

        Most parallel research, especially at this scale is more about different balance of operations to memory bandwidth, and much more worried about interconnect latency.

        I wouldn't assume that just because various corporations have large training clusters that they could dominate HPC if they wanted to. Hyperscalers have dominated throughput for many years now, but HPC is a different beast.

        • pama 7 hours ago

          All HPC and LLMs tend to get fully optimized to their hardware specs. When you train models with over 405B parameters and process about 2 million tokens per second calculating derivatives on all these parameters every few seconds, you do end up at the boundary of latency and bandwidth at all scales (from host to host, host to device, and the multiple rates within each device). Typical LLM training at these scales multiplexes three or more different types of parallelism to avoid keeping the devices idle and of course they have to also deal with redundancy and frequent failures of these erratic hardwares (if a single H100 fails once every five years, 100K of them would have more than two failures per hour.)

    • Koshkin 11 hours ago

      In terms of heat dissipation, maybe, yes. But not necessarily in time.

  • handfuloflight 11 hours ago

    Any idea how that stacks up with GPT-4?

    • pama 7 hours ago

      If I knew, I wouldn’t be able to disclose it :-)

  • alephnerd 11 hours ago

    Training an LLM (basically Transformers) is different workflow from Nuclear Simulations (basically Monte Carlo simulations)

    There are a lot of intricates, but at a high level they require different compute approaches.

    • pama 11 hours ago

      Absolutely. Though the performance of El Capitain is only measured by a linpack benchmark not the actual application.

    • Koshkin 11 hours ago

      This is about the raw compute, no matter the workflow.

    • handfuloflight 11 hours ago

      Can you expand on why the operations per second is not an apt comparison?

      • pertymcpert 11 hours ago

        When you're doing scientific simulations, you're generally a lot more sensitive to FP precision than ML training which is very, very tolerant of reduced precision. So while FP8 might be fine for transformer networks, it would likely be unacceptably inaccurate/unusable for simulations.