total descendants::10 total children::3 16 ❤️ |
https://cacm.acm.org/news/how-nasa-built-artemis-iis-fault-tolerant-computer/ . . . To ensure those wrong answers never reach the spacecraft’s thrusters, NASA moved beyond the triple redundancy of traditional systems. Orion utilizes two Vehicle Management Computers, each containing two Flight Control Modules, for a total of four FCMs. But the redundancy goes even deeper: each FCM consists of a self-checking pair of processors. Effectively, eight CPUs run the flight software in parallel. The engineering philosophy hinges on a “fail-silent” design. The self-checking pairs ensure that if a CPU performs an erroneous calculation due to a radiation event, the error is detected immediately and the system responds. “A faulty computer will fail silent, rather than transmit the ‘wrong answer,’” Uitenbroek explained. This approach simplifies the complex task of the triplex “voting” mechanism that compares results. Instead of comparing three answers to find a majority, the system uses a priority-ordered source selection algorithm among healthy channels that haven’t failed-silent. It picks the output from the first available FCM in the priority list; if that module has gone silent due to a fault, it moves to the second, third, or fourth. This level of redundancy is specifically scaled for the rigors of deep space. NASA anticipates transient failures during the Artemis II mission’s transit through the high-radiation Van Allen Belts. “We can lose three FCMs in 22 seconds and still ride through safely on the last FCM,” said Uitenbroek. A silenced FCM doesn’t become dead weight, however; the system is designed to reset, re-synchronize its state with the operating modules, and re-join the group mid-flight. . . . |
| |||||||||||||||||||||||