NASA accelerates science with gen AI-powered search
When you generate and collect as much data as the US National Aeronautics and Space Administration (NASA) does, finding just the right data set for a research project can be a problem.
With seven operating centers, nine research facilities, and more than 18,000 staff, the agency continually generates an overwhelming amount of data, which it stores in more than 30 science data repositories across five topical areas — astrophysics, heliophysics, biological science, physical science, earth science, and planetary science. Overall, the agency houses more than 88,000 datasets and 715,000 documents across 128 data sources. Its earth science data alone is expected to hit 250 petabytes by 2025. In light of such complexity, scientists need more than just domain expertise to navigate through it all.
“It requires researchers to know which repository to go to and what that repository has,” says Kaylin Bugbee, NASA data scientist at Marshall Space Flight Center in Huntsville, Ala. “You have to be both science literate and data literate.”
In 2019, NASA’s Science Mission Directorate (SMD) released a report based on a series of interviews with scientists that made it clear those scientists needed a centralized search capability to help them find the data they needed. The SMD’s mission is to engage with the US science community, sponsor scientific research, and use aircraft, balloon, and spaceflight programs for investigations in Earth orbit, in the Solar System, and beyond. Recognizing that giving scientists and researchers access to its data was fundamental to its purpose, SMD developed its Open Source Science Initiative (OSSI) as a result of that report in an effort to make publicly funded scientific research transparent, inclusive, accessible, and reproducible. The mission of the OSSI: a commitment to the open sharing of software, data, and knowledge (including algorithms, papers, documents, and ancillary information) as early as possible in the scientific process.
“It really came from the scientists and scientific community, and it also aligns with our broader SMD priority of enabling interdisciplinary science,” Bugbee says. “That’s where new discoveries are made.”
To facilitate that mission, the agency is now turning to a combination of neural nets and generative AI to put those vast amounts of data at scientists’ fingertips.