April 21, 2024

New horizons of science

This term is currently as promising in science as the hype surrounding ChatGPT around the world. The world-famous natural language processing application that is used millions of times every day is also a basic model. But the Helmholtz community is about more than just a text Q&A machine. Fundamental models aim to connect large amounts of data and recognize connections or patterns so that central questions in science can suddenly be solved.

To achieve this, the basic model is trained in several stages. First, it is fed tons of well-prepared scientific data. At first there was no specific mission. The fact is that the system independently acquires a very strong knowledge base (foundation). It can then be trained on the target tasks, called flow tasks, in the next stage with relatively little effort. This is what science seeks. “With regard to plankton research, we hope to use the knowledge gained to better understand nutrient and carbon dynamics in the oceans and thus be able to improve our climate models,” says Kaenmuller.

The potential basic model for plankton data — not yet developed, but work on it has already begun — will be fed by several billion images that the four Helmholtz Centers continuously produce in their research projects. Several hundred thousand of these images have been characterized and classified by scientists. This is called annotated or named data.

For basic training, for example, parts of the image are cut out from some original images. The model learns to fill in the missing pieces. If the system is able to accomplish this task after countless iterations, it will be trained on different tasks, such as identifying different types of plankton and highlighting them visually, distinguishing them from other species and organisms, or recognizing the amount of carbon present in organisms. Flake or how quickly sea snow falls to land.

In order to achieve this target task, the trained baseline model is supplemented, for example, with another layer containing only a few parameters. Unlike basic training, the training input comes primarily from annotated data. The step-by-step model output is then compared with the labeled raw data and refined until the result and original input are identical, at best.

Foundation models have enormous potential in all areas of research. In medicine, some systems have already been established in daily medical practice. Experts from the Center for Medical Image Computing at University College London and the NIH London Biomedical Research Center at Moorfields Eye Hospital NHS Foundation Trust trained the model. Retfund Based on 1.6 million retinal images. He is now able to diagnose diseases that appear in the eye but whose symptoms occur elsewhere in the body, such as the risk of heart failure or myocardial infarction.

Helmholtz also wants to develop such models: from science to science, says Florian Grotsch, senior speaker in the field of information and data science. Helmholtz has everything you need to ensure that developments in the field of artificial intelligence are not left to science and research to commercial players alone: ​​data, the specialist knowledge of researchers, expertise in the field of artificial intelligence, sufficient computing power and the expertise of our computer experts: inside.