You have already learned about the basic concepts of regular expressions in the previous chapter. This will take you beyond the classic search and replace. This time, we will introduce you to the expanded possibilities.

One feature of Regular Expressions that initially intrigued me was group indexing. For example, you receive a CSV file in which each row corresponds to a record, with semicolons separating each attribute (Listing 1).

Aspect;Alain;Frankreich;Physik;2022 Clauser;John;Vereinigte Staaten;Physik;2022 Zeilinger;Anton;Österreich;Physik;2022 Agostini;Pierre;Frankreich;Physik;2023 Krausz;Ferenc;Ungarn;Physik;2023 L'Huillier;Anne;Frankreich;Physik;2023

I prepare individual scientists from the list for further processing with the help of groups. The expression in the first line of Listing 2 serves as a regex for this. Figure 1 shows a set of results when using the Python function “findall()” to display all found locations. The regex is not very easy to read, so I fix it a bit (Listing 2, line 2).

^(.*?);(.*?);(.*?);(.*?);(.*?)$ ^(.*?);(.*?);(.*?);(Physik|Chemie|Medizin|Literatur|Wirtschaftswissenschaften|Frieden);(\d{4})$ ^(?<Name>.*?);(?<Vorname>.*?);(?<Land>.*?);(Physik|Chemie|Medizin|Literatur|Wirtschaftswissenschaften|Frieden);(\d{4})$ ^(\w+(?# Name));(\w+(?# Vorname));(\w+(?# Land));(\w+(?# Fach));(\d+(?# Jahr))$

Now it is clear what to expect in groups 4 and 5. Options like this…

