AI assistants

(Photo: Sam Rentmeester)

Mathematics occupies a unique position amongst academic disciplines: mathematicians can prove themselves right by demonstrating that they are right. But every advantage has a disadvantage: a theorem is only a theorem if you have a completely rigorous proof, and that sets the bar high. If your proof is incorrect – and mathematicians do sometimes overlook things – this will unavoidably be exposed by attentive colleagues. Your only option is to honourably acknowledge that you were wrong.

Occasionally, proofs are so complicated that hardly anyone can oversee all the details. How do you know if such a proof is correct? For want of anything better, until recently the mathematical community accepted it if several leading mathematicians stated that they believed everything was sound – proof by authority, so to speak. This changed with the emergence of formal proof checking. With the help of special programming languages designed for this purpose, such as Lean, it is possible to formalise proofs (which are usually written in informal English) and then completely reduce them to the axioms upon which mathematics is based.

Well-known examples of theorems that have been checked in this way are the prime number theorem (the number of prime numbers below a given number N scales as N/ln(N)); the four colour problem (every map can be coloured using four colours so that that countries with a common border have different colours); and, the Kepler conjecture (the stacking of oranges at the market stall is the most efficient). They may be seemingly simple assertions, at least for a mathematician, but all three are very difficult to prove. A nice quiz question: which of the three is the most difficult?

A nice quiz question: which of the three is the most difficult?

Formal proof checking is still the exclusive domain of a small group of aficionados, but with the emergence of generative AI this is beginning to change: you can try to generate the code for a formal proof check by using large language models (LLMs) such as ChatGPT. This is merely one of the many fascinating state-of-the-art applications of generative AI.

Many scientists now use commercial LLMs as personal assistants in their research. I too have built up some experience with this: it feels like collaborating with a PhD student who does sometimes make mistakes, but knows an incredible amount and is remarkably fast. As a lecturer, you can also benefit from LLMs. They can, for example, generate model solutions to problems in no time. Many students use an LLM as a personal teaching assistant – you can ask for hints and have your answers checked and commented on.

Fascinating possibilities loom, but also numerous ethical questions. May we collaborate with an LLM in research? May we delegate the writing of an article introduction to an LLM? May we deploy LLMs in writing grant proposals?

I suspect that many, including the undersigned, see few problems with the first scenario, provided it’s carefully applied – it won’t be long before LLMs are simply seen as a tool. The second question already begins to chafe, and with the third we are clearly on a slippery slope. Similar questions arise in education, where it will be difficult, for example, to draw a clear line with fraud.

Students have embraced AI en masse. In my scientific environment I see a number of enthusiastic early adopters, many people with a wait-and-see attitude or cautious interest, and the occasional person who is decidedly sceptical. The wider debate has yet to begin.

Jan van Neerven is Antoni van Leeuwenhoek Professor of Mathematics at the Delft Institute of Applied Mathematics (EEMCS), where he leads the Analysis section. He is the author of several books in his field and has received both Vidi and Vici grants from the Dutch Research Council (NWO). He is a member of the Royal Netherlands Academy of Arts and Sciences and President of the Royal Dutch Mathematical Society.

AI assistants

Read more

Luxury

Knowing everything

Interests