Large Language Models and Software Quality Assurance
One of the comparisons I've seen in defense using LLM technology to generate code, is that we after all use compilers to generate code from high level languages. But compilers are by design predictable and if they are not predictable that is an error; LLMs are stochastic in nature and there is so far no reliable way to tell when they are in error without careful examination of their output. Currently no one knows how to make LLMs predictable. There is a second problem: If your coder never tells you, "You're wrong, that doesn't work" or, "I don't think that's what you want" you will never find errors in specification; that is as true with a human coder as an LLM. There seems no way to bound the errors of LLM-generated code. I'm not even sure how to measure the errors; testing does not, cannot, do this. As Dijkstra famously obseved, "Testing shows the presence, not the absence, of bugs." The problem shifts from "Is your code corr...