Shinycroak

Posts

Showing posts from 2026

Large Language Models and Software Quality Assurance

March 01, 2026

One of the comparisons I've seen in defense using LLM technology to generate code, is that we after all use compilers to generate code from high level languages. But compilers are by design predictable and if they are not predictable that is an error; LLMs are stochastic in nature and there is so far no reliable way to tell when they are in error without careful examination of their output. Currently no one knows how to make LLMs predictable. There is a second problem: If your coder never tells you, "You're wrong, that doesn't work" or, "I don't think that's what you want" you will never find errors in specification; that is as true with a human coder as an LLM. There seems no way to bound the errors of LLM-generated code. I'm not even sure how to measure the errors; testing does not, cannot, do this. As Dijkstra famously obseved, "Testing shows the presence, not the absence, of bugs." The problem shifts from "Is your code corr...

"AI" and Financial Software

February 23, 2026

Today, in a blog post , Anthropic published a claim that “Tools like Claude Code can automate the exploration and analysis phases that consume most of the effort in COBOL modernization.” They claim that Claude can be used to analyze and document existing accounting software, so that it can be more quickly modernized. Has anyone else had the disorienting experience of reading an “AI” summary and realizing that it is wrong, wrong, wrong, but very neatly worded? Why does anyone think the summaries and documentation Claude generates will be any better? And there will be no way to check them…

The Errors of "AI"

February 22, 2026

We know that "AI" will create superficially valid natural language texts in authoritative persuasive language. They are right a fair amount of the time, which persuades us they are reliable, but often they are wrong and sometimes people are endangered by the errors. What does it do with programming language texts? I would expect "AI" to create programs that appear to be right most of the time and sometimes just fail. A superficial examination of the generated code will not catch the problem any more than a superficial reading of the natural language texts uncovers their errors. I feel queasy. It is certainly going to happen that an "AI" generated program will have a subtle bug that does a lot of harm, and that even will occur without malicious intent upon the part of the users of the technology, let alone deliberate malice upon the part of the owners. No one should trust a system managed by Sam Altman to produce honest answers, or take safety into account...

"AI" and Productivity

January 19, 2026

[I keep bringing these up on Bluesky, so I think it’s time to gather them up and make a post out of them.] This is a collection of articles on the problems of “AI.” “AI”—really, various sorts of generative machine learning models—including generative large language models (gLLMs) and generative stable diffusion models (gSDMs) so far do not live up to the promises of their marketers. A computer you can talk to is one of the great dreams of computing, and the initial releases of transformer-model based chatbots seemed to live up to this. There have been long-standing qualms about this idea, most notably Dijkstra’s argument that the imprecision of natural language was an impediment to correct thinking about computation and to accurate computing. 1 Unfortunately, so far it appears that Dijkstra was correct; gLLMs and gSDMs are notorious for errors and they are not currently designed to indicate uncertainty to their users so that people confidently rely on their erroneous output. There ar...