"AI" and Productivity

[I keep bringing these up on Bluesky, so I think it’s time to gather them up and make a post out of them.]

This is a collection of articles on the problems of “AI.” “AI”—really, various sorts of generative machine learning models—including generative large language models (gLLMs) and generative stable diffusion models (gSDMs) so far do not live up to the promises of their marketers. A computer you can talk to is one of the great dreams of computing, and the initial releases of transformer-model based chatbots seemed to live up to this. There have been long-standing qualms about this idea, most notably Dijkstra’s argument that the imprecision of natural language was an impediment to correct thinking about computation and to accurate computing.1 Unfortunately, so far it appears that Dijkstra was correct; gLLMs and gSDMs are notorious for errors and they are not currently designed to indicate uncertainty to their users so that people confidently rely on their erroneous output. There are experimental gLLMs which do register uncertainty,2 but so far these have not been widely deployed.

Something that Dijkstra did not foresee was the implications of the human tendency to accept glib confidence as a sign of knowledge and intelligence. Unfortunately, as author Karawynn Long observed, Language Is a Poor Heuristic for Intelligence and verbally-oriented managers, executives, and lawyers trust the glib confidence of gLLM-generated language as a sign of intelligence on the part of gLLMs. As a result, the technology is being widely deployed without testing or study of impact on users.

There is an additional problem of quasi-religious belief in the technology; the scaling hypothesis3 argues that a sufficient large gLLM will eventually spontaneously achieve sapience. This doesn’t appear to be the case, but under the influence of large amounts of money and drugs, they ask for ever-increasing amounts of money and drugs.

So this is a collection of articles that document some of the problems of “AI.” I’ve linked the pieces and added a summary after the links. If the summary is in quotes, it’s pulled from the article, possibly with some elisions.

Overviews

Management Studies

Large Language Models, Small Labor Market Effects
“[In Denmark] AI chatbots are now widespread—most employers encourage their use, many deploy in-house models, and training initiatives are common. Despite substantial investments, economic impacts remain minimal. AI chatbots have had no significant impact on earnings or recorded hours in any occupation.”

AI-Generated “Workslop” Is Destroying Productivity
“Despite a surge in generative AI use across workplaces, most companies are seeing little measurable ROI. One possible reason is because AI tools are being used to produce ‘workslop’—content that appears polished but lacks real substance, offloading cognitive labor onto coworkers. Research from BetterUp Labs and Stanford found that 41% of workers have encountered such AI-generated output, costing nearly two hours of rework per instance and creating downstream productivity, trust, and collaboration issues.”

Education

Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task
“While LLMs offer immediate convenience, our findings highlight potential cognitive costs. Over four months, LLM users consistently underperformed at neural, linguistic, and behavioral levels. These results raise concerns about the long-term educational implications of LLM reliance and underscore the need for deeper inquiry into AI’s role in learning.”

They Have Their Doubts.
“‘What I found was that everything that ChatGPT returned about my piece was incorrect,’ Perry said. ‘The composer was right, but the composition date and other facts about the piece were either half-truths or not accurate at all. So in my summary, I was like, This was not useful, and in fact, it wasted my time.’ And yet, Perry finds herself holding her own against an institute that’s ready to promote the use of A.I. everywhere, no matter the field.”

Specific Disciplines

Next, are some articles that cover the use of various sorts of generative machine learning technologies in different disciplines.

Software Development

After months of coding with LLMs, I’m going back to using my brain.
“LLMs are okay at coding, but at scale they build jumbled messes. I’ve scaled back my use of AI when coding and gone back to using my brain and pen and paper.”

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity.
“Surprisingly, we find that when developers use AI tools, they take 19% longer than without—AI makes them slower.”

Where’s the Shovelware? Why AI Coding Claims Don’t Add Up.
“I was an early adopter of AI coding and a fan until maybe two months ago, when I read the [study cited above]. In that study, the authors discovered that developers were unreliable narrators of their own productivity. So, I started testing my own productivity using a modified methodology from that study. It turns out it doesn’t just not work for me, it doesn’t work for anyone.”

Physics

I got fooled by AI-for-science hype—here’s what it taught me.
“Nick used to be optimistic that AI could accelerate physics research. But when he tried to apply AI techniques to real physics problems the results were disappointing.”

Materials Science

Whoa Now: Cautionary Tales from Materials Science.
Derek Lowe’s summary of a fraudulent paper, followed by a discussion of the actual state of the art, which is poor.

Biomedicine

The End of Disease.
Derek Lowe: “We don’t have enough pieces on the table to solve this puzzle. We don’t even have enough in most of these areas to know quite what kind of puzzle we’re even working on. AI/ML can be really, really good at rearranging the pieces we do have, in the limited little areas where we have some ground-truth knowledge about the real-world effects when you do that. But it will not just start filling in all those blank spots.”

FDA’s AI tool for medical devices struggles with simple tasks.
“The agency announced Monday that an AI tool, called Elsa, had been rolled out to all FDA employees. Sources say it also has issues.”

Conclusions

“AI” is error-prone, vastly oversold, and popular. There is a cotton-candy quality to the technology as it currently exists; sweet, but nothing to bite into. One issue that Derek Lowe touches on that is worth emphasizing; as a matter of marketing, many machine learning (ML) technologies are being rebranded as “AI,” which both casts undeserved doubt on the ML technologies and steals the credibility of reliable ML technology to prop up unreliable “AI.”

Many of the issues of “AI” have to do with its effects on human cognition; we are accustomed to treating computer output as reliable, and this is reinforced by design of these systems; current gLLMs sound authoritative, even as they blither, and many people in positions of authority trust that language, even after observing the gLLM produce egregious errors. I have in this piece discussed productivity, and how “AI” provides an illusion of improved productivity while actually impeding work, but there are other more dangerous illusions, reports of people driven to madness, reports of gLLMs assisting in suicides. There are not, so far, enough articles published on this for me to write about, but this may ultimately prove to be the greater issue.


  1. On the Foolishness of Natural Language Programming ↩︎

  2. OpenAI Has a Fix For Hallucinations, But You Really Won’t Like It ↩︎

  3. first proposed in science fiction, though not under that name ↩︎

Comments

Popular posts from this blog

Nuclear Fusion, "AI," and Big Science

Why ChatGPT?

The General Intelligence of Robots