On the Shallow Horizontality of Large Language Models

Joel Spolsky made an interesting post more than a decade ago about how Trello, and Excel, are horizontal products. Horizontal in the sense that not only they will serve different industries, but also different use cases.

Excel is a good enough tool to do accounting, project management, financial forecast, data visualization, … The list is long.

LLMs are very similar in that sense, first they blow your mind because you can use natural language with them, second they seem to have relevancy across many domains and usage.

But when Excel can be used for a project management system, it lacks the polish of some dedicated tool: the full feature coverage is not there, it’s not good looking and fragile. This binds Excel to be both the software equivalent to a swiss knife and a poor man’s tool.

I have the feeling that LLMs suffer this similar fate. Besides short text translations and general concept explanations on which they have impressive performances the rest feels shallow; in the sense, that nothing seems to reach a satisfactory / productive depth of expertise and reliability past the initial “wow effect”.

You can use them for almost anything and they will fall short on almost everything. Akin to swiss army knives, you seldom use them as they are rarely the best tool for the job.

They are great to support you at basic level on almost any topic you can think of: medicine, math, history, economics… You can build prototypes for many use cases: virtual friend, code analysis, information retrieval, creative writing, customer support… And the prototypes are compelling, the flaws are easy to ignore because we are still in awe of the atmosphere of natural interactions, and because those are prototypes (so we hope to fix those flaws later).

But each fix is superficial, the band-aid is here but water is still coming in the boat:

LLMs are slow and space consuming? Quantization (rounding of weights) helps a lot but also drops previous alignment efforts
Your model lacks knowledge? Make your personal GPTs! I have tried that many times, I have never seen this approach working.
LORAs enables fine-tuning but is very restrictive.

You can pull the blanket on any side you want, when it’s too short it’s too short: LLMs are shallow and most patches around it are hacks.

Those are not fixes or solutions, those are hacks. A lot of good things come out of engineering hacks, but it is often the sign that some limits have been reached.

To be fair, most neural networks forward leaps have been hacks: momentum, quantization, the most successful of them all RLHF, … maybe this shallowness is how the current frontier.

On the Shallow Horizontality of Large Language Models

Would you like to hear more from me?

Fräntz Miccoli