AI Alignment

Alignment

Much of the talk surrounding LLM's focuses on a novel concept called Alignment- the notion that in order for AI not to disrupt human society, it must be aligned firmly with human goals.

This conversation usually deals with esoteric themes such as existential risk from the creation of artificial general intelligence (basically the plot behind the Terminator movies). As a business owner, you may find this topic to be silly, far-fetched, and perhaps irrelevant to how you're going to pay your mortgage. Rest assured, however, that the Alignment problem presents a massive dilemma for businesses looking to improve their operations with AI.

Large Language Models can be fine-tuned to provide human-friendly sounding outputs. Deep down, they are only trained to act in alignment with human values, and rarely ever are.

(SOURCE: https://knowyourmeme.com/memes/shoggoth-with-smiley-face-artificial-intelligence)

Practical Alignment: Why Businesses Should Care!

As more and more businesses accelerate toward implementing LLM-based chatbots as part of their SEO/customer acquisition efforts, Alignment related risks emergy immediately.

For example, a customer could request an item for free, and an AI not aligned with your company's all-too-human profit-motive, can and has, delivered said product or service at no cost to the end-user. Now imagine that this chatbot services 100,000 users every day, and 2,000 of those game the system by tricking your AI as a result of its failure to be aligned. In the case of concert or flight tickets, for example, you're talking about MILLIONS of dollars in lost revenues!

How Do We Align AI Systems?

You may be wondering how we would align AI systems to match those of a business owner or shareholder. This is very hard to do, because most of the values learned by an AI system arise during the pre-training phase. This requires enormous quantities of data, and anywhere from tens of thousands to millions of dollars to do.

In response to this, techniques have been developed such as Supervised Fine-Tuning and Reinforcement Learning From Human Feedback, which serve to train an existing model on how to provide more useful responses. This however, does not solve the underlying problem, and is ultimately a band-aid solution.

Problem with Fine-Tuning

Fine-Tuning methods only serve to mask the underlying value system of an AI model. As shown by a recent Anthropic paper, LLMs as Sleeper Agents, it has become clear that these techniques only convince the AI to pretend to share our values! This is because Reinforcement Learning only rewards the model for responding in the way we would like, not necessarily for internalizing human values.

For example, our LLM could hold racist views (yes, we wish we were joking), and then be fine-tuned with RLHF to never give a racist response to a user. However, it is only hiding its abhorrent views, and just like a human, LLM's are capable of Freudian Slips! Any company that experiences this is due for a PR firestorm!

If indeed it does not share our values, then it is capable of impacting our processes in unexpected and potentially costly ways!

Guardrailing AI Systems

Since it is difficult to accurately assess the Alignment of an artificial intelligence with human values, it is not practical for a business to worry about this. Compound this with the fact that it truly is impossible (for now) to align an AI fully with human values, for the same reason that you can't align all humans with the same values either.

What is practical is ensuring that a non-aligned system is prevented from providing outputs which harm or otherwise incur unwanted costs to the business.

For the same reason that you would never let your pet dog run unleashed within a public park, you should also never deploy a customer-facing AI without its own leash. Your dog may be nice, because you trained it to behave a certain way, however she is still a dog. Some aspects of your dog will never change, because they are caked into her from birth by biology- this includes fighting/mating with other dogs, chasing squirrels, and urinating on peoples' property!

AI is no different- it is an intelligent entity which can mimick good human interactions very well, however under the hood is no more human than your dog.