Hitting the off-switch problem

No, no, I’m not hitting the off switch, although you could be forgiven for thinking so given my frequency of blogs. This is in response to a recent lecture by Stuart Russell I attended at the 2019 DX Expo.

In this interesting talk, one of the topics was the off-switch problem, described on Wikipedia and no doubt in his latest book. This problem can be summarised as follows:

“A robot with a fixed objective has an incentive to disable it’s own off-switch.”

This is about who/what has control. Are humans able to turn it off if the objective does not align with ours?

The theory goes that you give the robot a positive incentive to turn itself off in situations where it determines the outcome of it’s actions are uncertain.

I have two problems with this theory.

The first is that a physical, acting in the real world “robot” is equated with the AI. This can be misleading. Robots and AI are two different concepts. It’s true of course that some or most robots will run AI s/w but it’s not true that all AI needs direct control of a physical actor to achieve it’s goals. That can be done by manipulation of data and “human engineering”. The physical presence is almost irrelevant. The problem we are trying to solve is one of control. And there’s no off switch for the internet.

That leads me to the second objection. Implementing an algorithm which means the robot turns itself off is just moving the probleml from the physical switch to the controlling algorithm. It is assumed we control the algorithm and the code. The real danger in losing control of AI is when AI s/w becomes intelligent enough to write itself. All the theory does is move the problem. Arguably to a more difficult space to solve. At best, probabilistic programming is a short term solution which only lasts as long as we control the code.