when you’re a collection of organic molecules named carl sagan
You may or may not have heard about Eugene Goostman, the computer program created by Vladimir Veselov, that has been lauded as achieving an important milestone in artificial intelligence by passing the Turing Test. This has been met with controversy; some say other chat bots have passed the test already, others think it’s not a big deal, and then there are those who say that this doesn’t even qualify as a pass at all. This stuff really interests me so I spent some time looking into it a bit… . .
Let me back up a bit and provide some super BRIEF reference for those who didn’t take Comp Sci 101 or read Neal Stephenson or William Gibson or otherwise don’t know about Alan Turing and the Turing Test (just in case). Alan Turing (1912 – 1954) was a mathematician widely considered as the Father of Artificial Intelligence and Computer Science. From him, we get the Turing Test, where Turing proposed that a computer could be said to “think” if a human interrogator could not tell it apart, through conversation, from a human being. This was proposed in 1950. http://eprints.soton.ac.uk/262954/
Onwards to the test…..
Here are some specifics for the testing Eugene Goostman was involved in that took place at the University of Reading (from their site):
- Simultaneous tests as specified by Alan Turing
- Each judge was involved in five parallel tests - so 10 conversations
- 30 judges took part
- In total 300 conversations
- In each five minutes a judge was communicating with both a human and a machine
- Each of the five machines took part in 30 tests
- To ensure accuracy of results, Test was independently adjudicated by Professor John Barnden, University of Birmingham, formerly head of British AI Society
The Eugene Goostman program itself was developed as a 13 year old Ukranian boy for whom English was a second language.
The qualification for passing the test would be if the program is mistaken for a human more than 30% of the time during a series of five minute keyboard conversations. (Although Turing himself didn’t actually specify the 30% criteria nor the 5 minute time limit).
Eugene convinced 33% of the judges.
Here’s a snippet of a conversation from a 2012 competition (Eugene’s been in development since 2001):
[16:11:25] Judge: what music do you like?
[16:11:32] Eugene: I like Eminem – because he doesn’t have a thing to do with “polite-correctness”.
[16:11:46] Judge: OMG i love eminem, he’s my favourite artist!
[16:12:00] Eugene: I wonder what Eminem’s nick mean. I associate it only with a word “enema”. But he is a cool guy, I like his “The Real Slim Shady”.
[16:12:55] Judge: i know what his nick name means, his real name is Marshall Mathers. From the initials M&M phynetically sounding like Eminem.
[16:13:06] Eugene: Eminem is a famous crapper. No, raper… ripper… Yeah, I mean - rapper. I like his “Stan” and “The Real Slim Shady”.
I think the first place where the controversy starts is simply with the Turing Test itself. When Turing proposed it, computers were thought of differently than we do today. PEOPLE were the computers, when tasked with adding, subtracting, multiplying, dividing. The thought that machines could ever THINK like a human being at all was utterly foreign and nonsensical to the majority of people. Turing was in essence proposing a thought experiment.
This in turn makes the test itself rather arbitrary. There’s a difference between true cognition and simulation of human conversation aided by voluminous vocabulary variations and scripts. Some might interpret the test to be an indicator of sentience when it is really an inappropriate test for that. It is still certainly a stepping stone, though.
In addition, it’s pretty easy to game the requirements while still technically fulfilling them. Choosing the persona of 13 year old Ukranian boy for whom English was a second language obviously lowers the barriers to passing. Who your judges are matters as well. The sex chatbot Jenny18 (a modified version of ELIZA) managed to fool a whole lot of chat room users. With sexy results :P .
Even so, this event isn’t something I would discount because of the formal test environment and that the test is basically showing how gullible humans can be in a purely text situation. Chat bots that are convincing enough to steal people’s sensitive information are possible right now, whereas Skynet, not so much.
The iconic Turing Test itself though, seems outdated or at least, insufficient, to test what it is I think people in general really are thinking of when hearing about an event like this. Sentience. And of course that strikes at the root of something we’ve been striving to understand about ourselves for just about as long as we’ve existed. And step by step, we’ve made some progress in unraveling our own secrets. But as our understanding of human thought processes changes, so too, should the ways we test them.
"At first we say that 'if a computer could play chess that it would think like us', and then we get a computer to play chess and we say 'well that's really not thinking', and the answer is that we don’t really know what thinking is. I would argue that machines do a pretty good job right now.. at thinking.
And they don’t do as good a job at creating, although we don’t really know what creating is. And they don’t do a very good job at having a soul, but we don’t really know what a soul is.
But when we can define it, computers do a pretty good job doing it.”
-Joseph M. Rosen