This site may earn chapter commissions from the links on this page. Terms of employ.

Yeah, yep — of course a calculator won at a math competition. That'south not the point. This story, which concerns a rather amazing program called GeoS from the Allen Institute for Artificial Intelligence (AI2), is most the ability of AI to usefully appoint with the world. To a computer, with a brain literally structured for these sorts of operations, the math SAT is not a test on adding, just reading comprehension. That'southward why this story is and then interesting: GeoS isn't as skilful equally the average American at geometry, it's equally good equally the average American at the SAT itself.

Specifically, this AI program was able to score 49% accurateness on official SAT geometry questions, and 61% in do questions. The 49% figure is basically identical to the boilerplate for real human test-takers. The program was not given digitized or particularly labeled versions of the test, but looked at the exact aforementioned question layout as real students. It read the writing. Information technology interpreted the diagrams. It figured out what the question was asking, and then it solved the problem. It only got the answer well-nigh one-half the time — which makes it roughly equally fallible as a human being.

SAT AI 2Of course, GeoS makes errors for different reasons than loftier-schoolers. A human beingness might correctly interpret the question, and then apply the wrong formula, or muck up the calculation. GeoS, beingness a reckoner, will nearly ever go the correct answer so long as it truly understands the question. It might non be able to read a word correctly, or the grammar of a question might exist also conflicting for the calculator to parse. Regardless, what we're really measuring here is the computer'southward power to understand human communication in a class that's deliberately (pardon the pun) obtuse.

To do this, the researchers had to nail together a whole assortment of different software technologies. GeoS uses optical character recognition (OCR) algorithms to read the text, and custom language processing to endeavor to understand what it reads. Geometry questions are structured to be difficult to parse, hiding important data as inferences and implications.

sat ai 3The other side of the money is that though geometry questions are dense and hard to tease apart, they're as well extremely uniform in structure and subject matter. The AI's programmers can plan for the strict design principles that go into writing the questions. It couldn't accept this same programming and straight use information technology to calculus bug for case, because they utilize somewhat different language and mathematical symbols to describe the trouble. Just a good GeometryBot would also be relatively easy to adapt to those few distinguishing rules. Each successive new area of competence would make the side by side 1 easier to acquire.

One intriguing implication of this research is that anytime, we might have algorithms quality-checking Saturday questions. Nosotros could take different AI programs intended to achieve different levels of success on average questions, perhaps fifty-fifty for different reasons. Run proposed new questions through them, and their relative performance could not only weed out bad questions for point to the source of the problem. BadAtReadingAI and BadAtLogicAI did as expected on the question, but BadAtDiagramsAI did terribly — maybe the drawing simply needs to be a little clearer.

This isn't a sign of the coming AI-pocalypse, or at to the lowest degree non a particularly firsthand sign; as dense as geometry questions might be, they're homogeneous and nowhere near as circuitous every bit something like conversational oral communication. Just this written report shows how the private tools available to AI researchers can be assembled to create rather total-featured artificial intelligences. When things volition really take off is when those same researchers showtime snapping together those amalgamations into something far more versatile and total-featured — something not entirely unlike a existent biological heed.