Skip to main content

This CalypsoAI thought piece digs into the ‘why now’ question behind the AI explosion.

Researchers, academics, and philosophers have been trying to endow machines with human-like capabilities since antiquity, but most historians would trace the beginning of AI as we know it today to the Dartmouth Summer Research Project on Artificial Intelligence over the summer of 1956 (“Dartmouth workshop,” 2020). In the 60+ years since that kickoff, researchers have tried many techniques to model intelligence, with two major schools of thought emerging – with “Classical AI” as distinct in theory and practice from “Modern AI”.

Classical AI relies on machines operating autonomously within a set of logical rules. Building AI meant representing the world in a set of data structures (such as trees or lists or sets) and then using rules (such as and, or, if-then-else, and so on) to reason within that knowledge space. For example, one can create rules-based translation software by representing language as a set of words, and to then perform machine translation by translating those words from one language to another, and then reordering the words within defined rules regarding the order of nouns and verbs and adjectives. Likewise, an individual can try to solve computer vision recognition problems by first establishing the rules of what makes certain objects, themselves. For example, one can start an animal classification program by first describing cats as “four legged animals with whiskers” and then decompose that definition into a set of sub problems ( e.g. – find legs, find whiskers) and continue breaking these problems into more detailed problems (e.g. – ‘find edges’, separate foreground and background).

Although the Classical school enjoyed certain successes (in 1967 Marvin Minsky famously said that “within a generation…the problem of creating ‘artificial intelligence’ will substantially be solved.” (“History of artificial intelligence,” 2020)), eventually the efforts ran into a wall. Although rules-based systems offer an easily-understood mechanism to build autonomy, researchers soon discovered that the world is too complex and messy to be fully captured into a closed representation system. This meant that many early AI experiments succeeded in carefully-controlled environments but failed to generalize to real-world situations. This led to a number of “AI Winters”, when funding and research on autonomous systems came to a halt.

In contrast to Classical AI, Modern AI relies on letting the computer derive its own rules and logic about the world, by feeding it labeled (and sometimes unlabeled) data. Using this method, instead of attempting to describe to the computer what features a cat has, an AI developer will instead feed it thousands of pictures of cats. With a large-n amount of cats, the machine is (often but not always) able to extract the relevant features autonomously, without a human’s explicit programming. Another name for Modern AI is Machine Learning or “ML”, which is also the term we will be using throughout this report.

As a separate field, ML started to flourish in the 1990s. The field changed its goal from achieving artificial intelligence to tackling solvable problems of a practical nature. It shifted focus away from the symbolic approaches it had inherited from AI, and toward methods and models borrowed from statistics and probability theory. But the real inflection point for ML came in 2012 with the ImageNet competition, when an Artificial Neural Network or “ANN”-based submission called AlexNet outclassed all other competitors by 10.8 %. It was a historic moment because up to that point, ANNs (computing systems vaguely inspired by the biological neural networks that constitute animal brains) were considered nothing more than a research tool. Their success in this competition changed that perception.

So what exactly was the cause of ML development and applicational use between the 1990s and 2012? Contrary to popular belief, ANNs themselves were not invented in that period. In fact they were first described by Warren McCulloch and Walter Pitts in 1943 (Palm, 1986) and research on them continued throughout the second half of the century. Instead, two major developments occurred which allowed ML development to succeed:

An increase of labeled data. By design, ML systems learn from data. Therefore, having access to abundant data to feed the algorithm is paramount in achieving meaningful results. Moreover, for most ML applications the data must be labeled (e.g. a picture of a cat has to have the appropriate “cat” label associated). In the years leading up to 2012, large amounts of labeled data were generated, powered in large part due to the growth of digital technology and the internet.

Access to more powerful and cheaper computing capabilities. Beyond data, ML algorithms need many compute cycles to “look” at thousands (and often millions) of examples before they can learn the right features. Leading up to 2012, computing power became more abundant, cheaper, while at the same time more effective implementations of algorithms arrived for GPUs (which are better suited for ML, compared to CPUs).

Put together, data, compute and ANNs started a new era in ML known as Deep Learning or “DL”.

Diagram of three nested circles. Outer circle is Artificial Intelligence. Middle circle is Machine Learning. Inner circle is Deep Learning
 DL is a subset of ML, which in turn is a subset of AI, the broadest category of techniques aimed at modeling intelligence.

The reason for the name ‘deep learning’ is that ANNs can be made many hundreds of layers “deep” (figure [2]). This property of deeply layered processes is also what gives ANNs a unique property compared to other ML algorithms – they don’t saturate when fed increased amounts of data, and instead keep learning and improving (figure [X]). With the labeled data explosion and increased compute power reference prior, ANNs have become a critical AI/ML methodology.

Deep learning’s primary advantage over other ML methods is its constant unsatiety with more training data.

A graphical representation of an Artificial Neural Network with 3 layers - an input layer, a hidden layer, and an output layer.
A graphical representation of an Artificial Neural Network with 3 layers – an input layer, a hidden layer, and an output layer.


Put together, the explosion in compute power at low cost and labeled data has created an environment ripe for AI capabilities to expand. Learn more about our AI Validation Solution VESPR Validate here.