"Unified Dog Theory I: Understanding Pattern Recognition."
Is Learning Shaped By Consequences or By Detecting Environmental Changes?
Originally published in slightly different form on September 16, 2010 at PsychologyToday.com.
I’ve spent a good deal of time here emphasizing the differences between the two most common forms of pet dog training—the pack leader and behavioral science models—and contrasting them with the approach I use, which is more closely allied with the way working dogs are trained (primarily through stimulating and then satisfying their prey drive). With that in mind, I’m now proposing a “unified theory of dog training,” which will hopefully show how all three models are related, why some methods work better in some training situations than in others, and why each model sometimes fails.
First stop on our journey, understanding pattern recognition.
There’s a tendency among +R trainers to believe that their method is based on “the science of how animals learn,” when in fact, there are still many gaps in our knowledge about how learning actually takes place. For instance the idea that dogs learn through making associations between a behavior (“I sit”) and its consequences (“I get a treat”) may not actually be the case; there’s a growing body of clinical research, particularly in the area of neuroscience, which strongly suggests that the learning process may be the result of a very different set of rules than what we’ve previously been taught.
Dr. Ian Dunbar,  one of the main figureheads in the +R movement wrote on his blog recently that, “The first gift that we can give to all animal owners, parents and teachers is to simplify the ridiculously ambiguous and unnecessarily complicated and confusing [behavioral science] terminology. Second, let’s simplify the underlying theory by going back to Thorndike’s original premise—that behavior is influenced by [its] consequences.”
As I’ve pointed out before, this idea of how pleasant or unpleasant outcomes shape behavior can be traced directly back to Freud’s “pleasure principle”—we tend to be attracted to things that increase pleasure (or decrease internal tension), and that we tend to avoid things that do the opposite. However, new research suggests that both Dunbar and I may be wrong, that behavior is not learned via its consequences.
I think one of the biggest misunderstandings about positive reinforcement is the idea that animals learn new behaviors primarily because a neurotransmitter called dopamine creates a feeling of well-being in connection with an external reward, and that even the anticipation of a reward releases dopamine.
Here’s what WikiPedia has to say: “Dopamine is commonly associated with the reward system of the brain, providing feelings of enjoyment and reinforcement to motivate a person to perform certain activities.”
That sounds about right, doesn’t it?
Yes, but here’s the problem. In testing this idea directly on the brains of certain animals (mainly rats, mice, and monkeys), some researchers have found an interesting set of anomalies. For instance, in his paper “Dopamine and Reward: Comment on Hernandez et al. (2006),” Neuroscientist Randy Gallistel of Rutgers writes, “In the monkey, dopamine neurons do not fire in response to an expected reward, only in response to an unexpected or uncertain one, and, most distressingly of all, to the omission of an expected one.”
So missing out on a reward is pleasurable? How could that be?
Feedback or Feedforward?
In another article, “Deconstructing the Law of Effect,” Gallistel poses the problem of learning from an information theory perspective, contrasting Edward Thorndike’s model, which operates as a feedback system, and a feedforward model based on Claude Shannon’s information theory.
It’s well-known that shaping animal behavior via operant or classical conditioning requires a certain amount of time and repetition. But in the feed-forward model learning can take place instantly, in real time.
Why the difference? And is it important?
I think so. Which is more adaptive, being able to learn a new behavior on the fly, in the heat of the moment, or waiting for more and more repetitions of the exact same experience to set a new behavior in place?
In Thorndike’s model, the main focus is on targeting which events in a stream of events seem to create changes in behavior. But according to information theory, the intervals between events, when nothing is happening, also carry information, sometimes even more than is carried during the unconditioned stimulus. This would explain why the monkey’s brains were producing dopamine when they detected a big change in the pattern of reward, i.e., no reward at all!
Changes in One’s Environment
We’re now discovering that the real purpose of dopamine is to help motivate us to gather new information about the outside world quickly and efficiently. In fact dopamine is released during negative experiences as well as positive ones. (The puppy who gets his nose scratched by the cat doesn’t need further lessons to reinforce the “no-chasing-the-cat” rule; he learns that instantaneously, with a single swipe of the cat’s paw.)
This adds further importance to the idea that learning is not as much about pairing behaviors with their consequences as it is about paying close attention to salient changes in our environment: the bigger the changes, the more dopamine is released, and, therefore, the deeper the learning.
Randy Gallistel again: “...behavior is not the result of a learning process that selects behaviors on the basis of their consequences ... both the appearance of ‘conditioned’ responses and their relative strengths may depend simply on perceived patterns of reward without regard to the behavior that produced those rewards.” (“The Rat Approximates an Ideal Detector of Changes in Rates of Reward: Implications for the Law of Effect,” Journal of Experimental Psychology: 2001, 27, 354-372.)
Temple Grandin, the subject of a recent award-winning HBO film starring Clare Danes, always provides us with keen insights into animal behavior, and more particularly, their thought processes. I think she hits the nail on the head when she says that animal minds are geared toward perceiving vivid sensory details about their environments while the human brain tends to gather these details into conceptual chunks. In general terms: the animal mind is, in most cases, a difference detector, while the human mind is a similarity detector. (Dogs seem to fall somewhere in between.)
So if learning takes place through recognizing changes in the environment—an instantaneous process that releases dopamine—and not through the slow, random, trial-and-error recognition of connections between behaviors and their consequences—which sometimes does and sometimes doesn’t release dopamine—this would indicate that while Ian Dunbar’s model of learning may have flaws, perhaps so does mine! 
See? This is why we need a “unified dog theory!” 
1.) While I was putting the finishing touches on this article I learned that Cesar Millan has invited Ian Dunbar to contribute a chapter to his next book, and Dunbar agreed. So there may already be some movement toward a “unified dog theory” taking place, having nothing to do with me.
2.) However, using my model of tension and release, any reduction of tension or stress would hypothetically end up acting as a double reward.
3) Any unified dog theory has to include the one model of dog training that, more than any other, relies on teaching behaviors through pattern recognition, and that’s the model used to train working dogs: sheepdogs, cattle dogs, detection dogs, police dogs, etc. This may be one reason these are among the best-trained dogs on the planet.