When Positive Reinforcement Isn’t

If you’ve been following us, even for just a short period of time, you’ve probably already heard us say, “Only the learner decides what’s reinforcing.” I talked a bit about this in a blog post last month in regards to high-value foods, but I wanted to expand on it and share one of the reasons we say it so frequently from a different angle. And, to do that, we need to rope in a common phrase heard in the dog training world: positive reinforcement. 

Now, because this blog is meant for pet parents, I try not to use too much terminology. But because I hear many pet parents use the term positive reinforcement themselves, I think it’s safe to talk a bit about it here. I’ll try to break it down as much as possible while still being accurate (because inaccuracies from watering down science can get us into trouble!) The simplest definition of positive reinforcement is: 

A consequence [to a behavior] that is added to the environment and maintains or increases that behavior in the future.

Positive = added (think math, not morals). Reinforcement = any consequence that makes the behavior more likely to occur in a similar context in the future

I get it; this can be a really confusing topic. That can be especially true because the way in which this phrase is used in reference to dog training or animal behavior has been pretty bastardized as it’s made its way into mainstream vernacular. The diluted definition that I most frequently see is something along the lines of, “I’m giving him treats that he likes so he does the behavior more in the future.”

But… Allie… isn’t that the exact same thing? Nope. There are two big differences between the real definition and the diluted one:

  1. Positive means added– period, end of story. It does not mean that we’re adding something that’s necessarily desirable, or that the animal likes (even though in most cases it is!). There are times where adding something undesirable increases behavior. (I know; it’s weird and gets confusing when we throw things like high-value vs. low-value treats into the mix. For now, just trust me.)
  2. The real definition relies on observation to determine whether the consequence was, indeed, reinforcing. We can only say that a consequence was reinforcing when the past consequence strengthened future behavior. The diluted definition makes an assumption that it will be reinforcing, but doesn’t follow through with the critical component of making sure that the assumption is true. 

Let’s break that down further with a few examples:

Example 1: Positive means added, whether it’s desirable or undesirable.

A dog is barking at a stranger. The dog has a history of fearful body language around strangers, and this is no exception. The stranger yells at the dog to stop barking (adding: yelling). The dog cowers further and starts barking louder. The next time the dog encounters a stranger, he barks loudly at them. (People frequently call this punishment, but it’s actually reinforcement: the behavior was strengthened in future scenarios & the barking increased in volume). 

Example 2a: The only way to determine if a consequence was reinforcing is to observe the behavior in similar context(s) after the consequence was presented.

A cat head butts you for attention. You pick him up, believing that you’re reinforcing the behavior (adding: physical contact). The cat does not head butt you in the future. (This tells us it wasn’t reinforcement, but punishment: the behavior decreased or stopped). 

Example 2b: The only way to determine if a consequence was reinforcing is to observe the behavior in similar context(s) after the consequence was presented

Millie is fearful of people reaching toward her, as evidenced by her body language. To help her feel more comfortable with hands in general, we start working on a “hand targeting” exercise (touch your nose to the hand). I lay my hand flat on the floor, when she investigates I mark with a “yes”, then place a treat a few inches in front of her (added: treat). Millie investigates my hand 3-4 times, then stops investigating. (Punishment: the behavior decreased or stopped). 

Example of hand targeting
Picture by Nick Djalila on Unsplash

Why is this distinction between definitions important?

As much as I try to avoid terminology in this blog, I see enough well-meaning people talking about “positive reinforcement” incorrectly that I decided that we needed to hash this out. When we think we’re doing a technique correctly, but in fact aren’t, then we think that that methodology doesn’t work. We get the mindset that “using positive reinforcement” doesn’t work in some scenarios or with some animals or even at all. When, in reality, we can’t actually “use” positive reinforcement. We can deliver a consequence and see if it actually fits the positive reinforcement criteria by observing future behavior. 

We need a descriptive observation, not a prescriptive assumption. This is why your consultant will ask you what you’ve tried to do in the past to modify a behavior and compare that to what’s happening now. They’re figuring out if that approach actually works in the way that they or you assumed it would. In the Millie example, I assumed that she would continue investigating my hand because I gave her a treat for doing so. But Millie’s behavior said that my assumption was wrong. She stopped investigating my hand. (For those curious, we tweaked how we were delivering the food so instead of placing the treat in front of her we tossed the treat behind her and that fixed it. It was then truly positive reinforcement [added: treat; reinforcement: she more frequently investigated the hand when it was presented to her].)

This is what we mean by “only the learner decides what’s reinforcing.” We need to observe their behavior, determine whether the behavior is being strengthened, and only then can we decide if the exercise we’re doing is reinforcing. We don’t get to make an assumption. We don’t get to decide that it’s going to be. Only the learner can tell us that it is or isn’t through their future behavior. 

Now what?

  • Think about a behavior that you’ve been working on for a while where there hasn’t been much long-term progress (even baby steps of progress). Is your assumption matching your observation? 
  • If you’ve been working on something for a while without any progress, it’s time to tweak! Your trainer or consultant can help you figure out what component is not working as assumed and how to troubleshoot. 
  • Need professional help with troubleshooting? That’s literally our job. Email us at [email protected]

Happy training!

Allie

P.S. Since we’re talking about consequences, I feel obligated to put in this post that a consequence needs to happen within 3 seconds of a behavior to be associated with it (technically up to 5, but it’s much less effective after 3) and the consequence is the first thing that happens after a behavior. This is why scolding a dog who potties in the house while you’re gone isn’t effective. Way too much time has elapsed and a lot of other things have happened in between. We’ll delve more into this in a future post.