Why Average? Alternatives to Averaging Grades

(Part 3 of the “Why Average?” trilogy from the week of Aug 7-14. Here’s Part 1. Here’s Part 2.)

Over the past week, the topic of averaging grades has risen to the forefront of the twitter-verse.  Posts abound around the issues that professional educators have with lumping several disparate values together in the hopes of describing a student’s level of competence or understanding.  (For reminder of these posts, see Why Average?, xkcd’s TornadoGuard, David Wees’ A Problem with Averages, and Frank Noschese’s Grading and xkcd.)

Media_http2bpblogspot_jmqlf

http://kishmath421.pbworks.com/w/page/7782913/Math-Cartoons

After seeing so many (including myself) highlight the inadequacy of averaged grades, the words of our county’s assistant superintendent come to mind: “If you offer a problem, you’d better be ready to suggest a solution.”  That being said, here are a few alternatives to sole reliance on averaging student data to describe their competence, organized by the issues described in Part 2 of this “Why Average?” trilogy.

Issue 1: Averages of data that do not match intended outcomes do not suddenly describe outcome achievement.

The xkcd comic (along with the correlation to education on Frank’s blog) ties in most closely to this issue.  So often, we as educators assign points (and therefore value) to things that do not necessarily relate to outcome achievement.  Assigning grades for homework completion, timeliness- even extra credit for class supplies- and combining them with outcome achievement data introduces a high level of “grade fog”, where anyone looking at the final grade would have a high degree of difficulty in parsing out the components that led to a student’s grade.

In his article, “Zero Alternatives”, Thomas Guskey lays out the six overall purposes that most educators have for assigning grades:

  1. To communicate the achievement status of students to parents and others.
  2. To provide information students can use for self-evaluation.
  3. To select, identify, or group students for specific educational paths or programs.
  4. To provide incentives for students to learn.
  5. To evaluate the effectiveness of instructional programs.
  6. To provide evidence of a student’s lack of effort or inability to accept responsibility for inappropriate behavior.

Frank Noschese’s blog post highlights these cross-purposes: in the image paired with the xkcd comic, the student’s grade of B seems to come from averaging grades that are meant to provide motivation (“I do my homework”, “I participate in class”), responsibility (“I organize my binder”) and information on achievement (“I still don’t know anything”).

The simple answer to this issue would be to stop averaging grades for things like homework completion, class participation, and responsibility together with values for student achievement.  Instead, make grades specifically tied to meeting standards and course objectives.  Of course, if it were that easy, we would all be doing it, right?  I guess the bigger question is, How do we provide the desired motivation and accountability without tying it to a student’s grade?  Guskey’s article suggests several ideas for how one might differentiate these cross-purposes (e.g. a grade of “Incomplete” with explicit requirements for completion, separate reports for behaviors, etc).  Other alternatives from my own practice:

  • Report non-academic factors separate from a student’s grade. Character education is an important part of a student’s profile, though it does not necessarily need to be tied to the student’s academic success.  One way of separating the two would be to report the two separately.  I had a category in my gradebook specifically for these kinds of data, though the category itself had no weight relative to the overall grade.  Providing specific feedback to students (and their parents) on topics of organization and timeliness separate from achievement grades can go a long way toward getting behaviors to change.
  • Set “class goals” for homework and class participation.  Sometimes, there is no better motivator than positive “peer pressure”.  One of my bulletin boards in my classroom had a huge graph set up, labeled, “Homework completion as a function of time”.  Each day, we would take our class’ average homework completion, and put a sticker on the graph that corresponded to that day’s completion rate for the class.  We set the class goal as 85% completion every day, and drew that level as the “standard” to be met.  As a class, if we consistently met that standard over the nine-week term, there was a class reward.  One unintended consequence: each class not only held themselves to the standard, but also “competed” with other class periods for homework supremacy!  (Of course, there was that one class that made it their mission to be the worst at completing homework…goes to show that not every carrot works for every mule.)
  • Make homework completion an ‘entry ticket’ for mastery-style retests. If homework’s general purpose is to promote understanding, one would assume a correlation between homework completion and achievement.  While I ‘checked’ for homework completion on a daily basis and recorded student scores under a “Homework” category, that category had no weight in the student’s overall grade.  Instead, once the summative assessment came up, those students who did not reach the sufficient level of mastery needed to show adequate attempts on their previously assigned work before we could set a plan for their re-assessment.  You may think that students would “blow off” their homework assignments in this situation- and some did, initially.  However, once they engaged in the process, students did what was expected of them.  Over time, there was no issue with students being unmotivated to do their homework as necessary.

Issue 2: Averages of long-term data over time do not suddenly describe current state understanding.

This issue is a little trickier to manage.  On his blog Point of Inflection, Riley Lark summed up his thinking on the subject of how to best describe current state understanding with a combination of long-term data in a post entitled, Letting Go of the Past.  In the post, he compares straight averages to several other alternatives, including using maximums and the “Power Rule” (or decaying average).  I strongly suggest all those interested in this topic read Riley’s post.  Riley has since created ActivGrade, a standards-based gradebook on the web that “[makes] feedback the start of the conversation- instead of the end.”

For some other resources for ideas:

– – – – – – – – – –

At the heart of the question “Why Average?” is a push to purpose.  While none of the ideas described in this trilogy of posts are inherently right, at the very least, I hope that it has brought readers some “jumping-off points” on how to ensure that their methods match their intended purpose.  We owe at least that much to our students.  If you have other resources, ideas, or questions that would extend the conversation further, please share them by all means.

Why Average? on the Minds of Many

(Part 2 of the “Why Average?” trilogy from the week of Aug 7-14. See Part 1 here. See Part 3 here.)

So on Sunday, I posted a comic about how goofy it can be to average long-term data to describe current state measurements.  Imagine my surprise this afternoon upon checking the RSS feed to see this new comic on xkcd:

Media_httpimgsxkcdcom_gffeb

http://xkcd.com/937/

Earlier today, physics teacher and #sbar advocate Frank Noschese paired the xkcd image with an educational correlate on his Action-Reaction blog:

Media_httpfnoschesefi_qeteb

http://fnoschese.wordpress.com/2011/08/12/grading-and-xkcd/

While this comic tackles a different problem with averaging than does my own post, it seems like concerns with averaging as a description of data are on the minds of many.  (To get an idea of the scope of the discussion, check out the conversations happening in the comment boxes on posts by Frank Noschese and David Wees, respectively.)

Our comics highlight two different but very real issues with trying to describe such a complex thing as learning with such a simple thing as one averaged value:

  • When we take values that do not match intended outcomes (a student’s knowledge, understanding, and skills acquisition) and average them together, the new number does not somehow suddenly describe outcome achievement.
  • Even if we do happen to measure the outcomes described above, but those measures are taken over time and then averaged together, the new number does not somehow suddenly describe current state.

Have you seen any other visuals that help to describe these problems with averaging data?

Lesson Design using Wordle: A Pre/Post Class Assessment for Learning

I have run across many posts in the recent past explaining varied uses for Wordle in the classroom.  (See this post from the Tech Savvy Educator, and this one from Clif’s Notes for some examples that come to mind.)  While I appreciate the springboards that these many examples provide, I did notice that most posts collect many ideas together as opposed to describing the use within the context of a specific lesson design.  Below, I describe the process my students used as an assessment for whole-class learning in my physics classes, where Wordle played an integral part.  I hope that making my practice public can inspire each of you to improve on what I’ve tried- every time one of you shares how the lesson design works in your own classroom, we get a new opportunity to grow and learn from each other!

Pre-Assessment (The ‘Before’)

Before beginning our studies of magnetism, we had a quick class discussion around one question: “When you think of ‘magnetism,’ what comes to mind?”  Using a little “write-pair-share” strategy, we made a list- as they shared aloud, I collected their responses in a Word doc projected on the board.  After all three of my common preps completed this activity, we had three different classes’ “pre-assessed” knowledge around magnetism.  Copying all of that text into a Wordle, we could now find the commonalities in our ideas:

Picture_3

This cloud gives the class a picture of what ‘we’ think in relation to magnetism. As the last conversation was a “class-ending” conversation the day before, the word cloud became a “class-starting” conversation the next day.  We began class by examining this word cloud, questioning what it was that we would likely want to learn next about magnetism.

Learning Time (The ‘During’)

During this 2nd class period, several of the students who had experience in chemistry had a sneaking recollection that there was some relationship between electrons and magnetism, and became the leaders in a short class discussion around the concept of magnetic fields and magnetic domains.  At that point in the lesson design, we had our “do some stuff with magnetic fields” time.  Around the room were several demo stations related to the relationship between electricity and magnetism, where students had a central question to consider- “What Happens When I Do This?,” and “Why Do I Think It Happens?”  

Following these experiences- which led students in all sorts of WHWYDT kinds of directions (both expected and unexpected)- we came together as a class to discuss what we had seen at these stations, and what questions had developed from the experiences.  As a closing activity to the day, each student responded to a 1-question Google Form that asked the same question as their pre-assessment: “When you think of ‘magnetism,’ what comes to mind?”

Post-Assessment (The ‘After’)

The next class period, students entered class with this picture in front of them:

Learned_magnetism_wordle

By taking the student responses and pasting them into a Wordle, we were able to see what “we” now think about magnetism.  As a class, we compare this new word cloud to the first Wordle: by analyzing the similarities & differences between these two Wordles, the class is now examining what we have learned, and how our thinking has changed.  

The unintended consequence- many students noted that our new responses went farther down the path of “induced” magnetism (that is, magnetism brought on by electric current), and farther away from the more typical concept of naturally magnetic materials.  They wondered how we would connect these two ideas, as they still seemed disconnected in our thinking.  This connection just happened to be the planned topic of study for the day, not only because it was part of our original pacing guide, but specifically because now we have noticed this trend in the “data” that the Wordle had presented.  The students noticed that the dots were not connected, and the students wanted to connect them, which made the day’s learning much more authentic.  It was not just something I was supposed to teach them: it had become something that they wanted to learn.

Generalizing for Lesson Design:

While not a flawless design, these six steps seemed paramount in increasing students’ desire to learn:

  • Students pre-assessing their own knowledge and understanding – “What does _insert topic here_ mean to me?”
  • Students using Wordle to analyze the pre-assessment responses
  • Students “doing stuff” to experience _insert topic here_ in real life – “What happens when I do this?”
  • Students responding to what they now know and understand – “What does _insert topic here_ mean to me today?”
  • Students comparing the Wordle of their current thinking to that of their pre-assessment responses
  • Students asking the question, “Given what I first thought, and what I now think, what do I think of next?

Without the use of Wordle, we lose out on a central piece of this lesson design puzzle.

Have you used Wordle as a class assessment for learning with your students?  Please share ideas, questions, and suggestions in the comments.  If you decide to try out this lesson design with a topic in your class with your students, please consider sharing how it goes in the comments- learning from your experiences helps us all grow!

A Response to Data-Informed Decisions

Earlier this evening, I read a colleague’s blog post discussing the concept of making data-informed decisions as opposed to data-driven decisions.  It’s a thoughtful post, one I hope you will read in depth.

The post brought out a response in me that unearthed some Sherlock Holmes quotes I thought I had forgotten. (Read the comment, if you’re interested in the quotes themselves.)  There are a couple of images that seem to sync up well with the idea from the response, so I figured I’d put them up here for posterity’s sake:

Img_1546Img_1544Img_1545

These images are meant to be viewed in succession, almost as an evolution.  The 1st image depicts the concept of a data-driven decision as Steven describes it in his blog: data leads to our decision to act in a certain way, and those actions lead to new data.  What this idea is missing- and what Steven asserts- is the process of thoughtful reflection that occurs when you consider not just the data but also the perceived reasons for the data.  In the 2nd image, the data has informed those reasons, and those reasons then drive the decision on how to act.

The 3rd image adds a level of balance into the system as drawn from similar diagrams in Senge’s Fifth Discipline.  In this cycle, our decisions are still driven by the reasons for the data, but here the data is the perceived gap between the actual results and those we expected.  In other words, we’re not necessarily asking ourselves the question, “Why do we see the data we see?” but rather, “What is the reason for the difference between what we see and what we thought we’d see?”

Given time, there would probably be several more iterations of this image- I hope your thoughts will help to continue to shape it into something better than it is today.  Thanks again to Steven & Rich @ Teaching Underground for inspiring the response.

Standards-Based Grading, or How I Learned to Stop Worrying and Defuse the Bomb

When considering my contribution to this month’s SBG Gala, I remembered the power of stories, as they give us examples on how to act when we aren’t sure what to do next.  I decided to tell the story of how standards-based grading and I found each other, anchored by a group of quotes that served as springboards in putting the story together.  Hope you enjoy.

Whose_line_is_it_anyway-show

“Welcome to Who’s Line Is It, Anyway?  The show where everything’s made up, and the points don’t matter.”  

– Drew Carey

You may remember this line from the improv show adapted from the BBC: for years, you could have replaced the show’s title in the quote with my classroom number, and it would still have fit.

Starting out as a first-year physics teacher, I struggled conceptually with the larger meaning of points-based grading.  To me, the practice makes class seem like a game show, where kids try to collect all of the points in whatever form they exist, all for the purpose of “winning” a good grade and a pat on the back.  My belief has been that any grade worth its weight should serve more as a diagnosis than a “score,” one that helps all of us get a feel for our current status, and how to respond as a result.  That being said, I knew no other way to assess student work than to put some total out of 100 on everything, so that’s just what I did.

Thinking back to that first year, I remember making a distinction between the types of work that my students completed in class. There were those products that showed me and everyone else what the kids knew, understood and could do (tests, quizzes, projects, lab write-ups, exit slips, and even some ‘Do Nows’), and there were others that served more as scaffolds to get kids ready to know, understand, and do (homework, class exercises, “preview” questions, and other studying practices).  I decided to draw the line in the sand: the first group of products would make up students’ grades, while the second would have no part in them.  Unfortunately, I found no way to formalize that line within the maelstrom of point-collecting (aside from asserting at the onset that “these things get points,” while “these others do not”).  The lack of an external motivator led to less and less participation in these important learning activities: it made these practices seem like they were separate from the game.

Report-cards

“NEEDS MORE PREP FOR TESTS”

– Countless teachers, to countless students and parents as feedback on a report card

Aside with my issues in how all of the assignments & assessments fit together, I also grappled with the lack of specificity in feedback that I was able to offer to students and parents using a cumulative, points-based grading system.  What does “B” actually tell you?  The comment above always stuck out at me in the options for feedback aside from a grade on a report card.  The statement seems to say to kids and parents, “You’re not doing so great on the important work in this class.  Study harder, and you’ll do better.”  That advice is always true, and there is never a time where that statement would be helpful.  How am I helping students to grow if this is my response?  

To add insult to injury, another way of reading this comment would be, “You’re not doing so great, which means that what I’m planning & doing as a teacher must not be ‘prepping’ you very well.”  I’m as dedicated to the idea of kids taking ownership of their learning as the next guy, so by saying this to students, haven’t I put the onus back on me to change?  I sought methods of feedback that would help students know what they understood and what they could do next, while helping me to know how I could improve the state of my teaching practice.  I found the reporting practices I had to choose from to be less than ideal for these purposes.

Forest-path-blog“As a single footstep will not make a path on the earth, so a single thought will not make a pathway in the mind.  To make a deep physical path, we walk again and again.  To make a deep mental path, we must think over and over the kind of thoughts we wish to dominate our lives.”
– Henry David Thoreau  

Over the next two or three years, I continued to question my teaching practices, beliefs, and assumptions, hoping for a breakthrough.  Small, incremental changes in practice seemed to help periodically- changing how “points” were averaged together, or adjusting the weights of any given assignment- but nothing I did ever moved me into a system that really fit my beliefs.  I felt the need to unpack the the entire practice to defuse its harmful power.  But how?  Which wire of the bomb do you cut first, if you haven’t necessarily seen any other schematics?

Four years ago, I happened upon a workshop facilitated by colleague Chad Sansing at an in-house summer PD institute.  During his presentation (and the ensuing discussion), I was introduced to standards-based grading for the first time, albeit through the lens of language arts and social studies.  All it took was seeing a table for recording a student’s scores for my brain to find a pathway.  In his demonstration, he put the topics where the assignments were “supposed” to be.

It had me at hello.

Why did I make the shift to standards-based assessment?

After learning about other philosophical tenets and lessons learned, I decided that day that I must try this ‘standards-based grading’ over the next year, for several reasons:

  • The promise of diagnosis versus point-collecting: Standards-based grading offered the opportunity I had been searching for to diagnose strengths and weaknesses, just by dropping the charade that we should organize gradebook information by quiz, test & homework.  In this way, I found that the assessment became a vehicle to shed light on specific skills acquired & understanding gained (as opposed to a game that kids try to win).  I felt free to assess multiple skills & understanding in the same tool withoutworrying that those pieces of information would be lost in the combination to some “total test score.”  In the same way that people say, “I don’t teach physics, I teach kids,” I could now say, “I don’t check test papers, I check understanding.”
  • The promise of a separation between assessments and activities for learning: As an added benefit, I finally had a method to distinguish between that which showed what the kids could do versus that which prepared them to be able to do.  I had a new question: “Does this product serve as evidence of a student’s mastery of a standard?”  If yes: make it part of the grade under a corresponding topic/concept header.  If no: let’s keep records, and then look back to see how these assignments may have benefited (or not) in the assessment-related products.  (Ties to mastery learning also ensured the relevance of these assignments, but that is a different post for a different Gala.)
  • The promise of specific feedback versus broad validation: Instead of a project score being the primary feedback mechanism (e.g. “You got a 92 – good job!” vs. “You got a 78 – do better next time!”), each score could give kids specific feedback on where they succeeded and where they struggled.  In place of the 78, a student would receive information about their demonstrated knowledge and skills around their logic (how they arrived at an answer,) their content knowledge (how well they use information in order to construct an answer), and their communication (how well they shared their answer with others).  In terms of knowing how to respond to this assessment, students would definitely have more opportunity to uncover next steps.
  • The promise of a unified curriculum versus distinct topics of study: Like many teachers, I noticed that many of my students “learned” material for the unit test, and then promptly forgot it.  That each unit seemed so distinct didn’t help matters at all.  While unintended, I noticed that this system would allow me to track students on some of those concepts and skills that run throughout the school year.  In the system I used, 3 strands – Communication Skills, Systems-Thinking, and Mathematical Skills – followed the student in every unit throughout the year, while each unit has its own content-specific topic.  In the ubiquity of these strands, I felt like I had an anchor in the curriculum to be able to keep the kids seeing the conceptual connection between seeminglydisparate units. (In retrospect, I really wish that I had thought to include an Investigation component as well, but alas.)
Leaving Chad’s workshop, I felt renewed.  I had been walking this route long enough to have finally worn it down into a path – now, instead of looking down at each step, I was seeing my surroundings for the first time in a new light.  I knew that I needed to change my practice for the sake of my students, and seeing this presentation gave me an option to try something new.  After adjusting and tweaking practices to allow for collaboration with my teammates, I ended up with a standards-based system that gave me what I was looking for.
3-d-glasses-traditional

“Tell me and I forget.  Show me and I remember.  Involve me and I understand.”

– Chinese proverb

What struck me at first over the ups and downs of my two years of implementation in the classroom was the student and parent response to the practice of standards-based assessment and reporting.  They were interested, no doubt – but I can’t say that I did a fabulous job of communicating it (given that I was in all honesty still trying to figure it out myself, especially the first year).  That all changed, however, on two specific dates.

For the students, that day was one in which we all put our SBG glasses on and scored a “multi-standard” assessment together.  From the beginning of the year, I had been having the class score smaller, single-standard assessments together as a class against a rubric, while also having each student keep an individual record of all of their assessment scores associated with any related standards (and their corresponding topics).  I felt like I was including the students in the process, keeping everyone informed and on top of their learning.  

I noticed a lot of push-back, however, on those cumulative assessments that measured standards across multiple topics.  Instead of the “92” they were used to seeing, they would receive upwards of 5 separate scores (depending on the number of topics represented by the assessed standards).  While they appreciated the targeted feedback, they kept asking me: “So, what’d I get?”  Finally, it hit me: just because the kids were keeping up with their own scores does not mean that they are involved in the process.  What if we were to score one of these larger assessments as a class, forcing them to adjust their minds to think about different standards related to the same piece of evidence?  

I remember that day like it was yesterday.  Each student sat with pen in hand, ready to address a group of their solutions to a group of problems.  I asked them first to focus on their communication: “Use this rubric to determine what you can learn about your Communication Skills.”  It was like closing one eye and watching a 3D movie through the red lens – some of the information gets blocked out, and the students can focus on the specific info they were receiving through that mental filter.  Following this examination of Communication Skills, I asked them to switch to a different-colored pen and look at the same solution, but now focus solely on the evidence of their Systems-Thinking – without regard to their Communication Skills.  I highlighted sections on an sample solution that jumped out at me as specifically related to “systems-think” (as opposed to communication, content knowledge or mathematical prowess).

As they changed their focus between topics, it was those watching the 3D movie were now closing the eye behind the red lens and opening the one behind the blue – the information that had previously been blocked out instantly sprang to the surface.  In an instant, a roomful of teenagers’ minds opened at once: I watched their puzzled faces switch from scrunched to smiling as they opened their eyes to see the multi-dimensional image.  That, my friends, is rare.

We repeated this process for all of the topics related to the standards measured on this assessment, and recorded scores for each topic separately.  While the time for doing this with the whole class added up to more than I expected, this one strategy seemed to do wonders in students grasping the concept of this grading process.  Instead of asking what extra credit they could do to get better grades, they asked how they could learn more about those topics with which they had struggled – following up with questions on how they could show that they had now learned it.  They also seemed to have a much better grasp on the relationship between each smaller “one-standard” assessment and the larger “multi-standard” assessments.  It wasn’t until I had truly involved them in the process that they finally understood the process.

For parents, it was usually at our first parent-teacher conference that this practice actually sunk in.  Instead of breaking down their kid’s test scores (and telling them to make the kid study), I was able to share with them real information about their child’s communication skills, conceptual understanding, content knowledge, and mathematical prowess (along with trends that showed if any one of these skills were changing in one way or the other).  Since I kept examples of their assessments on hand, we could also look together at the student’s growth over time.  The ten-minute, one-on-one conversation did more for me than any larger parent meeting I tried to organize, and suddenly I had advocates who understood.

Austin-powers

“But what does it all mean, Basil?” – Austin Powers

This quote is a shout-out to physics teacher Shawn Cornally – the first time I saw this quote in his blog, I literally lol’d, as I used to use it with my students as a prompt to wrap things up and find meaning in whatever we had just done.  (It obviously came with the requisitely bad impression – what fun would any Austin Powers quote be without a horrible British accent?)

To me, the practice of standards-based assessment and reporting brought meaning to grades.  It made my students better learners, in that they realized that the only way to improve their standing in the class was to learn the material.  It also helped them to know more specifically what it was they knew, and what would be helpful to learn.  

It made me a better teacher each day, as it made me a better learner.  I learned more about my students’ understanding, which gave me better insights as to how to plan quality learning opportunities for them.  Our division’s mission is to establish a community of learners: for me, standards-based grading played a large part in inspiring a community of learners in my classroom.