These two chapters relate a common theme as they are both focused on the collection of data for the evaluation. The greatest common theme is that the data collected, and the design of the method for collecting the data, need to be aligned with the evaluation question to be answered. The selection of the method to design the data collection, and then the method used to collect the data, can lead to inefficient use of resources and leave the evaluation question unanswered.
Basic rules for the validity of the data compare closely with those with which I am familiar in designing research and statistical projects. However, the evaluation use of the data allows more freedom in selection of data, the approach, and the method because we are more concerned with the evaluation question pertaining to the particular instance of the program than we are in inferring the result to all instances of the program. This lends itself more readily to qualitative and mixed-method planning, design, and data collection than might be required under a more stringent standard associated with straight research.
I was grateful to be reminded of the lack of transferability of certain research principles and designs to education and other social evaluation projects. The example they use of the experimental designs familiar to us in science and medicine and the inability to transfer this same design to social policy situations is enlightening. The essence of the design and data collection choices we make is that each situation and evaluation is unique. We have to identify the evaluation question(s) as precisely as we can and then align the design and methods of collection so we can best answer the question(s).
One final highlight for me was that we did not need to choose methods and designs free of bias in order to have a meaningful evaluation. Rather, understanding the biases present on the various methods is critical to correctly answering the evaluation question. The thought that the best practice is to mix the design and the data collection methods with an eye towards the various biases so that the findings triangulate and measure the construct validity.
From these readings my biggest "take-away" related to our project is the importance of identifying the critical evaluation question(s) as precisely as we can. The randomization of the students into the groups for the evaluation will strengthen the validity of the observation, but the groups are small enough that we wouldn't need to consider sampling for some of the analyses we may perform. We have talked about the possibility of an experimental design using pre-post to assess the impact of the game on learning the 12 times tables. My concern is that the groups we are using are already fairly experienced with their times tables and the game. I still think that the pre-post design will be a good approach, but the measurements of impact may be skewed by prior knowledge and we will be starting from a substantially higher baseline than that to which the game is geared. It is hard to show a difference when many of the subjects have already achieved mastery in a major portion of the skill being evaluated.