Monitoring and Evaluation in Collaborative Learning Environments

In Proceedings of the Computer Support for Collaborative Learning (CSCL) 1999 Conference, C. Hoadley & J. Roschelle (Eds.) Dec. 12-15, Stanford University, Palo Alto, California. Mahwah, NJ: Lawrence Erlbaum Associates.

Monitoring and Evaluation in Collaborative Learning Environments

Simeon J. Simoff

Key Centre of Design Computing and Cognition, University of Sydney.

Abstract: Web-based courseware requires revision of the desktop Computer-Aided Instruction/Computer-Aided Learning (CAI/CAL) paradigms. Networked computer media brought significant changes to education methods. Recently, virtual worlds - networked environments, which mimic elements of the physical world, and create a sense of place for the person inhabiting and doing things there, have been used to organise on-line learning and teaching activities. The new educational forms in these environments use extensive computer-mediated communication and collaboration during the learning process. Computer media provides means for extensive and detailed documenting of activities in the learning environment. This information can be employed for assisting student monitoring and evaluation. Proposed framework for analysis and evaluation of virtual seminars is based on quantitative analysis of participation, qualitative content analysis, and visualisation of collaborative activities. The methodology is based on the research done in the Virtual Campus in the Architecture Faculty at the University of Sydney. Presented methods can become an intrinsic part of collaborative learning environments.

Keywords: text data mining, computer-mediated communication, project-based learning, qualitative methods

Computer-mediated education as communication

The extensive proliferation of computer media and networking opened the gates towards fundamental changes in the methods, models and techniques employed to educate and train design students and professionals. The new distributed media requires revision of the desktop Computer-Aided Instruction/Computer-Aided Learning (CAI/CAL) paradigms. Analyzing the role and impact of computers in the on-going changes in education, Lockard et al. (1994), has specified two types of CAI/CAL applications. Applications classified as Type I employ computing resources to do things educators have previously done without computers. These applications improve such aspects of teaching like preparation of teaching materials and student management. Generally, this approach does not change the teaching strategies and schemata, thus Type I automation reduces the technical efforts but does not result in more effective teaching. The other group of applications, labeled as Type II are oriented towards bringing teaching and learning methods and experiences impossible without computers. In the context of networked multimedia computing, these methodologies carry the potential of fundamental improvements in the efficiency of flexible on-line teaching and learning processes, though the initial costs and investment are much higher than in the development of traditional curriculum.

Most of the early Web-mediated on-line courses were designed to complement conventional methodologies for dissemination of course materials, connecting students to various on-line multimedia learning materials. These courses did not utilise the communication potential of internetworked computers. There were hardly any changes in the teaching and learning methodologies, including student monitoring and evaluation techniques.

Further developments in computer-mediated teaching and learning environments were influenced by the view of education as communication and collaboration. Educational theorists recalled Vigotsky's concept of Zone of Proximal Development - ZPD (Vigotsky, 1978, p.86). Tiffin and Rajasingham (1995, p.22) interpret ZPD as the difference what people can do on their own and what they could do with help from people more experienced than themselves. The purpose of educational methodology is to provide that assistance to the learner. The purpose of educational environment is to enable that provision. Consequently, the educational process can be reduced to a two-way interactive communication process between people who have roles as teachers and as learners. Such communication enables teachers to assist learners to solve problems that they would not be able to solve by themselves. This educational paradigm has been extended with the notion of teaching as a team activity and learning as a group activity (Tiffin and Rajasingham, 1995). They specified extended ZPD as a four component model: (i) someone in the role of learner; (ii) someone in the role of teacher; (iii) something that constitutes a problem which the learner is trying to solve with the help of the teacher; (iv) the knowledge needed to solve the problem.

Approaching education as communication and building a learning community (Palloff and Pratt, 1999) provides the basis for this research. One way to implement these principles in the design of virtual educational environments is to adopt the metaphor of a place dedicated to instruction and research (Maher et al., 1999). The organisation of the education in such environment is result of the adaptation of the educational model, which corresponds to the metaphor. In this paper we refer to the integrated educational environment, known as the Virtual Campus, University of Sydney (Maher, 1999). The Virtual Campus is implemented as a combination of a LambdaMOO object-oriented database server and WebCT courseware server.

Maher (1999) presents in details the architecture of Virtual Campus and the teaching techniques, namely virtual lecture and seminar, used there. Virtual Campus uses the room metaphor for organisation of collaborative learning. From the point of view of information processing room is a bounded system, which isolates community of learners, bounded by the subject(s) they study. This bounded system provides means for communication, knowledge sharing and storing. Each subject taught in the Virtual Campus has its own dedicated classroom. The education process uses extensively computer-mediated moderated discussions hold both in synchronous and asynchronous mode.

Synchronous seminars take place in the virtual campus classroom. Each seminar is devoted to particular theme as part of the course, complementary to lectures or course materials, available as virtual lectures. Figure 1 illustrates a synchronous seminar scenario in a classroom in the Virtual Campus. Each participant types his/her phrases, gestures, thoughts, etc. Discussions tend to split into several trends and it can be more difficult to control the topic than in face-to-face seminars. The role of the moderator(s) is to maintain the focus of the discussion, guiding students through the knowledge constructing process. All activities during the seminar are recorded and the transcripts are stored in the classroom. As a rule, discussions have an initial transition period, when participants are joining the group. During this period the transcripts contain relatively high amount of social and synchronisation utterances.

Virtual campus \| Room: CMC in Design \| Seminar B	Slide: 2

Figure 1. A moment from a synchronous seminar in the Virtual Campus environment

Asynchronous seminars are held through computer-mediated bulletin boards. Figure 2 shows fragments from threads in an asynchronous seminar. The relatively large amount of time for answer, that each participant has, results in deeper treatment of discussed issues. The role of the moderator in the learning process is to guide discussions and provide competent advice.

Figure 2. Fragments from an asynchronous seminar in the Virtual Campus environment

Online seminars can form part of the student’s assessment by including the amount and content of the student's participation. The virtual campus provides a means for recording (see Figure 1) in explicit and descriptive form all activities during synchronous discussions in a format suitable for further quantitative and qualitative analysis. Bulletin boards preserve the threads and the content of each message. Thus, the analysis of these data sets can be used to evaluate students’ and instructors’ contribution, providing feedback in both directions. The use of appropriate visualisation techniques can provide quick feedback for monitoring students. This paper presents two techniques:

Text data mining and analysis of seminar transcripts for evaluation of personal contribution;
Visualisation of discussion threads for evaluation of the degree of collaboration.

Each of these techniques is applicable both to synchronous and asynchronous seminars.

Text data mining and analysis for evaluation of personal contribution to a seminar

Data mining (DM) deals with the examination of a data source for implicit information and recording this information in explicit form (Fayyad et al., 1996, Chapter 1). The process developed for mining of seminar data is shown in Figure 3. DM involves the identification of potentially useful and understandable patterns in these data.

Figure 3. Mining of seminar data.

At the stage of preprocessing and transformation the transcripts can be represented as a sequence of activities, as shown in Figure 4. Each activity is described by an expression. An expression consists of a subject, who performs the action or the utterance in the activity, a verb, which describes the action or utterance, an object towards which the action is directed and the content of the action. Using this formalism we can represent, analyse and compare synchronous seminars implemented through different underlying environments. In this paper we consider only text-based synchronous seminars.

Figure 4. Formal model for representing synchronous seminars.

Preprocessing and transformation include:

Cleaning headers, service information and spelling errors.
Formatting the transcript in a way that each line corresponds to a single activity.
Splitting each line (activity) into two parts: the subject, the verb and the object (see Figure 4) constitute the left-hand side and the content remains in the right-hand side.
Structure normalisation: converting where necessary the right-hand side of each expression to the structure "subject-verb-object". For example, Figure 5 a and b present the same type of action - "say". Figure 6 shows the result of the normalisation.

Jim says, "Yes, I guess so, but I'm not sure "
Jim [to Ron]: can we bring Charles in here?

Figure 5. Same type of action can have different representation in the transcript

Subject (Jim) - Verb (says) - Object (everyone) - Content (Yes, I guess so, but I'm not sure)
Subject (Jim) - Verb (says) - Object (Ron) - Content (can we bring Charles in here?)

Figure 6. Structure normalisation unifies action representation

Reference normalisation: each participant is represented by only one name spelling in the seminar.

Structure and reference normalisations depend on the environment. For example, strucutre normalisation may not be necessary if the activities in the original record of the seminar have a unified structure.

Role coding: the reference to each participant is coded with his/her formal role in the seminar.

The basic role categories for course seminars include student, moderator, instructor and expert. The last category accommodates external professionals and world-known experts, both from industry and academia that could participate in the virtual seminar.

Evaluation is based on various data analysis procedures. There are a number of numerical characteristics, which describe some quantitative aspects of the seminar that can be easily computed from the preprocessed data. The assumption is that the level of activity of each participant is properly related to the number of utterances (action of type "say"). Consequently, the intensity of the seminar is reflected in the ratio

Similar statistics can be computed for each participant. Further refinement involves content analysis and the introduction of a coding schema for separating the utterances related to the subject of the seminar from the social and other utterances, which are not focused on the subject. Simoff and Maher (1999) proposed an open hierarchical coding schema, designed to conduct investigations on increasing level of detail and utilise the results obtained on previous levels. This technique operates over lines. For example, the seminar activities can be divided initially into two categories - related and unrelated to the discussion, and tagged according to this categorisation. Individual statistics are based only on the lines related to the seminar. Statistically normalised estimates of topic related and social utterances estimates and ratios between them can draw quick picture about the seminar.

The "discussion pies" in Figure 7 and Figure 8 illustrate the individual (a) and group (b) participation. Estimates are based only on activities related to the topic of the seminar. The visual balance of Figure 7a indicates that the participation in the discussion by Student_1, Student_2 and the Moderator (Instructor_2) are nearly the same. As shown in Figure 8a, the participation of the other instructors is relatively small compared to the Moderator's contribution. Patterns in Figure 7b indicate the different character of the discussion during Seminar 2. Student_2 continues to be the most active among the students. Figure 8a shows that more than two thirds of the utterances came from the students; Figure 8b indicates that student participation was only a bit more than 50%; and Figure 8c indicates that student participation was less than 50%. This could be a warning sign that student participation is dropping.


a.	b.	c.

Figure 7. Line-based estimate of individual activity, related to the virtual seminar.


a.	b.	c.

Figure 8. Line-based estimate of group activity, related to the virtual seminar.

Line-based estimators can be biased when participants used either very short expressions - one or two words, or very long (approximately more than 10-15 words), for example, when defending particular statement. The estimators can be corrected by introducing weights, based on the average length of expressions and length variance. Another solution is the use of word-based estimators of individual and group activities derived from the total amount of words, alphanumeric and other characters (Simoff and Maher, 1999) with similar visualisation. Visual patterns of discussion are applicable for assisting on-line student assessment and instructor/moderator adjustment. The information about the amount of other characters used in expressions gives an idea about the use of punctuation and other symbols in expressing additional cues in the text.

Text analysis remains still a state-of-the-art activity, considerably improved with the overall progress in computing. In virtual seminars it provides a quantitative analysis of the content data (the "right-hand" side), offering objective results to supplementing instructor's subjective evaluation of the discussion. The content analysis of seminar transcripts starts with the specification of a list of words that distort the content of the transcripts. The list of words, excluded from the analysis, consists of prepositions, conjunctions and disjunctions, articles, pronouns, and other particular words. The last group is used for analysing the content. Content analysis module counts the occurrences of the remaining words, the co-occurrences of the most frequent words, assuming high co-occurrence means that the terms constitute a complex term and excludes the infrequently occurring words. The definition of minimum frequency threshold is based entirely on experiential observations. and also a list of thematic key words, whose occurrence has to be considered even if it is below the threshold. This technique is used to build seminar thesaurus and individual thesaurus for each participant. The overlap between individual and seminar thesauri provides an estimate of the relevance of individual contribution to the seminar topic.

A step further is to compare ontological structures rather than thesauri. Ontology in this context is used in wide sense covering any structure that relates terms, including taxonomy, hierarchical categorisation and semantic net. Figure 9 illustrates part of semantic net derived from the content of a discussion about Virtual Design Studios (VDS). Note the numerical characteristics assigned to term nodes and links between them. The comparison of individual graph versus the seminar graph in this case can be done at macro level or for some specified key terms in the context of their usage. Such step provides an idea about the relevance of the use of the term.

Figure 9. Semantic net used to build seminar ontologies (generated by TextAnalyst, a product of Megaputer Intelligence, Inc., http://www.megaputer.com)

One of the problems in applying text mining and analysis methods to virtual seminar data is the relatively low word frequencies. These techniques are usually applied for the analysis of focused interviews. The word frequencies in such interviews are about 100-300. In a seminar transcript these frequencies are usually no greater than 10. Currently corrections and additions are done manually.

The above discussed techniques are applicable also to the asynchronous seminars on the bulletin board. In this case the preprocessing and transformation stage is much simpler, due to the clear separation of threads.

Visualisation of discussion threads for evaluation of the degree of collaboration.

These techniques are presented in the context of asynchronous discussions used in peer-assisted learning (Chapman, 1998) and collaborative problem- and project-based learning. The assumption is that both the content and the pattern of the sequence of messages on the bulletin board reflect the degree of collaborative learning. In this section message patterns are considered, assuming that the content analysis has established a correspondence between the subject and the content of the message. The idea behind the introduction of this visualisation technique is to be able to get quick picture about collaboration without going into the content of the discussion. Another goal is to provide means for quick allocation of key (hot) points in discussion and focussing on their content.

The messages on the board are grouped in threads. Berthold et al. (1997, 1998) proposes a threefold split of the thread structure of e-mail messages in discussion archives in order to explore the interactive threads. It included (i) reference-depth: how many references were found in a sequence before this message; (ii) reference-width: how many references were found, which referred to this message; and (iii) reference-height: how many references were found in a sequence after this message. In addition to the threefold split, Sudweeks and Simoff (1999) included the time variable explicitly. This model, expressed graphically as tree, allows the comparison of the structure of discussion threads both in a static mode (for example, their length and width at corresponding levels) and in a dynamic mode (for example, detecting moments of time when one thread dominates another in multi-thread discussions). Simoff and Maher (1999) present in details an application of this model to collaborative design. This paper is limited to static visualisation.

In static analysis each message on the bulletin board, denoted by "M", is identified by two indices - one for its level and one for the tread it belongs to. In the example in Figure 2, message "M_3A" is the message that is on the third level in discussion thread "A". For the static visualistion, if there is more than one message on the same level, than a "+" is added to the message label.

Visualisation techniques based on the above mentioned model are modified versions of the nested set visualisation of tree structures (Knuth, 1973, pp. 305-313). Figure 10 shows an example of such visualisation technique applied to threads "A" and "B". Each first message in a level is represented by a corresponding rectangle, labeled in this example to illustrate the message correspondence. Thus, there are four nested rectangles in Figure 10a. Each of the other messages on the same level is represented as additional 0.5 pt to the baseline thickness. In Figure 10b the base line thickness is 1 pt, thus rectangle "M_2B" has thickness 2.5 pt.

a. Nested rectangles for single message per level.

b. Nested rectangles when there are multiple messages on some levels.

Figure 10. Visualisation of discussion threads

Figure 11 illustrates the application of the technique for monitoring students working in collaborative design teams. Collaboration on a shared design task can be considered at different levels of abstraction and "degrees" of task sharing. Maher et al. (1997) identify two extreme approaches to sharing design tasks during collaboration: single task collaboration and multiple task collaboration. During single task collaboration the resultant design (or project development) is a product of a continued attempt to construct and maintain a shared conception of the design task. In other words each of the participants has his/her own view over the whole design problem and the shared conception is developed during intensive discussions. An example, of the visual pattern of such type of collaboration is presented in Figure 11a. It is characterised with relatively large amount of nested rectangles, usually indicating also several messages in respond to particular message. During multiple task collaboration the design problem is divided among the participants so that each person is responsible for a particular portion of the design. Thus, multiple task collaborative design does not necessarily require the creation of a single shared design conception, thus messages are usually related to the project management. Isolated messages and short threads dominate this collaboration style, as illustrated in Figure 11b.


a. intensive collaboration for creating a joint understanding of the problem	b. collaboration connected more with coordinating project tasks and submissions

Figure 11. Patterns of collaboration

Conclusions

Approaching education as communication in computer-mediated environments requires the building of a learning community. Usually instructor guides the development of this community. The techniques presented in this paper can assist instructors in monitoring and evaluating some of the processes in learning communities. The analysis of the record of on-line seminars can show who has participated and the extent of their participation. Evaluating individual participation can identify not only the amount of contribution, but the content of the contribution. The analysis and visualisation of the bulletin board discussion can provide indicators of the type of collaboration and the extent of the interaction on ideas and management. Proposed analysis and visualisation techniques can become an integral part of collaborative educational environments, complementing the means based on Web log statistics.

Acknowledgments

This research is part of the research work carried under the Internet Learning initiative at the Faculty of Architecture, University of Sydney. The author acknowledges Mary Lou Maher for the support of this research and help with the clarification of the ideas, presented in this paper.

Bibliography

Berthold, M. R., Sudweeks, F., Newton, S. and Coyne, R. (1997). Clustering on the Net: Applying an autoassociative neural network to computer-mediated discussions. Journal of Computer Mediated Communication, 2(4), (http://www.ascusc.org/jcmc/vol2/issue4/bert-hold.html).

Berthold, M. R., Sudweeks, F., Newton, S. and Coyne, R. (1998). It makes sense: Using an autoassociative neural network to explore typicality in computer mediated discussions. In F. Sudweeks, M. McLaughlin and S. Rafaeli (Eds), Network and Netplay: Virtual Groups on the Internet, Menlo Park, CA: AAAI/MIT Press, pp.191-220.

Chapman, E. S. (1998). Key considerations in the design and implementation of effective peer-assisted learning programs. In Topping, K. and Ehly, S. (eds), Peer-assisted learning, Lawrence Erlbaum Associates, Mahwal, N. J.

Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P. (1996). From data mining to knowledge discovery: An overview. In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy (eds), Advances in Knowledge Discovery and Data Mining, AAAI Press, Boston, MA, pp. 1-34.

Knuth, D E (1973). The art of computer programming, Vol 1: Fundamental algorithms Addison-Wesley, Reading, MA, USA, pp 311-312.

Lockard, J., Abrams, P. D. and Many, W. A. (1994). Microcomputers for Twenty-First Century Educators, Addison-Wesley, Reading, MA.

Maher, M. L. (1999). Designing the virtual campus as a virtual world. Submitted to CSCL'99.

Maher, M. L., Simoff, S. J. and Cicogniani A. (1997). Potentials and limitations of Virtual Design Studio. Interactive Construction On-line, January, a1.

Maher, M. L., Skow, B., and Cicognani, A. (1999) Designing the Virtual Campus, Design Studies.

Palloff, R. M and Pratt, K. (1999). Building Learning Communities in Cyberspace: Effective Strategies for the On-Line Classroom. Jossey-Bass Publishers, San-Francisco, CA.

Simoff, S.J. and Maher, M. L. (1999). Analysing participation in collaborative design environments. Design Studies (to appear).

Sudweeks, F. and Simoff, S. J. (1999). Complementary explorative data analysis: the reconciliation of quantitative and qualitative principles. In Jones, S. (ed.), Doing Internet Research, Sage Publications, Thousand Oaks, CA, pp 29-55.

Tiffin, J. and Rajasingham, L. (1995). In Search of the Virtual Class: Education in an Information Society, Routledge, London.

Vigotsky, L. S. (1978). Mind in Society: The Development of the Higher Psychological Processes, Harvard University Press, Cambridge, MA.

Authors' addresses

Simeon J. Simoff (simeon@arch.usyd.edu.au)

Key Centre of Design Computing and Cognition (G04); University of Sydney; NSW 2006, Australia. Tel. (+61 2) 9351-3030. Fax (+61 2) 9351-3031.