Sunday, December 20, 2015

Testing Oral Test: Testing Extensive Speaking for SMA

Share it Please

Language Assessment (Mata Kuliah Evaluasi Pembelajaran Bahasa Inggris)
 
This paper will discuss the testing of one of the English skills which is speaking. However, before respectively explored, the introduction to testing oral ability and background description of the test is going to be explained first. It consists of the purposes of tests; the type of tests and some procedures that can be applied in making good tests. The next explanation will examine the consideration of validity, reliability, practicality, and washback of the test. These further explanations will be delivered in the following sections.

Testing speaking is the most important aspect of language testing. However, it is become a difficult skill to test for the students (Heaton, 1995); since, there are some elements involved in this skill: fluency; accuracy (grammar); pronunciation; vocabulary; appropriateness; and comprehension. Besides, the students have to cope with their confidence and nervousness in dealing with this test.
There are some ways teacher can apply in testing speaking. Hughes (2003) divided them into three formats. First is interview which consists of questions and request for information, pictures, role play, and interpreting. Second is interaction with fellow candidates which comprises discussion and role play, while the last is responses to audio- or video recordings that includes describing situation, remarking in isolation to respond to, and retelling a story. In this test, the writer will implement the second format which is an interaction with fellow candidates and the possible technique is discussion. In this technique, teacher can give the students a topic to be discussed; and then let them discuss it and make a decision.
The interaction with their friends can be also used to elicit speaking data from the students. The advantages of this format are: students can be more confident and have better performance than when dealing with the interviewer directly, face to face. However, an insensitive and assertive student may affect the other student by dominating the performance. Thus, teacher should be carefully matched the students. However the scoring will be on teacher itself. There have been rules explaining in the test in which one of them is to constrain student to speak for at least one minute.
The test that is being discussed in this paper is an achievement test in the form of formative test. The clientele of this test will be the senior high school students year eleven of the first semester in a class that consists of 30 students. According to the English curriculum for senior high school grade XI, there are four types of texts that are being taught. Those are narrative, anecdote, exposition (analytical and hortatory), and discussion. The text type that will be tested is exposition.. The skill that will be tested in this test is speaking. As noted before, the kind of the tests is in from of formative test. Huges (2003) states that assessment is formative when teachers use it to check on the progress of their students, to see how far they have mastered what they should have learned, and then use this information to modify their future teaching plans (Huges, 2003). Thus in this test, it is assumed that students have been learned certain materials so they are supposed to be given a formative test to measure to what extent their ability has achieved the goal of the learning.

Testing Description
  1. Students will perform based on their students’ number (30 overall.
  2. Students will work in pair.
  3. Students will take one of random topics given by teacher (the topics are 10 which represent current issues and still in logical reasoning of high school students).
  4. Students will be given 1 minute at least to deliver their ideas and 2 minutes at most.
  5. Students will be given 1 minute to brain storm about the topic. 
  6. Present your arguments as clear as possible at least one minute and two minutes at most. This part is the students’ obligation as a main prerequisite to be assessed. While another talk after this major argument(s) is additional.
  7. Ask your partner to give his/her opinion regarding the topic.

    Topics:
    1.      Breakfast is the most important meal of the day.
    2.      Smoking should be illegal.
    3.      Television is the leading cause of violence in today's society.
    4.      Video games bring negative effect for children.
    5.      Students are not allowed to bring gadgets to school.
    6.      The government  should ban any kinds of homework.
    7.      Death penalty should not be applied in Indonesia.
    8.      Home schooling is more good than harm.
    9.      School should implement English day.
    10.  Morning  ceremony is important  to foster student’ discipline.
In practice, the most commonly used test methods for English majors are interview, interaction with peers, and response to tape-recordings. But as Hughes (1989,104) has pointed out: “The relationship between the tester and the candidate is usually such that the candidate speaks as to a superior and is unwilling to take the initiative.” Therefore, it is important for teachers to realize that task-oriented test approach may provide more opportunities for students to take the initiative than traditional methods.
As a matter of fact, there are some ways to create friendlier environment for the students, to help them to be confidence and make their nervousness become less; so the general purposes can be achieved. The ways are: personalize the test to the students; not interrupting them too much; they must not be discouraged from making a second attempt at a task that they have difficulties with; and they should avoid being seen, making notes on their performance, because it is highly stressful. Furthermore, the general purposes of this testing speaking are to figure out the students’ ability to perform in a range of situation and to collect the evidences in a systematic way (through elicitation techniques or tasks).
Regarding this, the paired-discussion is taken into account in this test in order to get the results optimally and in line with the expectation and goal of the learning.

Validity
 Validity in general refers to the appropriateness of a giving test or any of its component part as a measure of what it is supposed to measure. It is the quality which most affects the value of a test, prior to, though dependent on, reliability. However, validity is related to the content and construct of a test while reliability is related to the score. Moreover validity of a test has traditionally been defined as, the point to which the test actually measures what is intended to measure” (Brown, 1996:231).
To achieve face validity, an oral test may use direct method such as picture tasks, dialogues, group discussion, role play, interpreting, imitation or pair work with the attempt to duplicate as closely as possible the setting and operation of the language use situations; meanwhile, direct method makes oral tests authentic for it is reciprocal in nature and there is more interaction between the task and the test taker.
An oral test is said to have content validity only if it includes a proper sample of the relevant structures, whether dialogue, discussion, role play or pair work that has made clear the purpose of it. As Hughes (2003) notes, the tasks should elicit behavior which truly represents the candidates‟ ability and which can be scored validly and reliably. And test takers’ background knowledge, levels of language should also be considered.
The construct validity of a language test is an indication of how representative it is of an underlying theory of language learning. According to Hughes (2003), “the word “construct” refers to any underlying ability (or trait) which is hypothesized in a theory of language ability” (p. 33). The oral test is said to have high construct validity, Huges argues, if coefficients between various traits of speaking ability are low but those between the total construct and each subconstruct (grammar, accuracy, fluency, etc) are high. Meanwhile McNamara & Roever (2006) argue that in the oral test, the presence of interlocutor in the test setting introduces an immediate and overt social context, which presented fundamental challenges for the existing individualistic theories of language proficiency. Thus, a discussion with pair with a series of instructions that urge students to perform communicative task is considered having a construct validity.
Departing from the notions above, this test has complied the face, content, and construct validity. This test appears to measure oral ability of the students with discussion technique and it certainly requires them to speak and deliver their ideas. This kind of test, in another sense, is a direct test. The test also has content validity as it includes a proper sample of the relevant structures of discussion and has a clear purpose of it that is to be able to convey their own ideas.

Reliability
The more similar the scores would have been, the more reliable the test is said to be (Hughes, 2003). There are two components of reliability: the performance of candidates from occasion to occasion, and the reliability of the scoring. In the same way, to make an oral test reliable, testers should try out to achieve consistent performances from candidates and to achieve scoring reliability.
There are some ways of achieving consistent performances from candidates to make candidates perform consistently in an oral test. Firstly, use more items in an oral test, for the more items a test has, the more reliable that test will be. “It has been demonstrated empirically that the addition of further items will make a test more reliable” (Hughes, 2000, p.36). However, one thing to bear in mind is that the additional items should be independent of each other and of existing items. And each additional item should as far as possible represent a fresh start for the candidate. In an interview used to test oral ability, the candidate should be given as many fresh starts as possible. By doing so additional information on the candidates may be gained which will make the results of the oral test more reliable. Secondly, provide clear and explicit instructions so that candidates can avoid introducing confusion. Thirdly, candidates should be familiar with the format and testing techniques. Thus efforts must be made to ensure that all candidates have the opportunity to learn just what will be required of them. Fourthly, uniform and non-distracting conditions of administration should be provided.
In discussion and role-play assessment literature, inter-rater reliability is the most published psychometric property (McNamara and Blumer, 2009). Inter-rater reliability is important in communicating the degree to which different raters agree with each other in their evaluations of behavior in role-play assessments. To establish inter-rater reliability, raters select a sample and independently score recorded role-play scenarios after familiarizing themselves with operational definitions and criterion. The scores are them correlated to provide a measure of agreement. The rater of this test is single rater, and then it will be easier in measure the students.
For the objectivity of the judgement, there are evaluation sheet of this test that will be distributed to the students before the test is held. Thus the students can learn the criteria which will be the assessment. In terms of assessing speaking, the scoring of this skill tests can range from an impression mark to a fairly detailed mark which is in form of scale, for instance using a 4-point scale. Below is the table of marking scheme or evaluation sheet.
Aspects
Criteria
Score
1
2
3
4
Fluency
Speaking with many pauses
Speaking too slowly
Speaking generally at normal speed
Speaking fluently

Pronunciation
Speaking words incomprehensibly
Speaking with incorrect pronunciation but still understandable
Speaking with several incorrect pronunciation
Speaking with correct pronunciation 

Accuracy
The serious errors present in speech makes the message difficult to understand
The errors present in speech would frequently create confusion
The speech is still understood although it consists of many years
The errors present in speech are so minor so that the message would be easily comprehended

Clarity
Often mumbles or cannot be understood, more than one mispronounced words
Speaks clearly and distinctly most of the time, no more than one mispronounced word
Speaks clearly and distinctly near all the time, no more than one mispronounced word
Speaks clearly  and distinctly most of the time, and no mispronounced words

Performance skill
Speaking in volume which is almost inaudible, no facial expression and not communicative
Mumbling, flat facial expression, and less communicative
Speaking in soft voice, but can be understood, good facial expression
Speaking clearly and loudly, good  facial expression and communicative

Content
Very poor; does not show knowledge of subject; non-substantive; not pertinent
 Fair to poor; limited knowledge of subject; little substance; inadequate development of topic.
 Good to average; some knowledge of subject; adequate range; limited development of thesis; mostly relevant to topic;
Topic elaboration, organization, coherence and cohesion, suitable linkers and connectors. 


In the context of Indonesia, the authors believe that the type of test discussed above could be successfully used in the assessment of students who are learning English for certain micro and macro skills. They are expected to be able to communicate naturally in English as well as convey their ideas, notions, and argumentations regarding certain issue. Concisely, the test that is being promoted is both valid and reliable.
Practicality
As Brown (2004) notes, practicality involves questions of economy, ease of administration, scoring and interpretation of results (Brown, 2004). The economy refers to the money that was spent to make the test. In some speaking test, especially the format of the test is interview, are cost a lot of money because performance test such as speaking with interview test that have to conduct the training for that raters and also to pay the raters itself. However this test involves small pairs of students. One pair of students will not take time so long. It is about ten minutes each pair and will be 150 minutes for all pairs (a class).
The material that will be used in this test is some pieces of papers that is easy to get. Thus this test considers practical.

Washback
Speaking test in general has the positive washback effect of encouraging students to study the language skills required for oral communication. As Weir (2004:103) pointed out: “If we wish to make statements about capacity for spoken interaction we are no longer interested in multiple-choice, pencil-and-paper tests, that is, indirect tests of speaking where spoken language is conspicuously absent.” In line with this perspective, the writer proposes that the task-oriented approach should be especially considered in the test of spoken English as a type of performance test to assess language proficiency by asking students to give productive and interactive performance. It is believed that, if well designed according to the needs of various real situations, the task approach, as an option of speaking test, not only increases test authenticity and validity but also can be used to considerably reduce most test deficiency such as halo effect, interview bias, intra-rater reliability, and so on.
The test which has been designed will bring positive washback as the testing formats and procedures stimulate more proper teaching practices; for example, an oral proficiency test is introduced because the teacher expects the students to be able to speak in a natural setting (Taylor, 2005). Therefore, from that statement, the test which has been designed has more positive backwash.


References:
Brown, J. D. 1996. Testing in Language Programs. Upper Saddle River, New Jersey:
Prentice-Hall Regents.
Heaton, J. B.. 1995. Writing English Language Tests--New edition. United States of America: Longman.
Hughes, A. (1989). Testing for Language Teachers. Cambridge: Cambridge University Press
Hughes, Arthur. 2003. Testing for Language Teachers-Second Edition. Cambridge: Cambridge University Press.
McNamara, Tim. 2009. Language Testing. London: Oxford University Press.
Weir, C,& Taylor. L. 2005. Language Testing and Validation: An Evidence-Based Approach. London: Palgrave Macmilan.



No comments:

Post a Comment

Social Share Icons

Blogroll

About