Language Assessment (Mata Kuliah Evaluasi Pembelajaran Bahasa Inggris)
This paper will discuss the testing
of one of the English skills which is speaking. However, before respectively
explored, the introduction to testing oral ability and background description
of the test is going to be explained first. It consists of the purposes of
tests; the type of tests and some procedures that can be applied in making good
tests. The next explanation will examine the consideration of validity, reliability,
practicality, and washback of the test. These further explanations will be
delivered in the following sections.
Testing speaking is the most
important aspect of language testing. However, it is become a difficult skill
to test for the students (Heaton, 1995); since, there are some elements
involved in this skill: fluency; accuracy (grammar); pronunciation; vocabulary;
appropriateness; and comprehension. Besides, the students have to cope with
their confidence and nervousness in dealing with this test.
There
are some ways teacher can apply in testing speaking. Hughes (2003) divided them
into three formats. First is interview which consists of questions and request
for information, pictures, role play, and interpreting. Second is interaction
with fellow candidates which comprises discussion and role play, while the last
is responses to audio- or video
recordings that includes describing situation, remarking in isolation to
respond to, and retelling a story. In this test, the writer will implement the
second format which is an interaction with fellow candidates and the possible
technique is discussion. In this technique, teacher can give the students a
topic to be discussed; and then let them discuss it and make a decision.
The
interaction with their friends can be also used to elicit speaking data from
the students. The advantages of this format are: students can be more confident
and have better performance than when dealing with the interviewer directly,
face to face. However, an insensitive and assertive student may affect the
other student by dominating the performance. Thus, teacher should be carefully
matched the students. However the scoring will be on teacher itself. There have
been rules explaining in the test in which one of them is to constrain student
to speak for at least one minute.
The test that is being discussed
in this paper is an achievement test in the form of formative test. The
clientele of this test will be the senior high school students year eleven of
the first semester in a class that consists of 30 students. According to the English
curriculum for senior high school grade XI, there are four types of texts that
are being taught. Those are narrative, anecdote, exposition
(analytical and hortatory), and discussion. The text
type that will be tested is exposition.. The skill
that will be tested in this test is speaking. As noted before, the kind of the tests is in from of
formative test. Huges (2003) states that assessment is formative when teachers
use it to check on the progress of their students, to see how far they have
mastered what they should have learned, and then use this information to modify
their future teaching plans (Huges, 2003). Thus in this test, it is assumed
that students have been learned certain materials so they are supposed to be
given a formative test to measure to what extent their ability has achieved the
goal of the learning.
Testing
Description
- Students will perform based on their students’ number (30 overall.
- Students will work in pair.
- Students will take one of random topics given by teacher (the topics are 10 which represent current issues and still in logical reasoning of high school students).
- Students will be given 1 minute at least to deliver their ideas and 2 minutes at most.
- Students will be given 1 minute to brain storm about the topic.
- Present your arguments as clear as possible at least one minute and two minutes at most. This part is the students’ obligation as a main prerequisite to be assessed. While another talk after this major argument(s) is additional.
- Ask your partner to give
his/her opinion regarding the topic.
Topics:1. Breakfast is the most important meal of the day.2. Smoking should be illegal.3. Television is the leading cause of violence in today's society.4. Video games bring negative effect for children.5. Students are not allowed to bring gadgets to school.6. The government should ban any kinds of homework.7. Death penalty should not be applied in Indonesia.8. Home schooling is more good than harm.9. School should implement English day.10. Morning ceremony is important to foster student’ discipline.
In practice, the most commonly
used test methods for English majors are interview, interaction with peers, and
response to tape-recordings. But as Hughes (1989,104) has pointed out: “The
relationship between the tester and the candidate is usually such that the
candidate speaks as to a superior and is unwilling to take the initiative.”
Therefore, it is important for teachers to realize that task-oriented test
approach may provide more opportunities for students to take the initiative
than traditional methods.
As
a matter of fact, there are some ways to create friendlier environment for the
students, to help them to be confidence and make their nervousness become less;
so the general purposes can be achieved. The ways are: personalize the test to
the students; not interrupting them too much; they must not be discouraged from
making a second attempt at a task that they have difficulties with; and they should
avoid being seen, making notes on their performance, because it is highly
stressful. Furthermore, the general purposes of this testing speaking are to
figure out the students’ ability to perform in a range of situation and to
collect the evidences in a systematic way (through elicitation techniques or
tasks).
Regarding
this, the paired-discussion is taken into account in this test in order to get
the results optimally and in line with the expectation and goal of the
learning.
Validity
Validity
in general refers to the appropriateness of a giving test or any of its
component part as a measure of what it is supposed to measure. It is the
quality which most affects the value of a test, prior to, though dependent on,
reliability. However, validity is related to the content and construct of a
test while reliability is related to the score. Moreover validity of a test has traditionally been
defined as, the point to which the test actually measures what is intended to
measure” (Brown, 1996:231).
To achieve face validity, an oral
test may use direct method such as picture tasks, dialogues, group discussion,
role play, interpreting, imitation or pair work with the attempt to duplicate
as closely as possible the setting and operation of the language use situations;
meanwhile, direct method makes oral tests authentic for it is reciprocal in
nature and there is more interaction between the task and the test taker.
An oral test is said to have
content validity only if it includes a proper sample of the relevant structures,
whether dialogue, discussion, role play or pair work that has made clear the
purpose of it. As Hughes (2003) notes, the tasks should elicit behavior which
truly represents the candidates‟ ability and which can be scored validly and
reliably. And test takers’ background knowledge, levels of language should also
be considered.
The construct validity of a
language test is an indication of how representative it is of an underlying
theory of language learning. According to Hughes (2003), “the word “construct”
refers to any underlying ability (or trait) which is hypothesized in a theory
of language ability” (p. 33). The oral test is said to have high construct
validity, Huges argues, if coefficients between various traits of speaking
ability are low but those between the total construct and each subconstruct
(grammar, accuracy, fluency, etc) are high. Meanwhile McNamara & Roever (2006) argue that in the oral test, the presence of
interlocutor in the test setting introduces an immediate and overt social
context, which presented fundamental challenges for the existing
individualistic theories of language proficiency. Thus, a discussion with pair
with a series of instructions that urge students to perform communicative task
is considered having a construct validity.
Departing from the notions above,
this test has complied the face, content, and construct validity. This test
appears to measure oral ability of the students with discussion technique and
it certainly requires them to speak and deliver their ideas. This kind of test,
in another sense, is a direct test. The test also has content validity as it
includes a proper sample of the relevant structures of discussion and has a
clear purpose of it that is to be able to convey their own ideas.
Reliability
The more similar the scores would
have been, the more reliable the test is said to be (Hughes, 2003). There are
two components of reliability: the performance of candidates from occasion to
occasion, and the reliability of the scoring. In the same way, to make an oral
test reliable, testers should try out to achieve consistent performances from
candidates and to achieve scoring reliability.
There are some ways of achieving
consistent performances from candidates to make candidates perform consistently
in an oral test. Firstly, use more items in an oral test, for the more items a
test has, the more reliable that test will be. “It has been demonstrated
empirically that the addition of further items will make a test more reliable”
(Hughes, 2000, p.36). However, one thing to bear in mind is that the additional
items should be independent of each other and of existing items. And each
additional item should as far as possible represent a fresh start for the
candidate. In an interview used to test oral ability, the candidate should be
given as many fresh starts as possible. By doing so additional information on
the candidates may be gained which will make the results of the oral test more
reliable. Secondly, provide clear and explicit instructions so that candidates
can avoid introducing confusion. Thirdly, candidates should be familiar with
the format and testing techniques. Thus efforts must be made to ensure that all
candidates have the opportunity to learn just what will be required of them.
Fourthly, uniform and non-distracting conditions of administration should be
provided.
In discussion and role-play
assessment literature, inter-rater reliability is the most published
psychometric property (McNamara and Blumer, 2009). Inter-rater reliability is
important in communicating the degree to which different raters agree with each
other in their evaluations of behavior in role-play assessments. To establish
inter-rater reliability, raters select a sample and independently score
recorded role-play scenarios after familiarizing themselves with operational
definitions and criterion. The scores are them correlated to provide a measure
of agreement. The rater of this test is single rater, and then it will
be easier in measure the students.
For
the objectivity of the judgement, there are evaluation sheet of this
test that will be distributed to the students before the test is held. Thus the students can
learn the criteria which will be the assessment. In
terms of assessing speaking, the scoring of this skill tests can range from an
impression mark to a fairly detailed mark which is in form of scale, for
instance using a 4-point scale.
Below is the table of marking scheme or evaluation sheet.
Aspects
|
Criteria
|
Score
|
|||
1
|
2
|
3
|
4
|
||
Fluency
|
Speaking with many pauses
|
Speaking too slowly
|
Speaking generally at normal speed
|
Speaking fluently
|
|
Pronunciation
|
Speaking words incomprehensibly
|
Speaking with incorrect pronunciation but still understandable
|
Speaking with several incorrect pronunciation
|
Speaking with correct pronunciation
|
|
Accuracy
|
The serious errors present in speech makes the message difficult to
understand
|
The errors present in speech would frequently create confusion
|
The speech is still understood although it consists of many years
|
The errors present in speech are so minor so that the message would
be easily comprehended
|
|
Clarity
|
Often mumbles or cannot be understood, more than one mispronounced
words
|
Speaks clearly and distinctly most of the time, no more than one mispronounced
word
|
Speaks clearly and distinctly near all the time, no more than one
mispronounced word
|
Speaks clearly and distinctly
most of the time, and no mispronounced words
|
|
Performance skill
|
Speaking in volume which is almost inaudible, no facial expression
and not communicative
|
Mumbling, flat facial expression, and less communicative
|
Speaking in soft voice, but can be understood, good facial expression
|
Speaking clearly and loudly, good
facial expression and communicative
|
|
Content
|
Very poor; does not show knowledge of subject; non-substantive; not
pertinent
|
Fair to poor; limited knowledge of subject; little substance;
inadequate development of topic.
|
Good to average; some knowledge of subject; adequate range;
limited development of thesis; mostly relevant to topic;
|
Topic elaboration, organization, coherence and cohesion, suitable
linkers and connectors.
|
|
In
the context of Indonesia, the authors believe that the type of test discussed
above could be successfully used in the assessment of students who are learning
English for certain micro and macro skills. They are expected to be able to
communicate naturally in English as well as convey their ideas, notions, and
argumentations regarding certain issue. Concisely, the test that is being
promoted is both valid and reliable.
Practicality
As
Brown (2004) notes, practicality involves questions of economy, ease of
administration, scoring and interpretation of results (Brown, 2004). The
economy refers to the money that was spent to make the test. In
some speaking test, especially the format of the test is interview, are cost a
lot of money because performance test such as speaking with interview test
that have to conduct the training for that raters and also to pay the raters
itself. However this
test involves small pairs of students. One pair of students will not take time
so long. It is about ten minutes each pair and will be 150 minutes for all
pairs (a class).
The
material that will be used in this test is
some pieces of papers that is easy to get. Thus this test considers practical.
Washback
Speaking test in general has the
positive washback effect of encouraging students to study the language skills
required for oral communication. As Weir (2004:103) pointed out: “If we wish to
make statements about capacity for spoken interaction we are no longer
interested in multiple-choice, pencil-and-paper tests, that is, indirect tests
of speaking where spoken language is conspicuously absent.” In line with this
perspective, the writer proposes that the task-oriented approach should be
especially considered in the test of spoken English as a type of performance
test to assess language proficiency by asking students to give productive and
interactive performance. It is believed that, if well designed according to the
needs of various real situations, the task approach, as an option of speaking
test, not only increases test authenticity and validity but also can be used to
considerably reduce most test deficiency such as halo effect, interview bias,
intra-rater reliability, and so on.
The
test which has been designed will bring positive washback as the testing
formats and procedures stimulate more proper teaching practices; for example, an oral
proficiency test is introduced because the teacher expects the students to be
able to speak in a natural setting (Taylor, 2005). Therefore, from that
statement, the test
which has been designed has more positive backwash.
References:
Brown, J. D. 1996. Testing in
Language Programs. Upper Saddle River, New Jersey:
Prentice-Hall Regents.
Heaton,
J. B.. 1995. Writing English Language Tests--New edition. United States
of America: Longman.
Hughes, A. (1989). Testing for Language Teachers.
Cambridge: Cambridge University Press
Hughes, Arthur. 2003. Testing for
Language Teachers-Second Edition. Cambridge: Cambridge University Press.
McNamara, Tim. 2009. Language Testing. London: Oxford University
Press.
Weir, C,& Taylor. L. 2005. Language Testing and Validation: An
Evidence-Based Approach. London: Palgrave Macmilan.
No comments:
Post a Comment