The Institute plays an important role in measuring the efficacy of instruction and capturing the mission readiness of the force.
Most federal government agencies rely on the Defense Language Proficiency Test, or DLPT, and the Oral Proficiency Interview – OPI, which are reliable, scientifically validated tools for testing language ability of DOD personnel worldwide.
The Defense Language Proficiency Test guides are intended to provide information about DLPT5 tests, and include sample items with explanations.
Please note that all provided information is in the Portable Document Format (PDF) and that you will need a compatible reader in order to open those files. Download free PDF Reader software.
Q. Why did we change to DLPT5? What was wrong with DLPT IV?
A. Every 10 to 15 years, as our understanding of the ILR, language testing, and the needs of the government change, the DLPTs have been updated. The DLPT5 is just the latest version. The DLPT IV series tests were developed at a time when listening to authentic material was not considered as important as it is today (and it was not as easy to obtain authentic material as it is today). The very short length of the passages in DLPT IV also did not allow the test to cover as many of the aspects of the ILR skill level descriptions as was desired. The DLPT5 specifications address those issues.
Q. Why are scores lower on the DLPT5 than on older DLPTs?
A. No language test is 100% accurate. So when calibrating a language test, the question is always whether the error should be in the direction of being generous or strict. Older DLPTs were calibrated so that the error would be in the direction of being generous. With DLPT5, the post-9/11 emphasis on readiness resulted in a desire to have any error be in the direction of being strict.
Each new generation of DLPTs has incorporated innovations. Sometimes these innovations involve aspects of language proficiency that test-takers had not previously had to deal with on DLPTs, and that may not have been emphasized in training or self-study programs.
Q. If an older DLPT is replaced by a DLPT5, are old DLT scores still valid? And how soon do people have to take the DLPT5?
A. Scores are valid as long as they were in the past, typically one year. If you took the DLPT IV in April of one year, and the DLPT5 rolled out in May, your DLPT IV score is good until the following April, as usual; and at that point you would take the DLPT5.
Q. Are there materials available to help the examinee prepare for the new tests?
A. Yes. Most languages have a language-specific guide that includes samples in the target language. In most cases, the language-specific guide will be available at the same time, or some time before, the test itself is rolled out. Some languages do not have language-specific guides, but there are generic guides for multiple-choice DLPT5s and for constructed-response DLPT5s; these guides contain the same information as the language-specific guides, but have samples in English. In addition, there is an interactive demo multiple-choice test available on this site that you can use to familiarize yourself with the test’s appearance on the screen and how to work through the test (although this demo does not replicate the DLPT5 in terms of numbers of questions or distribution of questions by level). An interactive CRT version is in the works; in the meantime, the interface for multiple-choice and CRT is very similar, so those who will take the CRT DLPT5 may get some benefit from the demo as well.
Q. What range of ILR levels is covered by DLPT5?
A. DLPT5s are available either in a lower-range test that gives scores from 0+ to 3, or in an upper-range test that gives scores from 3 to 4. Some languages have only a lower-range test; some have only an upper-range test; and some have both.
Q. How do I become eligible to take the upper-range DLPT5?
A. To take an upper-range DLPT5, you must have a score of level 3 on the lower-range DLPT5. (Exception: if there is no lower-range DLPT5, a score of level 3 on a lower-range DLPT IV is used.) You do not need to have a score of level 3 on both skills. If you have a 2+ in listening and a 3 in reading, for example, you may take the upper-range test in reading (but not in listening).
Q. What if I take the upper-range test and do poorly on it?
A. The lowest possible score on an upper-range test is a level 3. Test-takers for upper-range tests are all certified to be at least level 3 by the lower-range test; therefore, doing poorly on the upper-range test simply means that you are not above level 3.
Q. Are questions and answers in the target language or in English? Why?
A. In English. The only part of a DLPT5 that is in the target language is the passage on which the questions are based.
For lower-range tests, we want to test examinees’ ability to understand the target-language passage; if we had test questions in the target language, some test-takers might understand the passage but not understand the question or answer choices, and they would get the question wrong unfairly. For upper-range tests, this is not a problem, but our testing experts do not have access to all the languages we test, so in order to make sure the test questions are working properly we need to use English. In addition, anyone taking the upper-range test is expected to understand the target language but produce reports in English, so we are assuming that all examinees have good English skills and the ability to move back and forth between the target language and English.
Q. Some high-level passages do not seem to have the elevated style I would expect. How do you account for this?
A. The test developers are trained to select passages based on text typology. They consider the text mode (for instance, is the purpose of the passage to persuade, or to inform?), the language features (lexicon and syntax, for example), as well as to consider the functions, content and accuracy components of the skill level descriptions. At high levels there is a great diversity of types of texts and uses of language; for example, level 4 people should understand both formal academic language and slang or non-standard dialect.
Q. Who writes the tests?
A. Each DLPT5 is developed by a team consisting of two or more speakers of the target language (almost always native speakers of the target language), plus a project manager who is an experienced test developer. In addition, each test has several reviewers, at least one of whom must be a native speaker of English.
Q. Why do some languages have a CRT test and some MC tests? Will any have both?
A. The type of test available depends on the size of the population of test-takers. Multiple-choice tests are preferable because they can be scored automatically; however, in order to generate the statistical information needed to calibrate these tests, we need a large number of people (at least 100, preferably 200 or more) to take the validation form of the tests. For languages for which we cannot get this many people to take a validation form, constructed-response tests are developed. Constructed-response tests do not require the large-scale statistical analysis; the disadvantage is that they must be scored by human raters and so take more time and personnel after the administration. Some languages have a multiple-choice test for the lower-range test and a constructed-response test for the upper-range test; these are languages in which there are plenty of potential test-takers at level 3 and below, but very few above level 3. CRT and MC are never mixed on any one single test: each form is either all CRT or all MC.
Q. Will all languages eventually have Computer Adaptive DLPT5s?
A. No. Computer adaptive tests are only possible for multiple-choice tests; at this time it is anticipated that we will have computer-adaptive tests only in the largest-population languages (probably Arabic-Modern Standard, Korean, Chinese-Mandarin, Spanish, and Russian).
Q. How many forms are available in each language? How often are they updated?
A. Most DLPT5s have two forms. Additional forms are under development for some languages. An update schedule for new forms has not been determined, but it is anticipated that forms will be updated more often for the languages with many linguists than for those with few linguists.
Q. Is the CRT an essay test?
A. No. At the lowest levels, examinees only need to type a few words or a sentence. At higher levels, they may need to type several sentences, but the questions are designed for short answers, not connected prose. The quality of examinees’ writing is not evaluated, only their ideas.
Q. Will people be penalized for poor English on the CRT responses?
A. No. As long as the scorers can understand the ideas being conveyed, it does not matter if the examinee uses correct punctuation, spelling, or grammar.
Q. Do I need to answer all the questions on a CRT?
A. ILR ratings on CRTs are determined by going up one level at a time, so if an examinee has level 2 proficiency, that examinee’s answers on the level 3 questions will not affect his/her score. However, the examinee should answer all the level 1 and 1+ questions. The bottom line is that you should answer all the questions you can, but if you reach a point in the test where you really have no idea what the answer is for several questions in a row, it won’t matter if you stop answering at that point.
Q. What are the procedures for grading CRTs?
A. Each question has a scoring protocol that indicates which ideas are to be given credit. Examinees do not have to match the wording in the protocol exactly, just the idea that is expressed there. Trained scorers work from the protocol to mark answers right or wrong; there is no partial credit. Each test is scored independently by two scorers. If the scorers end up disagreeing, an expert third rater scores the test to determine the level. Examinees’ level scores are determined by the number of questions they got right at each level; in general, they need to get 75% of the questions right at a level in order to be awarded that level.
Q. How are raters trained?
A. Raters are employees of the Evaluation and Standardization Directorate at DLIFLC, or, for tests administered to civilians in the intelligence community, the raters are trained directly by experts at the Evaluation and Standardization Directorate. All raters are given a one-day training session in which they learn how to read the scoring protocols and assign levels. Because questions and answers are in English, a rater, once trained, can score tests for any language. After training, new raters begin scoring; the second scorer for the tests they rate is an experienced scorer. Periodic analyses of rater tendencies are conducted, and rater support sessions are provided if raters seem to be slipping. Regular re-norming sessions for all raters are planned.
Q. How is subjectivity mitigated in CRT grading?
A. As described above, there is a protocol for each test specifying the range of acceptable answers for each question. Raters must follow the protocols, and are trained to do so. So for any given test-taker response, any given rater is likely to rate that response the same way. Also as described above, two raters rate independently, and statistics are kept of agreement and rater tendencies, so that raters who are inconsistent are retrained or removed from the rater pool.
Q. How long will it take to get score reports?
A. Scores are typically generated within one week of test administration. Time may vary depending on availability of raters.
Q. Does focusing my thoughts using the answer choices still work on the DLPT5?
A. Yes. Test-takers who prefer to read the question and all the answer choices before reading or listening to the passage may still do so, and those who feel that this is useful on earlier versions of the DLPT will probably still feel it is useful on the DLPT5.
Q. Is there the possibility of several correct answers on the multiple-choice test?
A. No. Each question has just one correct answer. The statistical analysis performed during validation reveals any questions for which high-ability examinees are divided between two or more answers, and those questions are not used in the operational forms of the test. Examinees below the proficiency level targeted by the passage and question may see several answer choices as correct, because the questions are written with the idea that examinees below the level should not be able to guess the correct answer. But examinees at or above the level will be able to find the single correct answer.
Q. If there is a question that a majority of examinees get wrong, will it be thrown out?
A. Not necessarily. Statistical analysis involves more than difficulty. If 200 test-takers take the validation form, and only 20 of those 200 are level 3, then it is expected that any given level 3 item should only be answered correctly by those 20 people, plus 25% of the remaining people who would get it right by chance: that’s a total of about 65 out of 200, much less than a majority. However, a question at level 1 should be answered correctly by most test-takers; questions at the low levels that are much more difficult than expected are thrown out.
Q. How are the cut scores for multiple-choice tests determined?
A. In the judgment of ILR experts, a person at a given ILR level should be able to answer correctly at least 70% of the multiple-choice questions at that level. Using Item Response Theory (a statistical method typically used for high-stakes, large-scale tests), the DLIFLC psychometrician goes level by level for all acceptable questions in the validation pool and, for each level, calculates an ability indicator corresponding to the ability to answer 70% of the questions at that level. This computation is then applied to the questions on specific operational forms to generate cut scores.
Q. Does the Chinese-Mandarin DLPT5 have both traditional and simplified characters?
A. Yes. The proportion of passages using traditional characters is low at the lowest levels and increases to 50% at the highest levels.
Q. If there is a power outage or network failure, is the test data saved real time so that you can come back to the same test?
A. Yes. Every time an examinee clicks on the [Next] button to advance to the next passage, the examinee’s responses are saved, and when the examinee re-starts the test, the passage the examinee was working on at the time of the outage is displayed. The timer is set to where it was at the time the [Next] button was last clicked, so the examinee does not lose any time.
Q. Do I get a choice whether to take a test on computer or on paper?
A. No. DLPT5s do not have paper versions. As for earlier generations of DLPTs, once they have been converted to computer delivery, the paper versions are retired from use.
Q. Do I have to go to a test center to take the test?
A. Yes. There are required software, hardware, and test security specifications that prevent tests from being given on private computers.
Q. Do I have to take the listening and reading tests back to back, on the same day?
A. No. The listening and reading tests are considered as separate tests and can be administered completely separately.
Q. How much time is allowed for each test, and how was that amount of time determined?
A. Each test is allotted three hours. DLIFLC conducted several timing studies for both multiple-choice and constructed-response tests to see how long it took examinees to answer; based on that, we set initial test times, which we then refined after additional studies. The DLPT5 is designed so that examinees have enough time to answer all the questions, regardless of whether it’s a multiple-choice or constructed-response test. It is expected, however, that examinees will use good time management; examinees who try to translate entire passages in order to answer a constructed-response question will run out of time.
A 15-minute break is programmed into each test at approximately the halfway point; this 15 minutes does not count toward the three-hour time limit. Time spent reading the introductory screens and sample passages also does not count toward the three-hour time limit.
Note on older DLPTs converted to Web delivery: Even though these tests are scheduled for less time in their paper versions, examinees get the same three-hour time limit as for the DLPT5s. Examinees who finish early may leave the testing room.
Q. Is note-taking allowed?
A. Note-taking on paper is never allowed. On constructed-response tests, examinees type responses in text boxes on the computer. If they wish to type notes in these boxes before typing in their answers, they are free to do so.
Q. Is there a chance to go over one’s errors on the DLPT5?
A. As with all DLPTs, the DLPT5 does not allow examinees to see which questions they got right and which ones they got wrong. The reason for this is that the DLPTs are designed to test general proficiency, not specific learning. If examinees were allowed to know the content of the test, they would study for that specific content, and the test would no longer assess general proficiency.
Q. Can I go back and change answers? Can I review answers at the end of the test?
A. Yes. While you are still on the passage screen, you can modify your responses however you like (clicking on different answer choice buttons, deleting and adding text in the constructed-response text boxes). When you reach the end of the test, there is a review screen that allows you to go back to any passage and check, add, or change your answers. Note that this function does not allow you to hear the audio again on the listening test, although the reading passages are visible on the reading test. Additionally, on the reading test there is a “Back” button that you can use as you are in the middle of the test to move to a previous passage.
Q. What process is in place to make sure service members’ scores are included in their service record? Can test administrators log in and get scores from that web based system?
A. The process for getting scores into service members’ service records is the same as for earlier generations of the DLPT: scores are typically input into a DA-330 and then input into the Personnel database. Test administrators and Test Control Officers can get scores directly from the DLPT5 Web-based system for linguists in the relevant branch, even if the service record has not yet been updated.
Q. Have we done enough validation to trust this test?
A. All DLPT5 passages and questions are reviewed by several experts in testing and by experts in the ILR proficiency scale. In addition, for languages with large linguist populations, such as Russian, we administer multiple-choice items to a large number of examinees at varying levels of proficiency. We then analyze response data and remove from the pool any questions that are not functioning appropriately. For constructed-response tests, which have small linguist populations, this level of analysis is not possible, but these tests do go through the same rigorous review process as the multiple-choice tests. The Defense Language Testing Advisory Project (DELTAP), a group of nationally-renowned psychometricians and testing experts, has reviewed our procedures and declared that they are good.
Q. How do you know that the scores won’t be affected by the change to the delivery platform?
A. Before implementing the test, DLIFLC had a comparability study in which examinees took one form of the Russian test on paper and another form on the computer. Study results showed no significant differences in scores. As an additional check, we are monitoring the statistical information on the operational tests, and will recalibrate if necessary.
Q. Does the DLPT5 address the problem of testing the memory rather than the listening skill?
A. The DLPT5 listening comprehension tests are designed to test general proficiency, not memory. The ability to process spoken language automatically, however, is an element of proficiency: an examinee whose listening proficiency is level 1 will probably be able to understand many words in a level 2 passage, but will not be able to process what he/she hears quickly enough to remember all the words and put them together in real time. A level 2 examinee will have the ability to do this. Although the passages are longer than for earlier generations of DLPTs, examinees may answer while the audio is still playing, and the questions (and answer choices, for multiple-choice tests) are visible the whole time. In addition, our statistical analysis shows that the questions are functioning as they should: a question that tested memory more than language proficiency would most likely have poor statistics and be removed from the pool before the test is actually assembled.
Q. How can a CRT score be equivalent to a MC score?
A. All DLPT5s are written according to the same standard, the Interagency Language Roundtable (ILR) Proficiency Skill Level Descriptions. All DLPT5s undergo the same review processes to ensure that passages and questions are at the level for which they are intended. The ways of translating what test-takers need to do on the test into an ILR score are different, but because both are valid ways of measuring ILR proficiency, both produce scores that can be relied on.