The University of Pittsburgh English Language Institute Corpus
PELIC Dataset repository on GitHub
Contact: Dr. Na-Rae Han (naraehan@pitt.edu)
Welcome to the homepage for the The University of Pittsburgh English Language Institute Corpus (PELIC). Here you will find information about the people involved with PELIC and the research using these data.
If you are interested in learning more about PELIC or downloading the dataset, please visit the PELIC-dataset GitHub repository which contains the publicly available data, a detailed description of the corpus, frequency statistics, and a set of lexical tools and tutorials.
The University of Pittsburgh English Language Institute Corpus (PELIC) is a large learner corpus of written and spoken texts. These texts were collected in an English for Academic Purposes (EAP) context over seven years in the University of Pittsburgh’s Intensive English Program, and were produced by students with a wide range of linguistic backgrounds and proficiency levels. PELIC is longitudinal, offering greater opportunities for tracking development in a natural classroom setting.
This webpage provides information about the people involved with and the research resulting from PELIC. Where possible, PDFs of the research are provided.
First and foremost, we wish to thank and acknowledge the students of the ELI for their spoken and written contributions to PELIC, and for graciously allowing us to publicly share this data. We hope that the research stemming from their work will improve the quality of learning and teaching for other students in similar contexts who are striving to attain academic readiness.
We also wish to thank the teachers and administrators of the ELI for their assistance in diligently collecting the data from the students.
The ELI Data Mining Group is a research group in the Department of Linguistics at the University of Pittsburgh. Their focus is on applying computational methods to the PELIC dataset.
We would like to thank the National Science Foundation for their grant via the Pittsburgh Science of Learning Center, funded award number SBE-0836012. (Previously NSF award number SBE-0354420.)
There have been numerous people involved throughout the process of compiling and creating PELIC, assisting with data collection, coding, etc. In particular, we wish to acknowledge the following programmers for their important roles: Ben Madore, Shanwen Yu, and Michael Nugent.
Naismith, B., Juffs, A., Han, N.-R., Zheng, D. (2022). Handle it in-house? Learner corpora frequency lists and lexical sophistication. International Journal of Corpus Linguistics.
Naismith, B., Han, N.-R., & Juffs, A (2022). The University of Pittsburgh English Language Institute Corpus (PELIC). International Journal of Learner Corpus Research, 8(1), pp. 121-138.
Naismith, B., & Juffs, A (2021). Finding the sweet spot: Learners’ productive knowledge of mid-frequency lexical items. Language Teaching Research.
Vercellotti, M. L., Juffs, A., & Naismith, B. (2021). Multiword sequences in L2 English language learners’ speech: The relationship between trigrams and lexical variety across development. System.
Juffs, A. (2020). Aspects of Language Development in an Intensive English Program. New York: Routledge. Hardback: 9781138048362.
Naismith, B., & Kanwit, M. (2020). A Corpus Study of the English Suffixes -ness and -acy: Productivity, Genre, and Implications for L2 Learning. Canadian Journal of Applied Linguistics, 24(1), pp. 115-137.
Juffs, A. (2019). Lexical development in the writing of English Language Program Students. In R. M. DeKeyser & G. P. Botana (Eds.), Reconciling methodological demands with pedagogical applicability (pp. 179-200). Amsterdam: John Benjamins.
Naismith, B. (2019). Lexical Sophistication Measurements: Applications in Teaching and Assessment. TESOL 2019 International Convention March 13th, 2019. Atlanta, GA.
Juffs, A., & Han, N-R. (2019). Combining Formal and Usage-Based Theories with Data Science Techniques in Measuring the Development of Syntactic Complexity in Written Production. American Association of Applied Linguistics, International Conference. March, 12, 2019, Atlanta, GA.
Naismith, B., Han, N.-R., Juffs, A., Hill, B. L., & Zheng, D. (2018). Accurate Measurement of Lexical Sophistication with Reference to ESL Learner Data. Educational Data Mining Conference 2018, Buffalo, NY. July 15-18, 2018.
Juffs, A. (2017). Moving generative SLA from knowledge of constraints to production data in educational settings. Journal of the Japanese Second Language Association, 16, 19-38.
Vercellotti, M. L. (2017). The development of complexity, accuracy and fluency in second language performance. Applied Linguistics, 38, 90-111.
Vercellotti, M. L., & Packer, J. (2016). Shifting structural complexity: The production of clause types in speeches given by English for academic purposes students. Journal of English for Academic Purposes, 22, 179-190.
Li, N. & Juffs, A. (2015). The influence of moraic structure on English L2 syllable final consonants. 2014 Annual Meeting on Phonology.
McCormick, D. E., & Vercellotti, M. L. (2013). Examining the Impact of Self-Correction Notes on Grammatical Accuracy in Speaking. TESOL Quarterly, 47(2), 410-420.
Spinner, P. (2011). Second language assessment and morphosyntactic development. Studies in Second Language Acquisition, 33, 529-561.
In addition to publications based on PELIC data, a number of studies and datasets have been published based on other data from this same population, i.e., students at the ELI:
Eskenazi, M. & Juffs, A. (2012). Information retrieval for reading tutors. The Encyclopedia of Applied Linguistics.
Heilman, M., Collins-Thompson, K., Callan, J., Eskenazi, M., Juffs, A., & Wilson, L. (2010). Personalization of reading passages improves vocabulary acquisition. International Journal in Artificial Intelligence in Education, 20(1), 73-98.
Heilman, M., Juffs, A., & Eskenazi, M. (2007). Choosing Reading Passages for Vocabulary Learning by Topic to Increase Intrinsic Motivation. Frontiers in Artificial Intelligence and Applications, 157, 566-568.
Juffs, A. & Friedline, B. E. (2014). Sociocultural influences on the use of a web-based tool for learning English vocabulary. System, 42(2), 137-166.
PELIC data can also be found in the following locations: