If you have any suggestions for additions to the lists below, please reach out to the organizers.
Publications
2023
- Ivan Habernal, Daniel Faber, Nicola Recchia, Sebastian Bretthauer, Iryna Gurevych, Indra Spiecker genannt Döhmann, Christoph Burchard: Mining Legal Arguments in Court Decisions. Artificial Intelligence and Law 2023
- Joel Niklaus, Veton Matoshi, Matthias Stürmer, Ilias Chalkidis, Daniel E. Ho: MultiLegalPile: A 689GB Multilingual Legal Corpus. DMLR@ICLR 2023
- Neel Guha, Julian Nyarko, Daniel E. Ho, Christopher Ré, Joel Niklaus, Megan Ma, Michael Livermore, Peter Henderson, Sean Rehaag, Sharad Goel, Shang Gao, Spencer Williams, Sunny Gandhi, Tom Zur, Varun Iyer, Zehua Li: LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
- Ilias Chalkidis*, Nicolas Garneau*, Catalina Goanta, Daniel Katz and Anders Søgaard: LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development. ACL 2023
2022
- Peter Henderson, Mark S. Krass, Lucia Zheng, Neel Guha, Christopher D. Manning, Dan Jurafsky, Daniel E. Ho: Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset. arXiv 2022.
- Yixiao Ma, Qingyao Ai, Yueyue Wu, Yunqiu Shao, Yiqun Liu, Min Zhang, Shaoping Ma: Incorporating Retrieval Information into the Truncation of Ranking Lists for Better Legal Search. SIGIR 2022
- Weijie Yu, Zhongxiang Sun, Jun Xu, Zhenhua Dong, Xu Chen, Hongteng Xu, Ji-Rong Wen: Explainable Legal Case Matching via Inverse Optimal Transport-based Rationale Extraction. SIGIR 2022
- Yi Feng, Chuanyi Li, Vincent Ng: Legal Judgment Prediction via Event Extraction with Constraints. ACL 2022
- Antoine Louis, Gerasimos Spanakis: A Statutory Article Retrieval Dataset in French. ACL 2022
- Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Martin Katz, Nikolaos Aletras: LexGLUE: A Benchmark Dataset for Legal Language Understanding in English. ACL 2022
- Ilias Chalkidis, Tommaso Pasini, Sheng Zhang, Letizia Tomada, Sebastian Schwemer, Anders Søgaard: FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing. ACL 2022
- Arnav Kapoor, Mudit Dhawan, Anmol Goel, Arjun T H, Akshala Bhatnagar, Vibhu Agrawal, Amul Agrawal, Arnab Bhattacharya, Ponnurangam Kumaraguru, Ashutosh Modi: HLDC: Hindi Legal Documents Corpus. ACL Findings 2022
- Feng Yao, Chaojun Xiao, Xiaozhi Wang, Zhiyuan Liu, Lei Hou, Cunchao Tu, Juanzi Li, Yun Liu, Weixing Shen, Maosong Sun: LEVEN: A Large-Scale Chinese Legal Event Detection Dataset. ACL Findings 2022
- Sophia Althammer, Sebastian Hofstätter, Mete Sertkan, Suzan Verberne, Allan Hanbury: PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval. ECIR 2022
2021
- Ilias Chalkidis, Manos Fergadiotis, Ion Androutsopoulos: MultiEURLEX -- A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer. EMNLP 2021
- Vijit Malik, Rishabh Sanjay, Shubham Kumar Nigam, Kripabandhu Ghosh, Shouvik Kumar Guha, Arnab Bhattacharya, Ashutosh Modi: ILDC for CJPE: Indian Legal Documents Corpus for Court Judgment Prediction and Explanation. ACL 2021
- Abhilasha Ravichander, Alan W Black, Thomas Norton, Shomir Wilson, Norman Sadeh: Breaking Down Walls of Text: How Can NLP Benefit Consumer Privacy?. ACL 2021
- Mukund Srinath, Shomir Wilson, C Lee Giles: Privacy at Scale: Introducing the PrivaSeer Corpus of Web Privacy Policies. ACL 2021
- Proceedings of the 8th Competition on Legal Information Extraction and Entailment (COLIEE-2021)
- Josef Valvoda, Tiago Pimentel, Niklas Stoehr, Ryan Cotterell, Simone Teufel: What About the Precedent: An Information-Theoretic Analysis of Common Law. NAACL 2021
- Ilias Chalkidis, Manos Fergadiotis, Dimitrios Tsarapatsanis, Nikolaos Aletras, Ion Androutsopoulos, Prodromos Malakasiotis: Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Cases. NAACL 2021
- Dan Hendrycks, Collin Burns, Anya Chen, Spencer Ball: CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review
- Lucia Zheng, Neel Guha, Brandon R. Anderson, Peter Henderson, Daniel E. Ho: When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings
- Julian Nyarko: Stickiness and Incomplete Contracts
- Noam Kolt: Predicting Consumer Contracts
- David A. Hoffman, Anton Strezhnev: Leases as Forms
2020
- NLLP 2020 Proceedings
- Andrea Galassi, Kasper Drazewski, Marco Lippi, Paolo Torroni: Cross-lingual Annotation Projection in Legal Texts. COLING 2020
- Phi Manh Kien, Ha-Thanh Nguyen, Ngo Xuan Bach, Vu Tran, Minh Le Nguyen, Tu Minh Phuong: Answering Legal Questions by Learning Neural Attentive Text Representation. COLING 2020
- Shirong Shen, Guilin Qi, Zhen Li, Sheng Bi, Lusheng Wang: Hierarchical Chinese Legal event extraction via Pedal Attention Mechanism. COLING 2020
- Yanguang Chen, Yuanyuan Sun, Zhihao Yang, Hongfei Lin: Joint Entity and Relation Extraction for Legal Documents with Legal Feature Enhancement. COLING 2020
- Prakash Poudyal, Jaromir Savelka, Aagje Ieven, Marie Francine Moens,Teresa Goncalves, Paulo Quaresma: ECHR: Legal Corpus for Argument Mining. COLING 2020, Workshop on Argument Mining
- Yiquan Wu, Kun Kuang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Jun Xiao, Yueting Zhuang, Luo Si, Fei Wu: De-Biased Court’s View Generation with Causality. EMNLP 2020
- Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos: LEGAL-BERT: The Muppets straight out of Law School. Findings of EMNLP 2020
- Łukasz Borchmann, Dawid Wisniewski, Andrzej Gretkowski, Izabela Kosmala, Dawid Jurkiewicz, Łukasz Szałkiewicz, Gabriela Pałka, Karol Kaczmarek, Agnieszka Kaliska, Filip Graliński: Contract Discovery: Dataset and a Few-Shot Semantic Retrieval Challenge with Competitive Baselines. Findings of EMNLP 2020
- Haoxi Zhong, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, Maosong Sun: Jec-qa: A legal-domain question answering dataset. AAAI 2020
- Haoxi Zhong, Yuzhong Wang, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, Maosong Sun: Iteratively Questioning and Answering for Interpretable Legal Judgment Prediction. AAAI 2020
- Nuo Xu, Pinghui Wang, Long Chen, Li Pan, Xiaoyan Wang, Junzhou Zhao: Distinguish Confusing Law Articles for Legal Judgment Prediction. ACL 2020
- Elliott Ash, Sam Asher, Aditi Bhowmick, daniel Chen, Tanaya Devi, Christoph Goessmann, Paul Novosad, Bilal Siddiqi: Measuring Gender and Religious Bias in the Indian Judiciary
2019
- NLLP 2019 Proceedings
- Abhilasha Ravichander, Alan W Black, Shomir Wilson, Thomas Norton and Norman Sadeh: Question Answering for Privacy Policies: Combining Computational and Legal Perspectives. EMNLP 2019
- Paheli Bhattacharya, Kaustubh Hiware, Subham Rajgaria, Nilay Pochhi, Kripabandhu Ghosh, Saptarshi Ghosh: A Comparative Study of Summarization Algorithms Applied to Legal Case Judgments. ECIR 2019
2018
- Haoxi Zhong, Zhipeng Guo, Cunchao Tu, Chaojun Xiao, Zhiyuan Liu, Maosong Sun: Legal Judgment Prediction via Topological Learning. EMNLP 2018
Data sets
- CaseHold Data set
- Open Legal Data project (German legal data)
- EURLEX57K dataset [Chalkidis et al, 2019]
- Datasets from Lynx project
- Contract Discovery corpus [Borchmann et al, 2020]
- Named Entity Recognition (NER) data set for German [Leitner et al, 2020]
- Cases from the European Court of Human Rights and their outcomes [Chalkidis et al, 2019]
- Cases from the European Court of Human Rights and their outcomes (smaller data set) [Aletras et al, 2016]
- The Free Law Project
- US Case Law data and API access (Caselaw Access Project)
- Exchanges between speakers in U.S. Supreme Court Oral Argument
- Corpus of US Supreme Court Opinions (BYU)
- The Supreme Court Database (Washington University)
- US Supreme Court Cases (Oyez)
- Proceedings of the Old Bailey [Huber, 2007]
- Financial Statements and Notes data (Edgar filings 2009-2018)
- US Patent Citations Data
- UK Parliamentary debates
- UK Parliamentary debates (code and data) Odell, 2017
- European Parliament Proceedings 1996-2011 (Europarl)
- Canadian Parliament Proceedings
- Polish Parliamentary Corpus 1919-2018 Ogrodniczuk 2018
- Website privacy policies annotated for data practices (Usable Privacy Policy Project)
- US Congressional Bill Corpus 1993-2010 [Yano et al, 2012]
- Securities Class Action Clearinghouse (Stanford Law)
- FairLex: Multilingual Legal Fairness Benchmark[[Chalkidis et al, 2022]](https://arxiv.org/abs/2306.02069
- LexFiles: Multinational Legal Corpora[Chalkidis*, Garneau* et al, 2023]
- LegalLAMA: Legal NLU Benchmark[Chalkidis*, Garneau* et al, 2023]
- MultiLegalPile: Multilingual Legal Corpora[Niklaus et al, 2023]
- LegalBench: Legal Benchmark for LLMs[Guha et al, 2023]
Models
- Legal BERT [Chalkidis et al, 2020]
- CaseLaw BERT[Zheng et al, 2021 - using Harvard Law case corpus]
- NER Models for legal entities in German Leitner et al, 2020
- Legal GPT-1 and GPT-2 [Borchmann et al, 2020]
- Legal RoBERTa and Longformer[Chalkidis*, Garneau* et al, 2023]
- Legal XLM-R[Niklaus et al, 2023]
Related events and workshops
- Online Workshop on the Computational Analysis of Law (OWCAL 2021)
- Eigth Competition on Legal Information Extraction and Entailment (COLIEE 2021)
- Artificial Intelligence for Legal Assistance Shared Tasks 2021
- International Conference on Artificial Intelligence and Law 2021 (ICAIL 2021)
- NLLP 2019 Workshop (NAACL 2019)
- AI4LEGAL Workshop (ISWC 2019)
- International Conference on Legal Knowledge and Information Systems 2018 (JURIX 2018)
- Workshop on Language Resources and Technologies for the Legal Knowledge Graph (@LREC 2018)
- Workshop on Automated Semantic Analysis of Information in Legal Texts (ASAIL 2017)
- Workshop on Automated Detection, Extraction and Analysis of Semantic Information in Legal Texts (ICAIL 2015)
- TREC Legal Track
- Artificial Intelligence and the Law - Springer