If you have any suggestions for additions to the lists below, please reach out to the organizers.
Publications
2022
- Peter Henderson, Mark S. Krass, Lucia Zheng, Neel Guha, Christopher D. Manning, Dan Jurafsky, Daniel E. Ho: Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset. arXiv 2022.
- Yixiao Ma, Qingyao Ai, Yueyue Wu, Yunqiu Shao, Yiqun Liu, Min Zhang, Shaoping Ma: Incorporating Retrieval Information into the Truncation of Ranking Lists for Better Legal Search. SIGIR 2022
- Weijie Yu, Zhongxiang Sun, Jun Xu, Zhenhua Dong, Xu Chen, Hongteng Xu, Ji-Rong Wen: Explainable Legal Case Matching via Inverse Optimal Transport-based Rationale Extraction. SIGIR 2022
- Yi Feng, Chuanyi Li, Vincent Ng: Legal Judgment Prediction via Event Extraction with Constraints. ACL 2022
- Antoine Louis, Gerasimos Spanakis: A Statutory Article Retrieval Dataset in French. ACL 2022
- Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Martin Katz, Nikolaos Aletras: LexGLUE: A Benchmark Dataset for Legal Language Understanding in English. ACL 2022
- Arnav Kapoor, Mudit Dhawan, Anmol Goel, Arjun T H, Akshala Bhatnagar, Vibhu Agrawal, Amul Agrawal, Arnab Bhattacharya, Ponnurangam Kumaraguru, Ashutosh Modi: HLDC: Hindi Legal Documents Corpus. ACL Findings 2022
- Feng Yao, Chaojun Xiao, Xiaozhi Wang, Zhiyuan Liu, Lei Hou, Cunchao Tu, Juanzi Li, Yun Liu, Weixing Shen, Maosong Sun: LEVEN: A Large-Scale Chinese Legal Event Detection Dataset. ACL Findings 2022
- Sophia Althammer, Sebastian Hofstätter, Mete Sertkan, Suzan Verberne, Allan Hanbury: PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval. ECIR 2022
2021
- Ilias Chalkidis, Manos Fergadiotis, Ion Androutsopoulos: MultiEURLEX -- A multi-lingual and multi-label legal document classification dataset for zero-shot cross-lingual transfer. EMNLP 2021
- Vijit Malik, Rishabh Sanjay, Shubham Kumar Nigam, Kripabandhu Ghosh, Shouvik Kumar Guha, Arnab Bhattacharya, Ashutosh Modi: ILDC for CJPE: Indian Legal Documents Corpus for Court Judgment Prediction and Explanation. ACL 2021
- Abhilasha Ravichander, Alan W Black, Thomas Norton, Shomir Wilson, Norman Sadeh: Breaking Down Walls of Text: How Can NLP Benefit Consumer Privacy?. ACL 2021
- Mukund Srinath, Shomir Wilson, C Lee Giles: Privacy at Scale: Introducing the PrivaSeer Corpus of Web Privacy Policies. ACL 2021
- Proceedings of the 8th Competition on Legal Information Extraction and Entailment (COLIEE-2021)
- Josef Valvoda, Tiago Pimentel, Niklas Stoehr, Ryan Cotterell, Simone Teufel: What About the Precedent: An Information-Theoretic Analysis of Common Law. NAACL 2021
- Ilias Chalkidis, Manos Fergadiotis, Dimitrios Tsarapatsanis, Nikolaos Aletras, Ion Androutsopoulos, Prodromos Malakasiotis: Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Cases. NAACL 2021
- Dan Hendrycks, Collin Burns, Anya Chen, Spencer Ball: CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review
- Lucia Zheng, Neel Guha, Brandon R. Anderson, Peter Henderson, Daniel E. Ho: When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset of 53,000+ Legal Holdings
- Julian Nyarko: Stickiness and Incomplete Contracts
- Noam Kolt: Predicting Consumer Contracts
- David A. Hoffman, Anton Strezhnev: Leases as Forms
2020
- NLLP 2020 Proceedings
- Andrea Galassi, Kasper Drazewski, Marco Lippi, Paolo Torroni: Cross-lingual Annotation Projection in Legal Texts. COLING 2020
- Phi Manh Kien, Ha-Thanh Nguyen, Ngo Xuan Bach, Vu Tran, Minh Le Nguyen, Tu Minh Phuong: Answering Legal Questions by Learning Neural Attentive Text Representation. COLING 2020
- Shirong Shen, Guilin Qi, Zhen Li, Sheng Bi, Lusheng Wang: Hierarchical Chinese Legal event extraction via Pedal Attention Mechanism. COLING 2020
- Yanguang Chen, Yuanyuan Sun, Zhihao Yang, Hongfei Lin: Joint Entity and Relation Extraction for Legal Documents with Legal Feature Enhancement. COLING 2020
- Prakash Poudyal, Jaromir Savelka, Aagje Ieven, Marie Francine Moens,Teresa Goncalves, Paulo Quaresma: ECHR: Legal Corpus for Argument Mining. COLING 2020, Workshop on Argument Mining
- Yiquan Wu, Kun Kuang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Jun Xiao, Yueting Zhuang, Luo Si, Fei Wu: De-Biased Court’s View Generation with Causality. EMNLP 2020
- Ilias Chalkidis, Manos Fergadiotis, Prodromos Malakasiotis, Nikolaos Aletras, Ion Androutsopoulos: LEGAL-BERT: The Muppets straight out of Law School. Findings of EMNLP 2020
- Łukasz Borchmann, Dawid Wisniewski, Andrzej Gretkowski, Izabela Kosmala, Dawid Jurkiewicz, Łukasz Szałkiewicz, Gabriela Pałka, Karol Kaczmarek, Agnieszka Kaliska, Filip Graliński: Contract Discovery: Dataset and a Few-Shot Semantic Retrieval Challenge with Competitive Baselines. Findings of EMNLP 2020
- Haoxi Zhong, Chaojun Xiao, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, Maosong Sun: Jec-qa: A legal-domain question answering dataset. AAAI 2020
- Haoxi Zhong, Yuzhong Wang, Cunchao Tu, Tianyang Zhang, Zhiyuan Liu, Maosong Sun: Iteratively Questioning and Answering for Interpretable Legal Judgment Prediction. AAAI 2020
- Nuo Xu, Pinghui Wang, Long Chen, Li Pan, Xiaoyan Wang, Junzhou Zhao: Distinguish Confusing Law Articles for Legal Judgment Prediction. ACL 2020
- Elliott Ash, Sam Asher, Aditi Bhowmick, daniel Chen, Tanaya Devi, Christoph Goessmann, Paul Novosad, Bilal Siddiqi: Measuring Gender and Religious Bias in the Indian Judiciary
2019
- NLLP 2019 Proceedings
- Abhilasha Ravichander, Alan W Black, Shomir Wilson, Thomas Norton and Norman Sadeh: Question Answering for Privacy Policies: Combining Computational and Legal Perspectives. EMNLP 2019
- Paheli Bhattacharya, Kaustubh Hiware, Subham Rajgaria, Nilay Pochhi, Kripabandhu Ghosh, Saptarshi Ghosh: A Comparative Study of Summarization Algorithms Applied to Legal Case Judgments. ECIR 2019
2018
- Haoxi Zhong, Zhipeng Guo, Cunchao Tu, Chaojun Xiao, Zhiyuan Liu, Maosong Sun: Legal Judgment Prediction via Topological Learning. EMNLP 2018
Data sets
- CaseHold Data set
- Open Legal Data project (German legal data)
- EURLEX57K dataset [Chalkidis et al, 2019]
- Datasets from Lynx project
- Contract Discovery corpus [Borchmann et al, 2020]
- Named Entity Recognition (NER) data set for German [Leitner et al, 2020]
- Cases from the European Court of Human Rights and their outcomes [Chalkidis et al, 2019]
- Cases from the European Court of Human Rights and their outcomes (smaller data set) [Aletras et al, 2016]
- The Free Law Project
- US Case Law data and API access (Caselaw Access Project)
- Exchanges between speakers in U.S. Supreme Court Oral Argument
- Corpus of US Supreme Court Opinions (BYU)
- The Supreme Court Database (Washington University)
- US Supreme Court Cases (Oyez)
- Proceedings of the Old Bailey [Huber, 2007]
- Financial Statements and Notes data (Edgar filings 2009-2018)
- US Patent Citations Data
- UK Parliamentary debates
- UK Parliamentary debates (code and data) Odell, 2017
- European Parliament Proceedings 1996-2011 (Europarl)
- Canadian Parliament Proceedings
- Polish Parliamentary Corpus 1919-2018 Ogrodniczuk 2018
- Website privacy policies annotated for data practices (Usable Privacy Policy Project)
- US Congressional Bill Corpus 1993-2010 [Yano et al, 2012]
- Securities Class Action Clearinghouse (Stanford Law)
Models
- Legal BERT [Chalkidis et al, 2020]
- Legal BERT[Zheng et al, 2021 - using Harvard Law case corpus]
- NER Models for legal entities in German Leitner et al, 2020
- Legal GPT-1 and GPT-2 [Borchmann et al, 2020]
Related events and workshops
- Online Workshop on the Computational Analysis of Law (OWCAL 2021)
- Eigth Competition on Legal Information Extraction and Entailment (COLIEE 2021)
- Artificial Intelligence for Legal Assistance Shared Tasks 2021
- International Conference on Artificial Intelligence and Law 2021 (ICAIL 2021)
- NLLP 2019 Workshop (NAACL 2019)
- AI4LEGAL Workshop (ISWC 2019)
- International Conference on Legal Knowledge and Information Systems 2018 (JURIX 2018)
- Workshop on Language Resources and Technologies for the Legal Knowledge Graph (@LREC 2018)
- Workshop on Automated Semantic Analysis of Information in Legal Texts (ASAIL 2017)
- Workshop on Automated Detection, Extraction and Analysis of Semantic Information in Legal Texts (ICAIL 2015)
- TREC Legal Track
- Artificial Intelligence and the Law - Springer