About NLLP

Information about the NLLP community


The NLLP community aims to bring together researchers and practitioners from Natural Language Processing (NLP) and the legal domain who work on methods and applications of Natural Language Processing by focusing on legal text and text with legal significance.

Legal text has distinct characteristics such as specialised vocabulary, particularly formal syntax, domain-specific semantics etc., to the extent that legal language is often classified as a 'sublanguage' which makes it challenging for generic NLP tools to work accurately.

We consider legal text to include litigation-related corpora such as dockets, opinions and court transcripts but also corpora based on patents, briefs, public financial filings, civil code, local ordinances, privacy policies, law enforcement records, congressional records and speeches.

Developing novel NLP for legal text methods holds the promise of transforming the legal services sector, with the US Legal Services market alone valued at $211 billion according to US government price indices.

On the other hand, the Internet is full of text with legal significance (e.g. advertising language, dark patterns), especially since national and supranational regulators have been paving the way to a new market focused on public interest technology such as consumer forensics.

The interest of public authorities in developing market monitoring tools is at an all-time high, yet relevant multidisciplinary research expertise remains scattered.


As electronic information becomes increasingly available around the world, automated tools for processing that information have grown apace. These tools can be especially effective and time-saving on text where information can be distilled in interesting ways including auto-summarization, named-entity extraction, machine translation, sentiment analysis, topic classification and others. As a result, natural language processing (NLP) applications are popular in important commercial contexts such as finance and healthcare.

The Legal domain however is still largely underrepresented in the NLP literature despite its enormous potential for generating interesting research problems on a par with other important commercial areas.

The accessibility of legal texts in the US in particular was an issue in the past preventing some researchers from working on legal NLP problems. Over the last few years however, more legal corpora have come online at low- or no-cost including the BYU Corpus, the Free Law Project and the expansion of resources published by the Library of Congress through Law.gov. A variety of growing electronic legal resources already exist free of charge for countries in Europe and Asia. Thus we feel that the timing is excellent to bring together researchers from around the world to focus on NLP problems in this area.


The NLLP workshop was held for the first time in 2019, collocated with the Conference of the North-American Association of Computational Linguistics (NAACL). The first edition consisted of 12 original papers, 9 of which are archived in the workshop proceedings.

Building on the success of the first edition, the 2nd edition of the NLLP workshop was held online and collocated with KDD 2020.

In 2021, the NLLP monthly talks series was started to provide an open transdisciplinary forum for discussing the most recent research in the topics of relevance to the community.

The NLLP 2021 workshop will be collocated with EMNLP and will take place in a hybrid format with both full virtual participation and in-person attendance in Punta Cana, Dominican Republic possible.