Charities Speak

Using data science to map the vibrant and diverse arts and cultural charitable sector in England and Wales

31st March 2020

Charities are core to arts and culture in the UK

Why do charities matter?

The annual income of the entire charities sector in England and Wales was an estimated £113.1 billion, making it a important part of the economy. Almost 3% of the total UK workforce work in the voluntary sector and over one in five people volunteer at least once a month.

In particular, charitable organisations are a key part of arts and culture in the UK. Many arts and cultural venues, groups, and institutions are organised on a charitable basis.

Goals of our research

In official charities classifications, there is an umbrella term called ‘arts, culture, heritage, or science’ (ACHS), with roots in legislation, but this term does not break down any further, so it can be difficult to understand what it covers.

Currently, at least 30,000 charities in England and Wales are actively advancing ACHS. Yet, we only have a very high-level understanding of what they do, who they engage with, or why.

Anecdotally, we may know charities that engage with various types of people and interests, from general goals like promoting artistic excellence, serving as a musical venue, or providing financial support to more specific missions engaging particular audiences like promoting dance among the elderly, or promoting Chinese calligraphy, or acting as an LGBTQ+ choir. Thus far, official data and statistics around these organisations do not allow us to find more granular information.

Therefore, we did this research to:

  1. Produce a more detailed picture of the charities in the arts and cultural sector with automated techniques, going beyond existing static classifications.
  2. Better understand what charities advancing ACHS (say they) do and what they are trying to achieve.

Looking under the umbrella

Reading words programatically gives us new, useful information

Using web-scraping, we collected the details of over 359,000 charities ever registered in England and Wales. We analysed the words used to describe their work when they registered (their ‘aims and activities’ and ‘charitable objects’).

Our research uses common techniques in computational linguistics to turn words into a mathematical representation (vectors). This allows us to parse large amounts of text to give interesting new insights in how we understand arts and cultural charities.

Compared to manual reading, the computational method has challenges. For example, a model trained on English text misses out on Welsh words. Models have different abilities in recognising contexts. But programatic text parsing also has advantages like speed, uncovering new perspectives and prompting new questions. By using language from charities themselves, we also reflect a more ground-up view than existing classifications, which tend to be written from a top-down perspective.

A ‘taxonomy’ of arts and cultural charitable keywords

There are clusters of terms around types of arts and culture

There are clusters of terms about the types of arts and culture, like performing arts, music and sound, fine art, and cultural heritage etc.


For the technically inclined

Using natural language processing (NLP) models, we extracted the keywords used by ACHS charities with part-of-speech tagging. By converting noun phrases to word embeddings, semantically similar phrases are closer to each other. The embeddings were successively clustered until they are represented in a four-tier structure which we loosely call a ‘taxonomy’ for ease. (At risk of being pedantic, it’s closer to being a hybrid folk taxonomy or ‘folksonomy’.) In the visualisation, each cluster of circles is a group of noun phrases. If AA is the percentage of ACHS charities that used at least 1 term in the cluster to self-describe, and BB is the same metric, but for all charities (including non-ACHS) instead, the size of each circle is simply AB\frac{A}{B} . This surfaces keywords that ACHS charities are more likely than generic charities to use to self-describe activities and goals. An advantage of using part-of-speech tagging is that the subclusters of terms can be automatically assigned ‘labels’ by prepending adjectives/ appending nouns. We found 2,747 of such auto-labelled groups of terms and they mostly make grammatical sense!


... around the demographics of people who charities engage with

Charities, when describing their aims and activities, often also include the groups of people that they are trying to benefit or engage with.

Our NLP analysis shows that there are clusters of terms relating to young people, the elderly, women, LGBTQ+, refugees, specific ethnic groups, etc. (The Charity Commission does not currently provide data on many of these groups.) This allows us to identify charities whose work likely involves engaging particular demographic groups.

... as well as domains like health, education, and the environment.

There are clusters of terms about adjacent domains, include health, religion, and miscellaneous description of places or activities. Arts charities carry out a range of activities that are not solely focused on the arts. In fact, only 11% of active ACHS charities only work on that singular purpose. 40% of active ACHS charities work on 1-2 additional charitable purposes.

This adds to existing evidence about how the arts play an important part in addressing some of the biggest social issues of our time.

Identifying artforms, groups of communities and trends over time

Our analysis has generated a ‘folksonomy’ which summarises, at a high level, different types of activities in the sector, while reflecting the natural language from charities themselves. This can be usefully applied to provide interesting insights, including:

  • What demographics do arts and cultural charities engage with? Using what art forms?
  • How has the arts and cultural charitable sector changed from the 1960s till now?
  • How do the digital presence of these organisations vary?

These are all answered in greater depth in the ‘Charities Speak’ report. In the longer term, a data science approach as outlined can be applied to:

  • Build a recommendation engine to search for similar charities or charitable causes.
  • Evidence how well-addressed certain goals are by charities, or how crowded certain areas are, by linking to other data sources like funding.
  • Make the creation and maintenance of taxonomies of sector activity easier to help improve understanding of what the sector is doing.

The banner image at the top is ‘Verblist’ (1967–68) by Richard Serra with image distortion added. My approach to separate verb and noun phrases was partially inspired by this artwork!

To read the full analysis and more details about this work, visit https://www.pec.ac.uk/research-reports/charities-speak. Data collection, analysis and visualisation by the author.