The annual income of the entire charities sector in England and Wales was an estimated £113.1 billion, making it a important part of the economy. Almost 3% of the total UK workforce work in the voluntary sector and over one in five people volunteer at least once a month.
In particular, charitable organisations are a key part of arts and culture in the UK. Many arts and cultural venues, groups, and institutions are organised on a charitable basis.
In official charities classifications, there is an umbrella term called ‘arts, culture, heritage, or science’ (ACHS), with roots in legislation, but this term does not break down any further, so it can be difficult to understand what it covers.
Currently, at least 30,000 charities in England and Wales are actively advancing ACHS. Yet, we only have a very high-level understanding of what they do, who they engage with, or why.
Anecdotally, we may know charities that engage with various types of people and interests, from general goals like promoting artistic excellence, serving as a musical venue, or providing financial support to more specific missions engaging particular audiences like promoting dance among the elderly, or promoting Chinese calligraphy, or acting as an LGBTQ+ choir. Thus far, official data and statistics around these organisations do not allow us to find more granular information.
Therefore, we did this research to:
Using web-scraping, we collected the details of over 359,000 charities ever registered in England and Wales. We analysed the words used to describe their work when they registered (their ‘aims and activities’ and ‘charitable objects’).
Our research uses common techniques in computational linguistics to turn words into a mathematical representation (vectors). This allows us to parse large amounts of text to give interesting new insights in how we understand arts and cultural charities.
Compared to manual reading, the computational method has challenges. For example, a model trained on English text misses out on Welsh words. Models have different abilities in recognising contexts. But programatic text parsing also has advantages like speed, uncovering new perspectives and prompting new questions. By using language from charities themselves, we also reflect a more ground-up view than existing classifications, which tend to be written from a top-down perspective.
There are clusters of terms about the types of arts and culture, like performing arts, music and sound, fine art, and cultural heritage etc.
Using natural language processing (NLP) models, we extracted the keywords used by ACHS charities with part-of-speech tagging. By converting noun phrases to word embeddings, semantically similar phrases are closer to each other. The embeddings were successively clustered until they are represented in a four-tier structure which we loosely call a ‘taxonomy’ for ease. (At risk of being pedantic, it’s closer to being a hybrid folk taxonomy or ‘folksonomy’.) In the visualisation, each cluster of circles is a group of noun phrases. If is the percentage of ACHS charities that used at least 1 term in the cluster to self-describe, and is the same metric, but for all charities (including non-ACHS) instead, the size of each circle is simply . This surfaces keywords that ACHS charities are more likely than generic charities to use to self-describe activities and goals. An advantage of using part-of-speech tagging is that the subclusters of terms can be automatically assigned ‘labels’ by prepending adjectives/ appending nouns. We found 2,747 of such auto-labelled groups of terms and they mostly make grammatical sense!
Charities, when describing their aims and activities, often also include the groups of people that they are trying to benefit or engage with.
Our NLP analysis shows that there are clusters of terms relating to young people, the elderly, women, LGBTQ+, refugees, specific ethnic groups, etc. (The Charity Commission does not currently provide data on many of these groups.) This allows us to identify charities whose work likely involves engaging particular demographic groups.
There are clusters of terms about adjacent domains, include health, religion, and miscellaneous description of places or activities. Arts charities carry out a range of activities that are not solely focused on the arts. In fact, only 11% of active ACHS charities only work on that singular purpose. 40% of active ACHS charities work on 1-2 additional charitable purposes.
This adds to existing evidence about how the arts play an important part in addressing some of the biggest social issues of our time.
Our analysis has generated a ‘folksonomy’ which summarises, at a high level, different types of activities in the sector, while reflecting the natural language from charities themselves. This can be usefully applied to provide interesting insights, including:
These are all answered in greater depth in the ‘Charities Speak’ report. In the longer term, a data science approach as outlined can be applied to:
The banner image at the top is ‘Verblist’ (1967–68) by Richard Serra with image distortion added. My approach to separate verb and noun phrases was partially inspired by this artwork!
To read the full analysis and more details about this work, visit https://www.pec.ac.uk/research-reports/charities-speak. Data collection, analysis and visualisation by the author.