Natural Language Classification of Development Applications Reveals a Divided Sydney

Sydney has experienced a number of housing bubbles since the early 2000s. Fuelled by a variety of factors, the booms and busts have made housing affordability hit record lows. Problems with construction quality also emerged in areas where builders rushed to cash in on the boom while sacrificing quality (and conscience?).

A few of us at Small Multiples have purchased or are looking to purchase a home, so our interests in the housing issues led us to ask questions such as:

  • Where are the development hotspots in Sydney?
  • Do different parts of Sydney favour different types of development?

Using Open Australia’s Planning Alerts data going back 10 years (2009-2019) and covering most councils (missing councils such as Waverley and Woollahra) in Greater Sydney, we classified and visualised a total of ~260,000 records. The natural language classification technique we developed (see below) allowed us to classify applications as new residential builds, residential renovations, and commercial properties with reasonable accuracy.

Residential development

A number of interesting patterns within residential development emerged from the maps:

  • Renovations and extensions to existing residential properties took place across the city, there were no observable differences
  • Northern and inner suburbs had much less new residential builds, with the exception of the Inner West
  • Hotspots for new residential builds:
    • Inner West: Lilyfield and Leichhardt
    • South: Bexley, Sans Souci and Caringbah
    • West: Revesby, Yagoona and Edmondson Park
    • North West: Dundas and Kellyville
    • East: Malabar

Residential Development Applications 2009-2019

  • New builds
  • Renovations and extensions

Commercial development

The maps show clearly that:

  • High volumes of commercial development applications were apparent at the known population and employment centres such as City of Sydney, North Sydney and Parramatta.
  • Other centres of commercial activity were also visible by the volume of applications. Presumably, the level of activity at each centre was driven by the unique growing demographics they are serving:
    • Randwick: university and health precinct
    • Campsie: recent immigrants
    • Bella Vista: young families
    • Manly: tourists and boomers

Commercial Development Applications 2009-2019

  • Commercial

All Development Applications

  • New builds
  • Renovations and extensions
  • Commercial

The technique

Our Natural Language Processing technique involved first analysing document keywords using “term frequency–inverse document frequency(tf-idf), then clustering the documents into groups based on the K-means algorithm. The keywords associated with each cluster allows the logic of the grouping to be interpreted, for example as commercial projects or renovations.

Other applications

A similar technique could be applied wherever there is a large volume of manually created documents. Natural Language Processing and machine learning can automate processes like classification, tagging and clustering, helping people make better, data-based decisions. We’re sure that both government and business could benefit from techniques like this, to gain further insights from their existing records.

If you need business or planning insights and you have a large amount of text-based information, feel free to chat to us about it, or check out our business and government pages for more information.

Leave A Comment

Your email address will not be published. Required fields are marked *