Data preprocessing¶

The fixed_network data pre-processing pipeline reads raw data on the fixed broadband network in the UK, appends this with data for areas where data is not available and transforms this into a set of shapefiles that can be interpetated by the fixed_network model.

File structure¶

data/digital_comms/raw
Contains an archive of untouched incoming data
data/digital_comms/intermediate
Contains intermediate files, necessary to enable preprocessing data on a cluster
data/digital_comms/processed
Contains the final result that can be read by the fixed_network model

Preprocessing¶

Local machine option

Step 1

Generate exchange areas, this is necessary to split up the problem in ~5895 units, to be run in a distributed environment. Note that this is a memory extensive process that should be run on a high-memory machine ~120GB of RAM required.

python scripts/network_cluster_input_files.py

If you have no access to such a machine, you can also get the intermediate/exchange_areas folder from a previous job (on the cluster) and put it in you local project.

Step 2

Run pre-processing per exchange area, make sure to give the exchange area as an argument to the script.

python scripts/network_preprocess_input_files.py exchange_EACAM

This generate an intermediate file per exchange_area in processed/exchange_EACAM

Cluster option

This single script generates intermediate/exchange_areas on the host node and then distributes pre-processing jobs over the cluster using GNU_parallel. The exchange areas are not re-generated if they already exist, delete them if you will need to regenerate these.

cd scripts

run_parallel.sh

Results collection¶

Collect the intermediate results and process this into a single results set in the processed directory. Without arguments the script will collect all the areas that are present in the intermediate folder. With an argument, it will collect data for a certain subset, for example Cambridge, Oxford, Leeds and Newcastle.

python scripts/network_preprocess_collect_results.py python scripts/network_preprocess_collect_results.py Cambridge

Data preprocessing¶

File structure¶

Preprocessing¶

Results collection¶

Table of Contents

Related Topics

This Page