The fixed_network data pre-processing pipeline reads raw data on the fixed broadband network in the UK, appends this with data for areas where data is not available and transforms this into a set of shapefiles that can be interpetated by the fixed_network model.
Contains an archive of untouched incoming data
Contains intermediate files, necessary to enable preprocessing data on a cluster
Contains the final result that can be read by the fixed_network model
Local machine option
Generate exchange areas, this is necessary to split up the problem in ~5895 units, to be run in a distributed environment. Note that this is a memory extensive process that should be run on a high-memory machine ~120GB of RAM required.
If you have no access to such a machine, you can also get the
intermediate/exchange_areas folder from a previous job (on the cluster) and put it in you local project.
Run pre-processing per exchange area, make sure to give the exchange area as an argument to the script.
python scripts/network_preprocess_input_files.py exchange_EACAM
This generate an intermediate file per exchange_area in
This single script generates intermediate/exchange_areas on the host node and then distributes pre-processing jobs over the cluster using GNU_parallel. The exchange areas are not re-generated if they already exist, delete them if you will need to regenerate these.
Collect the intermediate results and process this into a single results set in the
processed directory. Without arguments the script will collect all the areas that are present in the
intermediate folder. With an argument, it will collect data for a certain subset, for example Cambridge, Oxford, Leeds and Newcastle.
python scripts/network_preprocess_collect_results.py Cambridge