Summary of the data collected
Key
Data collected as part of the project run by UK Power Networks:
Validation of Photovoltaic Connection Assessment Tool
https://www.ofgem.gov.uk/ofgem-publications/93938/pvtoolcdrfinal-pdf
The project collected a rich dataset at domestic sites with Solar Panels. The data set comprises of 25,775 days of
data, and over 171 million individual measurements.
Key stats about the dataset:
- 20 substations and 10 domestic premises
- 480 days of measurement - 27 July 2013 to 19 November 2014
- 10 minute intervals over all time recorded, 1 minute intervals in summer 2014
- 10-minute measurements prior to 10 June 2014, aggregated to hourly minima and maxima
Several important findings are presented in the final project report (link above). For any further queries please email
innovation@ukpowernetworks.co.uk
PV Tool Dataset – Explanatory Notes
Datasets have been cleansed and processed as follows:
� Aggregated the three datasets provided by Dan
o Dataset #1 covering 27/07/13 to 27/07/14 (this is the one that was corrupted)
o Dataset #2 extracted 30/09/13 to 30/09/14
o Dataset #3 extracted 30/09/14 to 19/11/14
� Deleted the corrupted rows from Dataset #1
o From 27/07/13 to 31/10/13 there were 696 corrupted rows and 143,795 good rows
o The magic regexp to delete the corrupted rows is ^[^,T]*(,[^,]*){89,100}$
� Deleted duplicate rows where datasets have overlapped
� Deleted blank rows
� Split the timestamp into separate columns for year, month, day, day of week, hour, minute – to make it easier
to group data by hour, day of week, month etc.
� Filtered out any data that pre-dates the device’s installation date (e.g. data recorded during bench testing)
� For the monitors that were moved during the trial, assigned the data to the correct location on the correct
dates. These rows have retained their original serial number (with an ‘X’ appended), so make sure you group
by substation and feeder name, not by serial number.
� Filtered out devices/cards that didn’t have any CT’s connected
� For customer endpoints, transformed A/B/C channels into Generation/Import channels according to how the
CTs are connected for that device, and transformed the polarity so that Generation is always +ve, Import is
always +ve, and Export is always –ve.
� Filtered out erroneous voltage, current, and thdV measurements:
o All voltage measurements <200V
o All current measurements >500A
o All thdV measurements whenever the corresponding voltage measurement is <200V. (This brings the
maximum thd across the entire dataset from 900% down to about 4%)
� For endpoints, provided the corresponding substation busbar voltages in the same row and calculated the
voltage rise
� Fixed the problem where numbers were truncated (not rounded) to 2 decimal places. (Apparently this is a
design “feature” of MS Access, and to overcome it you have to force all the numbers into string format before
exporting.)
� Removed commas from substation/feeder names – this was causing issues when exporting to CSV (comma-
separated-values) format
� Corrected for the CT polarity that was changed at Alverstone close: Swapped P_GEN_MIN with P_GEN_MAX
and multiplied both by -1 for dates before 2014-01-31 12:00:00
� Deleted erroneous 1min data on 08/05/2014