Phase Identification of Smart Meters Using a
Fourier Series Compression and a Statistical
Clustering Algorithm
Jeremy Chiu

Albert Wong

James Park

Mathematics and Statistics
Langara College
Vancouver, Canada
0000-0002-0737-9055

Mathematics and Statistics
Langara College
Vancouver, Canada
0000-0002-0669-4352

Mathematics and Statistics
Langara College
Vancouver, Canada
0000-0002-3714-9138

Joe Mahony

Michael Ferri

Tim Berson

Research and Development
Harris SmartWorks
Ottawa, Canada
JMahony@harriscomputer.com

Research and Development
Harris SmartWorks
Ottawa, Canada
mferri@harriscomputer.com

Research and Development
Harris SmartWorks
Ottawa, Canada
TBerson@harrisutilities.com

Abstract—Accurate labeling of phase connectivity in distribution systems is important for maintenance and operations
but is often erroneous or missing. In this paper, we present
an algorithm to identify which smart meters must be in the
same phase using a hierarchical clustering method on voltage
time series data. Instead of working with the time series
directly, we apply the Fourier transform to represent time
series in their frequency domain, remove 98% of the Fourier
coefficients, then cluster the remaining coefficients to estimate
which meters belong in the same phase. We validate results by
verifying they do not change phase in time and by comparing
our results to available network-distribution data.
Index Terms—Phase identification, clustering, Fourier series, Fourier series compression

I. I NTRODUCTION
Managing an electricity distribution network efficiently
requires accurate phase connectivity models [18]. However, electricity companies usually do not have accurate
information of phase connectivity and often require the
use of measurement-based phase identification methods.
[8].
To deliver high-voltage power from the generation
station to customers, voltage in the primary distribution
circuit is stepped down at a distribution substation. Then
through feeders electricity is distributed to transformers.
In North America, power is stepped down again from
transformers and distributed to the customers using a threephase system [18]. Which phase is used for the customers
is often not recorded, and therefore creating a phase
identification problem if phase connection information is
required for network management tasks.
We would like to acknowledge and thank the Post Degree Diploma
program, the Work on Campus program, and the Applied Research Centre
at Langara College for supporting our research.

There are many ways in research to tackle this identification problem:
Micro-synchrophasors - One can use a microsynchrophasor to measure voltage magnitude and phase
angle of a meter [19]. The higher the correlation between
the voltage magnitude of the substation and that of smart
meters, the more accurate the phase labelling. To complete
the identification, signal generators are set up at the
substations and signal discriminators at the smart meters to
accurately identify the phase. This method is quite accurate
but expensive as it requires deployment and maintenance
of additional equipment and human resources.
Integer Programming algorithms ( [2], [3], [7], [22]) Phase connection of smart meters are represented as binary
variables, then integer linear programming methods are
used to determine the most-likely phase network. However,
this approach requires a new variable for every new meter,
making the problem computationally intensive, especially
for feeders with thousands of meters.
Correlation-based method ( [14]–[16]) - Data is first
collected over time from the smart meters to be identified.
The correlation coefficient is then calculated using voltage
time series between two smart meters – the closer a coefficient is to one, the more likely the pair of smart meters
have the same voltage pattern and therefore the same
phase. The correlation coefficients are then transformed to
a distance measure as input to a clustering algorithm. The
method is logical and seems promising. However, based
on results from unpublished research by a project team at
Langara College (personal communication), when applied
to the data set in this research, this method suffers from
issues with a number of performance criteria that we have

identified and discussed below.
Constrained k-means clustering - Voltage time series
data is first normalized using standard deviation, then
principal component analysis is applied to reduce the
data’s dimension. A k-means clustering algorithm is then
used to cluster the smart meters. The phase of each cluster
is then identified by solving a minimization problem [9],
[14], [18].
Other phase identification methods proposed include the
use of supervised learning models or different types of
clustering algorithm, such as spectral clustering [4]–[6],
[10], [11], [17], [20], [21].
In this research, we will take a new approach in the
phase identification problem. The central idea is to extract
as much information as possible from the voltage time
series using a Fourier series compression process. A hierarchical clustering routine is then applied on the compressed
data to produce accurate identification.
II. R ESEARCH DATA S ET
For this research, we use a voltage data set that was
provided by a utility company in the United states, which
contains hourly voltage data for a number of smart meters
in the month of June and July 2021. The data set also
include the linkage between the smart meters and their
associated transformers and feeders. This information is
critical for the assessment of appropriateness and accuracy
in the clustering results.
We removed smart meters with any missing entries from
June and July 2021. We then normalize each smart meter
by dividing each voltage value by its mean. We chose two
of the smaller feeders (Feeder F with 26 smart meters and
Feeder D with 55 smart meters) to conduct our research
so that we can easily visualize and evaluate the results.
III. F OURIER C OMPRESSION
Clustering the smart meters using its time series (voltage
vs time) is challenging because of its size – measurements
are hourly, so in a month of 30 days, each time series
would be in R720 . We reduce the dimension by using a
compressed Fourier series, then cluster the smart meters
using the compressed Fourier series. Figure 1 shows a high
level overview of how we use Fourier series to reduce the
dimension.
The compression is done as follows. We represent each
smart meter in its frequency domain by applying the
Fourier transform to the normalized time series. Recall
the Fourier series (sine-cosine form) representation of a
periodic function f (t) is
f (t) =

∞
a0 X
an cos
+
2
n=1

2π
P nt


+ bn sin

2π
P nt


, (1)

where an , bn are real coefficients and P is the function’s
period. We then delete coefficients that are ‘small’ (either
by deleting frequencies that are smaller in magnitude

Fig. 1. A high level overview of how we use Fourier series to reduce the
dimension of a smart meter. We performed clustering on the compressed
Fourier series. The functions f (t) and fˆ(t) are time series, where fˆ(t) ≈
f (t).

than a predetermined magnitude, or by only keeping a
predetermined number of the largest terms), thus giving us
a compressed Fourier representation. We also delete the 0th
harmonic a0 because it is constant across all smart meters
due to normalization. In practice, we used 12 Fourier
coefficients to represent a month of data, thus reducing the
dimension from R720 to R12 (a 98% reduction in size).
As demonstrated in Figure 2, most of the Fourier
coefficients are very small, which suggests the compressed Fourier series could provide a high-accuracy, lowdimension approximation of the time series. To verify the
accuracy of the compression, we obtain an approximate
time series by applying the inverse Fourier transform
to a compressed Fourier series, and then comparing the
approximate time series to the original time series. Figure
3 shows approximate time series alongside the original
time series – the general trend of the time series is
captured, but the 12-coefficient approximation does poorly
at the spikes. As Figure 4 demonstrates, keeping more
coefficients yields better accuracy. Notice that with about
10% of the coefficients, we maintain about 90% accuracy
of the time series. Ultimately, the accuracy of the time
series is not too important, so long as the clustering results
are sensible.
The compression was done in Matlab. Given a smart
meter’s time series, we use Matlab’s fft function, which
returns complex coefficients corresponding to the Fourier
series in exponential form. We convert the complex coefficients into an and bn , the real coefficients of the Fourier
series in sinusoidal form (we used get_harmonics
[1]). In practice, a time series in June would be in R720 ,
corresponding to 0 ≤ t ≤ 720 hours, and so P =
720. Matlab’s fft would return the complex coefficients
c−360 , . . . , c359 , which we convert to real coefficients, then
only keep a1 , . . . , a360 and b1 , . . . , b360 (note a360 and b360
were computed from a−360 and b−360 ). We then compress
by using a mask to set most coefficients to zero. In practice,
we kept an and bn where n = 30, 60, . . . , 180 (these
coefficients correspond to the large frequencies in Figure
2), a total of 12 coefficients.
IV. C LUSTERING OF S MART M ETERS
After the dimension of the data is reduced through
a Fourier compression, distance between smart meters’

Combined Amplitude |a n |+|b n |

0.02

0.015

0.01

0.005

4

4

/2

Fig. 5. Visualizing the clustering of Feeder D using June 2021 Data via
Matlab’s mdscale function.

12

4

/2

24

/2

11

10

24

9/

24

8/

24

7/

24

6/

24

5/

24

4/

24

3/

2/

1/

24

0

Frequency

Fig. 2. Combined magnitude of the Fourier coefficients (|an | + |bn |) vs
frequency. Notice most of the coefficients are small. The largest amplitude
occur at the frequency 1/24; this is unsurprising because energy usage
follow daily patterns.

Normalized Voltage

1.03
1.02
1.01
1
0.99
0.98
0.97

0

24

48

72

Time (hour)
Original (720 data points)
Approximate (144 coefficients)
Approximate (12 coefficients)

Fourier coefficients D(X, Y ) is calculated using the traditional Euclidean distance metric:
n

D(X, Y ) =

1X
(Xi − Yi )2 .
n i=1

(2)

Using this distance, we cluster the set of smart meters
in Feeder F (then repeat for Feeder D) using the Ward
hierarchical clustering algorithm in Matlab [12]. Since all
smart meters should be in one of the three phases, the
number of resulting clusters is set to be three. Hence,
meters clustered together would mean they belong to the
same phase.
V. VALIDATION OF C LUSTERING R ESULTS
A. Visualizing Clustering Results

Fig. 3. Original time series alongside approximate time series. The
domain was reduced to 3 days for a better viewing rectangle. The 12coefficient approximation does poorly at the spikes, but captures the
general trend. The 144-coefficient approximation captures most spikes.

30

25

Error %

20

15

10

5

A useful way to visualize the result of clustering a
multi-dimensional data set is to somehow “project” the
data set into a two dimensional space. We could then
visualize clusters with a scatter diagram in the xy-plane.
Since we are using the Euclidean distance as the basis
for clustering, a natural way to achieve this is to use
Matlab’s multidimensional scaling technique [13]. Given
the distance between points, mdscale reconstructs where
the points could be in 2D so that the distance is still
roughly preserved. In Figure 5, we see a visualization of
the clustered meters from Feeder D. Notice that there are
clear boundaries between different clusters.
Moreover, a hierarchical clustering algorithm such as
Ward would allow us to visualize the formation of the
clusters hierarchically via a dendogram (Figure 6). However, it is less useful here because the number of clusters
is required to be three.
B. Same Transformer, Same Phase

0
0

60

120 180 240 300 360 420 480 540 600 660 720

Number of Coefficients Kept

∥y−ŷ∥

2
Fig. 4. Error percentage is computed as
, where y is the original
ȳ
time series, ŷ is the approximate time series, and ȳ is the average of y
(note ȳ = 1 due to normalization). The 12-coefficient approximation has
16% error.

Meters within the same transformer must be in the same
phase, and thus should be clustered together. We can use
this fact to see how well our method performs – after
we cluster the smart meters, each transformer should only
have meters of a single phase. As seen in Tables I and II,
the clustering of Feeder F is almost perfect while that for

Transformer

A

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

1
1
1
1
2
1

Total

39

Cluster
B
C

0.045
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
0

Fig. 6. Dendogram of clustering Feeder D using data from June 2021.
Dendograms are useful to see how clusters are being formed.

Feeder D is perfect, giving us hope that this approach has
promise.

Transformer

A

1
2
3
4
5
6
7
8
9
10
11

1
1
1
1

Total

13

Cluster
B C
8

1
4
1
1
1
1
5
8

5

TABLE I
F EEDER F J UNE 2021 CLUSTER RESULTS GROUPED BY
TRANSFORMERS .

C. Stability Over Time
Physically, meters do not change phase over time.
Therefore, for the clustering (assignment of phase) to be
meaningful, the result should not change over time.
To evaluate results from this research, we performed
cluster analysis on two different time periods (June 2021
and July 2021) on Feeder F and D, then checked for inconsistent results. Any meter that changed phases (clusters)
are considered time unstable. Note that the labels from
the clustering (A B and C) are arbitrary, and so we use a
cross tabulation of the two clustering results to see how
meters are assigned in the clustering processes. Table III
shows that the clustering from June to July is stable. All
13 meters assigned to Cluster A in June are also assigned
in the same cluster in July; the same is true for Clusters
B and C.
The same can be said about the stability of clustering
Feeder D using our approach (Table IV).

1
1
1
4
1
1
1
1
1
1
2
1
1
1
1
3
1
2
2
1
1
1
1
1
2
1
4
1
1
1
2
3
1
13

3

TABLE II
F EEDER D J UNE 2021 CLUSTER RESULTS GROUPED BY
TRANSFORMERS .

VI. F UTURE W ORK
While the above results look very promising, we have
not applied this approach to a larger feeder (say with over
300 meters), or to a data set with multiple feeders. We
suspect, due to the increased likelihood of data related
issues, that the results may not be as “perfect” as we have
seen so far.
To advance our research, the approach would be applied
to a larger data set with multiple feeders. The same

A
June

A
B
C

13

Total

13

July
B

C

Total

5

13
8
5

5

26

8
8

TABLE III
C LUSTERING F EEDER F - J UNE AND J ULY 2021

A
June

A
B
C

39

Total

39

July
B

C

Total

3

39
13
3

3

55

13
13

TABLE IV
C LUSTERING F EEDER D - J UNE AND J ULY 2021

approach should also be applied to a data set with several
months; clustering could be done month by month, or with
several months combined. Considerations should also be
given to use this approach to cluster a subset of the data
set and, after the validation process as outlined above,
using the cluster labels for the development of a supervised
learning model for the classification of other meters.
VII. C ONCLUSION
In this research, we have applied a novel method of
approximating a time series with its Fourier series. We then
used hierarchical clustering methods on the dimensionreduced data. The major application of this approach is
in the phase identification of smart meters in a network
environment.
Results from two small data sets using this approach
show significant promise as they passed two important
tests: same assignment for meters in the same transformer
and stability of assignment over time. The application of
this approach to a larger data set with multiple feeders
would therefore be a worthwhile exercise.
R EFERENCES
[1] A. A DELMALEK, Get harmoniques of a real signal, 2022. Last
accessed 13 October 2022.
[2] A. H. A KHIJAHANI , S. H OJJATINEJAD , AND A. S AFDARIAN, A
milp model for phase identification in lv distribution feeders using
smart meters data, in 2019 Smart Grid Conference (SGC), IEEE,
2019, pp. 1–6.
[3] V. A RYA , D. S EETHARAM , S. K ALYANARAMAN , K. D ONTAS ,
C. PAVLOVSKI , S. H OY, AND J. R. K ALAGNANAM, Phase identification in smart grids, in 2011 IEEE International Conference on
Smart Grid Communications (SmartGridComm), 2011, pp. 25–30.
[4] L. B LAKELY, M. J. R ENO , AND W.- C . F ENG, Spectral clustering
for customer phase identification using ami voltage timeseries, in
2019 IEEE Power and Energy Conference at Illinois (PECI), IEEE,
2019, pp. 1–7.

[5] B. F OGGO AND N. Y U, A comprehensive evaluation of supervised
machine learning for the phase identification problem, International
Journal of Computer and Systems Engineering, 12 (2018), pp. 419–
427.
[6]
, Improving supervised phase identification through the
theory of information losses, IEEE Transactions on Smart Grid,
11 (2019), pp. 2337–2346.
[7] A. H EIDARI -A KHIJAHANI , A. S AFDARIAN , AND F. A MINIFAR,
Phase identification of single-phase customers and pv panels via
smart meter data, IEEE Transactions on Smart Grid, 12 (2021),
pp. 4543–4552.
[8] A. H OOGSTEYN , M. VANIN , A. KOIRALA , AND D. VAN H ERTEM,
Low voltage customer phase identification methods based on
smart meter data, Electric Power Systems Research, 212 (2022),
p. 108524.
[9] S. P. JAYADEV, A. R AJESWARAN , N. P. B HATT, AND R. PA SUMARTHY , A novel approach for phase identification in smart
grids using graph theory and principal component analysis, in 2016
American Control Conference (ACC), IEEE, 2016, pp. 5026–5031.
[10] H. P. L EE , M. Z HANG , M. BARAN , N. L U , P. R EHM , E. M ILLER ,
AND M. M AKDAD , A novel data segmentation method for
data-driven phase identification, arXiv preprint arXiv:2111.10500,
(2021).
[11] Y. M A , X. FAN , R. TANG , P. D UAN , Y. S UN , J. D U , AND
Q. D UAN, Phase identification of smart meters by spectral clustering, in 2018 2nd IEEE Conference on Energy Internet and Energy
System Integration (EI2), IEEE, 2018, pp. 1–5.
[12] M ATH W ORKS, linkage, 2022. Last accessed 17 October 2022.
, mdscale, 2022. Last accessed 17 October 2022.
[13]
[14] F. O LIVIER , A. S UTERA , P. G EURTS , R. F ONTENEAU , AND
D. E RNST, Phase identification of smart meters by clustering
voltage measurements, in 2018 Power Systems Computation Conference (PSCC), IEEE, 2018, pp. 1–8.
[15] H. P EZESHKI AND P. W OLFS, Correlation based method for phase
identification in a three phase lv distribution network, in 2012 22nd
Australasian Universities Power Engineering Conference (AUPEC),
2012, pp. 1–7.
[16] T. A. S HORT, Advanced metering for phase identification, transformer identification, and secondary modeling, IEEE Transactions
on Smart Grid, 4 (2012), pp. 651–658.
[17] W. WANG AND N. Y U, Advanced metering infrastructure data
driven phase identification in smart grid, 07 2017.
[18] W. WANG , N. Y U , B. F OGGO , J. DAVIS , AND J. L I, Phase
identification in electric power distribution systems by clustering
of smart meter data, in 2016 15th IEEE International Conference
on Machine Learning and Applications (ICMLA), 2016, pp. 259–
265.
[19] M. H. W EN , R. A RGHANDEH , A. VON M EIER , K. P OOLLA , AND
V. O. L I, Phase identification in distribution networks with microsynchrophasors, in 2015 IEEE Power & Energy Society General
Meeting, IEEE, 2015, pp. 1–5.
[20] N. Z ARAGOZA AND V. R AO, Phase identification of power distribution systems using hierarchical clustering methods, in 2021 North
American Power Symposium (NAPS), 2021, pp. 1–6.
[21]
, Phase identification of power distribution systems using
hierarchical clustering methods, in 2021 North American Power
Symposium (NAPS), IEEE, 2021, pp. 1–6.
[22] J. Z HU , M.-Y. C HOW, AND F. Z HANG, Phase balancing using
mixed-integer programming [distribution feeders], IEEE Transactions on Power Systems, 13 (1998), pp. 1487–1492.