Resource Allocation in 5G Systems with Massive MIMO and

Department of Electrical Engineering

Karina Bernardin Rosa

Resource Allocation in 5G Systems with

Massive MIMO and NOMA Schemes

Final year project presented to the

Electrical Engineering Department at

Universidade Estadual de Londrina

(UEL) as a requirement for the con-

clusion of the Bachelor of Electrical

Engineering (BE) degree.

Londrina, PR2018


Resource Allocation in 5G Systems with

Massive MIMO and NOMA Schemes

Final year project presented to the Electrical

Engineering Department at Universidade Es-

tadual de Londrina (UEL) as a requirement

for the conclusion of the Bachelor of Electrical


Area: Telecommunication Systems

Supervisor:

Prof. Dr. Taufik Abrao

Londrina, PR2018


Resource Allocation in 5G Systems withMassive MIMO and NOMA Schemes

Final year project presented to the Electrical

Engineering Department at Universidade Es-

tadual de Londrina (UEL) as a requirement

for the conclusion of the Bachelor of Electrical


Area: Telecommunication Systems

Examination Board

Prof. Dr. Taufik AbraoDepartment of Electrical EngineeringUniversidade Estadual de Londrina

Supervisor

Prof. Dr. Jose Carlos Marinello FilhoDepartment of Electrical EngineeringUniversidade Estadual de Londrina

Prof. Msc. Jaime Laelson JacobDepartment of Electrical EngineeringUniversidade Estadual de Londrina

November 17, 2018

Acknowledgments

First, I would like to thank my family for their support and encouragement du-

ring my studies, and especially my parents, for helping me through the difficult

times before and during the engineering course. I also express my gratitude for

my mentor during this work, Taufik Abrao. I thank him for his guidance, com-

ments and advice, and for encouraging me. In addition, I am thankful to all the

professors and friends who have helped me during the last five years.

Abstract

In this work, the downlink (DL) of a single cell MIMO system was utilizedin order to analyze two distinct scenarios. First, the concept of non-orthogonalmultiple access (NOMA) was applied in a cluster-based MIMO-NOMA config-uration, which was evaluated in terms of spectral efficiency (SE) and resourceefficiency (RE) through user grouping and a two-step power allocation strat-egy. The performance of the system for different beamforming directivities andcluster-loading conditions was assessed by varying the number of both clustersand users per cluster. The results demonstrated that the best performance oc-curs for two users per cluster, where these users are multiplexed according to theprinciples of NOMA. The numerical simulations also verified that the SE and EEare not monotonically decreasing due to the increase in the number of users percluster and the simultaneous reduction in the number of transmit antennas. Inthe second part of this monograph, the sum-rate capacity achieved by three low-complexity linear precoders was analyzed for a Massive MIMO condition. Theperformance-complexity tradeoff of the conventional zero-forcing (ZF) beamform-ing and the regularized channel inversion (RCI) was compared to the innovativeprecoding scheme based on the iterative randomized Kaczmarz algorithm (rKA).A large-system limit analysis was also developed for both ZF and RCI precodingschemes. The rKA’s formulation and convergence in the DL Massive MIMO wereexplored, while a comprehensive complexity analysis comparing the three precod-ing schemes indicated the rKA’s performance-complexity tradeoff superiority.

Keywords: Channel Inversion, Kaczmarz Algorithm, Massive MIMO, NOMA,Precoding, Sum-Rate Capacity, Zero Forcing

List of Contents

Acronyms

Conventions and Notations

List of Symbols

1 Introduction 1

2 Results 3

3 Conclusions 5

Appendix A -- Resource Efficiency in Cluster-Based Downlink MIMO-

NOMA Systems 7

Appendix B -- Sum-Rate Capacity in DL Massive MIMO with Par-

tial CSI and Low-Complexity Linear Precoders 14

Bibliography 30

Acronyms

5G Fifth Generation

AWG Additive White Gaussian

BS Base Station

CSI Channel State Information

DL Downlink

DPC Dirty Paper Coding

EE Energy Efficiency

EPA Equal Power Allocation

IUI Inter-User Interference

KA Kaczmarz Algorithm

LB Lower Bound

MF Matched-Filtering

MIMO Multiple Input Multiple Output

MRC Maximum-Ratio Combining

MT Mobile Terminal

MU Multi-User

NOMA Non-Orthogonal Multiple Access

OD Overdetermined

OFDMA Orthogonal Frequency Division Multiple Access

OMA Orthogonal Multiple Access

QoS Quality of Service

RCI Regularized Channel Inversion

RE Resource Efficiency

rKA Randomized Kaczmarz Algorithm

SC Superposition Coding

SE Spectral Efficiency

SIC Sucessive Interference Cancelation

SINR Signal-to-Interference-plus-Noise Ratio

SISO Single Input Single Output

SLE Set of Linear Equations

SNR Signal-to-Noise Ratio

TDMA Time Division Multiple Access

UB Upper Bound

UD Underdetermined

UE User Equipment

UL Uplink

ULA Uniform Linear Array

UPA Uniform Planar Array

ZF Zero-Forcing

Conventions and Notations

The following mathematical notations where adopted in this work:

a Boldface lower case letters represent vectors;

A Boldface upper case letters denote matrices;

(·)−1 Inversion operator;

(·)H Hermitian operator (transposition and conjugation);

(·)T Transposition operator;

(·)? Conjugation operator;

(·)∗ Optimal solution;

det(·) Determinant of a Square Matrix;

‖·‖n Norm of order n;

‖·‖F Frobenius norm;

tr (·) Trace operation;

diag(·) Diagonalization Operation;

Im Identity matrix of order m;

0m×n All zero matrix of size m× n;

CN{µ, σ2} Gaussian Random Variable circularly-symmetric with mean µ

and variance σ2;

O(·) Complexity order of an operation or algorithm;

E [·] Statistical Expectation;

Var [·] Variance;

C Complex numbers set;

<{·} Real part of a complex number;

={·} Imaginary part of a complex number;

G t(·) Linear operator that depends on the internal randomization of

KA until iteration t

∈ Belongs to the set;

〈X,Y〉 Inner product between two matrices or vectors X and Y

List of Symbols

Appendix A

M Number of antennas at the BS

K Number of single-antenna users (receivers)

L Number of clusters

Km Number of receivers in the m-th cluster

K Fixed number of receivers in each cluster

r Cell radius

r1 Smaller cell disk radius

s Symbol vector

x Transmit signal vector

A Path-loss coefficient matrix

b Path-loss exponent

H Channel matrix

H Equivalent channel matrix

Hm Channel matrix of the m-th cluster

Hm Equivalent channel vector of the m-th cluster

z AWG noise vector

y Received signal vector

σ2 Noise variance

G Linear precoding matrix

P Diagonal power matrix

Pt Total available transmit power at the BS

γ SNR at the transmitter side

SINRm,k SINR of the k-th user at the m-th cluster

gm,k Channel gain of the k-th user at the m-th cluster

Rm,k Capacity of the k-th user at the m-th cluster

α Power normalization constant

ξ Regularization parameter

SE Spectral Efficiency

EE Energy Efficiency

RE Resource Efficiency

Pc Constant circuit power consumption per antenna

P0 Basic power consumed at the BS

$ Constant accounting for the inefficiency of the power amplifier

at the BS

Appendix B

M Number of antennas at the BS

K Number of single-antenna users (receivers)

β Cell-loading factor

s Symbol vector

x Transmit signal vector

A Path-loss coefficient matrix

rk Distance between the BS and the k-th user

b Path-loss exponent

H Channel matrix

Q Estimated channel matrix

τ Channel estimation quality parameter

n AWG noise vector

y Received signal vector

σ2n Noise variance

G Linear precoding matrix

P Diagonal power matrix

Pt Total available transmit power at the BS

γ SNR at the transmitter side

Φ Transmit correlation matrix

c Correlation parameter

Rk Capacity of the k-th user

RΣ Ergodic sum-rate capacity

G Precoding matrix

α Power normalization constant

SINRk SINR of the k-th user

ξ Regularization parameter

ρ Normalized regularization parameter in the Large System Limit

Sk Signal of the k-th user in the Large System Limit

Ik Interference of the k-th user in the Large System Limit

SINR∞ Limiting SINR of the k-th user

R∞k Limiting capacity of the k-th user

R∞Σ Limiting sum rate capacity

p Row probability distribution vector

T Number of rKA iterations

Tdl Interference matrix

Rk Lower bound on the achievable ergodic rate of the k-th user

Rk Upper bound on the achievable ergodic rate of the k-th user

Tfull convg Number of iterations for the full convergence of rKA

1

1 Introduction

Fifth generation (5G) communication systems are expected to bring several

advances and benefits, such as higher data transmission rates, high reliability

and latency reduction. The systems will also be more connected, using a massive

number of transmitters which seek to deliver a homogeneous connection to the

numerous users, regardless of their location or network traffic conditions.

Therefore, the 5G wireless communications need to meet challenging require-

ments such as high spectral and energy efficiency, as well as massive conectivity.

To meet these requirements, several technologies have been proposed and exploi-

ted, such as massive multiple-input-multiple-output (MIMO). In the majority of

the MIMO implementations, the base station (BS) employs a small number of

antennas, and the corresponding improvement in spectral efficiency, while impor-

tant, is still relatively modest. The use of MIMO systems with a huge number

of antennas, which is called massive MIMO, is a solution to solve these problems

related to the need for high data rates, simultaneously serving many terminals

in the same time-frequency resource and attaining the benefits of conventional

MIMO, but on a larger scale (LU et al., 2014; G. et al., 2014).

In general, massive MIMO presents significant advantages in terms of energy

efficiency, spectral efficiency, robustness and reliability. Nonetheless, problems

such as pilot contamination and high computational complexity need to be adres-

sed in order to fully realize the potential of the technology. In particular, the pre-

coding and detection processes impose a significantly high computational comple-

xity on massive MIMO implementations. In the last years, a significant research

effort has been dedicated to developing low-complexity algorithms for the design

of precoding/detection matrices. Most of the works, however, require statistical

information of the users channels to reduce the complexity, what is also a com-

putationally demanding problem and may not be practical, especially in massive

MIMO regime. Hence, new methods, topologies, and/or algorithmic solutions

for massive MIMO are expected to be efficient and have the lowest computatio-

nal complexity possible (G. et al., 2014; BOROUJERDI; HAGHIGHATSHOAR; CAIRE,

1 Introduction 2

2018).

In 5G communication networks, new multiple access technologies are also

important for the massive connectivity of a large number of mobile users and

devices connected to the Internet of Things (IoT) with different quality of service

(QoS) requirements. A technology proposed for meeting these requirements is

the use of non-orthogonal multiple access (NOMA) schemes. Over the history of

wireless communications, the applied multiple access scheme has been a relevant

distinction between different systems. In many of the conventional multiple access

schemes, such as time division multiple access (TDMA) and orthogonal frequency

division multiple access (OFDMA), resource allocation is done orthogonally over

time or frequency so as to avoid interference between users. However, this type

of scheme presents problems such as the fact that resources occupied by users

with poor channel conditions, which are therefore not fully exploited, can not be

shared with other users (WONG et al., 2017).

In contrast to these conventional technologies of orthogonal multiple ac-

cess (OMA), NOMA schemes can accommodate many more users through non-

orthogonal resource allocation (DAI LINGLONG; WANG et al., 2015). NOMA re-

source allocation schemes allow users to share the same spectrum and time re-

sources by allocating them to different codes or different power levels. NOMA

techniques which exploit different power levels for user multiplexing, i.e., Power

Domain NOMA, are able to apply the available resources in a more efficient way,

capitalizing the specific channel conditions of each user and serving multiple users

with different QoS using the same spectrum, time and code resources (LIU et al.,

2017; DING et al., 2017).

3

2 Results

The numerical simulation results, the associated analyses, as well as the analy-

tical analyses related to both parts of this work are presented herein in the form

of articles written throughout the development of the activities. Such papers have

been submitted to conferences and for a journal. In the sequel, these articles are

briefly introduced and appended as part of this document.

[A] Resource Efficiency in Cluster-Based Downlink MIMO-NOMA Systems

Authors: Karina Bernardin Rosa, Taufik Abrao

Paper submitted to the conference IEEE WCNC 2019 (IEEE Wireless Com-

munications and Networking Conference).

Status: Under review (Nov. 29th, 2018)

[B] Sum-Rate Capacity in DL Massive MIMO with Partial CSI and Low-Complexity

Linear Precoders

Authors: Karina Bernardin Rosa, Joao Lucas Negrao,

Jose Carlos Marinello, Taufik Abrao

Paper submitted to the journal Transactions on Emerging Telecommunica-

tions Technologies (ETT).


From [B], two short-papers were also generated and submitted to conferences, as

follows:

[C] Correlation and Partial CSI on the DL Massive MIMO Low-Complexity

Kaczmarz Precoding

Authors: Karina Bernardin Rosa, Jose Carlos Marinello, Taufik Abrao

Paper submitted to the conference IEEE WCNC 2019 (IEEE Wireless Com-

munications and Networking Conference).


2 Results 4

[D] Low Complexity Kaczmarz Precoding for Massive MIMO Under Correlation

and Partial CSI

Authors: Karina Bernardin Rosa, Jose Carlos Marinello, Guixian Xu,

Taufik Abrao

Paper submitted to the conference IEEE ICC 2019 (IEEE International

Conference on Communications).


In [A], a cluster-based MIMO-NOMA system configuration is analyzed in

terms of spectral and resource efficiency through a two-step power allocation

strategy. The performance of the system for different beamforming directivities

and cluster-loading conditions was assessed through variation in the number of

both clusters and users per cluster. The work in [B] analyzes and compares the

sum-rate capacity achieved by three low-complexity linear precoders in a single

cell Massive MIMO broadcast channel; these linear precoders are the conventio-

nal zero-forcing beamforming, the regularized channel inversion and a precoding

scheme based on the iterative randomized Kaczmarz algorithm. A large-system

limit analysis is presented for both ZF and RCI. The rKA’s formulation and

its convergence for the DL is explored, and a complexity analysis of the three

precoding schemes is presented and indicates the rKA’s performance-complexity

trade-off superiority compared to the other conventional techniques when opera-

ting in Massive MIMO systems. As mentioned above, [C] and [D] were derived

from [B], presenting a compact version of this article’s results and some new dis-

cussions regarding the sum-rate capacity, convergence and complexity of the rKA

precoder.

5

3 Conclusions

Through this final year project, several concepts and issues associated to DL

massive MIMO scenarios were analyzed; specifically, we have focused on explora-

tory numerical analyses of the power domain applied to MIMO user multiplexing,

the performance analysis of massive MIMO precoding schemes and their associ-

ated computational complexity.

Initially, the spectral and resource efficiencies of a cluster-based MIMO-

NOMA system configuration were evaluated in order to determine the influence

of the number of antennas, as well as the number of users multiplexed in power,

in the system performance. For assessment purposes, we have considered a simple

user grouping and a two-step power allocation strategy; the best performance of

the system was achieved for a condition with the highest number of antennas at

the BS and only two users sharing the same power resources. The following best

performances, however, were not attained respectively for the following conditi-

ons of least amount of users per cluster. Instead, it was verified that in some

configurations a higher number of users in a cluster was shown to have a supe-

rior performance. Besides, the analysis of the system demonstrated, as expected,

that an excessive number of users multiplexed in power can result in performance

degradation.

The second part of this work focused on the sum-rate capacity analysis achie-

ved by three low-complexity linear precoders in a massive MIMO scenario equip-

ped with uniform linear array (ULA) antennas, and considering the impact of

channel correlation, as well as the imperfect channel state information on the

system performance-complexity tradeoff. The conventional ZF and RCI preco-

ders were compared with the iterative randomized Kaczmarz algorithm (rKA)

precoding scheme in terms of sum-rate capacity and complexity. Numerical re-

sults demonstrated that, despite the fact the number of iterations needed for the

convergence of rKA is reduced for poor channel estimation quality, it increases

with the SNR, spatial channel correlation and cell load conditions. Even though,

its performance-complexity tradeoff is still more favorable than the conventional

3 Conclusions 6

RCI and ZF precoding when operating in realistic massive MIMO scenarios and

conditions, what is significantly due to the low-complexity of the rKA precoding

algorithm, which does not depend on the channel inversion. The rKA preco-

der also demonstrated to be equally efficient if compared to RCI under a simple

equal power allocation policy and an adequate number of iterations. Moreover,

in this work, it was shown that imperfect channel estimation provoked a similar

performance degradation for the three precoding techniques.

7

Appendix A -- Resource Efficiency in

Cluster-Based Downlink MIMO-NOMA

Systems

Resource Efficiency in Cluster-Based DownlinkMIMO-NOMA Systems

Karina Bernardin Rosa, Taufik AbrãoDepartment of Electrical Engineering, State University of Londrina

Po.Box 10.011. CEP: 86057-970, Londrina, PR - [email protected], [email protected]

Abstract—This work discusses a cluster-based non-orthogonal mul-tiple access (NOMA) structure aiming at improving the resourceefficiency, namely the energy and spectral efficiencies (EE and SE), ina cell through power allocation. In the considered scenario, the usersare grouped into several clusters; hence, each beamformer serves theusers located at a specific cluster. The RE in a cluster-based downlink(DL) MIMO NOMA is attained through the application of a two-stepsimple power allocation strategy. The cluster head in each cluster isable to almost completely cancel the intra-cluster interference. Ourresults demonstrated that the best performance occurs for two usersper cluster.

I. INTRODUCTION

In contrast to the conventional orthogonal multiple access(OMA), the NOMA can fairly accommodates more than one uservia non-orthogonal resource allocation, via power-domain user mul-tiplexing, with exploitation of the channel differences among users,features that were not exploited in current and past cellular systems.The large power difference to deal with cell-center and cell-edgeusers facilitates the succesfull decoding of the signals designated foreach user, enabling the use of relatively low-complexity receiversat the receiver side [1][2].

The NOMA schemes are expected to increase the systemthroughput. Beyond that, user fairness and spectral efficiency canbe significantly improved under NOMA schemes, since the users,served by different power levels, are able to transmit at the samefrequency and at the same time slot [1]. A possibility for improvingspectral efficiency with NOMA is exploiting also the spatial degreesof freedom enabled by MIMO technologies, which is crucial forcomplying the performance requirements of 5G networks [3].

Multiple-antenna-aided NOMA design is capable of using direc-tional beamforming or spatial multiplexing in order to provide arraygains or increase the system’s throughput, respectively. Also, adopt-ing appropriate transmit precoding matrix design, enables to createuser-specific channels. Exploring these configurations can lead toa generalized NOMA design for satisfying the heterogeneous QoSuser requirements [4].

The fast growth of data traffic in the recent systems has beenaccompanied with an increase in the energy consumption. Thissituation, combined to limitations in the battery capacity occasionedby reduction in the terminal size limits, makes the energy efficiency(EE) one dominant target in the design of future wireless commu-nication systems. As a consequence, in the recent years numerousworks focusing on analyzing and enhancing not only the spectralefficiency (SE), but also the EE of wireless communication systemshave been developed.

Different from the most of works, which only consider either SEor EE, authors in [5] present a more general and flexible measure,called the Resource Efficiency (RE). The RE concept has also beenexplored in [6] for massive MIMO applications; RE is definedas a weighted combination of EE and SE. This new performance

measure has more flexibility in striking the balance between SEand EE, despite difficulties in solving the resulting optimizationproblem. Hence, authors first investigate an uplink-downlink dualityfor the RE maximization purpose. The conventional UL-DL dualityonly applies in the SE or the capacity region, and has yet notbeen explicitly established for the EE or the RE. In [6], authorsprove with rigorous derivation that the duality has a more generalform which can be directly used to tackle either the EE or REbeamforming design problem.

According to [4], current downlink MIMO-NOMA applicationscan be broadly classified into two categories: the beamformer-basedstructure, in which one beamformer is used for each user, and thecluster-based structure, on which one beamformer serves multipleusers. [4]. In this work we focus on the cluster-based structure,which is based on separating the users and grouping them intoseveral clusters, while each beamformer serves all the users of aspecific cluster. Then, by applying appropriate detector design, theinter-cluster interference can be reduced substantially or even totallysupressed, depending on which type of beamformer is used.

Decomposing techniques capable to transform MIMO-NOMAchannels into multiple SISO-NOMA channels are discussed in [7],[8]. The authors consider a BS equipped with M antennas while theusers equipped with N antennas each are randomly partitioned intoM clusters. The detection technique used in these designs is thenbased on the zero forcing (ZF) detection, and TPC designs can beapplied in order to completely cancel the intercluster interference.Thus, this approach leads to low-complexity SISO-NOMA modelwhere the conventional NOMA technique can be applied. Despiteof the clear advantages in using this proposed design, the zero-forcing detection adopted with such techniques requires a numberof antennas at each receiver (N ) be greater than or equal the numberof transmit antennas (N ěM ) [7] or at least to be greater than orequal half the number of transmit antennas (N ěM{2) [8].

Another cluster-based MIMO-NOMA design is proposed by[9]. In contrast to the ones mentioned before, this design allowsthe existence of intercluster interference, applying specific usergrouping and power allocation schemes in order to reduce this kindof interference. Besides, it was assumed a configuration with Ntransmit antennas at the BS and N clusters, with the assumptionthat each cluster has two users for the sake of simplicity. It wasalso assumed each user has a single antenna. Among the users in aspecific cluster, the user having a larger (smaller) channel gain is de-fined as a strong (weak) user. As a simple and practical alternative,zero-forcing beamforming (ZF-BF) is used under perfect channelstate information at the transmitter (CSIT), and the precoding isperformed by considering the channel gain of a particular user ofeach cluster. The user using the BF vector based on its own channeldoes not receive any interference from the other beams, i.e., inter-cluster interference is virtually zero. On the other hand, the other

users belong to the same cluster suffer inter-cluster interference,which negatively affects the decoding of the received signal. Since itis unhelpful when the strong users perform successive interferencecancelation (SIC), the authors consider the generation of the BFvectors based on the channels of the strong users of each cluster.In order to minimize the inter-cluster interference of the weakerchannel users and maximize the sum capacity while guaranteeingthe capacity requirement of the weak user, the authors respectivelypropose a clustering algorithm and a suitable power allocationscheme.

Another cluster-based MIMO-NOMA design is proposed in [10].This design also allows the existence of intercluster interference,applying a technique for reducing the interference and increase thestrength of the desired signals. The authors of [10] also utilize aZF-BF technique, but in this approach the precoding is performedby considering the equivalent channel gain of each cluster, insteadof any particular user channel gain.

Instead of using random clustering as in [7] and [8], this user-clustering approach is based on making the channel gains of usersmore distinctive and sorting the users in a specific cluster accordingto their equivalent normalized channel gains. This strategy enablesthe user with the highest channel gain, namely cluster head, tohave its gain very similar to the cluster equivalent channel gain,and thus to be able to almost completely cancel the inter-clusterinterference and completely cancel intra-cluster interference byinvoking ideal SIC. Each of the other users in the cluster, then,efficiently suppress the inter-cluster interference by estimating theirown cluster’s equivalent channel gain by multiplying their receivedsignal by a user-specific decoding scaling weight factor sent by thebase station (BS) prior to the data transmission process.

The main advantage of these two inter-cluster interference-tolerant designs is that they do not impose any number of antennasconstraints at the BS or at the users, which implies that the numberof antennas in each user is not somehow attached to the numberof antennas at the BS and, thus, these designs can be applyed toscenarios where the BS is equipped with a large antenna array,such as in massive-MIMO-NOMA or NOMA millimeter-wavecommunication scenarios.

In this paper, we consider the multiuser power allocation for acluster-based downlink MIMO-NOMA system and aim at achievingthe balance between the SE and the EE through the use of the REas a more general and flexible measure.

The main contributions of this work are threefold. We have anal-ized a clustering-based non-orthogonal multiple access (NOMA)structure under the resource efficiency (RE) perspective, whichcombines the energy and spectral efficiencies (EE and SE) per-formance metrics; we have demonstrated that there is a preferablegrouping strategy for achieving the RE optimality; finally, thiswork evaluates the computational complexity of the grouping-basedNOMA-MIMO schemes from the perspective of the performance-complexity tradeoff.Notation. In the following, boldface lower-case and uppercase char-acters denote vectors and matrices, respectively. The operators p¨qH ,tr p¨q and E r¨s denote conjugate transpose, trace and expectation,respectively. The M ˆ M identity matrix is denoted by IM . Arandom vector x „ CN tm,Θu is complex Gaussian distributedwith mean vector m and covariance matrix Θ.

The remainder of this paper is divided as follows. The systemmodel is described in Section II. NOMA precoding designs anduser grouping are discussed in Section III. In Section IV, we discussthe RE concept. Power allocation schemes are discussed in Section

V. In Section VI, numerical results are provided, and the work isconcluded in Section VII.

II. SYSTEM MODEL

In this work we consider a downlink multi-user (MU) MIMO-NOMA system with a single cell and one BS equipped with Mtransmit antennas used for beamforming. The total number of userequipents (UE) in a cell is K, where K ě M , and each UE isconsidered to have one receive antenna. Also, the receive antennasare grouped into L clusters. The users in each cluster are scheduledaccording to the NOMA principles.

In the assumed conditions L “M and each BF vector serves anindividual cluster, which means all clusters use the same spectrumresources. The m-th cluster consists of Km receive antennas suchthat

řMm“1Km “ K.

In practice, it may not be realistic to schedule all the users in acluster using NOMA. An alternative is to build a hybrid system, inwhich NOMA is combined with OMA techniques such as OFDMA.In this condition, L ą M and multiple clusters can utilize thesame BF vector while using orthogonal spectrum resources to eachother, while the users in each cluster are scheduled according tothe NOMA principles.

For simplicity, in this analysis we assume the number of clustersequal to the number of transmit antennas, each UE equipped witha single antenna and a fixed number K of users in each cluster, i.e., L “M and K “ K ˆM .

The users are considered to be randomly distributed an a disk,and the BS is assumed to be located at its center. The radius of thisdisk is denoted by r. We then divide this disk in two parts: onesmaller disk, with a radius r1 pr1 ă rq, and with the BS located atits center. The other part is a ring, constructed from the larger diskby removing the smaller one. Fig. 1 illustrates a generic systemmodel which follows the described assumptions.

r

1r

Figure 1. Generic system model.

We consider a channel model with Rayleigh fading and largescale path loss.

The data stream for the m-th cluster, corresponding to the m-thelement of the transmitted data vector is defined as:

sm “Kÿ

k“1

?pm,ksm,k, (1)

where pm,k and sm,k „ CN t0, 1u are the transmit power andsymbol of the k-th user from the m-th cluster, and

řKk“1 pm,k “ 1.

The transmitted data vector s P CMˆ1 is then defined as:

s ““

s1 s2 . . . sm . . . sM‰ᵀ. (2)

Also, the data vector s is multiplied by a power matrix P P

RMˆM and a precoding matrix G P CMˆM and, then, transmittedover a radio channel H P CKˆM . Each Hm P CKˆM represents

the radio channel of all K users in the m-th cluster, and can beexpressed as:

Hm ““

hm,1 hm,2 . . . hm,K‰T, (3)

with hm,k P C1ˆM as the radio channel gain vector of the k-thuser in the m-th cluster.

Thus, the radio channel matrix H P CKˆM is expressed as:

H ““

H1 H2 . . . Hm . . . HM

‰ᵀ. (4)

The power matrix corresponding to the power allocated to eachcluster is defined as P “ diagpp1, ¨ ¨ ¨ , pM q P RMˆM , withřMm“1 pm “ PT. The transmitted signal x P CMˆ1 obtained after

the power and precoding multiplication can be written as:

x “ GP1{2s “Mÿ

m“1

?pmgmsm. (5)

Also, the precoding vectors are normalized to satisfy the averagepower constraint, thus:

E“

‖ x ‖2‰

“ tr`

PGHG˘

ď PT, (6)

where PT ě 0 is the total available transmit power at the BS. Hence,the SNR at the receiver side is denoted as γ “ PT

σ2 .The received signal y P CKˆ1 can be expressed as:

y ““

y1 y2 . . . ym . . . yM‰ᵀ, (7)

where each ym P CKˆ1 is constituted by the signals received foreach of the K users in the m-th cluster, such as:

ym ““

ym,1 ym,2 . . . ym,k . . . ym,K‰ᵀ, (8)

where ym,k P C corresponds to the signal received by the k-th userof the m-th cluster.

The array y P CKˆ1 can be written in terms of its componentsas:

y “ HGP1{2s` z, (9)

where z P CKˆ1 represents the complex gaussian noise vectorwith variance σ2, whose elements are represented as zm,k P C „CN t0, σ2u. The received signal for the k-th user in the m-th clustercan then be expressed as:

ym,k “ hm,kGP1{2s` zm,k. (10)

III. PRECODING DESIGN AND GROUPING STRATEGY

In the MIMO-NOMA model under consideration K ą M , andthus we utilize a precoding technique suggested by [10] based on[11], where the actual channel matrix Hm P CKˆM correspondingto the K users of the m-th cluster is transformed into an equivalentchannel vector hm P C1ˆM . This manipulation then leads to a totalchannel matrix H P CMˆM as an equivalent to H, which providescompatible dimension for precoding application. The radio channelmatrix corresponding to the m-th cluster can be expressed as:

Hm ““

hm,1 hm,2 . . . hm,k . . . hm,K‰ᵀ. (11)

Taking the singular value decomposition (SVD) of Hm weobtain:

HmrKˆMs“ UmrKˆKsΣmrKˆMs

VHmrMˆMs

. (12)

In the considered system, each beamforming vector is utilizedby one cluster. According to this configuration, the equivalent radio

channel matrix Hm P C1ˆM representing the equivalent channelof the m-th cluster can be obtained as:

Hmr1ˆMs“ uHm,1r1ˆKs

UmrKˆKsΣmrKˆMsVHmrMˆMs

, (13)

where uHm,1 is the Hermitian transpose of the first column of Um

in (12).An alternative for calculating the equivalent channel matrix is

to assume the channel of the user closer to the BS in each clusteras the cluster equivalent channel vector, such as adopted in [9].According to both approaches, the equivalent radio channel matrixH P CMˆM formed by the equivalent channels of all the M clusterscan be expressed as:

H ““

h1 h2 . . . hm . . . hM‰ᵀ. (14)

We considered as our precoding technique the Regularized Zero-Forcing (RZF) precoding, that can be faced as a generalizationof the Zero-Forcing (ZF) precoding, in which a regularizationparameter is added to the pseudo-inverse matrix. Considering asystem equipped with the RZF precoder and the equivalent channelmatrix H, the precoding matrix solution is given by [12], [13], [14]:

GZF “ αHH`

HHH ` ξIM˘´1

, (15)

or equivalently as

GZF “ α`

HHH` ξIM˘´1

HH , (16)

where the normalizing constant α is chosen to satisfy the powerconstraint (6), and ξ ą 0 is the regularization parameter.

As stated in [15], by assuming independence between the datasymbols, the normalization constant for the RCI precoding isexpressed as

α “

g

f

f

e

PT

tr´

PH`

HHH` ξIM˘´2

HH¯ , (17)

where now the normalization factor α depends on the channelrealization H, as well as the regularization factor ξ.

Using the RZF precoder (16), the received vector y and signalym,k for each user can be respectively expressed as

y “ αH`

HHH` ξIM˘´1

HHP1{2s` n (18)

Since SIC is performed within each MIMO-NOMA cluster, thedynamic power allocation in each of them is performed in a waya user can decode and then supress the intra-cluster interferencefrom users with channel gains lower then its own. Thus, the signalreceived for the k-th user of the m-th cluster is denoted by:

ym,k “ α?pmhm,k

`

HHH` ξIM˘´1

hHm?pm,ksm,k`

` α?pmhm,k

`

HHH` ξIM˘´1

hHm

k´1ÿ

j“1

?pm,j sm,j

looooooooooooooooooooooooooooooomooooooooooooooooooooooooooooooon

intra-cluster interference

`

`

Mÿ

i“1,i‰m

α?pihm,k

`

HHH` ξIM˘´1

hHi si

looooooooooooooooooooooooooomooooooooooooooooooooooooooon

inter-cluster interference

`zm,k, (19)

where the first term in the right side of (19) is the desired signalfor the k-th user from the m-th cluster, while the other terms are theinterference introduced by the other users in the same cluster andin other clusters plus the received thermal noise. In order to avoid

excessive complexity, we advocate the use of single-user detectionat the DL receiver side. Hence, the SINR for each user is expressedas in Eq. 21 [12][16].

The achievable throughput for the k-th user of the m-th cluster

can then be expressed as:

Rm,k “ log2 p1` SINRm,kq , (20)

SINRm,k “α2|hm,k

`

HHH` ξIM˘´1

hHm|2pmpm,k

α2|hm,k`

HHH` ξIM˘´1

hHm|2pm

k´1ÿ

j“1

pm,j

loooooooooooooooooooooooooooomoooooooooooooooooooooooooooon

intra-cluster interference

`

Mÿ

i“1,i‰m

α2|hm,k`

HHH` ξIM˘´1

hHi |2pi

loooooooooooooooooooooooooomoooooooooooooooooooooooooon

inter-cluster interference

` σ2loomoon

Noise

. (21)

A. User Grouping Strategy

In a conventional NOMA system, it is preferable to pair userswhose channel conditions are significantly different, what improvesthe sum-rate of both users and also each individual user’s rate [1].Based on the work presented in [10], in this work we utilize asub-optimal user clustering scheme for downlink MIMO NOMAsystems, which exploits the channel gain differences among theusers in a cell in order to allow an optimal power allocation schemetargeting an enhancement in the sum-spectral efficiency in the cell.

According to the user-clustering scheme presented in this work,the cluster head in each downlink MIMO NOMA cluster can almostcompletely cancel the intra-cluster interference, and, thus, achievesmaximum throughput gain in comparison to the other users inthe cluster. Based on this consideration, one strategy to maximizethe overall system capacity in each cluster is to select the highchannel gain users in the cell as the cluster-heads of different MIMONOMA clusters. This consideration can also be used to determinethe number of clusters in the cell, based on the number of highchannel gain users available.

IV. RESOURCE EFFICIENCY

Instead of focusing on the SE or the EE separately as in thetraditional design, it is much more effective balancing the attainablesystem SE and EE by adopting the resource efficiency (RE) metric,such as presented in [5] and adopted in [6].

Conventional system designs usually focus on the SE, definedfor a single cell system with K single-antenna users and M BSantennas as SE “

řKi“1Ri, with Ri “ log2p1 ` SINRiq. On the

other hand, the energy efficiency (EE) has became an importantfigure of merit for the future wireless communications; the EE isdefined as the ratio of the weighted sum-rate to the total powerconsumption. The EE (bits/Hz/Joule) for the same K user single-cell system can be expressed as:

EE “SE

$řKi“1 pi `MPc ` P0

, (22)

where Pc is the constant circuit power consumption per antennaincluding power dissipations in the transmit filter, the mixer, thefrequency synthesizer, and the digital-to-analog converter, P0 isthe basic power consumed at the BS which is independent of thenumber of transmit antennas and $ ą 1 is a constant accountingfor the inefficiency of the power amplifier at the BS [6].

The resource efficiency is expressed as a weighted sum of theEE and the SE, and can be formulated as:

RE “ EE` ηSE

$PT `MPc ` P0, (23)

where η is the weighting factor in“

HzW

‰

controlling the weights ofEE and the SE on the design. Hence, when η “ 0 the expression(23) reduces to the EE, and it tends to the SE for η " 1. The valuechoice for this weighting factor is then up to the system designer[6].

A. RE Applied to Cluster-based MIMO-NOMA Design

In order to apply the RE concept and analyze the performance ofthe defined cluster-based MIMO-NOMA system model, we definethe system SE as:

SEnoma “

Mÿ

m“1

Kÿ

k“1

Rm,k. (24)

Furthermore, based on (22) we define the EE by:

EEnoma “

řMm“1

řKk“1Rm,k

$řMm“1 pm `MPc ` P0

. (25)

Finally, the RE for MIMO-NOMA is then defined combining (24)and (25) into (23) as:

REnoma “ EEnoma ` η ¨SEnoma

$PT `MPc ` P0. (26)

V. POWER ALLOCATION

In this MIMO-NOMA system, the power allocation is performedin a two-step method, such as proposed in [10]. First, the total BStransmit power is divided into the number of transmit beams, andthe transmit power for a beam is proportional to the number ofusers served by that beam/cluster, i.e. pm 9 Km where Km is thenumber of users inside the cluster m.

Considering all the clusters have the same number of users K ”Km,@m, then all the transmit beams serve the same number ofusers and, thus, the BS transmit power PT is equally divided amongthe beams.

pm “PT

KThe second step of the power allocation method consists in allo-

cating power for the users in each cluster. The power allocation forthe K users at the mth cluster is scheduled according to the NOMAprinciple, and thus the intra-cluster dynamic power allocation isessential. In the sequel we examine the power allocation based onthe inverse of channel state.

A. CSI Based Power Allocation

In this strategy, the power coefficients are determined basedon the channel state information experienced by each user of thecluster. The relation between the power shared per user and itschannel state is assumed to be inversely proportional, given by [17]

pm,k 91

gm,k, (27)

which means the BS assigns highest power fractions to users whichexperience bad channel conditions, while assigning less powerto the users with better channel conditions. The power fractionassigned to the k-th user from m-th cluster is given by:

pm,k “gm,igm,k

pm,i, i ‰ k. (28)

Considering the above equation and the fact thatřKk“1 pm,k “ 1

we can write:

pm,k

˜ Kÿ

i“1

gm,kgm,i

¸

“ 1, (29)

which leads to following power allocation policy for user k belong-ing to the cluster m:

pm,k “

˜

gm,k

Kÿ

i“1

1

gm,i

¸´1

. (30)

where K is the number of users associated to the cluster m.

VI. NUMERICAL RESULTS

In this section, we investigate the performance of the proposeduser-clustering and power allocation schemes via numerical simu-lations. The main parameters deployed in the numerical simulationsare listed in Table I. The radio channel was considered as theproduct of path loss and Rayleigh fading with zero mean and unitvariance. In the considered path-loss model the transmitted signalpower decays according to d´b

k , where dk is the distance betweenBS and the related user k and b is the path-loss exponent. A singlecell was considered, with the single BS located at the center of thecell area. The cell radius is set to 500 m and the distance betweena user and the BS is 50 ă dk ă 500 m. The circuit power perantenna is assumed Pc “ 30 dBm, and the basic power consumedat the BS is P0 “ 40 dBm. The inefficiency factor of the poweramplifier is set to $ “ 1.5. The cluster-heads are assumed to bedistributed within a specific distance r1 from the BS, as depictedin Fig. 2.

1r

r

Figure 2. Illustration of the clusterized system model for K “ 12 and K “ 4. TheBS is considered to have M “ 4 antennas.

We also assume perfect CSIT. All the simulations are performedconsidering a single transmission time interval (TTI) with itsinstantaneous channel gains perfectly estimated. Our numericalresults are shown in terms of system SE, EE and RE. Besides, inthe following analysis it is assumed the channel gains of the usersin each cluster are uncorrelated and identically and independentlydistributed (i.i.d.). The results obtained through the considerationthat the channels of cluster-heads as the equivalent channel vectors

are denominated MIMO-NOMA I; while the results associated withthe equivalent channel calculation applying eq. (13) are referred asMIMO-NOMA II.

Table IMIMO-NOMA SIMULATION PARAMETERS.

Parameter Value

Cell radius R “ 500 [m]Minimum distance UE–BS dmin “ 50 [m]Path loss exponent b “ 3.5

# Mobile users K “ 30

# BS antennas M P r3; 15s

# UE antennas 1

Weighting factor SE-EE η P r0.1; 100s

Inefficiency of PA $ “ 1.5

BS transmit power budget PT “ 30 dBmCircuit power per antenna Pc “ 30 dBmBasic power consumption BS P0 “ 40 dBmSNR γ P r0; 20s [dB]

# Monte-Carlo Trials T “ 500

Fig. 3 depicts the SE, EE and RE of MIMO-NOMA I for differentvalues of η. As can be seen from the (23), the RE reaches betterperformance for higher values of η.

η

10-1 100 101 102

SE

[bps/H

z], E

E a

nd R

E [bits/H

z/J

oule

]

0

5

10

15

20

25

M = 3, K = 30 SE

M = 3, K = 30 EE

M = 3, K = 30 RE

Figure 3. MIMO-NOMA Type I: SE, EE and RE vs η for M “ 3, K “ 30.

Fig. 4 presents the RE of the MIMO-NOMA I system fordifferent number of antennas at the BS and, consequently, fordifferent number of clusters in the cell. By analyzing the differencein the system performance when the number of clusters is changed,one can see that the condition which presents the better performanceis when K “ 2, i.e., in which there are two users per cluster.This is an expected result, since a higher number of antennasat the BS implicates in a higher directivity in the transmissionto each cluster, reducing the inter-cluster interference. Also, theresultant smaller number of users in each cluster leads to a reducedintra-cluster interference. Interesting, we have found that insteadof having the condition of three users per cluster as the secondbest performance condition in MIMO-NOMA I, however, we haveobtained the condition M “ 5, which results in a total of six usersper cluster, corresponding to the second highest RE results.

Finally, a comparison of the two MIMO-NOMA channel equiv-alent calculations is presented in Fig. 5 for different system dimen-

SNR [dB]0 2 4 6 8 10 12 14 16 18 20

Res

ourc

e E

ffici

ency

[bits

/Hz/

Joul

e]

0

1

2

3

4

5

6

7

8

9

M = 3, K = 30M = 5, K = 30M = 6, K = 30M = 10, K = 30M = 15, K = 30

Figure 4. RE for M P r3; 15s, K “ 30 and η “ 10.

sions M ¨K. As one can infer from the figure, and for the systemconditions assumed in this paper, the MIMO-NOMA II approachis inferior to MIMO-NOMA I in terms of RE. Also, this differenceappears to become smaller for an increase in the number of transmitantennas and users. Among relevant information shown in Fig. 5one can highlight the reduction in the system performance for alarger number of users (K “ 60). This indicates there is a limit inthe RE performance enhancement achieved by an increase in thevalue of K, after which the interference caused by the additionalusers implicates in RE degradation. Besides, when the number ofBS antennas is not enough to forming suitable clusters with reduced(mitigate) inter-cluster interference, which implies in high numberof users per clusters, the RE will be degraded in high SNR.

SNR [dB]0 5 10 15 20

RE

[bits

/Hz/

Joul

e]

0

0.5

1

1.5

2

2.5

3

3.5

4M = 3, K = 30 | MIMO-NOMA IM = 3, K = 30 | MIMO-NOMA IIM = 5, K = 30 | MIMO-NOMA IM = 5, K = 30 | MIMO-NOMA II

SNR [dB]0 5 10 15 20

RE

[bits

/Hz/

Joul

e]

0

0.5

1

1.5

2

2.5

3

3.5M = 3, K = 60 | MIMO-NOMA IM = 3, K = 60 | MIMO-NOMA IIM = 5, K = 60 | MIMO-NOMA IM = 5, K = 60 | MIMO-NOMA II

a) K “ 30 users b) K “ 60 users

Figure 5. RE MIMO-NOMA I and II for M P r3, 5s, K P r30, 60s and η “ 10.

VII. CONCLUSION

In this paper, we have analyzed the resource efficiency in acluster-based MIMO NOMA system through the application ofa two-step simple power allocation strategy. Following the user-clustering scheme presented in this work, the cluster head in each

downlink MIMO NOMA cluster can almost completely cancel theintra-cluster interference, and, thus, achieves maximum throughputgain in comparison to the other users in the cluster.

Analyzing the system performance in terms of RE, our resultsdemonstrated that the best performance occurs for two users percluster. However, the following highest RE values are not alwaysattained for the subsequent smaller numbers of users per cluster.Instead, there are conditions where a higher number of users in acluster was shown to have a superior performance.

A comparison of the two MIMO-NOMA channel equivalent cal-culations demonstrated that, for the assumed system conditions, theequivalent obtained through the SVD leads to inferior performanceresults than the direct choice of the cluster-head’s channel. Anobservation of the system RE for an increase in the number ofusers also showed that an excessive number of users can result inperformance degradation.

REFERENCES

[1] V. Wong, R. Schober, D. Ng, and L. Wang, Key Technologies for 5G WirelessSystems. Cambridge University Press, 2017.

[2] F. Luo and C. Zhang, Signal Processing for 5G: Algorithms and Implementa-tions, ser. Wiley - IEEE. Wiley, 2016.

[3] Z. Ding, X. Lei, G. K. Karagiannidis, R. Schober, J. Yuan, and V. K.Bhargava, “A survey on non-orthogonal multiple access for 5g networks:Research challenges and future trends,” IEEE Journal on Selected Areas inCommunications, vol. 35, no. 10, pp. 2181–2195, Oct 2017.

[4] Y. Liu, H. Xing, C. Pan, A. Nallanathan, M. Elkashlan, and L. Hanzo,“Multiple-antenna-assisted non-orthogonal multiple access,” IEEE WirelessCommunications, vol. 25, no. 2, pp. 17–23, April 2018.

[5] J. Tang, D. K. C. So, E. Alsusa, and K. A. Hamdi, “Resource efficiency:A new paradigm on energy efficiency and spectral efficiency tradeoff,” IEEETransactions on Wireless Communications, vol. 13, no. 8, pp. 4656–4669, Aug2014.

[6] Y. Huang, S. He, W. J., and J. Zhu, “Spectral and energy efficiency tradeoff formassive mimo,” IEEE Transactions on Vehicular Technology, vol. XX, no. xx,pp. 1–13, Accepted 2018.

[7] Z. Ding, F. Adachi, and H. V. Poor, “The application of mimo to non-orthogonal multiple access,” IEEE Transactions on Wireless Communications,vol. 15, no. 1, pp. 537–552, Jan 2016.

[8] Z. Ding, R. Schober, and H. V. Poor, “A general mimo framework fornoma downlink and uplink transmission based on signal alignment,” IEEETransactions on Wireless Communications, vol. 15, no. 6, pp. 4438–4454, June2016.

[9] B. Kimy, S. Lim, H. Kim, S. Suh, J. Kwun, S. Choi, C. Lee, S. Lee, andD. Hong, “Non-orthogonal multiple access in a downlink multiuser beam-forming system,” in MILCOM 2013 - 2013 IEEE Military CommunicationsConference, Nov 2013, pp. 1278–1283.

[10] S. Ali, E. Hossain, and D. I. Kim, “Non-orthogonal multiple access (noma) fordownlink multiuser mimo systems: User clustering, beamforming, and powerallocation,” IEEE Access, vol. 5, pp. 565–577, 2017.

[11] Q. H. Spencer, A. L. Swindlehurst, and M. Haardt, “Zero-forcing methods fordownlink spatial multiplexing in multiuser mimo channels,” IEEE Transactionson Signal Processing, vol. 52, no. 2, pp. 461–471, Feb 2004.

[12] R. Muharar and J. Evans, “Optimal power allocation for multiuser transmitbeamforming via regularized channel inversion,” in 2011 Conference Recordof the Forty Fifth Asilomar Conference on Signals, Systems and Computers(ASILOMAR), Nov 2011, pp. 1393–1397.

[13] C. B. Peel, B. M. Hochwald, and A. L. Swindlehurst, “A vector-perturbationtechnique for near-capacity multiantenna multiuser communication-part i:channel inversion and regularization,” IEEE Transactions on Communications,vol. 53, no. 1, pp. 195–202, Jan 2005.

[14] J. Hoydis, S. ten Brink, and M. Debbah, “Massive mimo in the ul/dl of cellularnetworks: How many antennas do we need?” IEEE Journal on Selected Areasin Communications, vol. 31, no. 2, pp. 160–171, February 2013.

[15] S. Wagner, R. Couillet, M. Debbah, and D. T. M. Slock, “Large systemanalysis of linear precoding in correlated miso broadcast channels under limitedfeedback,” IEEE Transactions on Information Theory, vol. 58, no. 7, pp. 4509–4537, July 2012.

[16] R. Muharar, “Multiuser precoding in wireless communication systems,” Ph.D.dissertation, The University of Melbourne, Department of Electrical andElectronic Engineering, 2012.

[17] M. M. El-Sayed, A. S. Ibrahim, and M. M. Khairy, “Power allocation strategiesfor non-orthogonal multiple access,” in 2016 International Conference onSelected Topics in Mobile Wireless Networking (MoWNeT), April 2016, pp.1–6.

14

Appendix B -- Sum-Rate Capacity in

DL Massive MIMO with Partial CSI

and Low-Complexity Linear Precoders

1

Sum-Rate Capacity in DL Massive MIMO withPartial CSI and Low-Complexity Linear Precoders

Karina Bernardin Rosa, João Lucas Negrão, Jose Carlos Marinello, Taufik Abrão

Abstract—The sum-rate capacity achieved by three low-complexity precoders is compared considering a single-cell mas-sive MIMO broadcast channel; these linear precoders include theconventional zero-forcing (ZF) beamforming, regularized channelinversion (RCI) precoding, and a precoding version that doesnot require a priori knowledge of the channel statistics, which isbased on the iterative randomized Kaczmarz algorithm (rKA).The sum-rate capacity vs complexity problem is analyzed inthe downlink (DL) with a base-station (BS) deploying massiveuniform linear array (ULA) antennas, while mobile terminals(MT) are equipped with single-antenna. We first derive the signal-to-interference-plus-noise (SINR) ratio for the selected precodersand transmission schemes. Numerical results demonstrated thatthe rKA’s performance-complexity tradeoff is superior comparedto conventional ZF and RCI precoding when operating in massiveMIMO systems, since it holds a relative robustness against systemloading increasing, implies in a considerably reduced number ofcomplex operations and appears to be equally efficient under asimple equal power allocation policy.

Index Terms—Channel Inversion, Kaczmarz algorithm, Mas-sive MIMO, Precoding, Sum-Rate capacity, Zero Forcing

I. INTRODUCTION

In the next generation of communication systems (5G), dueto the increasing demand for higher data rates transmissionand the exponential growth in the number of users, interferencehas turn into one of the major limiting factors for performanceand capacity of wireless cellular systems. In downlink (DL)transmission of a multiuser MIMO (MU-MIMO) scenario,interference between users is a major source of system errors;hence, schemes able to mitigate it without requiring excessivecoordination and control information exchanging are of greatinterest. These methods are commonly defined in the categoryof base station (BS) precoding and generally rely on thechannel state information (CSI) knowledge.

Beamforming is a widely known technique for interferencereduction and directed transmission of energy in the presenceof noise and interference. In MIMO systems, some beamform-ing techniques exploit channel knowledge at the transmitterside to maximize the signal-to-noise ratio (SNR) at the re-ceiver, while others utilize the channel information to transmitin the direction of the eigenvector corresponding to the largesteigenvalue of the channel [1], while canceling the transmissionin other directions. Furthermore, beamforming techniques canalso be used in the DL of a multi-user system aiming atmaximizing the signal-to-interference-plus-noise ratio (SINR)of a particular user [2].

K. B. Rosa, J. L. Negrão and T. Abrão are with Department of ElectricalEngineering, State University of Londrina (DEEL-UEL). Rod. Celso GarciaCid - PR445, Po.Box 10.011. CEP: 86057-970, Londrina, PR - Brazil. E-mail:[email protected]

From information theory perspective it is proved that thesum capacity of the MIMO broadcast (spatial multiplexingmode) channel can be achieved through the technique knownas dirty-paper-coding (DPC) [3]. However, DPC is a nonlin-ear precoding scheme and for most practical communicationsystems it is not feasible due to its very high computationalcomplexity. Due to this reason, researches have focused onquasi-optimal low-complexity approaches. In contrast to theDPC, it has been shown in [4], [5] that sub-optimal linearprecoders, such as the matched filtering (MF) precoding, alsoknown as conjugate beamforming, and the zero-forcing (ZF)beamforming can be applied to ensure much lower compu-tational complexity and still providing good performance interms of achievable sum-rate in the massive MIMO context.

Specifically, the ZF beamforming is a sub-optimal linearprecoding or transmit beamforming strategy that is able to can-cel the inter-user interference (IUI) simply by pre-multiplyingdata symbols with the inverse of the channel matrix. However,it has been demonstrated in [6] that the sum-capacity of theZF does not grow linearly with the number of users, whilechannel inversion-based precoding strategies can become aserious concern when the channel becomes ill-conditioned.Hence, in order to tackle this problem, a regularization pa-rameter is introduced in the channel inversion and, by that, thecorresponding sum-capacity scales linearly with the number ofusers, but under a slower rate than that achieved by the optimalDPC [6]. This beamforming technique is called regularizedchannel inversion (RCI) and it does not cancel the inter-userinterference completely as the ZF, but it is able to controlthe amount of interference introduced to each user. Thischaracteristic also contributes to the robustness against noisein comparison to ZF, which may enhance the noise power [7].Therefore, this regularization parameter should be optimallychosen to maximize some performance indicators, such as theSINR. The optimal regularization parameter when the numberof BS antennas is equal to the number of users was derived in[6], while in [8] a generic case was derived by using a largesystem analysis.

An alluring investigation about precoding techniques for thesingle-cell DL MU-MIMO systems is carried out in [4]. Theauthors compared the MF precoding and ZF with respect tospectral-efficiency and radiated energy-efficiency in a singlecell scenario. It is showed that, for high spectral-efficiency andlow energy-efficiency, ZF outperforms MF, while in the case atlow spectral-efficiency and high energy-efficiency the oppositeis true. An equivalent result for the uplink (UL) can be found in[9]; the authors have demonstrated that in a low SNR ratio, thesimple maximum ratio combining (MRC) receiver outperforms

2

the ZF receiver. This can be explained by the fact that, atlow power levels, the inter-user interference introduced by theMRC receiver is occasionally less than the noise enhancementcaused by the ZF detector; hence, the simple MRC detectorbecomes a better alternative. This result is analog for thedownlink.

In MIMO systems, the BS needs to compute precod-ing/receiving vectors in order to transmit/detect data to/fromeach user, which results in a complexity that grows proportion-ally to the number of antennas M and the number of usersK. Therefore, the increasing number of antennas in MassiveMIMO brings benefits, but they come at the cost of a sub-stantial increase in hardware and computational complexity.Different algorithms have been lately proposed to reduce suchcomplexity. However, the fundamental assumption in thesesolutions is that the exact statistics of the channel vectors ofthe users is a priori known. This is not a practical assumption,specially since estimating the channel covariance matrices ofthe users in a large system regime is also demanding in termsof both computational and storage requirements [12].

In the statistical channel covariance matrix and CSI uncer-tainty context, a technique for computing the precoder/detectorin a massive MIMO system that does not require a prioriknowledge of the statistics of the users channel vectors, basedon the randomized Kaczmarz algorithm (rKA), was proposedrecently in [12]. The KA was initially proposed by Kaczmarzas an iterative technique for solving over-determined (OD) setof linear equations (SLE) [13]. Recently, a randomized versionof KA was devised and analyzed for solving consistent OD-SLE [14]. With the increasing popularity of stochastic gradienttechniques and machine learning, KA has been revived [15]and applied to other problems such as solving quadraticequations [16]. Moreover, considering under-determined (UD)systems, in [12] an extended version of the rKA algorithmbased on [14] is proposed to deal with UD-SLE. In the samework, the rKA methodology is applied to different massiveMIMO scenarios, while the performance of both the rKA-based precoder and detector for Massive MIMO is analyzedtheoretically. Numerical results in [12] indicate that the rKAprecoder/detector schemes achieve a suitable performancewhile are very competitive from a computational viewpoint.

This work develops a comparative analysis between theRCI, ZF and the rKA precoding schemes in terms of sum-ratecapacity under an equal power allocation strategy and realisticmassive MIMO channel scenarios, considering 5G target appli-cations, imperfect channel estimation and antenna correlation.The contributions of this work include: a) comparison onthe RCI, ZF and rKA-based linear precoders performance-complexity tradeoff; b) analysis of the impact of partial CSIand antenna correlation on the system capacity.

Notation: In the following, boldface lower-case and up-percase characters denote vectors and matrices, respectively.The operators (·)H , tr (·), E [·] and V [·] denote conjugatetranspose, trace, expectation and variance, respectively. TheM × M identity matrix is denoted by IM . A random vec-tor x ∼ CN{m,Θ} is complex Gaussian distributed withmean vector m and covariance matrix Θ. The inner productbetween two matrices or vectors X and Y is denoted by

〈X,Y〉 = tr(XHY

). The notations ‖X‖F =

√〈X,X〉 and

‖x‖ are respectively used for the Frobenius norm of a matrixX and the `2-norm of a vector x. For two m×n and m′×nmatrices X and Y, we denote by [X;Y] the (m + m′) × nmatrix obtained by stacking the rows of X on top of the rowsof Y. G t(·) is a linear operator that depends on the argumentand on the internal randomization of the rKA algorithm untiliteration t.

II. SYSTEM MODEL

In this section, we consider the DL of a single-cell MU-MIMO broadcast channel, where the BS is equipped with Mantennas that transmit to K single-antenna user terminals. Itis also considered that the large-scale fading between the BSand the receiver user k is denoted by ak.

Also, it is considered an unbounded path-loss model in

which the transmitted signal power decays accordingly to1

rbk,

where rk is the distance between BS and the related user kand b is the path-loss exponent, which is related to the signaldecay behavior in the cell.

With these considerations, the received signal vector y ∈CK×1 of a narrow-band communication is given by

y = AHx + n, (1)

where x ∈ CM×1 is the transmit vector, A ∈ RK×K ,diag (a1, . . . , aK) is the path-loss diagonal matrix witha2k = 1/rbk , H ∈ CK×M the channel matrix and n =[n1 . . . nk . . . nK

]T ∈ CK×1 ∼ CN{0, σ2nIK} the

i.i.d Additive White Gaussian (AWG) noise vector. The trans-mit signal vector x is obtained from the product of a linearprecoding G ∈ CM×K ,

[g1 . . . gK

]with the symbol

vector s =[s1 s2 . . . sK

]T ∈ CK×1, which is normal-ized in power, i.e., E

[ssH

]= IK . Hence, the linear precoding

G and the power matrix P ∈ RK×K = diag (p1, . . . , pK)allocated for each of K users define the transmit vector:

x = GP1/2s ≡K∑

k=1

√pkgksk. (2)

Indeed, the transmit vector x can also be expressed as alinear combination of the independent user symbols sk, wheregk ∈ CM×1 and pk ≥ 0 are the precoding vector and thesignal power of the k-th user, respectively. We are workingunder the assumption of perfect channel state informationat the transmitter (pCSIT), as well as under channel errorestimates (eCSIT).

Moreover, the precoding vectors are normalized to satisfythe average power constraint, thus:

E[‖x‖2

]= tr

(PGHG

)≤ PT, (3)

where PT ≥ 0 is the total available transmit power at the BS.

Hence, the SNR at the transmitter side is denoted by γ =PT

σ2n

.

3

1) MIMO Correlation Channels: Correlated MIMO chan-nel scenarios are being considered, as in [18, eq.(6)-(7)]:

hk = hkΦ1/2, (4)

where hk ∈ C1×M is the k-th row of the channel matrix H,and the covariance channel matrix is denoted by

Φ = E{hkhHk }, (5)

where hk ∼ CN (01×M , IM ) is the uncorrelated small-scalechannel fading vector for the user k, and M is the numberof BS antennas. This is a general channel model family formassive MIMO channels, which can represent the followingchannel scenarios:

a) i.i.d. Rayleigh case (rich scattering environment) withΦ = IM .

b) spatial correlated channels under uniform linear antenna(ULA) array, where the correlation model is defined by:

[Φ]i,j = c|i−j|, with the correlation parameter c ∈ {0, 1}.(6)

c) planar antenna array (UPA) correlation model. A mul-tidimensional array correlation structure is constructedfor the UPA based on a Kronecker product of twoULA correlation matrices as in [19]. We assume anapproximation in which the correlation between elementsalong x coordinate does not depend on y and is given bymatrix Φx, and the correlation along y coordinate doesnot depend on x and is given by matrix Φy. The followingKronecker-type approximation of the UPA correlationmatrix is proposed by [19]:

Φ ≈ Φx ⊗Φy =

φx 1,1Ry . . . φx 1,nT

Ry

.... . .

...φx nR,1Ry . . . φx nR,nT

Ry

(7)where ⊗ denotes the Kronecker product. UPA correlationmatrix at the receiver Φ is the Kronecker product of twoULA correlation matrices Φx and Φy, which are Toeplitz.Therefore, even though Φ may not be a Toeplitz matrix,its approximation (7) has a Toeplitz per block structure.

According to [20], the approximation model for UPA cor-relation matrix in (7) is reasonably accurate, allowing theusage of the well-developed theory of Toeplitz matrices forthe analysis of multidimensional antenna arrays.

2) Noisy Version for the MIMO Channel Matrix: Besides,the channel matrix estimation available at BS Q in eachcoherence time period ∆TC is not perfect, i.e., Q 6= H. Indeed,estimated channel matrix Q is a noisy correlated version ofthe true channel matrix H, possibly modelled as:

Q =√

1− τ2 H + τN, where [N]k,m ∼ CN (0, 1) (8)

where τ ∈ [0; 1] is the estimation quality parameter, withτ = 0 yielding a perfect channel estimation, Q = H [12].

With the previous considerations, the received DL vector yfrom (1) can be rewritten as:

y = AHx + n = AHGP1/2s + n (9)

and the received DL signal yk at the k-th UE is given as:

yk = ak√pkhkgksk +

K∑

j=1,j 6=kak√pjhkgjsj + nk. (10)

As a result, the SINR per user is defined as [10]:

SINRk =a2kpk |hkgk|

2

K∑

j=1,j 6=ka2kpj |hkgj |2 + σ2

n

, ∀k = 1, . . . ,K

(11)and the normalized achievable rate of user k, in terms of[bits/s/Hz] is given as:

Rk = log2(1 + SINRk). (12)

Moreover, the ergodic sum-rate capacity is given by

RΣ = E

[K∑

k=1

log2(1 + SINRk)

], (13)

where the expectation is taken over a large number of channelrealizations hk, ∀k = 1, . . . ,K.

III. LINEAR BEAMFORMING SCHEMES

In this section we derive the sum-rate capacity for theconventional ZF and RCI linear precoders, as well as for aniterative low-complexity precoder based on the rKA [12].

A. Zero-Forcing Beamforming

The ZF precoding, also referred as channel inversion (CI)precoding, eliminates all the inter-user interference by per-forming an inversion of the channel matrix H at the transmitterside [21]. ZF beamforming is largely applied to MU-MIMOnetworks with single antenna users [4], [10]. Due to itssimplicity on designing beamforming vectors gk, it allows theusers to receive data free of interference, which can be attainedthanks to the orthogonality imposed by the beamformingvectors for different users. The precoding matrix for the ZF isgiven by [21]

GZF = αHH(HHH

)−1= α

(HHH

)−1HH , (14)

where the normalization constant α is chosen to satisfy thepower constraint in (3) and it is built only upon the channelrealization H, being determined by

α =

√√√√ PT

tr(P [HHH ]

−1) . (15)

The received vector can be represented as

y = αAHHH(HHH

)−1P1/2s + n

= αAP1/2s + n.(16)

With these considerations, the SINR of user k under ZFprecoding is given as

SINRZFk =

α2a2kpkσ2n

, with K≤M (17)

4

By using the ZF beamforming, the precoding vector isconstructed to eliminate the interference that a particular usermay cause to others, i.e., IUI.

Notice that the ZF has a limited number of users, which isbounded by the number of BS antennas, K≤M . If the numberof single-antenna users, K, increase beyond M , the IUI ismuch stronger in the system and the SINR described in (17)does not hold. For cases where K > M , the received signalvector y and signal yk are given, respectively, by:

y = αAH(HHH

)−1HHP1/2s + n

yk = αak√pkhk

(HHH

)−1hHk sk +

+αK∑

j=1,j 6=kak√pjhk

(HHH

)−1hHj sj + nk,

(18)

leading to a SINR for the kth user given by

SINRZFk =

α2a2kpk

∣∣∣hk(HHH

)−1hHk

∣∣∣2

α2

K∑

j=1,j 6=ka2kpj

∣∣∣hk(HHH

)−1hHj

∣∣∣2

+ σ2n

, (19)

with the term α2∑Kj=1,j 6=k a

2kpj

∣∣∣hk(HHH

)−1hHj

∣∣∣2

repre-senting the IUI.

Since the SINRZFk is proportional to α2, a rank deficiency on

the channel correlation matrix HHH will lead to a penalty inα2 and, consequently, in the SINR. This motivates the additionof a regularization parameter, as discussed in the sequel.

B. Regularized Channel Inversion

The RCI precoding can be faced as a generalization of theZF precoding, in which a regularization parameter is includedin the pseudo-inverse matrix. Basically, to compensate the pos-sibility of an ill-conditioned channel matrix H, for instance, incase of spatial channel correlation, the regularization term ξ isadded within the pseudo-inverse of the ZF precoding matrix,in Eq. (14). Hence, considering a system equipped with theRCI precoder, the precoding matrix solution is given by [11],[22], [23]:

GRCI = αHH(HHH + ξIK

)−1, (20)

or equivalently as

GRCI = α(HHH + ξIM

)−1HH , (21)

where the normalizing constant α is chosen to satisfy thepower constraint (3), and ξ > 0 is the regularization parameter.

As stated in [21], by assuming independence between thedata symbols, similarly to (15), the normalization constant forthe RCI precoding is expressed as

α =

√√√√ PT

tr(PH (HHH + ξIM )

−2HH

) . (22)

where now the normalization factor α depends on the channelrealization H, as well as the regularization factor ξ.

Using the RCI precoder (21), the received vector y andsignal yk for each user can be respectively expressed as

y = αAH(HHH + ξIM

)−1HHP1/2s + n (23)

yk = αak√pkhk

(HHH + ξIM

)−1hHk sk + (24)

+

K∑

j=1,j 6=kαak√pjhk

(HHH + ξIM

)−1hHj sj + nk,

where the first term in the right side of (24) is the desiredsignal for each user k, the second term is the interferenceintroduced by the other users and the last one is the receivedthermal noise.

In order to avoid excessive complexity, we advocate theuse of single-user detection at the DL receiver side. Hence,the SINR for each user is expressed as follows [11][24]

SINRRCIk =

α2a2kpk

∣∣∣hk(HHH + ξIM

)−1hHk

∣∣∣2

α2

K∑

j=1,j 6=ka2kpj

∣∣∣hk(HHH + ξIM

)−1hHj

∣∣∣2

+ σ2n

(25)Notice that when ξ → 0, the RCI precoder converges tothe ZF precoder. This relationship is direct, since the term(HHH + ξIM

)−1HH will tend to

(HHH

)−1HH as ξ tends

to zero.

C. rKA-based Precoding

The iterative rKA precoder, proposed in [12] as an extensionof the KA [13] and the rKA [14], does not require a prioriknowledge of the statistics of the users channel vectors ina massive MIMO system. In order to explore the potentialof rKA precoding, including complexity reduction1, we firstdefine a possibly quantized and noisy estimate of the truechannel matrix H available at the BS by Q ∈ CK×M asdefined in (8). For this purpose, all computation performed atthe BS is then assumed to be based on this estimate matrixQ. We consider the RCI as the linear precoding scheme, withthe transmit vector calculated as

x = Gs = αQH(QQH + ξIK

)−1P1/2s. (26)

Finding the suitable M -dimensional transmit vector to betransmitted to the users in the DL direction can be interpretedas finding the solution w∗ of a SLE of the form Aw = b,where A ∈ Cm×n and the vectors w ∈ Cn×1 and b ∈ Cm×1

are, respectively, associated with the estimate channel matrixQ, the transmit vector x and the symbol vector s. Herein,the optimal solution is found via the rKA iterative procedurepresented in Algorithm 1.

The rKA algorithm works by selecting one of the rows,say row number r(t), of the matrix A at each iteration t.This selection is random with a specific probability. Then itcalculates an estimate wt of the optimal solution w∗. If wt

satisfies the equation expressed by the conjugate of the r(t)-throw of A denoted by ar(t), i.e., if

⟨ar(t),w

t⟩

= br(t), then

1Since rKA-based precoding does not depend on the matrix inversion.

5

wt is hold. Otherwise, the algorithm updates wt along ar(t)to make the r(t)-th equation consistent. This update, made ateach iteration t, can be written as

wt+1 = wt +br(t) −

⟨ar(t),w

t⟩

∥∥ar(t)∥∥2 ar(t). (27)

At each iteration t the algorithm selects a row of Arandomly and independently from the previous iterations,according to a probability distribution p = (p1, . . . , pm)T ,where pi ∈ [0, 1] is the probability of selecting the i-th rowof A and

∑mi=1 pi = 1.

The main parameter that controls the convergence speed ofthe rKA is the average gain κχ(A,p) of the matrix A overthe subspace χ ∈ Cn generated by the conjugate of the rowsof A, defined in [12, Definition 1], which depends both on thematrix A and the probability distribution p. Indeed, the workin [12] also demonstrates κχ(A,p) ∈ [0, 1

min{m,n} ], and thatthe closer κχ(A,p) is to 1

min{m,n} , the faster rKA convergesacross the iterations. Thus, the optimal probability distributionvector p can be found through finding the distribution p thatmaximizes κχ(A,p). However, although this problem can besolved via convex optimization techniques, this incurs in thesame order of complexity as directly computing the precodingvectors, thus being not suitable for the application analyzedin this section, since we are looking for techniques withsubstantial complexity reduction.

Instead of using the optimal probability distribution men-tioned above, we use the suboptimal distribution p proposedby [14], where pi = ‖ai‖2

‖A‖2Fand the probability of selecting the

i-th row scales proportionally to its `2-norm ‖ai‖2. Computingthis probability distribution incurs in a complexity of O(mn),which is much lower than directly solving the maximizationproblem to find the optimal p.

It is demonstrated in [12] that, for a fixed loading factorβ = K

M , the matrix gain κχ(A,p) gives an approximation ofthe best matrix gain κχ(A,p∗) up to a multiplicative factor.This also leads to the conclusion that the suboptimal rKAconverges slower than the optimally-tuned rKA up to the samemultiplicative factor.

Notice that the KAs proposed in [13] and [14] are applicableto consistent overdetermined (OD) cases. For the massiveMIMO DL application, however, we need variants of KAfor OD scenarios, where the equations are almost alwaysinconsistent. The work in [12] proposes a new KA variantwhich uses a step prior to the rKA to remove the inconsistencyin the linear equations.

Based on (26), we deploy the new version of KA inspiredin [12] that is able to find the signal w =

(QQH + ξIK

)−1s,

considering P = IK since uniform power allocation isassumed, from which one can obtain x = αQHw and solvethe precoding massive MIMO problem in a simple, low-complexity but effective way.

Considering s ∈ CK×1 the symbol sequence to besent from the BS to the mobile users (DL), and x =αQH

(QQH + ξIK

)−1s being the M -dim precoded trans-

mitted signal, we can apply the rKA in order to find the K-

dim signal w =(QQH + ξIK

)−1s, from which we can write

x = αQHw.According to [12, Proposition 3], we can obtain

w =(QQH + ξIK

)−1s as the optimal solution of the

minx∈CK

∥∥QHx∥∥2

+ ξ∥∥∥x− s

ξ

∥∥∥2

. Hence, we can write this

as the least-squares minimization problem ‖Bx− b‖2, whereB = [QH ;

√ξIK] is an (M+K)×K matrix and b = [0;

√ξ sξ ]

is an (M +K)-dim vector. This leads to the OD SLE:

Bw = b, (28)

that must be solved for the vector w, from which we can obtainthe desired precoded signal x = αQHw at the BS massiveantennas side. However, the SLE in (28) is inconsistent unlesss = 0. Thus, in order to remove this inconsistency, we definez = Bw and solve the UD SLE defined as

BHz = BHb =√ξ

s√ξ

= s (29)

for z through the application of the rKA to (29), from whichwe obtain the estimate z. Then, we use z to solve the OD butconsistent SLE

Bw = z, (30)

which leads us to find w. Assuming B as defined previously,with Br(t) being the conjugate of the r(t)-th row of BH

and denoting z = [u;√ξv] for u ∈ CM×1, v ∈ CK×1, the

suboptimal probability that the row r(t) be selected in the rKAfor solving (29) is obtained through:

pr(t) =

∥∥Br(t)

∥∥2

‖BH‖2F=

∥∥qr(t)∥∥2

+ ξ

‖QH‖2F +Kξ, (31)

for qr(t) as the conjugate of r(t)-th row of Q. The updatestep of the rKA (27) applied for solving (29) can be writtenas:

zt+1 = zt +sr(t) −

⟨Br(t), z

t⟩

∥∥Br(t)

∥∥2

︸︷︷︸℘t

Br(t). (32)

Moreover, the residual term ℘t can be rewritten as:

℘t =sr(t) −

⟨qr(t),u

t⟩− ξvtr(t)∥∥qr(t)

∥∥2+ ξ

, (33)

which leads to the components u and v of vector z beinggiven respectively as:

ut+1 = ut + ℘tqr(t) (34)

vt+1r(t) = vtr(t) + ℘t, and vt+1

j = vtj for j 6= r(t). (35)

Finally, the desired precoding vector w in (30) can beobtained as the rKA estimate of v. The entire process issummarized by the pseudocode in Algorithm 1.

Algorithm 1 is used to directly compute a suitable transmitvector for the DL of the massive MIMO system. However,computing the DL precoding matrix from the available esti-mation of the channel state can be useful when the channelcoherence block is large, which results in the precoding matrixcorresponding to each block being computed only once insidethe block and in reducing the computational complexity [12].

6

Algorithm 1 Kaczmarz Algorithm for DL (Precoding)1: Input: Q ∈ CK×M , s ∈ CK×1 and RCI parameter ξ ≥ 0.2: Define ut and vt and initialize u0 = 0, v0 = 0.3: Apply KA to Eq. (29) through:4: for t = 0, 1, . . . , T − 1 do5: Pick a row r(t) of Q with prob.: pr(t) =

‖qr(t)‖2+ξ

‖QH‖2F +Kξ

6: Compute the residual ℘t as in Eq. (33).7: Update u, Eq. (34).8: Update v, Eq. (35).9: end for

10: Output: Set w = vT−1, Eq. (30).11: Output: Set x = αQHw.

1) Computing Precoding Matrix on a Specific CoherenceBlock using rKA: With the goal of generalizing the rKAprocedure aiming to compute the precoding matrix directly, wefocus on a specific coherence block. We then use Algorithm 1and run K KA procedures in parallel, where the input for thei-th KA is si = ei ∈ CK×1, for ei denoting the i-th canonicalbasis, with “1” in its i-th component and “0” elsewhere.The K parallel KA processes may share the same randomrow index at each iteration or may use their independentrandomness. After a suitable convergence, the output of thei-th KA using si = ei is denoted by wi ∈ CM×1, and thecorresponding precoding vector would be xi = αQHwi. Dueto the convergence of KA, xi should be approximately equalto

xi ≈ αQH(QQH + ξIK)−1ei, (36)

which corresponds to the i-th column of the precoding matrixαQH(QQH + ξIK)−1. Since Q is assumed to have a fullcolumn rank, one can conclude that wi, after a suitable con-vergence, corresponds to the i-th column of (QQH +ξIK)−1.

Hence, (QQH + ξIK)−1 can be approximated by the ma-trix W =

[w1 . . . wi . . . wK

]∈ CK×K consisting of

the outputs of the K parallel KAs with inputs si = ei, i ∈{1, 2, . . . ,K}. It also implies the precoding matrix can beapproximated as G = αQHW. As discussed in SectionV-E, in terms of computational complexity it is beneficialto compute the precoding vector following the multiplicationorder:

x = (αQH(Ws)), (37)

where the parenthesis indicate the order of multiplication.With the purpose of summarizing the precoders presented

in this work and to provide a better comparison between them,the precoding matrix, SINR and computational complexity interms of complex operations of each precoder structure areorganized in Table I.

IV. LARGE SYSTEM SINR ANALYSIS

In this section, a large limit analysis for the SINR and theachievable ergodic rate is provided. Specifically, the SINR ofthe RCI and the ZF precoding schemes are analyzed under alarge system limit where both the number of users K and thenumber of transmit antennas M approach infinity, with a fixedratio β = K

M < 1. Large system limit analysis expressions for

the asymptotic sum-rate capacity were previously studied in[21], [11] and [24]. Capacity bounds for the rKA precoderare provided based on [12] and [25]. Herein, the analysis isapplied to the DL of a massive MIMO system equipped withan M antennas ULA at the BS, and uniform power allocationis considered.

A. Sum-Rate under Transmit-Side Channel Correlation

In Appendix C the SINR expressions for ZF and RCIprecoders operating under large system regime are derivedaiming to determine the sum-rate capacity of such linearprecoding. Hence, the achievable sum-rate capacity is

R∞Σ =K∑

k=1

log2(1 + SINR∞,preck ) (38)

where SINR∞,preck is the SINR for the kth user in RCI and ZF

precoding defined by eq. (64) and (65), respectively. Noticethat the asymptotic SINR expressions for both linear precodersare different for each user and depend on the pathloss ak.

B. Capacity Bounds for rKA Precoder

The lower and upper bounds for the achievable ergodic ratewhen rKA precoding is deployed are presented in AppendixA, based on [12], and originally developed in [25].

For the DL massive MIMO scenario, an upper bound onthe achievable ergodic rate when the rKA precoding is run fora specific number of iterations t, can be obtained assumingtime-varying2 interference matrix, by treating the interferenceas noise and by coding across several coherence blocks, beingevaluated by:

Rk = E

[log2

(1 +

|Tdlk,k|2∑

k′ 6=k |Tdlk,k′ |2 + σ2

k

)], (39)

whereTdl = HG t(Q) = HG ∈ CK×K . (40)

For simplicity, we dropped the explicit dependence of Tdl

on the iteration t. The expectation in (39) is taken over all therandomness of the channel state H and the precoding matrixG t(Q), including the randomness due to KA. This yields theupper bound R =

∑Kk=1Rk on the achievable sum rate.

Notice that the upper bound R was obtained under thecondition that the true channel state H is available at theBS, such that the channel interference matrix Tdl is fullyknown to the BS. This yields an upper bound on the achievableergodic rate in our scenario where only a noisy version of His available at the BS, thus, the channel interference matrixTdl is not perfectly known at the BS (since H is not known).

For the lower bound on the ergodic rate, and following thederivation from [25], we can write:

Rk = log2

1 +

|E[Tdlk,k]|2

V[Tdlk,k

]+∑k′ 6=k E[|Tdl

k,k′ |2] + E [σ2k]

,

(41)

2Varying across coherence time intervals due to channel randomness, aswell as varying across multiple slots inside a coherence block due to internalrandomness of KA.

7

Table IPRECODING STRUCTURE, MATRIX AND SINR RELATIONS.

Precoding Precoding Matrix, G SINRk Complexity

ZF (K ≤M) αHH(HHH

)−1 α2a2kpkσ2n

O(K2M

)+O

(K3)+

+O (MK)

[26]ZF (K > M) α

(HHH

)−1HH

α2a2kpk

∣∣∣hk(HHH

)−1hHk

∣∣∣2

α2

K∑

j=1,j 6=ka2kpj

∣∣∣∣hk(HHH

)−1

hHj

∣∣∣∣2

+ σ2n

RCI α(HHH+ ξIM

)−1HH

α2a2kpk

∣∣∣hk(HHH+ ξIM

)−1hHk

∣∣∣2

α2

K∑

j=1,j 6=ka2kpj

∣∣∣∣hk(HHH+ ξIM

)−1

hHj

∣∣∣∣2

+ σ2n

Iterative rKA QH(QQH + ξIK)−1 |Tdlk,k|2∑

k′ 6=k |Tdlk,k′ |2 + σ2

k

, (after t iterations) O(

MK(1−√β)2

)+ 2 · O(MK)

which leads to the lower bound R =∑Kk=1Rk on the ergodic

rate. Hence, the DL system ergodic rate with rKA precodingis bounded by:

R ∈[R, R

]≡[K∑

k=1

Rk,K∑

k=1

Rk]. (42)

In order to calculate G t(Q) for each iteration t explicitly,we can apply (54) in Appendix B. As an alternative, one canapply the technique described for computing the precodingmatrix directly. As an example, in the DL scenario, wecan run K rKAs in parallel with the corresponding inputssi = ei, i = 1, . . . ,K for calculating the precoding matrixG. Using these concepts, at each instance of the Monte-Carlo(MC) simulation it is possible to compute G t(Q), producingindependent realizations of the random variables needed forcomputing the upper and lower bounds in (39) and (41),respectively.

V. NUMERICAL RESULTS

This section provides a comparison on the sum-rate capacityof the ZF beamforming, the RCI and rKA precoders fordifferent system arrangements, including distinct number ofantennas at the BS and number of users in the cell and perfectand imperfect CSI conditions. Besides, the rKA precoderconvergence is evaluated taking into consideration the channelestimation and correlation effects. We evaluate and comparethe proposed precoding schemes under equal power allocation(EPA). Moreover, the analytical large scale limit SINR andsum-rate capacity results are analyzed and compared with thenumerical simulation results aiming to corroborate its pre-coders behavior considering different system configurations.

A. Simulation Setup

The main simulation parameters are presented in Table II.We have considered an uniformly distributed random positioninside the cell for each user. As an example, Fig. 1 depicts anM ×K ≡ 64 × 64 system in a R = 500 m macro-cell with

uniformly distributed users. We considered an urban cellularradio environment, which leads to a path-loss exponent b =3.5. For correlated channels we have deployed a simple yeteffective ULA spatial correlation model of eq. (6) for low andmedium spatial correlation, i.e, adopting c = 0.2 and c = 0.5,respectively. For channel error estimation, we deployed themodel given in eq. (8).

Distance (m)-500 -400 -300 -200 -100 0 100 200 300 400 500

-500

-400

-300

-200

-100

0

100

200

300

400

500

UsersBSCell Limits

Figure 1. Sample of MT’s uniformly random distribution in a 64×64 macro-cell system with a Rmax = 500 m, a minimum Rmin = 50 m and an urbancellular radio environment (b = 3.5).

B. rKA Precoder – Typical Operation

Fig. 2 depicts the upper and lower capacity bounds (UBand LB, respectively) for the rKA precoder, and give us anidea on how many iterations T are necessary for the rKAprecoder attain the UB and LB capacity, respectively. Hence,considering typical system and channel parameters, includingloading factor β = 0.5, spatial ULA correlation c = 0.5,estimation quality τ = 0.1 and EPA, the number of necessaryiterations for the rKA precoder to achieve full convergence

8

Table IISIMULATION PARAMETERS.

Parameter Value

Cell radius R = 500 [m]# BS antennas M ∈ [64, 128]

# UE antennas 1# mobile users K ∈ [32, 64]

Loading factor β = 0.5

Regularization parameter ξ =M/SNR

Spatial ULA correlation c ∈ {0.0; 0.2; 0.5}Estimation quality parameter, (8) τ ∈ [0, 0.95]

Path loss exponent b = 3.5

Minimum distance UE–BS dmin = 50 mSNR γ ∈ [0, 20] [dB]#rKA iterations T ∈ [10, 900]

# trials T = 500

of the UB and LB sum-rate capacities is of about T = 400iterations.

-10 -5 0 5 10 15 20

SNR [dB]

0

5

10

15

20

25

30

35

rKA

Spe

ctra

l Effi

cien

cy [b

ps/H

z]

Analytical LBAnalytical UBLB T = 100UB T = 100LB T = 250UB T = 250LB T = 300UB T = 300LB T = 350UB T = 350LB T = 400UB T = 400

Figure 2. Capacity Upper and Lower Bounds of the rKA precoder for β =3264

, τ = 0.1, c = 0.5 and T = [100, 250, 300, 350, 400]; equal powerallocation.

1) Numerical and Analytical Capacity for the ZF, RCI andrKA: Numerical and analytical ergodic capacity calculationsfor the ZF, RCI and rKA precoding schemes under perfectCSI knowledge at the transmitter side are depicted in Fig. 3.For such analysis we have considered various cell-loading (β)and SNR (γ = P

σ2n

) conditions. The numerical results wereanalyzed considering M ∈ [64, 128]. The analytical capacityresults for ZF and RCI were included through the expressionsshown in the large system analysis developed in Section IV,and were calculated using normalized equal power values.

Fig. 3 illustrates the numerical sum-rate results of the threeanalyzed precoding schemes, evaluated for different systemdimensions and correlation parameters, i.e., non correlation(c = 0), and medium channel correlation (c = 0.5). It can beobserved that the large system analytical analysis fit well withthe ones evaluated by Monte-Carlo simulations. It can also beseen that, when the number of iterations applied is Tfull convg,

rKA has a similar performance to that of RCI for all SNRconditons and for different system dimensions.

The ZF precoding, in comparison, presents a more degra-dated performance than the other two for low SNR values,but reaches similar performance for higher SNR conditions.Also, the difference between the sum-rate of ZF and RCI/rKAbecomes larger for higher system dimensions.

SNR [dB]0 5 10 15 20

Cap

acity

[bps

/Hz]

0

10

20

30

40

50

60

70

80a) c = 0

rKA M = 64 K = 32ZF M = 64 K = 32RZF M = 64 K = 32rKA M = 128 K = 64ZF M = 128 K = 64RZF M = 128 K = 64

SNR [dB]0 5 10 15 20

Cap

acity

[bps

/Hz]

0

10

20

30

40

50

60

70b) c = 0.5

rKA M = 64 K = 32ZF M = 64 K = 32RZF M = 64 K = 32rKA M = 128 K = 64ZF M = 128 K = 64RZF M = 128 K = 64

Figure 3. Sum-Rate Capacity of the rKA, ZF and RCI precoders, consideringEPA, M ∈ [64, 128], K ∈ [32, 64], τ = 0, T = 400 for K = 32 andT = 600 for K = 64

C. rKA Convergence

In order to analyze the convergence speed of rKA, wecompute the minimum gain κχ(BH) of matrix BH along thesubspace χ spanned by the conjugate of the rows of BH ,which is given by:

κχ(BH) =λmin(QQH) + ξ

‖QH‖2F +Kξ. (43)

Since for any ξ ≥ 0 we have

λmin(QQH)

‖QH‖2F≤ λmin(QQH) + ξ

‖QH‖2F +Kξ, (44)

it is possible to verify the slowest convergence occurs for ξ =0. Thus, from [12, Proposition 2], it results in a theoreticalupper bound in the number of iterations needed for the fullconvergence of the algorithm, i.e., Tfull convg, which is givenby:

Tfull converg ≤∥∥QH

∥∥2

F

λmin(QQH). (45)

Hence, an order of O(‖QH‖2

F

λmin(QQH)

)iterations is sufficient for

the rKA attain full convergence, considering any regularizationfactor ξ ≥ 0.

9

Also according to [12], when the channel vectors of theusers have no spatial correlation, the full convergence upperbound can be approximated as:

Tfull converg ≤K

(1−√β)2. (46)

1) rKA Convergence under Spatial Correlation, ChannelEstimation and Increasing Number of Antennas: Fig. 4 sug-gests the rKA’s full convergence behavior w.r.t. the spatialantenna correlation (c) and channel estimation error (τ ), eval-uated through (45) for a medium loading system in a massiveMIMO configuration. The convergence impacts directly on thealgorithm’s complexity. As expected, the number of iterationsincreases with the spatial correlation index but, interestingly,decreases with the channel estimation error. Moreover, asthe number of BS antennas M and/or number of users Kincreases, the signal processing burden grows and, as a con-sequence, the full convergence of the rKA algorithm requiresa higher number of iterations.

0.80.6

0.4

τ

0.200

0.2

c

0.4

0.6

1000

800

600

400

200

1200

0.8

Tfu

llcon

verg

ence

(a) M = 64 and K = 32

0.80.6

0.4

τ

0.200

0.2

c

0.4

0.6

500

1000

1500

2000

2500

0.8

Tfu

llcon

verg

ence

(b) M = 128 and K = 64

Figure 4. Theoretical OB of Tfull convg × τ × c, evaluated through (45) fordifferent number of antennas and users, but same system loading β = 0.5.

Fig. 5 presents the theoretical convergence results calculatedboth through the exact expression (45) and the approximation

of (46). Fig. 5.a) shows that the theoretical Tfull convg valuespresent a linear dependence on the number of BS antennas.However, despite following the same pattern of growth ofthe precise results, the approximations implicate in a superiornumber of iterations needed for the full convergence of rKA.

Furthermore, in Fig. 5.b) we see that the approximation ofthe theoretical results, evaluated by (46), indicates the needfor a larger number of iterations than the exact calculations of(45) for channel correlation conditions in which c < 4. Afterthat, the precise theoretical results of Tfull convg exceed theapproximations. It is also worth noting that the exact resultsincrease exponentially w.r.t. the spatial antenna correlation.

M100 200 300 400 500

Tfu

llcvg

0

500

1000

1500

2000

2500

3000a) τ = 0 and c = 0

Theor. UBApprox. UB

c0 0.2 0.4 0.6 0.8

Tfu

llcvg

0

500

1000

1500

2000

2500b) τ = 0

Theor. UB M = 128, K = 64

Approx.UB M = 128, K = 64

Theor. UB M = 64, K = 32

Approx. UB M = 64, K = 32

Figure 5. UBs of rKA’s Tfull convg, considering β = 0.5 and perfect channelestimation (τ = 0).

Fig. 6 demonstrates that the theoretical UBs in the numberof iterations becomes larger as the loading factor β increases.As in the previous figure, the approximated Tfull convg valuesare shown to be higher than the more accurate ones, sincean uncorrelated scenario was considered. Also, the differenceamong the curves seems to become more significant for highervalues of β.

Fig. 7 depicts the numerical system capacity results obtainedfor the rKA precoder vs the number of rKA iterations T , forM = 64, K = 32, different spatial ULA correlation c andchannel estimation conditions τ . Following Eq. (45), the upperbound in the number of iterations needed for the convergenceof rKA for M = 64, K = 32 and an uncorrelated scenariois about T ≈ 286 iterations, while for (46) it is of about 373iterations. Analyzing the results from this figure we can seethat the convergence seems to occur between 250 and 300iterations. A similar number of iterations is needed for theconvergence under low channel correlation conditions, e.g.,c = 0.2, and slightly higher values of T are required for c =0.5. Also, Fig. 7 indicates that the number of rKA iterationsrequired for the full convergence of the rKA precoder, i.e.,Tfull convg, is progressively reduced as the channel estimationquality parameter becomes larger, with each increase of 0.2in the value of τ corresponding to a reduction of about 50iterations for the full convergence of the algorithm.

10

β

0.2 0.3 0.4 0.5 0.6 0.7

Tfu

llcon

verg

ence

0

1000

2000

3000

4000

5000

6000

Theoretical M = 128Approximated M = 128Theoretical M = 64Approximated M = 64

Figure 6. UBs of rKA’s Tfull convg considering perfect channel estimation(τ = 0) and an uncorrelated channel (c = 0).

T100 200 300 400

Cap

acity

[bps

/Hz]

16

17

18

19

20

21

22

23

24a) c = 0

A τ = 0rKA τ = 0A τ = 0.1rKA τ = 0.1A τ = 0.2rKA τ = 0.2

T100 200 300 400

Cap

acity

[bps

/Hz]

16

17

18

19

20

21

22

23

24b) c = 0.2


T100 200 300 400

Cap

acity

[bps

/Hz]

13

14

15

16

17

18

19

20

21

22c) c = 0.5


(a) 0 ≤ τ ≤ 0.2 and 100 ≤ T ≤ 400

T20 40 60 80 100

Cap

acity

[bps

/Hz]

0

5

10

15

20

25a) c = 0

A τ = 0.4rKA τ = 0.4A τ = 0.6rKA τ = 0.6A τ = 0.8rKA τ = 0.8A τ = 0.95rKA τ = 0.95

T20 40 60 80 100

Cap

acity

[bps

/Hz]

0

5

10

15

20

25b) c = 0.2


T20 40 60 80 100

Cap

acity

[bps

/Hz]

0

5

10

15

20

25c) c = 0.5


(b) 0.4 ≤ τ ≤ 0.95 and 10 ≤ T ≤ 100

Figure 7. Convergence of the rKA precoder for β = 3264

and SNR = 15 dB

The trend is the same when the number of antennas Mincreases but the system loading holds (β = 0.5). In thecase of M = 128, applying (45) under different correlationand channel estimation conditions, the upper bound in thenumber of iterations needed for the rKA to achieve full con-vergence in an uncorrelated scenario is about 628 iterations.Meanwhile, this number is of about 746 iterations for theapproximation of (46). The numerical results confirmed thatthe convergence occurs for 15 dB of SNR in the range of 550to 600 iterations. Furthermore, it is straightforward to verifynumerically (in same way found in Fig. 7) and analyticallythat the algorithm converges faster for higher values of τ , i.e.,for lower channel estimation quality conditions. Even thoughthe rKA’s full convergence occurs under a higher number ofiterations, increasing in 0.2 the value of τ results in a reductionof Tfull convg in approximately 150 iterations.

Fig. 8 shows the number of iterations for the full conver-gence of the rKA obtained both theoretically, using (45), andthrough observation of the numerical results for SNR = 15dB, such as those presented in Fig. 7. The figure provides acomparison of both results obtained for β = 0.5, c = 0.5 andsystem dimensions M = 64 and M = 128.

As can be verified through the figure, the Tfull convg valuesof both results are quite similar for lower τ , i.e., for superiorchannel estimation conditions. However, although both ofthem present a reduction in the number of iterations for highervalues of τ , this effect is much more evident in the numericalresults. For the theoretical results, the reduction in the numberof iterations for higher τ becomes more noticeable for c ≥ 0.5.

Analyticalb)

a)

b)a)

0.60.4

0.2

τ

0.10.0

0

500

1000

Tfu

llcon

verg

ence

M = 64K = 32

M = 128 K = 64

Simulations

Figure 8. Tfull convg of the rKA precoder for M ∈ [128, 64], β = 0.5 andc = 0.5.

2) rKA Convergence w.r.t. SNR: As can be seen from (43),the number of iterations needed for the full convergence of therKA precoder is dependent of the regularization parameter ξ.Since in this work we consider this term inversely proportionalto the SNR, we can conclude the value of SNR also impactson the Tfull convg.

Table III indicates that there is a dependence of Tfull convg

w.r.t. SNR, where a higher number of iterations is required forhigher SNR conditions. It can also be verified that the con-

11

vergence conditions are highly sensitive to an SNR alteration,since an increase in 5 dB of this value results in an incrementof Tfull convg in approximately 150 iterations, for M = 64, andof ≈ 300 iterations, for M = 128.

Examining the table below, it is possible to verify that theexact theoretical results are generally in accordance with thenumerically obtained values for an SNR regime of 15 dB.However, these upper bounds are inferior to the numericalverifications for higher SNR regimes, such as 20 dB, for lowcorrelation conditions, in which the approximated convergenceresults seem to represent a more adequate upper bound. Onthe other hand, the approximation results do not represent theconvergence behavior of rKA for medium to high correlationscenarios, in which the evaluations from the exact expressionappear to be more suitable to represent an upper bound forTfull convg.

Moreover, the numerical convergence of the rKA algorithmoccurs for a similar number of iterations for the three cor-relation conditions analyzed, i.e., c ∈ [0; 0.2; 0.5]. However,Tfull convg appears to be slightly higher for c = 0.5, indicatingan increase in the number of iterations for medium to highcorrelation conditions, i.e., for c ≥ 0.5. This result is alsoin accordance with the behavior presented by the analyticalresults evaluated through (45).

D. Capacity under Equal Power Allocation

The numerical results and bounds for the sum-rate capacityof the MIMO precoders are presented and compared underequal power assignment. The system analysis is performed asa data transmission from a BS with M antennas to K users(MT’s) equipped with a single antenna, characterizing the DLscenario of a massive MIMO single-cell.

First we have compared the sum-rate capacity attained withthe classical ZF and RCI precoders, as well as the rKAprecoder, under perfect CSI knowledge at the transmitter sidefor all cases. After that, we analyze the capacity degradationdue to the noisy and possibly quantized version of the channelmatrix, as described by Eq. (8), for different τ ’s.

1) Sum-Rate under Perfect CSI: Fig. 9 depicts the sum-rate capacity of ZF, RCI and rKA varying with the loadingfactor β for M = 128 antennas and SNR γ = 15 dB. Insuch system operation conditions it can be seen that, despitethe performance of RCI being always equal or better than thatobtained by rKA or ZF precoding, the three precoding schemesresult in a similar performance for β ≤ 0.3. For higher valuesof the loading factor, however, we can see that the ZF presentsprogressively more degradation than the other two precodingschemes, what leads to a considerably inferior performancefor β close to the unity.

Regarding the behavior of rKA for T s, the results showthat, for β ≤ 0.2, even T = 200 achieves the full convergenceof the algorithm. As the value of β increases to 0.3 and 0.4,respectively, the full convergence appears to occurs at aboutT = 300 and T = 400. Therefore, the increase in the cell loadconditions leads to a need for a higher number of iterationsin order to achieve the full convergence of rKA.

β

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Cap

acity

[bps

/Hz]

0

5

10

15

20

25

30

35

40

45

rKA T = 200rKA T = 300rKA T = 400rKA T = 500rKA T = 600ZFRCI

Figure 9. Sum-Rate Capacity of the rKA, ZF and RCI precoders for EPA,M = 128, τ = 0, c = 0, and β ∈ [0, 1]

2) Sum-Rate under Imperfect CSI: In this subsection, wefocus on the sum-rate capacity of the ZF, RCI and rKAprecoders under different quality levels of CSI at the BS, i.e.,assuming τ ∈ [0.0, 0.1, 0.2] in the channel model of eq. (8).

Fig. 10 depicts results for M = 128, K = 64 and correla-tion parameters c = 0 and c = 0.5. It can be seen that, underlow correlation and for the considered number of iterations ofrKA, both precoders perform almost identically in all the SNRrange, for any τ . Furthermore, RCI precoder presents a similarperformance degradation than that experienced by the rKA,even under less accurate channel estimates (τ = 0.2). Whenthe level of channel correlation increases, e.g., c = 0.5, thesum-rate capacity for both precoders is reduced accordingly.The capacities achieved by RCI and rKA are similar to thoseobtained under low-correlation scenario. Also, the sum-ratecapacity attained by deploying the RCI precoding degradeswith the increase of τ in a similar way to the one seen withthe rKA for the same channel estimation quality conditions.

On the other hand, even though it presents inferior sum-rate capacity results than the other schemes for lower SNRvalues, the ZF reaches the same sum-rate performance ofRCI and rKA for higher SNR values. Moreover, under lessaccurate channel estimates (higher values of τ ), ZF and RCIprecoders present a similar performance degradation than thatexperienced by the rKA.

Fig. 11 give us a general idea on how the sum-rate capacityresults of the three precoding schemes is degraded w.r.t theincrease of τ under low to high SNR regime. Since thecondition of M = 128, β = 0.5 and SNR = 20 dB was alsoanalyzed and Table III shows the numerical convergence forthis specific scenario happens in between 850 and 900 rKAiterations, a number of T = 900 iterations was considered forthe rKA while obtaining the results presented below.

In the different system dimensions and channel conditionsconsidered, the capacity results are reduced for more spatiallycorrelated channels. This degradation is not significant from

12

Table IIIFULL CONVERGENCE (Tfull convg) OF RKA PRECODING W.R.T. SNR REGIME UNDER β = 0.5 AND τ = 0

Antennas Correlation SNR Regime (Simulation) Upper BoundM c 15 [dB] 20 [dB] Theor. (UB) Approx. (UB)

0 250− 300 iter. 400− 450 iter. 286 iter.64 0.2 250− 300 iter. 400− 450 iter. 306 iter. 373 iter.

0.5 300− 350 iter. 450− 500 iter. 420 iter.0 550− 600 iter. 850− 900 iter. 630 iter.

128 0.2 550− 600 iter. 850− 900 iter. 667 iter. 746 iter.0.5 600− 650 iter. 900− 950 iter. 930 iter.

SNR [dB]0 5 10 15 20

Cap

acity

[bps

/Hz]

0

10

20

30

40

50

60

70

80a) c = 0

RCI τ = 0.0rKA τ = 0.1ZF τ = 0.1RCI τ = 0.1rKA τ = 0.2ZF τ = 0.2RCI τ = 0.2

SNR [dB]0 5 10 15 20

Cap

acity

[bps

/Hz]

0

10

20

30

40

50

60

70b) c = 0.5

RCI τ = 0.0rKA τ = 0.1ZF τ = 0.1RCI τ = 0.1rKA τ = 0.2ZF τ = 0.2RCI τ = 0.2

Figure 10. Sum-Rate Capacity of the rKA, ZF and RCI precoders for EPA,M = 128, K = 64; τ ∈ [0.0, 0.1, 0.2], c ∈ [0.0, 0.5], T = 400 forK = 32 and T = 600 for K = 64.

0 0.5 10

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

Cap

acity

[bps

/Hz]

a) SNR = 0 dB

rKA

ZF

RCI

0 0.5 10

2

4

6

8

10

12

14

16

18

20

Cap

acity

[bps

/Hz]

a) SNR = 10 dB

rKA

ZF

RCI

0 0.5 10

10

20

30

40

50

60

70

Cap

acity

[bps

/Hz]

a) SNR = 20 dB

rKA

ZF

RCI

Figure 11. Sum-Rate degradation of the rKA, ZF and RCI for M = 128,K = 64, c = 0.5 and T = 900; equal power allocation.

zero to low channel correlation, i.e., from c = 0 to c = 0.2; thecapacities for both conditions presented to be quite close to oneanother. The performance reduction is, however, much morenoticeable in medium correlation conditions, i.e., c = 0.5.

Another result verified in all the analyzed situations is aninferior performance of the ZF in comparison to RCI andrKA, for all values of τ and SNR considered and specially forlower channel estimation quality and poor channel conditions.Also, the RCI and rKA precoders present a highly similarperformance for all the analyzed situations.

In relation to the degradation of rKA and RCI for theintroduction of medium correlation condition shown in theFigure, these two precoding schemes present, in a SNR of 0dB, a degradation of about 9.1%, 11.1% and 12.5% for τ = 0,τ = 0.4 and τ = 0.8, respectively. For a SNR = 10 dB regime,the reductions in the capacity results from the zero and low tomedium correlation conditions are of about 11.1%, 17.1% and19,1% for τ = 0, τ = 0.4 and τ = 0.8, respectively. Likewise,for a SNR of 20 dB the degradation is around 14.3%, 15.91%and 20%, respectively, for τ = 0, τ = 0.4 and τ = 0.8. Thus,in general, the capacity degradation of rKA and RCI precodersw.r.t an increase in the correlation appears to be higher forbetter channel conditions, as well as for higher values of τ .

With regard to the degradation of ZF, from zero/low tomedium correlation conditions and a SNR of 0 dB, thecapacity is reduced in 23.5%, 21.4% and 18.2% for τ = 0,τ = 0.4 and τ = 0.8, respectively. For the same conditions asmentioned above and an SNR = 10 dB regime the degradationsare over 15.4%, 14.3% and 14%. Besides, for SNR = 20dB these reductions in the capacity are of approximately14.3%, 20.4% and 14.3% for τ = 0, τ = 0.4 and τ = 0.8,respectively. Therefore, the degradation in the performance ofZF precoding appears to be higher as the channel conditionsbecome worse. With respect to the degradation in the channelestimation quality, the performance reduction of ZF seems tobe similar or lower for an increment in the values of τ .

Thus, for worse channel conditions, the degradation pro-voked by an increase in the channel correlation seems toaffect more significantly the ZF precoder. For better channelconditions, however, the rKA and RCI precoders have a moreperceptible performance reduction occasioned by the morecorrelated conditions. Moreover, high channel estimation qual-ity conditions associated with an increase in the correlationappear to interfere more in the performance of ZF, while thecorrespondent degradation of rKA and RCI seems to be moreexpressive for low channel estimation quality scenarios.

13

E. Precoders Complexity

Linear precoding techniques are computationally more ef-ficient than its non-linear counterparts. However, most linearprecoding schemes, such as the ZF and the RCI, present anumber of arithmetic operations of O

(K2M

). This leads to

an intractable complexity for most of the linear precodingschemes in the massive MIMO regime [26].

Herein, we analyze the complexities of the RCI, ZF and rKAprecoding schemes. These complexities are treated simply asthe number of complex addition and multiplication operationsneeded for a given arithmetic operation [26]. Since the numberof floating point operations (flops) needed to implement thesecomplex operations varies according to the deployed hardwareand complex number representation, we will not compare thenumber of flops of the implemented operations. Instead, wefocus on the number of complex operations required for eachlinear precoder execute its function.

One way of obtaining the number of complex operations inorder to calculate (20) consists of the following steps [26]:

1) compute the matrix-matrix multiplication HHH ;2) add the diagonal matrix ξI to the result;3) compute HH(HHH + ξIK)−1.The number of complex operations needed to execute these

steps is K2(2M −1)+K+ K3

3 +2K2M . Using the Choleskifactorization to compute step 3) instead of completely invert-ing the matrix and, then, using matrix-matrix multiplicationleads to the total cost of

CT = K2(2M − 1) +K +4K3

3+MK(2K − 1)

Furthermore, the RCI precoding matrix GRCI is computed onceper coherence time period. Once it has been obtained, themultiplication GRCIs considering P = IK , which is computedevery time the channel is used for DL transmission, requiresextra M(2K − 1) operations. Thus, the number of operationsis in the order of O(K3)+O(K2M)+O(MK), which are ex-tremely computationally demanding in large MIMO systems.Besides, the known inversion algorithms are complicated toimplement in hardware.

The above complexity analysis can also be applied to the ZFprecoding scheme. However, in the ZF case, ξ = 0; hence, step2) is not necessary, what leads to a reduction of K complexoperations in the ZF if compared to RCI’s complexity.

Considering the rKA precoder, as presented in the sub-section V-C, a number of iterations in the order of

O(M ·‖QH‖2

F

λmin(QQH)

)is sufficient for the rKA computation in

M antennas. Indeed, since each step of KA requires Mmultiplications, the computational complexity of KA for the

described assumptions scales by M , whereO(‖QH‖2

F

λmin(QQH)

)is

the complexity per antenna. Besides, O(MK) operations mustbe added for computing the sampling probability distributionand O(MK) operations are needed for calculating QHwin Algorithm 1, resulting in a computational complexity of

O(M ·‖QH‖2

F

λmin(QQH)

)+ 2O(MK) for every channel use for data

transmission. If the matrix W is calculated, then it is beneficial

to keep the precoding matrix GrKA in the factorized formGrKA = αQHW and compute first the Ws multiplication ofx = αQHWs, which makes it possible to avoid an increaseby a factor O(K) in the complexity occasioned by the directcomputation of GrKA. Therefore, the rKA presents an overallcomputational complexity much lower than the one associatedwith RCI and ZF precoders. The complexities of the analyzedschemes are included in the column “Complexity” of Table I.

VI. CONCLUSIONS

We have analyzed the sum-rate capacity achieved by threelow-complexity linear precoders in a single-cell massiveMIMO environment equipped with ULA antennas, consideringthe impact of correlation channels and imperfect channelstate information. The ZF, RCI, and the iterative randomizedKaczmarz algorithm (rKA) precoding schemes were comparedin terms of sum-rate capacity vs complexity. Numerical resultsdemonstrated that, although the convergence of rKA presentedto increase with the SNR, spatial channel correlation and cellload conditions, its performance-complexity tradeoff is muchmore favorable than that of RCI and ZF precoding whenoperating in realistic massive MIMO scenarios and conditions.This is due to rKA’s relative low-complexity, which does notdepend on the channel inversion. Besides, the rKA precoderdemonstrated to be much more robust than ZF to the systemloading increase (or interference), and equally efficient ifcompared to RCI under a simple equal power allocation policy.Moreover, imperfect channel estimates degrades similarly theperformance of the three precoding techniques.

APPENDIX ACAPACITY BOUNDS ACHIEVED WITH RKA PRECODING

Lower and upper bounds of the achievable ergodic rate ofthe rKA are presented in the following [12].

First, consider an arbitrary SLE Ax = b, and let xt

be an estimation of rKA at iteration t starting from thezero initialization x0 = 0. Then, it is demonstrated in [12,Proposition 4] that xt = G t(A)b, where G t(A) is a linearoperator that depends on A and on the internal randomizationof KA until iteration t, but does not depend on b. The proofof this proposition is also presented in Appendix B.

The following lower and upper bounds on the achievableergodic rate when the rKA is run for a specific number ofiterations t are derived assuming this Proposition.

In the DL, the signals received by the users are given bythe vector y = Hx + n as described in Section II. From theproposition described above it results that the signal obtainedby the rKA (Algorithm 1) at a specific iteration t will be arandom linear function of s, such as:

z = G t(Q)s, (47)

which directly leads to

w = G t(Q)s. (48)

According to (26) we can write

x = QHG t(Q)s = G t(Q)s, (49)

14

where the resultant M × K matrix G t(Q) depends on thenoisy channel state Q and the internal randomization of KA,but does not depend on the signal s.

Since we also have

x = Gs = QHWs, (50)

we obtain that G t(Q) = G ∈ CM×K . Letting G t(Q) =[gt1 . . . gtK

]∈ CM×K , we can then interpret gtk ∈ CM×1

as the precoding vector corresponding to the k-th user at t-thiteration.

The time-varying3 interference matrix can be written as:

Tdl = HG t(Q) = HG ∈ CK×K , (51)

where for simplicity we dropped the explicit dependence ofTdl on the iteration t.

In the following, we use the upper and lower bounds on thecapacity deployed in [12], which in fact were developed in[25]. By treating the interference as noise and by coding acrossseveral coherence blocks, an upper bound on the ergodiccapacity can be expressed as:

Rk = E

[log2

(1 +

|Tdlk,k|2∑

k′ 6=k |Tdlk,k′ |2 + σ2

k

)](52)

where the expectation is taken over all the randomness of thechannel state H and the precoding matrix G t(Q) includingthe randomness due to KA. This yields the upper bound R =∑Kk=1Rk on the achievable sum rate.For the lower bound on the ergodic rate, from [25], follows:

Rk = log2

(1 +

|E[Tdlk,k]|2

Var[Tdlk,k] +

∑k′ 6=k E[|Tdl

k,k′ |2] + E [σ2k]

),

(53)which leads to the lower bound R =

∑Kk=1Rk on the ergodic

rate.

APPENDIX BPROOF OF [12, PROPOSITION 4]

We first assume that xt = G t(A)b is true for the t-thiteration. For t = 0, we have x0 = 0 which is a zero functionof b. Denoting r(t) = 1, . . . ,m as the index of the row of Aselected at iteration t and ar(t) = (Ar(t))

H we have:

xt+1 = xt +br(t) −

⟨ar(t),x

t⟩

∥∥ar(t)∥∥2 ar(t)

(i)= (In +

ar(t)aHr(t)∥∥ar(t)∥∥2 )xt +

ar(t)eHr(t)∥∥ar(t)∥∥2 b

(ii)=

((In −

ar(t)aHr(t)∥∥ar(t)∥∥2 )G t(A) +

ar(t)eHr(t)∥∥ar(t)∥∥2

)b

∆= G t+1(A)b, (54)

where in (i), er(t) ∈ Cm×1 is the canonical vector with “1” asthe index r(t) and “0” leading to br(t) = eHr(t)b, and in (ii) the

3Varying across multiple coherence time intervals due to channel random-ness, as well as varying across slots inside a coherence block due to internalrandomness of KA.

induction hypothesis that xt = G t(A)b is used. This impliesthat xt+1 = G t+1(A)b, where the linear operator G t+1(A)depends on A and on the internal randomization of KA untiliteration t+ 1 but does not depend on b.

APPENDIX CZF AND RCI SUM-RATE UNDER LARGE SYSTEM REGIME

Let’s consider the SINR formulation for the RCI precoder inEq. (25), with a finite size system and path-loss coefficientsto each user. Such consideration leads to each user havingspecific path-loss values. We can then rewrite (25) as:

SINRRCIk =

α2a2kSk

α2a2kIk + σ2

n

=α2a2

k|Ak|2

(1+Ak)2

α2a2kIk + σ2

n

, (55)

where Ak = hk(HHk Hk + ξIM

)−1hHk and

Ik =∑

j 6=k|hk(HHk Hk + ξIM

)−1hHj |2.

where α2, Ak and Ik are random quantities which depend onH [11].

Let us assume H = 1√M

H and hk = 1√M

hk, where, asin (4), hk ∼ CN (01×M , IM ) is the uncorrelated small-scalechannel fading vector for the user k. Besides, Ak and Ik canthen be rewritten respectively as:

Ak = hk

(HHk Hk + ρΦ−1

)−1

hHk (56)

andIk =

∑

j 6=k|hk(HHk Hk + ρΦ−1

)−1

hHj |2. (57)

In the large system regime, Ak converges almost surely tothe deterministic quantity ν [24], which can be found throughthe expression:

ν = ρ(1 + ν)2E12 + βE22 + βνE22, (58)

where ρ = ξ/M is the normalized regularization parameterand Eij is given by [24]

Eij = E[ T i

(ρ(1 + ν) + βT )j

], (59)

where the expectation is taken over the random variable Twhose distribution function is the limiting eigenvalue distri-bution of the correlation matrix Φ.

The interference term Ik converges almost surely (a.s.) to:

Ik a.s.−→ βE22

(1− βE22)(1 + ν)2. (60)

Besides, the term α2 can be shown to converge a.s. to:

α2 a.s.−→ PT(1− βE22)

βE12. (61)

Combining the large system limit results for Ak, Ik andα2, the SINR of user k converges almost surely to the limitingdeterministic SINR:

SINR∞,RCIk =

γkβ

ν2(1− βE22)

γkE22 + (1 + ν)2E12(62)

15

where γk =PT a

2k

σ2n

is the effective SNR.According to the exponential correlation model presented

in Section II, eq. (6), we calculate E12 and E22 as in [24, eq.(4.20) and (4.21)]. Based on these considerations, ν can beobtained through:

ν =1√(

ρ 1+c1−c + β

1+ν

)(ρ 1−c

1+c + β1+ν

) . (63)

Also, the limiting SINR on (62) can be rewritten as

SINR∞,RCIk = ν

γkΘ1 + ργβ (1 + ν)2Θ2

γkΘ1 + (1 + ν)2Θ2, (64)

where Θ1 = ρ(1 + ν)(1 + c2) + β(1 − c2) and Θ2 = [ρ(1 +ν)(1− c2) + β(1 + c2)].

For the ZF precoder, the limiting SNR under the consideredcorrelation model can be expressed as [24]:

SINR∞,ZFk =

γk(1−β2)

β2(µ+√µ2−1+1/β2)

β < 1

γk(β−1)γk(β−1)2+µβ2 β > 1,

(65)

where µ = 1+c2

1−c2 .

REFERENCES

[1] S. Moradi, R. Doostnejad, and G. Gulak, “Downlink beamforming forfdd systems with precoding and beam steering,” in 2011 IEEE GlobalTelecommunications Conference - GLOBECOM 2011, Dec 2011, pp.1–6.

[2] M. Bengtsson and B. Ottersten, “Uplink and downlink beamforming forfading channels,” in 1999 2nd IEEE Workshop on Signal ProcessingAdvances in Wireless Communications (Cat. No.99EX304), 1999, pp.350–353.

[3] M. Costa, “Writing on dirty paper (corresp.),” IEEE Transactions onInformation Theory, vol. 29, no. 3, pp. 439–441, May 1983.

[4] H. Yang and T. L. Marzetta, “Performance of conjugate and zero-forcingbeamforming in large-scale antenna systems,” IEEE Journal on SelectedAreas in Communications, vol. 31, no. 2, pp. 172–179, February 2013.

[5] A. Wiesel, Y. C. Eldar, and S. Shamai, “Zero-forcing precoding andgeneralized inverses,” IEEE Transactions on Signal Processing, vol. 56,no. 9, pp. 4409–4418, Sept 2008.

[6] C. B. Peel, B. M. Hochwald, and A. L. Swindlehurst, “Avector-perturbation technique for near-capacity multiantenna multiusercommunication-part i: channel inversion and regularization,” IEEETransactions on Communications, vol. 53, no. 1, pp. 195–202, Jan 2005.

[7] Y. Jiang, X. Zhu, E. Lim, Y. Huang, and H. Lin, Semi-Blind CarrierFrequency Offset Estimation and Channel Equalization, ser. Springer-Briefs in Electrical and Computer Engineering. Springer InternationalPublishing, 2015.

[8] V. K. Nguyen and J. S. Evans, “Multiuser transmit beamformingvia regularized channel inversion: A large system analysis,” in IEEEGLOBECOM 2008 - 2008 IEEE Global Telecommunications Confer-ence, Nov 2008, pp. 1–4.

[9] H. Q. Ngo, E. G. Larsson, and T. L. Marzetta, “Energy and spectralefficiency of very large multiuser mimo systems,” IEEE Transactionson Communications, vol. 61, no. 4, pp. 1436–1449, April 2013.

[10] R. Couillet and M. Debbah, Random Matrix Methods for WirelessCommunications. New York, NY, USA: Cambridge University Press,2011.

[11] R. Muharar and J. Evans, “Optimal power allocation for multiuser trans-mit beamforming via regularized channel inversion,” in 2011 ConferenceRecord of the Forty Fifth Asilomar Conference on Signals, Systems andComputers (ASILOMAR), Nov 2011, pp. 1393–1397.

[12] M. N. Boroujerdi, S. Haghighatshoar, and G. Caire, “Low-complexitystatistically robust precoder/detector computation for massive mimosystems,” arXiv: Cornell University Repository, vol. 1, no. 1, pp. 1–32, Jul 2018.

[13] S. Kaczmarz, “Angenaherte auflsung von systemen linearer gleichungen.sciences mathmatiques. serie a,” Bulletin International de l’AcadmiePolonaise des Sciences et des Lettres. Classe des Sciences Mathmatiqueset Naturelles, pp. 335–357, 1937.

[14] T. Strohmer and R. Vershynin, “A randomized solver for linear sys-tems with exponential convergence,” in Approximation, Randomization,and Combinatorial Optimization. Algorithms and Techniques, J. Díaz,K. Jansen, J. D. P. Rolim, and U. Zwick, Eds. Berlin, Heidelberg:Springer Berlin Heidelberg, 2006, pp. 499–507.

[15] D. Needell, R. Ward, and N. Srebro, “Stochastic gradient descent,weighted sampling, and the randomized kaczmarz algorithm,” in Ad-vances in Neural Information Processing Systems 27, Z. Ghahramani,M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds.Curran Associates, Inc., 2014, pp. 1017–1025.

[16] Y. Chi and Y. M. Lu, “Kaczmarz method for solving quadratic equa-tions,” IEEE Signal Processing Letters, vol. 23, no. 9, pp. 1183–1187,Sept 2016.

[17] M. A. Khalighi, J. M. Brossier, G. V. Jourdain, and K. Raoof, “Waterfilling capacity of rayleigh mimo channels,” in 12th IEEE InternationalSymposium on Personal, Indoor and Mobile Radio Communications.PIMRC 2001. Proceedings (Cat. No.01TH8598), vol. 1, Sep 2001, pp.A–155–A–158 vol.1.

[18] S. Qiu, D. Chen, D. Qu, K. Luo, and T. Jiang, “Downlink precodingwith mixed statistical and imperfect instantaneous csi for massive mimosystems,” IEEE Transactions on Vehicular Technology, vol. 67, no. 4,pp. 3028–3041, April 2018.

[19] G. Levin and S. Loyka, “On capacity-maximizing angular densities ofmultipath in mimo channels,” in 2010 IEEE 72nd Vehicular TechnologyConference - Fall, Sept 2010, pp. 1–5.

[20] J. Li, X. Su, J. Zeng, Y. Zhao, S. Yu, L. Xiao, and X. Xu, “Codebookdesign for uniform rectangular arrays of massive antennas,” in 2013IEEE 77th Vehicular Technology Conference (VTC Spring), June 2013,pp. 1–5.

[21] S. Wagner, R. Couillet, M. Debbah, and D. T. M. Slock, “Large systemanalysis of linear precoding in correlated miso broadcast channels underlimited feedback,” IEEE Transactions on Information Theory, vol. 58,no. 7, pp. 4509–4537, July 2012.

[22] C. B. Peel, B. M. Hochwald, and A. L. Swindlehurst, “Avector-perturbation technique for near-capacity multiantenna multiusercommunication-part i: channel inversion and regularization,” IEEETransactions on Communications, vol. 53, no. 1, pp. 195–202, Jan 2005.

[23] J. Hoydis, S. ten Brink, and M. Debbah, “Massive mimo in the ul/dl ofcellular networks: How many antennas do we need?” IEEE Journal onSelected Areas in Communications, vol. 31, no. 2, pp. 160–171, February2013.

[24] R. Muharar, “Multiuser precoding in wireless communication systems,”Ph.D. dissertation, The University of Melbourne, Department of Elec-trical and Electronic Engineering, 2012.

[25] T. L. Marzetta, E. G. Larsson, H. Yang, and H. Q. Ngo, Fundamentalsof Massive MIMO. Cambridge University Press, 2016.

[26] A. Müller, A. Kammoun, E. Björnson, and M. Debbah, “Linearprecoding based on truncated polynomial expansion - part I: large-scale single-cell systems,” CoRR, vol. abs/1310.1806, 2013. [Online].Available: http://arxiv.org/abs/1310.1806

30

Bibliography

BOROUJERDI, M. N.; HAGHIGHATSHOAR, S.; CAIRE, G. Low-complexitystatistically robust precoder/detector computation for massive mimo systems.arXiv: Cornell University Repository, v. 1, n. 1, p. 1–32, Jul 2018.

DAI LINGLONG; WANG, B.; YUAN, Y.; HAN, S.; I, C.-L.; WANG, Z. Non-orthogonal multiple access for 5g: Solutions, challenges, opportunities, and futureresearch trends. IEEE Communications Magazine, September 2015.

DING, Z.; LEI, X.; KARAGIANNIDIS, G. K.; SCHOBER, R.; YUAN, J.;BHARGAVA, V. K. A survey on non-orthogonal multiple access for 5g networks:Research challenges and future trends. IEEE Journal on Selected Areas inCommunications, v. 35, n. 10, p. 2181–2195, Oct 2017. ISSN 0733-8716.

G., L. E.; EDFORS, S. O.; TUFVESSON, S. F.; MARZETTA, S. T. L. Massivemimo for next generation wireless systems. IEEE Communications Magazine,v. 52, p. 186–195, Feb 2014.

LIU, Y.; QIN, Z.; ELKASHLAN, M.; DING, Z.; NALLANATHAN, A.; HANZO,L. Non-orthogonal multiple access for 5g and beyond. Proceedings of theIEEE, 10 2017.

LU, L.; LI, G. Y.; SWINDLEHURST, A. L.; ASHIKHMIN, A.; ZHANG, R. Anoverview of massive mimo: Benefits and challenges. IEEE Journal of SelectedTopics in Signal Processing, v. 8, n. 5, p. 742–758, Oct 2014. ISSN 1932-4553.

WONG, V.; SCHOBER, R.; NG, D.; WANG, L. Key Technologies for 5GWireless Systems. : Cambridge University Press, 2017. ISBN 9781316781067.

Documents

Resource Allocation in 5G Systems with Massive MIMO and