Discovering the Hidden Structure of Global Money Flows
Please cite the following paper if you feel that your research benefits from the codes and ideas in this chapter:
The World Bank data
The World Bank provides annual statistics of global migration and remittances in the Migration and Remittances Factbook. By analyzing this data set we are able to track the impact of globalization, in particular, the movement of labor and the flow of currency between countries.
I downloaded the file "Bilateral Remittance Matrix 2012" from the above hyperlink. This is a Microsoft Excel file, so we need the openpyxl package to handle it.
import sys, random
import numpy as np
import pylab as plt
import networkx as nx
import matplotlib.cm as cm
from collections import defaultdict
from openpyxl import load_workbook
wb = load_workbook(filename = '/Users/lingfeiw/Desktop/BilateralRemittanceMatrix2012.xlsx', read_only=True)
sheet = wb.get_sheet_by_name(wb.get_sheet_names()[0])
data=[]
sheet = wb.get_sheet_by_name('Bilateral Remittance Matrix2012')
for row in sheet.iter_rows():
data_row = []
for cell in row:
data_row += [cell.internal_value]
data += [data_row]
Construct money flow network
countries=[]
for i in range(2, sheet.max_column-1):
countries.append(sheet.cell(row=2, column=i).value.encode('utf-8').strip())
G=nx.DiGraph()
for i in range(len(countries)):
rowth = i+2
for j in range(len(countries)):
colth = j+1
w=data[rowth][colth]
if w:
G.add_edge(countries[i],countries[j],weight=float(w))
len(G.nodes()),len(G.edges())
The above codes construct a network in which nodes are countries and the directed edges are the money flows from one country to another. The weights on edges shows the amount of currency in million of U.S. dollars. From example, testing
G['Australia']['Zambia']
shows that in 2012 there are 2.2 million U.S. dollars transferred from Australia to Zambia in currency. There are 201 countries and 7660 edges in the network.
An algorithm to quantify the hierarchy of networks
The hierarchy of directed, weighted networks is widely observed in real-world systems, yet the methods to quantify it remains unclear. Jiang Zhang et al. proposed a Markov-chain based technique to calculate the "flow distance" of nodes, i.e., the average number of steps a random walker takes to reach the investigated node.
The above figure shows two examples of flow networks and the calculated flow distance of each node (marked in red). Two artificial nodes, source and sink, are added to balance the network such that weighted in-degree (the sum of weights over inbound edges) equals weighted out-degree (the sum of weights over outbound edges) on each node except for source and sink themselves. As shown in Panel A, this method gives a finite flow distance for nodes in loops.
This algorithm can be used to quantify various real-world patterns relevant to the hierarchical structure of networks, such as the trophic level of food chains, and the direction of influence between opinion leaders and their followers. Here we will use it to discover the direction of money flow between countries (from countries of lower flow distance to those of higher flow distance).
def flowDistanceFromSource(G): #input a balanced nx graph
R = G.reverse()
mapping = {'source':'sink','sink':'source'}
H = nx.relabel_nodes(R,mapping)
#---------initialize flow distance dict------
L = dict((i,1) for i in G.nodes())
#---------prepare weighted out-degree dict------
T = G.out_degree(weight='weight')
#---------iterate until converge------------
ls = np.array(L.values())
delta = len(L)*0.01 + 1
k=0
while delta > len(L)*0.01:
k+=1
if k>20:
break
for i in L:
l=1
for m,n in H.edges(i):
l+=L[n]*H[m][n].values()[0]/float(T[m])
L[i]=l
delta = sum(np.abs(np.array(L.values()) - ls))
ls = np.array(L.values())
#---------clean the result-------
del L['sink']
for i in L:
L[i]-=1
L['sink'] = L.pop('source')
return L
def flowBalancing(G):
RG = G.reverse()
E=defaultdict(lambda:0)
for x,y in G.edges():
E[(x,y)]+=G[x][y]['weight']
def nodeBalancing(node):
outw=0
for i in G.edges(node):
outw+=G[i[0]][i[1]].values()[0]
inw=0
for i in RG.edges(node):
inw+=RG[i[0]][i[1]].values()[0]
deltaflow=inw-outw
if deltaflow > 0:
E[(node,'sink')]+=deltaflow
#H.add_edge(node, "sink",weight=deltaflow)
elif deltaflow < 0:
E[('source',node)]+=abs(deltaflow)
#H.add_edge("source", node, weight=abs(deltaflow))
else:
pass
for i in G.nodes():
nodeBalancing(i)
H = nx.DiGraph()
for x,y in E:
H.add_edge(x,y, weight=E[(x,y)])
if ("source", "source") in H.edges(): H.remove_edge("source", "source")
if ("sink", "sink") in H.edges(): H.remove_edge("sink", "sink")
return H
The above codes would calculate the flow distance from "source" (which represent "the environment" of the economic system) to nodes.
Tree layout function
To generate a neat tree layout turns out to be more difficult that I thought. Thanks to Bill Mill, we can get nice Python codes from his blog. Putting all the relevant codes here would make this chapter a little bit longer than others. But I prefer to do this, so that you do not suffer from gathering the scattered pieces of codes from Bill's blog and make them work.
#Tree ploting functions
# from http://billmill.org/pymag-trees/
class Tree:
def __init__(self, node="", *children):
self.node = node
self.width = len(node)
if children: self.children = children
else: self.children = []
def __str__(self):
return "%s" % (self.node)
def __repr__(self):
return "%s" % (self.node)
def __getitem__(self, key):
if isinstance(key, int) or isinstance(key, slice):
return self.children[key]
if isinstance(key, str):
for child in self.children:
if child.node == key: return child
def __iter__(self): return self.children.__iter__()
def __len__(self): return len(self.children)
def addChild(self,nodeName): self.children.append(nodeName)
class DrawTree(object):
def __init__(self, tree, parent=None, depth=0, number=1):
self.x = -1.
self.y = depth
self.tree = tree
self.children = [DrawTree(c, self, depth+1, i+1)
for i, c
in enumerate(tree.children)]
self.parent = parent
self.thread = None
self.mod = 0
self.ancestor = self
self.change = self.shift = 0
self._lmost_sibling = None
#this is the number of the node in its group of siblings 1..n
self.number = number
def left(self):
return self.thread or len(self.children) and self.children[0]
def right(self):
return self.thread or len(self.children) and self.children[-1]
def lbrother(self):
n = None
if self.parent:
for node in self.parent.children:
if node == self: return n
else: n = node
return n
def get_lmost_sibling(self):
if not self._lmost_sibling and self.parent and self != \
self.parent.children[0]:
self._lmost_sibling = self.parent.children[0]
return self._lmost_sibling
lmost_sibling = property(get_lmost_sibling)
def __str__(self): return "%s: x=%s mod=%s" % (self.tree, self.x, self.mod)
def __repr__(self): return self.__str__()
def buchheim(tree):
dt = firstwalk(DrawTree(tree))
min = second_walk(dt)
if min < 0:
third_walk(dt, -min)
return dt
def third_walk(tree, n):
tree.x += n
for c in tree.children:
third_walk(c, n)
def firstwalk(v, distance=1.):
if len(v.children) == 0:
if v.lmost_sibling:
v.x = v.lbrother().x + distance
else:
v.x = 0.
else:
default_ancestor = v.children[0]
for w in v.children:
firstwalk(w)
default_ancestor = apportion(w, default_ancestor, distance)
#print "finished v =", v.tree, "children"
execute_shifts(v)
midpoint = (v.children[0].x + v.children[-1].x) / 2
ell = v.children[0]
arr = v.children[-1]
w = v.lbrother()
if w:
v.x = w.x + distance
v.mod = v.x - midpoint
else:
v.x = midpoint
return v
def apportion(v, default_ancestor, distance):
w = v.lbrother()
if w is not None:
#in buchheim notation:
#i == inner; o == outer; r == right; l == left; r = +; l = -
vir = vor = v
vil = w
vol = v.lmost_sibling
sir = sor = v.mod
sil = vil.mod
sol = vol.mod
while vil.right() and vir.left():
vil = vil.right()
vir = vir.left()
vol = vol.left()
vor = vor.right()
vor.ancestor = v
shift = (vil.x + sil) - (vir.x + sir) + distance
if shift > 0:
move_subtree(ancestor(vil, v, default_ancestor), v, shift)
sir = sir + shift
sor = sor + shift
sil += vil.mod
sir += vir.mod
sol += vol.mod
sor += vor.mod
if vil.right() and not vor.right():
vor.thread = vil.right()
vor.mod += sil - sor
else:
if vir.left() and not vol.left():
vol.thread = vir.left()
vol.mod += sir - sol
default_ancestor = v
return default_ancestor
def move_subtree(wl, wr, shift):
subtrees = wr.number - wl.number
#print wl.tree, "is conflicted with", wr.tree, 'moving', subtrees, 'shift', shift
#print wl, wr, wr.number, wl.number, shift, subtrees, shift/subtrees
wr.change -= shift / subtrees
wr.shift += shift
wl.change += shift / subtrees
wr.x += shift
wr.mod += shift
def execute_shifts(v):
shift = change = 0
for w in v.children[::-1]:
#print "shift:", w, shift, w.change
w.x += shift
w.mod += shift
change += w.change
shift += w.shift + change
def ancestor(vil, v, default_ancestor):
#the relevant text is at the bottom of page 7 of
#"Improving Walker's Algorithm to Run in Linear Time" by Buchheim et al, (2002)
#http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.16.8757&rep=rep1&type=pdf
if vil.ancestor in v.parent.children:
return vil.ancestor
else:
return default_ancestor
def second_walk(v, m=0, depth=0, min=None):
v.x += m
v.y = depth
if min is None or v.x < min:
min = v.x
for w in v.children:
min = second_walk(w, m + v.mod, depth+1, min)
return min
def generateTree(edgeDic):
allNodes={}
for k,v in edgeDic.items():
if k in allNodes:
n=allNodes[k]
else:
n=Tree(k,)
allNodes[k]=n
for s in v:
if s in allNodes:
cn=allNodes[s]
else:
cn=Tree(s,)
allNodes[s]=cn
allNodes[k].addChild(cn)
return allNodes
def getThreePos(root,pos={},edges=[]):
pos[str(root.tree)]=[root.x,root.y]
if root.children:
for child in root.children:
edges.append([str(root.tree),str(child.tree)])
getThreePos(child,pos,edges)
return pos,edges
Calculate the "trophic levels" of countries
With the codes of flow distance calculation and tree layout generation at hands, we can now calculate and visualize the "trophic levels" of countries in global money flow. To prepare data for tree layout, we trim the network into a tree such that each node (country) only connect to the largest upstream node (country). Note that this tree pruning only affects the visualization and does not have an impact on the calculation of flow distance. The flow distance is always calculated using the directed, weighted network.
H = flowBalancing(G)
L = flowDistanceFromSource(H)
T = H.out_degree(weight='weight')
R = H.reverse()
L['source']=0
S=defaultdict(lambda:[])
for i in H.nodes():
if i!='source':
es=R[i]
w,k=sorted([(es[j]['weight'],j) for j in es],reverse=True)[0]
S[k].append(i)
treeDic=generateTree(S)
tree=treeDic['source']
d = buchheim(tree)
pos,edges=getThreePos(d)
Plot the tree of money flow
The following codes compare two versions of visualization:
fig = plt.figure(figsize=(12, 6),facecolor='white')
ax = fig.add_subplot(121)
for i in pos:
x,y=pos[i]
plt.scatter(x,y,s=T[i]/1000.0,facecolor='RoyalBlue',alpha=0.9,edgecolor='',zorder=2)
for i,j in edges:
x1,y1=pos[i]
x2,y2=pos[j]
ax.annotate("",
xy=(x1, y1), xycoords='data',
xytext=(x2, y2), textcoords='data',
arrowprops=dict(arrowstyle="<-", #linestyle="dashed",
color="0.5",
alpha=0.2,
shrinkA=5, shrinkB=5,
patchA=None,
patchB=None,
connectionstyle='angle3,angleA=90,angleB=0',
),zorder=1
)
ax = fig.add_subplot(122)
for i in pos:
x,y=pos[i]
l=L[i]
plt.scatter(x,l,s=T[i]/1000.0,facecolor='Chocolate',alpha=0.9,edgecolor='',zorder=2)
for i,j in edges:
x1,y1=pos[i]
l1=L[i]
x2,y2=pos[j]
l2=L[j]
ax.annotate("",
xy=(x1, l1), xycoords='data',
xytext=(x2, l2), textcoords='data',
arrowprops=dict(arrowstyle="<-", #linestyle="dashed",
color="0.5",
alpha=0.2,
shrinkA=5, shrinkB=5,
patchA=None,
patchB=None,
connectionstyle='angle3,angleA=90,angleB=0',
),zorder=1
)
#plt.savefig('/Users/lingfeiw/Desktop/tree.png')
The data points (countries) in two panels share the same x-axis values, but they are different in y-axis. The points in the left panel simply takes the n-th branching level as the vertical value. This is very bias because a lot of links are removed in tree construction. The points in the right panel use the calculated "flow distance" as the y values. As the flow distance is calculated using the information of the entire directed, weighted network, it is a more general and valid measure. Still, it is interesting to observe that these two layouts are actually very similar to each other.
A better version
We can adjust the color scheme and change the shape and size of arrows and nodes to generate a more appealing and informative version as follows:
fig = plt.figure(figsize=(8, 10),facecolor='black')
ax = fig.add_subplot(111)
for i in pos:
x,y=pos[i]
l=L[i]
plt.scatter(l,x,s=T[i]/800.0,facecolor='#999933',alpha=0.9,edgecolor='',zorder=2)
if T[i]>3000:
plt.text(l,x,i,rotation=0,ha='left',va='bottom',fontsize=np.log(T[i]),color='#88CCEE')
for i,j in edges:
x1,y1=pos[i]
l1=L[i]
x2,y2=pos[j]
l2=L[j]
ax.annotate("",xy=(l1,x1), xycoords='data',xytext=(l2,x2), textcoords='data',
arrowprops=dict(arrowstyle="<-", color="0.5",alpha=0.5,shrinkA=5, shrinkB=5,
patchA=None,patchB=None,connectionstyle='angle3,angleA=90,angleB=0',),zorder=1)
plt.ylim(-5,130)
ax.set_axis_bgcolor('black')
#plt.savefig('/Users/lingfeiw/Desktop/nicetree.png')
Conclusion and Discussion
Form the above figure we find that money generally flows from developed countries to developing countries. It is probably because many immigrants from developing countries tends to find jobs in places with better economics and send money back to their mother countries.
The algorithm introduced in this chapter also provide an approach to investigate many other interesting social questions, such as the global capitalism, the different type of economics across countries. But I would stop here and leave these questions for the readers.