Discovering the Hidden Structure of Global Money Flows

Please cite the following paper if you feel that your research benefits from the codes and ideas in this chapter:

Wang, C. J., Wu, L., Zhang, J., & Janssen, M. A. (2016). The Collective Direction of Attention Diffusion. Scientific Reports, 6.

The World Bank data

The World Bank provides annual statistics of global migration and remittances in the Migration and Remittances Factbook. By analyzing this data set we are able to track the impact of globalization, in particular, the movement of labor and the flow of currency between countries.

I downloaded the file "Bilateral Remittance Matrix 2012" from the above hyperlink. This is a Microsoft Excel file, so we need the openpyxl package to handle it.

import sys, random
import numpy as np
import pylab as plt
import networkx as nx
import matplotlib.cm as cm
from collections import defaultdict
from openpyxl import load_workbook

wb = load_workbook(filename = '/Users/lingfeiw/Desktop/BilateralRemittanceMatrix2012.xlsx', read_only=True)
sheet = wb.get_sheet_by_name(wb.get_sheet_names()[0])
data=[]
sheet = wb.get_sheet_by_name('Bilateral Remittance Matrix2012')
for row in sheet.iter_rows():
    data_row = []
    for cell in row:
        data_row += [cell.internal_value]
    data += [data_row]

Construct money flow network


countries=[]
for i in range(2, sheet.max_column-1):
    countries.append(sheet.cell(row=2, column=i).value.encode('utf-8').strip())
G=nx.DiGraph()
for i in range(len(countries)):
    rowth = i+2
    for j in range(len(countries)):
        colth = j+1
        w=data[rowth][colth]
        if w:
            G.add_edge(countries[i],countries[j],weight=float(w))

len(G.nodes()),len(G.edges())

The above codes construct a network in which nodes are countries and the directed edges are the money flows from one country to another. The weights on edges shows the amount of currency in million of U.S. dollars. From example, testing

G['Australia']['Zambia']

shows that in 2012 there are 2.2 million U.S. dollars transferred from Australia to Zambia in currency. There are 201 countries and 7660 edges in the network.

An algorithm to quantify the hierarchy of networks

The hierarchy of directed, weighted networks is widely observed in real-world systems, yet the methods to quantify it remains unclear. Jiang Zhang et al. proposed a Markov-chain based technique to calculate the "flow distance" of nodes, i.e., the average number of steps a random walker takes to reach the investigated node.

Figure1_newflowNetworkxample.png Figure credit

The above figure shows two examples of flow networks and the calculated flow distance of each node (marked in red). Two artificial nodes, source and sink, are added to balance the network such that weighted in-degree (the sum of weights over inbound edges) equals weighted out-degree (the sum of weights over outbound edges) on each node except for source and sink themselves. As shown in Panel A, this method gives a finite flow distance for nodes in loops.

This algorithm can be used to quantify various real-world patterns relevant to the hierarchical structure of networks, such as the trophic level of food chains, and the direction of influence between opinion leaders and their followers. Here we will use it to discover the direction of money flow between countries (from countries of lower flow distance to those of higher flow distance).

def flowDistanceFromSource(G): #input a balanced nx graph
    R = G.reverse()
    mapping = {'source':'sink','sink':'source'} 
    H = nx.relabel_nodes(R,mapping)
    #---------initialize flow distance dict------
    L = dict((i,1) for i in G.nodes())
    #---------prepare weighted out-degree dict------
    T = G.out_degree(weight='weight')
    #---------iterate until converge------------
    ls = np.array(L.values())
    delta = len(L)*0.01 + 1
    k=0
    while delta > len(L)*0.01:
        k+=1
        if k>20:
            break
        for i in L:
            l=1
            for m,n in H.edges(i):
                l+=L[n]*H[m][n].values()[0]/float(T[m])
            L[i]=l
        delta = sum(np.abs(np.array(L.values()) - ls))
        ls = np.array(L.values())
    #---------clean the result-------
    del L['sink']
    for i in L:
        L[i]-=1
    L['sink'] = L.pop('source')
    return L

def flowBalancing(G):
    RG = G.reverse()
    E=defaultdict(lambda:0)
    for x,y in G.edges():
        E[(x,y)]+=G[x][y]['weight']
    def nodeBalancing(node):
        outw=0
        for i in G.edges(node):
            outw+=G[i[0]][i[1]].values()[0]
        inw=0
        for i in RG.edges(node):
            inw+=RG[i[0]][i[1]].values()[0]
        deltaflow=inw-outw
        if deltaflow > 0:
            E[(node,'sink')]+=deltaflow
            #H.add_edge(node, "sink",weight=deltaflow)
        elif deltaflow < 0:
            E[('source',node)]+=abs(deltaflow)
            #H.add_edge("source", node, weight=abs(deltaflow))
        else:
            pass
    for i in G.nodes():
        nodeBalancing(i)
    H = nx.DiGraph()
    for x,y in E:
        H.add_edge(x,y, weight=E[(x,y)])
    if ("source", "source") in H.edges():  H.remove_edge("source", "source")    
    if ("sink", "sink") in H.edges(): H.remove_edge("sink", "sink")
    return H

The above codes would calculate the flow distance from "source" (which represent "the environment" of the economic system) to nodes.

Tree layout function

To generate a neat tree layout turns out to be more difficult that I thought. Thanks to Bill Mill, we can get nice Python codes from his blog. Putting all the relevant codes here would make this chapter a little bit longer than others. But I prefer to do this, so that you do not suffer from gathering the scattered pieces of codes from Bill's blog and make them work.

#Tree ploting functions
# from http://billmill.org/pymag-trees/

class Tree:
    def __init__(self, node="", *children):
        self.node = node
        self.width = len(node)
        if children: self.children = children
        else:        self.children = []
    def __str__(self): 
        return "%s" % (self.node)
    def __repr__(self):
        return "%s" % (self.node)
    def __getitem__(self, key):
        if isinstance(key, int) or isinstance(key, slice): 
            return self.children[key]
        if isinstance(key, str):
            for child in self.children:
                if child.node == key: return child
    def __iter__(self): return self.children.__iter__()
    def __len__(self): return len(self.children)
    def addChild(self,nodeName): self.children.append(nodeName)

class DrawTree(object):
    def __init__(self, tree, parent=None, depth=0, number=1):
        self.x = -1.
        self.y = depth
        self.tree = tree
        self.children = [DrawTree(c, self, depth+1, i+1) 
                         for i, c
                         in enumerate(tree.children)]
        self.parent = parent
        self.thread = None
        self.mod = 0
        self.ancestor = self
        self.change = self.shift = 0
        self._lmost_sibling = None
        #this is the number of the node in its group of siblings 1..n
        self.number = number

    def left(self): 
        return self.thread or len(self.children) and self.children[0]

    def right(self):
        return self.thread or len(self.children) and self.children[-1]

    def lbrother(self):
        n = None
        if self.parent:
            for node in self.parent.children:
                if node == self: return n
                else:            n = node
        return n

    def get_lmost_sibling(self):
        if not self._lmost_sibling and self.parent and self != \
        self.parent.children[0]:
            self._lmost_sibling = self.parent.children[0]
        return self._lmost_sibling
    lmost_sibling = property(get_lmost_sibling)

    def __str__(self): return "%s: x=%s mod=%s" % (self.tree, self.x, self.mod)
    def __repr__(self): return self.__str__()        

def buchheim(tree):
    dt = firstwalk(DrawTree(tree))
    min = second_walk(dt)
    if min < 0:
        third_walk(dt, -min)
    return dt

def third_walk(tree, n):
    tree.x += n
    for c in tree.children:
        third_walk(c, n)

def firstwalk(v, distance=1.):
    if len(v.children) == 0:
        if v.lmost_sibling:
            v.x = v.lbrother().x + distance
        else:
            v.x = 0.
    else:
        default_ancestor = v.children[0]
        for w in v.children:
            firstwalk(w)
            default_ancestor = apportion(w, default_ancestor, distance)
        #print "finished v =", v.tree, "children"
        execute_shifts(v)

        midpoint = (v.children[0].x + v.children[-1].x) / 2

        ell = v.children[0]
        arr = v.children[-1]
        w = v.lbrother()
        if w:
            v.x = w.x + distance
            v.mod = v.x - midpoint
        else:
            v.x = midpoint
    return v

def apportion(v, default_ancestor, distance):
    w = v.lbrother()
    if w is not None:
        #in buchheim notation:
        #i == inner; o == outer; r == right; l == left; r = +; l = -
        vir = vor = v
        vil = w
        vol = v.lmost_sibling
        sir = sor = v.mod
        sil = vil.mod
        sol = vol.mod
        while vil.right() and vir.left():
            vil = vil.right()
            vir = vir.left()
            vol = vol.left()
            vor = vor.right()
            vor.ancestor = v
            shift = (vil.x + sil) - (vir.x + sir) + distance
            if shift > 0:
                move_subtree(ancestor(vil, v, default_ancestor), v, shift)
                sir = sir + shift
                sor = sor + shift
            sil += vil.mod
            sir += vir.mod
            sol += vol.mod
            sor += vor.mod
        if vil.right() and not vor.right():
            vor.thread = vil.right()
            vor.mod += sil - sor
        else:
            if vir.left() and not vol.left():
                vol.thread = vir.left()
                vol.mod += sir - sol
            default_ancestor = v
    return default_ancestor

def move_subtree(wl, wr, shift):
    subtrees = wr.number - wl.number
    #print wl.tree, "is conflicted with", wr.tree, 'moving', subtrees, 'shift', shift
    #print wl, wr, wr.number, wl.number, shift, subtrees, shift/subtrees
    wr.change -= shift / subtrees
    wr.shift += shift
    wl.change += shift / subtrees
    wr.x += shift
    wr.mod += shift

def execute_shifts(v):
    shift = change = 0
    for w in v.children[::-1]:
        #print "shift:", w, shift, w.change
        w.x += shift
        w.mod += shift
        change += w.change
        shift += w.shift + change

def ancestor(vil, v, default_ancestor):
    #the relevant text is at the bottom of page 7 of
    #"Improving Walker's Algorithm to Run in Linear Time" by Buchheim et al, (2002)
    #http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.16.8757&rep=rep1&type=pdf
    if vil.ancestor in v.parent.children:
        return vil.ancestor
    else:
        return default_ancestor

def second_walk(v, m=0, depth=0, min=None):
    v.x += m
    v.y = depth

    if min is None or v.x < min:
        min = v.x

    for w in v.children:
        min = second_walk(w, m + v.mod, depth+1, min)

    return min

def generateTree(edgeDic):
    allNodes={}
    for k,v in edgeDic.items():
        if k in allNodes:
            n=allNodes[k]
        else:
            n=Tree(k,)
            allNodes[k]=n
        for s in v:
            if s in allNodes:
                cn=allNodes[s]
            else:
                cn=Tree(s,)
                allNodes[s]=cn
            allNodes[k].addChild(cn)
    return allNodes

def getThreePos(root,pos={},edges=[]):
    pos[str(root.tree)]=[root.x,root.y]
    if root.children:
        for child in root.children:
            edges.append([str(root.tree),str(child.tree)])
            getThreePos(child,pos,edges)
    return pos,edges

Calculate the "trophic levels" of countries

With the codes of flow distance calculation and tree layout generation at hands, we can now calculate and visualize the "trophic levels" of countries in global money flow. To prepare data for tree layout, we trim the network into a tree such that each node (country) only connect to the largest upstream node (country). Note that this tree pruning only affects the visualization and does not have an impact on the calculation of flow distance. The flow distance is always calculated using the directed, weighted network.

H = flowBalancing(G)
L = flowDistanceFromSource(H)
T = H.out_degree(weight='weight')
R = H.reverse()
L['source']=0
S=defaultdict(lambda:[])
for i in H.nodes():
    if i!='source':
        es=R[i]
        w,k=sorted([(es[j]['weight'],j) for j in es],reverse=True)[0]
        S[k].append(i)
treeDic=generateTree(S)
tree=treeDic['source']
d = buchheim(tree)
pos,edges=getThreePos(d)

Plot the tree of money flow

The following codes compare two versions of visualization:

fig = plt.figure(figsize=(12, 6),facecolor='white')
ax = fig.add_subplot(121)
for i in pos:
    x,y=pos[i]
    plt.scatter(x,y,s=T[i]/1000.0,facecolor='RoyalBlue',alpha=0.9,edgecolor='',zorder=2)
for i,j in edges:
    x1,y1=pos[i]
    x2,y2=pos[j]
    ax.annotate("",
                xy=(x1, y1), xycoords='data',
                xytext=(x2, y2), textcoords='data',
                arrowprops=dict(arrowstyle="<-", #linestyle="dashed",
                                color="0.5",
                                alpha=0.2,
                                shrinkA=5, shrinkB=5,
                                patchA=None,
                                patchB=None,
                                connectionstyle='angle3,angleA=90,angleB=0',
                                ),zorder=1
                )
ax = fig.add_subplot(122)
for i in pos:
    x,y=pos[i]
    l=L[i]
    plt.scatter(x,l,s=T[i]/1000.0,facecolor='Chocolate',alpha=0.9,edgecolor='',zorder=2)
for i,j in edges:
    x1,y1=pos[i]
    l1=L[i]
    x2,y2=pos[j]
    l2=L[j]
    ax.annotate("",
                xy=(x1, l1), xycoords='data',
                xytext=(x2, l2), textcoords='data',
                arrowprops=dict(arrowstyle="<-", #linestyle="dashed",
                                color="0.5",
                                alpha=0.2,
                                shrinkA=5, shrinkB=5,
                                patchA=None,
                                patchB=None,
                                connectionstyle='angle3,angleA=90,angleB=0',
                                ),zorder=1
                )
#plt.savefig('/Users/lingfeiw/Desktop/tree.png')

tree.png

The data points (countries) in two panels share the same x-axis values, but they are different in y-axis. The points in the left panel simply takes the n-th branching level as the vertical value. This is very bias because a lot of links are removed in tree construction. The points in the right panel use the calculated "flow distance" as the y values. As the flow distance is calculated using the information of the entire directed, weighted network, it is a more general and valid measure. Still, it is interesting to observe that these two layouts are actually very similar to each other.

A better version

We can adjust the color scheme and change the shape and size of arrows and nodes to generate a more appealing and informative version as follows:

fig = plt.figure(figsize=(8, 10),facecolor='black')
ax = fig.add_subplot(111)
for i in pos:
    x,y=pos[i]
    l=L[i]
    plt.scatter(l,x,s=T[i]/800.0,facecolor='#999933',alpha=0.9,edgecolor='',zorder=2)
    if T[i]>3000:
        plt.text(l,x,i,rotation=0,ha='left',va='bottom',fontsize=np.log(T[i]),color='#88CCEE')
for i,j in edges:
    x1,y1=pos[i]
    l1=L[i]
    x2,y2=pos[j]
    l2=L[j]
    ax.annotate("",xy=(l1,x1), xycoords='data',xytext=(l2,x2), textcoords='data',
                arrowprops=dict(arrowstyle="<-", color="0.5",alpha=0.5,shrinkA=5, shrinkB=5,
                patchA=None,patchB=None,connectionstyle='angle3,angleA=90,angleB=0',),zorder=1)
plt.ylim(-5,130)
ax.set_axis_bgcolor('black')
#plt.savefig('/Users/lingfeiw/Desktop/nicetree.png')

nicetree.png

Conclusion and Discussion

Form the above figure we find that money generally flows from developed countries to developing countries. It is probably because many immigrants from developing countries tends to find jobs in places with better economics and send money back to their mother countries.

The algorithm introduced in this chapter also provide an approach to investigate many other interesting social questions, such as the global capitalism, the different type of economics across countries. But I would stop here and leave these questions for the readers.

results matching ""

    No results matching ""