Graph-tool tutorial

Steven Bergner

February 23th, 2023

[single page]

Sources

Lab setup

  • Activate the gt conda environment
    mkdir -p ~/.conda/envs
    ln -s /usr/shared/CMPT/big-data/condaenv/gt ~/.conda/envs
    source activate gt
    
  • And load the graph_tool module
    python
    >>> import graph_tool.all as gt
    >>> print(gt.__version__.split(' ')[0])
    2.46
    
  • Environment also contains pandas, jupyter (lab), ...

Outline

  • Create graph
  • Visualize graph
  • Analyze graph
  • Visualize again

Introducing: Graph-tool

  • Manipulate and analyse graphs
  • Fast! - based on Boost Graph in C++, OpenMP
    • Utilize multiple cores if available
  • Visualization
  • Lots of graph processing algorithms

How to start

  • Define your problem
  • Convert it into graph representation, i.e. determine what nodes and edges are

Example

  What Weight
Vertex Person #Knows
Edge Knows Count

Example from lecture

X Adjacent to X
Mary Peter, Albert, DavidF, Peter
Judy Bob, Alan
Peter Mary, DavidF, Jon
DavidF Albert, Joseph, Peter, Mary
Jon Peter, Joseph, DavidE
DavidE Jon, Joseph, Albert
Joseph DavidE, Jon, DavidF
Bob Judy, Alan
Alan Bob, Mary, Judy
Albert DavidF, Mary, DavidE

Create minimal graph

g = gt.Graph()
v1 = g.add_vertex()
v2 = g.add_vertex()
e = g.add_edge(v1, v2)

Properties

v_count_p = g.new_vertex_property('int')

# store it in our graph, optionally
g.vp['count'] = v_count_p

Faster import

from graph_tool import Graph

Counting

name_v_map = {}
for name in names:
    v = name_v_map.get(name)
    if v is None:
        v = g.add_vertex()
        v_count_p[v] = 0
        name_v_map[name] = v
    v_count_p[v] += 1

Visualize Graph

Draw to output format

gt.graph_draw(
    g,
    output_path = 'output.pdf',
)

gt.graph_draw(
    g,
    output_path = 'output.png'
)
  • Task 1: Try this for one of the example graphs in the collection

Use constants

SIZE     = 400
V_SIZE   = SIZE / 2
E_PWIDTH = V_SIZE / 4
gt.graph_draw(
    ...
    output_size = (SIZE, SIZE),
    vertex_size = V_SIZE,
    edge_pen_width = E_PWIDTH,
)

Use PROP_TO_SIZE

v_size_p = gt.prop_to_size(
    v_count_p,
    MI_V_SIZE,
    MA_V_SIZE,
)
...
gt.graph_draw(
    ...
    verted_size = v_size_p,
    edge_pen_width = e_pwidth_p,
)
  • Task 2: Produce a diagram of this graph
    • g = gt.load_graph("search_example.xml")
    • learn about its vertex and edge properties
    • have edge width scaled by weight

Use FILL_COLOR

gt.graph_draw(
    ...
    vertex_fill_color = v_size_p,
)

Analyze Graph

Choose an Algorithm

  • Search algorithms
    • BFS search
  • Graph topology
    • Shortest path, Connected components
  • Centrality measures
    • PageRank, Betweenness, Closeness
  • Maximum flow algorithms
  • Community structures
  • Clustering coefficients

Centrality measures

  • Degree centrality
    • Number of links incidend upon a node
    • “Immediate risk of taking a node out”
  • Closeness centrality
    • Sum of a node’s distances to all other nodes
    • “Cost to spread information to all other nodes”
  • Betweenness centrality
    • Number of times a node acts as a bridge
    • “Control of a node on the communication between other nodes”
  • Eigenvector centrality
    • Influence of a node in a network
    • Google’s PageRank is a variant of this measure

Example cont’d

Choice of measure:

  • Centrality measures - Closeness centrality

Goal: Identify the people with shortest connection to all other people

Calculate Closeness

e_icount_p = g.new_edge_property('int')
e_icount_p.a = e_count_p.a.max() - e_count_p.a

v_cl_p = closeness(g, weight=e_icount_p)

import numpy as np
v_cl_p_.a = np.nan_to_num(v_cl_p.a)

Draw Closeness

v_cl_size_p = gt.prop_to_size(
    v_cl_p,
    MI_V_SIZE,
    MA_V_SIZE,
)
...
gt.graph_draw(
    ...
    vertex_fill_color = v_cl_size_p,
)

On-the-fly Filtering

v_pck_p = g.new_vertex_property('bool')
v_pck_p.a = v_count_p.a > v_count_p.a.mean()

g.set_vertex_filter(v_pck_p)
# g.set_vertex_filter(None) # unset

Top N

t10_idxs = v_count_p.a.argsort()[-10][::-1]

t1_idx   = t10_idxs[0]
t1_v	 = g.vertex(t1_idx)
t1_name  = v_name_p[t1_v]
t1_count = v_count_p[t1_v]

SFDP Layout

  • Scalable Force Directed Placement
  • Fast, multilevel, force directed
    gt.graph_draw(
      ...
      pos = gt.sfdplayout(g),
    )
    
      ...
      pos = gt.sfdplayout(
          g, eweight=e_count_p
      ),
    
      ...
      pos = gt.sfdplayout(
          g,
          eweight=e_count_p,
          vweight=v_count_p
      ),
    

FR Layout

gt.graph_draw(
    ...
    pos = gt.fruchterman_reingold_layout(g),
)

gt.graph_draw(
    ...
    pos = gt.fruchterman_reingold_layout(
        g, weight=e_count_p
    ),
)

ARF Layout

gt.graph_draw(
    ...
    pos = gt.arf_layout(g),
)

gt.graph_draw(
    ...
    pos = gt.arf_layout(
        g, weight=e_count_p
    ),
)

Links

Graph analysis workflow

  • Define problem in graphic form
  • Clean the raw data
  • Visualize to understand
  • Choose proper algorithm for network feature extraction
  • Filter data that interests you
  • Visualize again to convince