Code documentation

PCPG class

class bipcpg.pcpg.PCPG(corr_matrix, variable_names=None)

Bases: object

Class to obtain a Partial Correlation Planar Graph (PCPG) network from a correlation matrix. 1

Parameters

corr_matrix (pandas.DataFrame/numpy.ndarray) – Correlation matrix displaying correlations among variables in the system.
variable_names (list) – Names of the variables in the system. The order of this list should coincide with the order of rows and columns in corr_matrix.

This class includes methods to perform the necessary computations and obtain a networkx.Graph network object. The PCPG algorithm consists in the following steps:

Find the Average influence (AI) between every ordered pair of variables in the system, i.e. those in the input corr_matrix. See compute_avg_influence_matrix().
List the AIs in order from largest to smallest, and,
Iterate through the list and add a directed edge corresponding to the pair of variables of the AI value in that position if and only if (i) the reversed edge is not already in the network and (ii) the network’s planarity is not broken by adding the edge. See create_network().

See the tutorial for further information.

Variables

avg_influence_matrix – numpy.ndarray containing average influence values between pairs of variables.
avg_influence_df – pandas.DataFrame containing average influence values between pairs of variables.
influence_df – pandas.DataFrame containing influence values between pairs of variables.
partial_corr_df – Multi-index pandas.DataFrame containing partial correlation values between triple of variables.
network – the PCPG network generated (a networkx.DiGraph directed graph object).
nodes – Nodes in network.
edges – Edges in network.
dict_var_names – dict containing variable numbers as keys and variables names as values.

References

1: Kenett DY, Tumminello M, Madi A, Gur-Gershgoren G, Mantegna RN, Ben-Jacob E (2010) Dominating Clasp of the Financial Sector Revealed by Partial Correlation Analysis of the Stock Market. PLoS ONE 5(12): e15032. <https://doi.org/10.1371/journal.pone.0015032>

add_edge_attribute(attr_data, attr_name)

Adds data as an attribute to edges in network.

Parameters

attr_data (dict/pandas.DataFrame) – pandas.DataFrame or dict containing edge attribute values.
attr_name (str) – Name of attribute to be added to edges.

Note

If attr_data is a pandas.DataFrame, the row indices should the origin nodes and column indices should be the target nodes. If attr_data is a dictionary, keys should be tuples of the form (origin_node, target_node).

add_node_attribute(attr_data, attr_name)

Adds data as an attribute to nodes in network.

Parameters

attr_data (dict/pandas.Series) – pandas.Series or dict containing node attribute values.
attr_name (str) – Name of attribute added.

Note

If edge_attribute_values is a pandas.Series, its index should contain the node and its values the node data. If edge_attribute_values is a dict, keys should be nodes and values should be node data.

compute_assortativity(node_attribute, attr_type)

Compute node assortativity based on node_attribute of nodes.

Parameters

node_attribute (str) – Name of node attribute in network by which to compute assortativity.
attr_type (str) – Either “qual” or “quant”. Indicates if node_attribute data is a qualitative characteristic or a quantitative characteristic.

Returns

Value of calculated assortativity.

Return type

float

compute_avg_influence_matrix()

Compute average influences between every pair of variables in the system and put these in avg_influence_matrix.

Returns: None

compute_influence_avg_influence_partial_corr_dfs()

Compute partial correlations, influences and average influences between all variables in the system and put these in partial_corr_df, influence_df and avg_influence_df respectively.

Returns: None

create_network()

Create PCPG a networkx.DiGraph object with nodes: and edges found following the PCPG algorithm.

Returns: None

find_edges()

Compute the edges in the PCPG network using the average influences in avg_influence_matrix.

Returns: List of edges in the PCPG network
Return type: list

Correlations Functions

bipcpg.correlations.compute_corr_matrix(matrix, critical_value=None)

Obtain a correlation matrix among the variables in a matrix. If critical value is passed, the correlation matrix is filtered based on a statistical significance T-test where critical_value is the threshold value.

Parameters

matrix (numpy.ndarray) – numpy.ndarray containing time series for the values of interest with observations along axis 0 (rows) and variables along axis 1 (columns).
critical_value (float) – Boundary of the acceptance region of the T-test performed.

Returns

Correlation matrix displaying correlation coefficients between the columns (axis 1) of each input matrix.

Return type

numpy.ndarray

bipcpg.correlations.corr_pvalue_matrices(matrix)

Obtain a correlation matrix and p-value matrix for a matrix containing variables and observations.

Parameters: matrix (numpy.ndarray) – 2-dimensional numpy.ndarray containing containing observations axis 0 and variables along axis 1.
Returns: tuple containing correlation matrix showing correlation coefficients between columns of input matrix and p-value matrix showing statistical significance of correlations.
Return type: tuple

bipcpg.correlations.get_correlation_matrices_for_list_of_matrices(matrices, critical_value=None)

Obtain a correlation matrix and p-value matrix for each matrix (containing variables along the columns and observations along the rows) in matrices. If critical value is passed, each correlation matrix is filtered based on a statistical significance T-test where critical_value is the threshold value.

Parameters

matrices (Iterable) – Iterable object containing of 2-dimensional numpy.ndarray s with observations along axis 0 (rows) and variables along axis 1 (columns).
critical_value (float) – Boundary of the acceptance region of the T-test performed.

Returns

list of length len(list_time_series_matrices) containing correlation matrices displaying the correlation coefficients between the columns (axis 1) of each input matrix

Return type

list

Bootstrap functions

bipcpg.bootstrap.construct_corr_matrix_replicates_from_time_series_matrices(array_of_matrices, num_replicates, critical_value=None)

Performs a bootstrap procedure on time series matrices to obtain correlation matrix replicates. If critical_value is not None, the correlation matrices are filtered using a statistical significance T-test.

Parameters

array_of_matrices (numpy.ndarray) – 3-dimensional numpy.ndarray with axis 0 representing elements of one of the sets in the bipartite system, axis 1 representing time series observations and axis 2 representing elements of the remaining set in the bipartite system.
num_replicates (int) – Number of correlation matrix replicates to be constructed.
critical_value (float) – If passed, boundary of the acceptance region of the T-test performed.

Returns

Array containing mean of correlation matrix replicates in each batch.

Return type

numpy.ndarray

bipcpg.bootstrap.get_bootstrap_values(timeseries_matrices, variable_names=None, num_replicates=1000, critical_value=None)

Compute bootstrap values for edges in a PCPG network. This function takes a dataset in the form of a list or numpy array of matrices with time series in its columns (see Dataset structure) performs a bootstrap procedure that generates a total of num_replicates replicate PCPG matrices and finds the bootstrap value of each edge, i.e. the fraction of times the edge appears in these networks. If critical_value is not None, the replicate correlation matrices generated are filtered using a statistical significance T-test.

Parameters

timeseries_matrices (list/numpy.ndarray) – Iterable containing the dataset for which the PCPG network was generated. This should be a list containing 2d-:class:numpy.ndarray` s whose columns contain observations for one of the the two sets of variables in a bipartite dataset.
variable_names (list) – Names of variables along columns of each matrix in timeseries_matrices
num_replicates (int) – Number of replicates to generate in the bootstrap procedure.
critical_value (float) – If passed, boundary of the acceptance region of the T-test performed.

Returns

pandas.DataFrame containing the bootstrap values of the directed edges in the PCPG network. Note that the source of an edge is its row index and the target of the edge is its column index.

Return type

pandas.DataFrame

Util functions

bipcpg.utils.utils.get_degrees_df(G)

Get a pandas.DataFrame containing the degree, in-degree and out-degree information of the nodes in G.

Parameters: G (networkx.DiGraph) – Directed network.
Returns: pandas.DataFrame containing degree information.
Return type: pandas.DataFrame

bipcpg.utils.utils.remove_reversed_duplicates(iterable)

For an iterable object containing other iterables, yield items which do not have a reversed duplicate in a position with a smaller index.

Parameters: iterable (Iterable) – An iterable object containing other iterables.
Returns: Inner iterables which do not have a reversed duplicate in a position with a smaller index.
Return type: Iterator[Iterable]

bipcpg.utils.utils.reshape_year_matrices_to_time_series_matrices(list_yearly_matrices)

For a list of numpy.ndarray s, switch the first dimension (list entries) for the second dimension (axis 0) of matrices in the list.

Parameters: list_yearly_matrices (list) – list of 2-dimensional numpy.ndarray s indexed over time. Each matrix has one set of variables of the bipartite dataset along axis 0 (rows) and the other set of variables in the bipartite dataset along axis 1 (columns).
Returns: list of 2-dimensional numpy.ndarray indexed over the elements in the rows of the matrices in list_yearly_matrices. Axis 0 (rows) of each matrix is now indexed over time, i.e. the dimension of the elements in list_yearly_matrices.
Return type: list
Example: This can be used transform a list of matrices (one per year) into a list of time series matrices. Say we have a list my_list containing matrices (one per year) with the exports every country (rows) made for every product (columns). We can then transform this into a list of matrices (one per country) with time series observations along the rows and products along the columns.

>>> my_list = [np.array([[1,2],[3,4]]),
...            np.array([[5,6],[7,8]]),
...            np.array([[9,10],[11,12]])]
>>> my_list_transformed = transform_year_matrices_to_time_series_matrices(my_list)
my_list_transformed
[
array([[ 1,  2],
       [ 5,  6],
       [ 9, 10]]),
array([[ 3,  4],
       [ 7,  8],
       [11, 12]])
]

bipcpg.utils.utils.transform_3level_nested_dict_into_df(nested_dict)

Transform a nested dictionary with three levels into a stacked pandas.DataFrame with a 2 level multi-index.

Parameters: nested_dict (dict) – Three level nested dictionary to be transformed.
Returns: pandas.DataFrame with 2-level multi-index. multi-index level 0 corresponds to outermost nested_dict keys, multi-index level 1 corresponds to nested_dict middle level keys and columns correspond to nested_dict innermost keys.
Return type: pandas.DataFrame

bipcpg.utils.utils.transform_3level_nested_dict_into_stacked_df(nested_dict, name=None)

Transform a nested dictionary with three levels into a stacked pandas.DataFrame with a 3 level multi-index and a single column. If name is passed, set the name of the column to name.

Parameters

nested_dict (dict) – Three level nested dictionary to be transformed.
name (str) – Name of single column found in returned pandas.DataFrame

Returns

Stacked dataframe with multi-index level 0 corresponding to outermost nested_dict keys, multi-index level 1 corresponding to nested_dict middle level keys and multi-index level 2 corresponding to nested_dict innermost keys.

Return type

pandas.DataFrame

bipcpg.utils.communities_utils.communities_data(G, **la_kwds)

Perform a community detection procedure on graph G and return relevant results for plotting.

Parameters

G (networkx.Graph) – networkx graph on which to perform community detection.
la_kwds – keyword arguments passed on to leidenalg.find_partition().

Returns

G_igraph igraph.Graph - igraph graph object equivalent to G.
partition leidenalg.VertexPartition - Graph partition.
tup_nodes_num_nodes tuple - a tuple containing list of nodes sorted by community and list of number of nodes per community.

Return type

tuple

bipcpg.utils.communities_utils.get_igraph_network_and_partition(G, **la_kwds)

Obtain an igraph graph and a partition from a networkx graph.

Parameters

G (networkx.Graph) – networkx graph to be converted into igraph graph.
la_kwds – keyword arguments passed on to leidenalg.find_partition().

Returns

H igraph.Graph - igraph graph object.
partition leidenalg.VertexPartition - Graph partition.

Return type

tuple