Code documentation
PCPG class
- class bipcpg.pcpg.PCPG(corr_matrix, variable_names=None)
Bases:
objectClass to obtain a Partial Correlation Planar Graph (PCPG) network from a correlation matrix. 1
- Parameters
corr_matrix (pandas.DataFrame/numpy.ndarray) – Correlation matrix displaying correlations among variables in the system.
variable_names (list) – Names of the variables in the system. The order of this list should coincide with the order of rows and columns in
corr_matrix.
This class includes methods to perform the necessary computations and obtain a
networkx.Graphnetwork object. The PCPG algorithm consists in the following steps:Find the Average influence (AI) between every ordered pair of variables in the system, i.e. those in the input
corr_matrix. Seecompute_avg_influence_matrix().List the AIs in order from largest to smallest, and,
Iterate through the list and add a directed edge corresponding to the pair of variables of the AI value in that position if and only if (i) the reversed edge is not already in the network and (ii) the network’s planarity is not broken by adding the edge. See
create_network().
See the tutorial for further information.
- Variables
avg_influence_matrix –
numpy.ndarraycontaining average influence values between pairs of variables.avg_influence_df –
pandas.DataFramecontaining average influence values between pairs of variables.influence_df –
pandas.DataFramecontaining influence values between pairs of variables.partial_corr_df – Multi-index
pandas.DataFramecontaining partial correlation values between triple of variables.network – the PCPG network generated (a
networkx.DiGraphdirected graph object).nodes – Nodes in
network.edges – Edges in
network.dict_var_names –
dictcontaining variable numbers as keys and variables names as values.
References
- 1
Kenett DY, Tumminello M, Madi A, Gur-Gershgoren G, Mantegna RN, Ben-Jacob E (2010) Dominating Clasp of the Financial Sector Revealed by Partial Correlation Analysis of the Stock Market. PLoS ONE 5(12): e15032. <https://doi.org/10.1371/journal.pone.0015032>
- add_edge_attribute(attr_data, attr_name)
Adds data as an attribute to edges in
network.- Parameters
attr_data (dict/pandas.DataFrame) –
pandas.DataFrameordictcontaining edge attribute values.attr_name (str) – Name of attribute to be added to edges.
Note
If
attr_datais apandas.DataFrame, the row indices should the origin nodes and column indices should be the target nodes. Ifattr_datais a dictionary, keys should be tuples of the form (origin_node, target_node).
- add_node_attribute(attr_data, attr_name)
Adds data as an attribute to nodes in
network.- Parameters
attr_data (dict/pandas.Series) –
pandas.Seriesordictcontaining node attribute values.attr_name (str) – Name of attribute added.
Note
If
edge_attribute_valuesis apandas.Series, its index should contain the node and its values the node data. Ifedge_attribute_valuesis adict, keys should be nodes and values should be node data.
- compute_assortativity(node_attribute, attr_type)
Compute node assortativity based on
node_attributeof nodes.- Parameters
node_attribute (str) – Name of node attribute in
networkby which to compute assortativity.attr_type (str) – Either “qual” or “quant”. Indicates if
node_attributedata is a qualitative characteristic or a quantitative characteristic.
- Returns
Value of calculated assortativity.
- Return type
float
- compute_avg_influence_matrix()
Compute average influences between every pair of variables in the system and put these in
avg_influence_matrix.- Returns
None
- compute_influence_avg_influence_partial_corr_dfs()
Compute partial correlations, influences and average influences between all variables in the system and put these in
partial_corr_df,influence_dfandavg_influence_dfrespectively.- Returns
None
- create_network()
Create PCPG a
networkx.DiGraphobject withnodes: and edges found following the PCPG algorithm.- Returns
None
- find_edges()
Compute the edges in the PCPG network using the average influences in
avg_influence_matrix.- Returns
List of edges in the PCPG network
- Return type
list
Correlations Functions
- bipcpg.correlations.compute_corr_matrix(matrix, critical_value=None)
Obtain a correlation matrix among the variables in a matrix. If
critical valueis passed, the correlation matrix is filtered based on a statistical significance T-test wherecritical_valueis the threshold value.- Parameters
matrix (numpy.ndarray) –
numpy.ndarraycontaining time series for the values of interest with observations along axis 0 (rows) and variables along axis 1 (columns).critical_value (float) – Boundary of the acceptance region of the T-test performed.
- Returns
Correlation matrix displaying correlation coefficients between the columns (axis 1) of each input matrix.
- Return type
numpy.ndarray
- bipcpg.correlations.corr_pvalue_matrices(matrix)
Obtain a correlation matrix and p-value matrix for a matrix containing variables and observations.
- Parameters
matrix (numpy.ndarray) – 2-dimensional numpy.ndarray containing containing observations axis 0 and variables along axis 1.
- Returns
tuple containing correlation matrix showing correlation coefficients between columns of input matrix and p-value matrix showing statistical significance of correlations.
- Return type
tuple
- bipcpg.correlations.get_correlation_matrices_for_list_of_matrices(matrices, critical_value=None)
Obtain a correlation matrix and p-value matrix for each matrix (containing variables along the columns and observations along the rows) in
matrices. Ifcritical valueis passed, each correlation matrix is filtered based on a statistical significance T-test wherecritical_valueis the threshold value.- Parameters
matrices (Iterable) – Iterable object containing of 2-dimensional
numpy.ndarrays with observations along axis 0 (rows) and variables along axis 1 (columns).critical_value (float) – Boundary of the acceptance region of the T-test performed.
- Returns
list of length len(list_time_series_matrices) containing correlation matrices displaying the correlation coefficients between the columns (axis 1) of each input matrix
- Return type
list
Bootstrap functions
- bipcpg.bootstrap.construct_corr_matrix_replicates_from_time_series_matrices(array_of_matrices, num_replicates, critical_value=None)
Performs a bootstrap procedure on time series matrices to obtain correlation matrix replicates. If
critical_valueis not None, the correlation matrices are filtered using a statistical significance T-test.- Parameters
array_of_matrices (numpy.ndarray) – 3-dimensional
numpy.ndarraywith axis 0 representing elements of one of the sets in the bipartite system, axis 1 representing time series observations and axis 2 representing elements of the remaining set in the bipartite system.num_replicates (int) – Number of correlation matrix replicates to be constructed.
critical_value (float) – If passed, boundary of the acceptance region of the T-test performed.
- Returns
Array containing mean of correlation matrix replicates in each batch.
- Return type
numpy.ndarray
- bipcpg.bootstrap.get_bootstrap_values(timeseries_matrices, variable_names=None, num_replicates=1000, critical_value=None)
Compute bootstrap values for edges in a PCPG network. This function takes a dataset in the form of a list or numpy array of matrices with time series in its columns (see Dataset structure) performs a bootstrap procedure that generates a total of
num_replicatesreplicate PCPG matrices and finds the bootstrap value of each edge, i.e. the fraction of times the edge appears in these networks. Ifcritical_valueis not None, the replicate correlation matrices generated are filtered using a statistical significance T-test.- Parameters
timeseries_matrices (list/numpy.ndarray) – Iterable containing the dataset for which the PCPG network was generated. This should be a list containing 2d-:class:numpy.ndarray` s whose columns contain observations for one of the the two sets of variables in a bipartite dataset.
variable_names (list) – Names of variables along columns of each matrix in
timeseries_matricesnum_replicates (int) – Number of replicates to generate in the bootstrap procedure.
critical_value (float) – If passed, boundary of the acceptance region of the T-test performed.
- Returns
pandas.DataFramecontaining the bootstrap values of the directed edges in the PCPG network. Note that the source of an edge is its row index and the target of the edge is its column index.- Return type
pandas.DataFrame
Util functions
- bipcpg.utils.utils.get_degrees_df(G)
Get a
pandas.DataFramecontaining the degree, in-degree and out-degree information of the nodes inG.- Parameters
G (networkx.DiGraph) – Directed network.
- Returns
pandas.DataFramecontaining degree information.- Return type
pandas.DataFrame
- bipcpg.utils.utils.remove_reversed_duplicates(iterable)
For an iterable object containing other iterables, yield items which do not have a reversed duplicate in a position with a smaller index.
- Parameters
iterable (Iterable) – An iterable object containing other iterables.
- Returns
Inner iterables which do not have a reversed duplicate in a position with a smaller index.
- Return type
Iterator[Iterable]
- bipcpg.utils.utils.reshape_year_matrices_to_time_series_matrices(list_yearly_matrices)
For a list of
numpy.ndarrays, switch the first dimension (list entries) for the second dimension (axis 0) of matrices in the list.- Parameters
list_yearly_matrices (list) – list of 2-dimensional
numpy.ndarrays indexed over time. Each matrix has one set of variables of the bipartite dataset along axis 0 (rows) and the other set of variables in the bipartite dataset along axis 1 (columns).- Returns
list of 2-dimensional
numpy.ndarrayindexed over the elements in the rows of the matrices inlist_yearly_matrices. Axis 0 (rows) of each matrix is now indexed over time, i.e. the dimension of the elements inlist_yearly_matrices.- Return type
list- Example
This can be used transform a list of matrices (one per year) into a list of time series matrices. Say we have a list
my_listcontaining matrices (one per year) with the exports every country (rows) made for every product (columns). We can then transform this into a list of matrices (one per country) with time series observations along the rows and products along the columns.
>>> my_list = [np.array([[1,2],[3,4]]), ... np.array([[5,6],[7,8]]), ... np.array([[9,10],[11,12]])] >>> my_list_transformed = transform_year_matrices_to_time_series_matrices(my_list) my_list_transformed [ array([[ 1, 2], [ 5, 6], [ 9, 10]]), array([[ 3, 4], [ 7, 8], [11, 12]]) ]
- bipcpg.utils.utils.transform_3level_nested_dict_into_df(nested_dict)
Transform a nested dictionary with three levels into a stacked
pandas.DataFramewith a 2 level multi-index.- Parameters
nested_dict (dict) – Three level nested dictionary to be transformed.
- Returns
pandas.DataFramewith 2-level multi-index. multi-index level 0 corresponds to outermostnested_dictkeys, multi-index level 1 corresponds tonested_dictmiddle level keys and columns correspond tonested_dictinnermost keys.- Return type
pandas.DataFrame
- bipcpg.utils.utils.transform_3level_nested_dict_into_stacked_df(nested_dict, name=None)
Transform a nested dictionary with three levels into a stacked
pandas.DataFramewith a 3 level multi-index and a single column. Ifnameis passed, set the name of the column toname.- Parameters
nested_dict (dict) – Three level nested dictionary to be transformed.
name (str) – Name of single column found in returned
pandas.DataFrame
- Returns
Stacked dataframe with multi-index level 0 corresponding to outermost nested_dict keys, multi-index level 1 corresponding to nested_dict middle level keys and multi-index level 2 corresponding to nested_dict innermost keys.
- Return type
pandas.DataFrame
- bipcpg.utils.communities_utils.communities_data(G, **la_kwds)
Perform a community detection procedure on graph
Gand return relevant results for plotting.- Parameters
G (networkx.Graph) –
networkxgraph on which to perform community detection.la_kwds – keyword arguments passed on to
leidenalg.find_partition().
- Returns
G_igraphigraph.Graph- igraph graph object equivalent toG.partitionleidenalg.VertexPartition- Graph partition.tup_nodes_num_nodestuple- a tuple containing list of nodes sorted by community and list of number of nodes per community.
- Return type
tuple
- bipcpg.utils.communities_utils.get_igraph_network_and_partition(G, **la_kwds)
Obtain an
igraphgraph and a partition from anetworkxgraph.- Parameters
G (networkx.Graph) –
networkxgraph to be converted into igraph graph.la_kwds – keyword arguments passed on to
leidenalg.find_partition().
- Returns
Higraph.Graph-igraphgraph object.partitionleidenalg.VertexPartition- Graph partition.
- Return type
tuple