PyPI官网下载|Xnode2vec-0.11.4.tar.gz资源-CSDN文库

版权申诉

82 浏览量 2022-01-17 13:23:13 上传评论收藏 16KB GZ 举报

共14个文件

py：6个

txt：4个

pkg-info：2个

**PyPI 官网下载 | Xnode2vec-0.11.4.tar.gz** PyPI（Python Package Index）是Python编程语言的官方软件仓库，它提供了大量的第三方库供开发者下载和使用。Xnode2vec-0.11.4.tar.gz 是一个在PyPI上发布的软件包，它是一个用于图嵌入的Python库，特别是针对网络中的节点表示学习。这个版本为0.11.4，意味着它可能包含了之前版本的改进和修复。 **Xnode2vec 库详解** Xnode2vec 是一种基于深度学习的算法，它扩展了经典的word2vec模型到图结构数据。Word2vec是一种广泛应用于自然语言处理的词嵌入技术，它能够将词汇转化为连续的向量表示，以便捕捉词汇之间的语义关系。Xnode2vec 同样利用这种思想，但它的目标是捕获图中节点之间的拓扑关系。 1. **节点嵌入（Node Embedding）** 节点嵌入是将网络中的节点转换为低维向量的过程，这样可以保持网络的结构信息，并可用于后续的分析任务，如节点分类、链接预测等。Xnode2vec 提供了一种有效的方法来学习这些嵌入，考虑了节点的局部和全局上下文。 2. **随机游走（Random Walk）** 在Xnode2vec中，随机游走是探索网络结构的关键工具。它模拟了在图中随机移动的过程，帮助捕获节点的邻域信息。通过控制游走的长度和回溯概率，可以适应不同类型的网络结构。 3. **参数设置** Xnode2vec有两个主要的参数，即p和q，它们控制随机游走的返回和跳转概率。p参数影响了节点返回其起始节点的概率，而q参数则影响了节点跳转到与其最近邻节点的概率。这两个参数的选择对最终的嵌入效果有很大影响。 4. **训练过程** 训练过程中，Xnode2vec首先生成一系列的随机游走路径，然后将这些路径视为“句子”，每个节点视为“词”，利用word2vec的skip-gram模型进行训练。这会产生一组节点的低维向量表示，保留了网络结构的特征。 5. **应用** 学习到的节点嵌入可以用于多种任务，如社区检测、节点分类、链接预测等。通过将图中的节点映射到低维空间，可以更直观地理解网络结构，并且可以利用机器学习模型对这些嵌入进行进一步分析。 6. **安装与使用** 要在Python项目中使用Xnode2vec，首先需要从PyPI下载并安装该库。这通常可以通过运行`pip install Xnode2vec`命令完成。安装完成后，用户可以参照库的文档，调用相应的函数进行图嵌入。 Xnode2vec是一个强大的工具，它为图数据的学习和分析提供了一种有效的解决方案。通过在PyPI上发布，它使得Python开发者能够轻松地将这一先进算法集成到他们的项目中，提升图数据处理的能力。对于处理社交网络、生物网络、知识图谱等复杂网络结构的数据科学家和工程师来说，Xnode2vec是一个值得探索的宝贵资源。

资源推荐

资源详情

资源评论

收起资源包目录

Xnode2vec-0.11.4.tar.gz （14个子文件）

Xnode2vec-0.11.4

PKG-INFO 12KB

setup.cfg 38B

setup.py 745B

Xnode2vec

data_manipulation.py 4KB

data_clusterization.py 14KB

data_edgelists.py 8KB

__init__.py 362B

data_management.py 4KB

README.md 10KB

Xnode2vec.egg-info

PKG-INFO 12KB

requires.txt 41B

SOURCES.txt 325B

top_level.txt 10B

dependency_links.txt 1B

# XNode2Vec - An Alternative Data Clustering Procedure Description ----------- This repository proposes an alternative method for data classification and clustering, based on the Node2Vec algorithm that is applied to a properly transformed N-dimensional dataset. The original [Node2Vec](https://github.com/aditya-grover/node2vec) algorithm was replaced with an extremely faster version, called [FastNode2Vec](https://github.com/louisabraham/fastnode2vec). The application of the algorithm is provided by a function that works with **networkx** objects, that are quite user-friendly. At the moment there are few easy data transformations, but they will be expanded in more complex and effective ones. Installation ------------ In order to install the Xnode2vec package simply use pip: - ``` pip install Xnode2vec ``` *If there are some problems with the installation, please read the "Note" below.* How to Use ---------- The idea behind is straightforward: 1. Take a dataset, or generate one. 2. Apply the proper transformation to the dataset. 3. Build a **networkx** object that embeds the dataset with its crucial properties. 4. Perform a node classification analysis with Node2Vec algorithm. ```python import numpy as np import Xnode2vec as xn2v import pandas as pd x1 = np.random.normal(4, 1, 20) y1 = np.random.normal(5, 1, 20) x2 = np.random.normal(17, 2, 20) y2 = np.random.normal(13, 1, 20) family1 = np.column_stack((x1, y1)) # REQUIRED ARRAY FORMAT family2 = np.column_stack((x2, y2)) # REQUIRED ARRAY FORMAT dataset = np.concatenate((family1,family2),axis=0) # Generic dataset transf_dataset = xn2v.best_line_projection(dataset) # Points transformation df = xn2v.complete_edgelist(transf_dataset) # Pandas edge list generation edgelist = xn2v.generate_edgelist(df) G = nx.Graph() G.add_weighted_edges_from(edgelist) # Feed the graph with the edge list nodes, similarity = xn2v.similar_nodes(G, dim=128, walk_length=20, context=5, picked=10, p=0.1, q=0.9, workers=4) similar_points = xn2v.recover_points(dataset,G,nodes) # Final cluster ``` Using the same setup as before, let's perform an even more complex analysis: ```python x1 = np.random.normal(16, 2, 100) y1 = np.random.normal(9, 2, 100) x2 = np.random.normal(25, 2, 100) y2 = np.random.normal(25, 2, 100) x3 = np.random.normal(2, 2, 100) y3 = np.random.normal(1, 2, 100) x4 = np.random.normal(30, 2, 100) y4 = np.random.normal(70, 2, 100) family1 = np.column_stack((x1, y1)) # REQUIRED ARRAY FORMAT family2 = np.column_stack((x2, y2)) # REQUIRED ARRAY FORMAT family3 = np.column_stack((x3, y3)) # REQUIRED ARRAY FORMAT family4 = np.column_stack((x4, y4)) # REQUIRED ARRAY FORMAT dataset = np.concatenate((family1,family2,family3,family4),axis=0) # Generic dataset df = xn2v.complete_edgelist(dataset) # Pandas edge list generation df = xn2v.generate_edgelist(df) # Networkx edgelist format G = nx.Graph() G.add_weighted_edges_from(df) graph = xn2v.nx_to_Graph(G) # Load the Graph object to avoid multiple network readings nodes_families, unlabeled_nodes = xn2v.clusters_detection(G, graph=graph, cluster_rigidity = 0.85, spacing = 15, dim_fraction = 0.8, picked=100, dim=100, context=5, Weight=True, walk_length=20) points_families = [] points_unlabeled = [] for i in range(0,len(nodes_families)): points_families.append(xn2v.recover_points(dataset,G,nodes_families[i])) points_unlabeled = xn2v.recover_points(dataset,G,unlabeled_nodes) plt.scatter(dataset[:,0], dataset[:,1]) plt.xlabel('x') plt.ylabel('y') plt.title('Generic Dataset', fontweight='bold') plt.show() ``` Now the list ```points_families``` contains the four clusters -- clearly taking in account possible statistical errors. The results are however surprisingly good in many situations. Results ------- The analysis prints out on the terminal automatically: - Number of clusters found. - Number of nodes analyzed. - Number of *clustered* nodes. - Number of *non-clustered* nodes. - Number of nodes in each cluster. The output is something of this type: ```properties --------- Clusters Information --------- - Number of Clusters: 5 - Total nodes: 400 - Clustered nodes: 251 - Number of unlabeled nodes: 149 - Nodes in cluster 1: 16 - Nodes in cluster 2: 52 - Nodes in cluster 3: 83 - Nodes in cluster 4: 64 - Nodes in cluster 5: 36 ``` The clustered objects are stored into a list of numpy vectors that are returned by the function *clusters_detection()*. It's important to get used to the *parameter selection* that determines the criteria with which the nodes are labeled. Objects Syntax -------------- Here we report the list of structures required to use the Xnode2vec package: - Dataset: ``` dataset = np.array([[1,2,3,..], ..., [1,2,3,..]])```; the rows corresponds to each point, while the coulumns to the coordinates. - Edge List: ``` edgelist = [(node_a,node_b,weight), ... , (node_c,node_d,weight)] ```; this is a list of tuples, structured as [starting_node, arriving_node, weight] - DataFrame: ``` pandas.DataFrame(np.array([[1, 2, 3.7], ..., [2, 7, 12]]), columns=['node1', 'node2', 'weight']) ``` Functions Description --------------------- - ```nx_to_Graph()``` : Performs a conversion from the **networkx** graph format to the **fastnode2vec** one, that is necessary to work with fastnode2vec objects. - ```labels_modifier()```: Changes the labels of the created networkx graph. It can be useful if we want to select rows from a dataframe that we can't recover only with their positions in the vector. - ```generate_edgelist()```: Read a pandas DataFrame and generates an edge list vector to eventually build a networkx graph. The syntax of the file header is rigidly controlled and can't be changed. The header format must be: (node1, node2, weight). - ```edgelist_from_csv()```: Read a .csv file using pandas dataframes and generates an edge list vector to eventually build a networkx graph. The syntax of the file header is rigidly controlled and can't be changed. - ```complete_edgelist()```: This function performs a **data transformation** from the space points to a network. It generates links between specific points and gives them weights according to the specified metric. - ```stellar_edgelist()```: This function performs a **data transformation** from the space points to a network. It generates links between specific points and gives them weights according to specific conditions. - ```low_limit_network()```: This function performs a **network transformation**. It sets the link weights of the network to 0 if their initial value was below a given threshold. The threshold is chosen to be a constant times the average links weight. - ```best_line_projection()```: Performs a linear best fit of the dataset points and projects them on the line itself. - ```cluster_generation()```: This function takes the nodes that have a similarity higher than the one set by *cluster_rigidity*. - ```clusters_detection()```: This function detects the **clusters** that compose a generic dataset. The dataset must be given as a **networkx** graph, using the proper data transformation. The clustering procedure uses Node2Vec algorithm to find the most similar nodes in the network. - ```recover_points()```: Recovers the spatial points from the analyzed network. It uses the fact that the order of the nodes that build the network is the same as the dataset one, therefore there is a one-to-one correspondence between nodes and points. - ```similar_nodes()```: Performs FastNode2Vec algorithm with full control on the crucial parameters. In particular, this function allows the user to keep working with networkx objects -- that are generally quite user-friendly -- instead of the ones required by the fastnode2vec algorithm. - ```load_model()```: Load the saved Gensim.Word2Vec model. - ```draw_community()```: Draws a networkx p

评论收藏

内容反馈

版权申诉