【免费】web端的人口数据统计分析_numpy人口分析资源-CSDN文库

共56个文件

py：30个

rst：6个

md：6个

python

windows

numpy

scikit-learn

需积分: 0 88 浏览量 2024-03-24 18:49:16 上传评论收藏 203KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

web人口数据统计分析.zip （56个子文件）

web人口数据统计分析

.vscode

settings.json 91B

setup.py 2KB

Pipfile 138B

LICENSE 1KB

tests

__init__.py 0B

test_utility.py 2KB

knowknow

wos.backup.py 5KB

__init__.py 956B

code_sharing.py 3KB

nlp.py 4KB

datastore_sql

__init__.py 73B

dbmodel.py 1KB

dbquery.py 5KB

wos.py 5KB

count_cache.py 12KB

example_usage.py 3KB

group_references.py 5KB

viz.py 18KB

datasources

wos.py 14KB

jstor.py 41KB

datastore_cnts

__init__.py 133B

counter.py 28KB

stats.py 5KB

time_trend.py 9KB

count_cache.py 20KB

representations.py 4KB

config.example.yaml 47B

utility.py 4KB

dataverse.py 7KB

__main__.py 6KB

meta_notebook.py 667B

external-data

world-universities.csv 542KB

env.py 894B

exceptions.py 83B

BUILD_IT.ps1 227B

docs

make.bat 764B

Makefile 638B

source

knowknow_base.rst 108B

intro.rst 491B

main_concepts.md 15B

conf.py 3KB

modules.rst 61B

Tutorials.md 41B

knowknow.rst 601B

index1.rst 490B

death_coverage.md 4KB

index.md 1KB

utility.rst 71B

getting_started.md 979B

requirements.txt 706B

requirements.txt 477B

Pipfile.lock 453B

devRequirements.txt 14B

MANIFEST.in 181B

.gitignore 139B

README.md 7KB

This Python package, `knowknow`, is an attempt to make powerful, modern tools for analyzing the structure of knowledge open to anyone. I recognize that parallel efforts exist along these lines, including [CADRE](https://cadre.iu.edu/), but this package is still the only resource for *anyone* to analyze Web of Science datasets, and the methods can be incorporated into CADRE by *anyone*.  # Projects built on knowknow + [amcgail/citation-death](https://github.com/amcgail/citation-death) applies the concept of 'death' to attributes of citations, and analyzes the lifecourse of cited works, cited authors, and the authors writing the citations, using the `sociology-wos-74b` dataset. + [amcgail/lost-forgotten](https://github.com/amcgail/lost-forgotten) digs deeper into . An online appendix is available [here](http://www.alecmcgail.com/lost&forgotten/), and the paper published in *The American Sociologist* can be found [here](https://rdcu.be/cnSFG). # Datasets built with knowknow + Sociology + `sociology-wos` ([Harvard Dataverse](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/GQGJLQ)) every paper in WoS in early 2020 whose journal is in the 'Sociology' category, and which have full data. + *in progress* `sociology-jstor` in-text citations and their contexts were extracted from >90k full-text Sociology articles indexed in JSTOR. # Installation (from PyPI) 1. Install Python 3.7+ 2. Install [Build Tools for Visual Studio](https://visualstudio.microsoft.com/visual-cpp-build-tools/) 3. Run `pip install knowknow-amcgail` # Installation (from GitHub) 1. Install Python 3.7+ 2. Clone this repository to your computer 3. Create a virtualenv for `knowknow` 4. In the virtualenv, execute `pip install -r requirements` + On Windows, I needed to install the latest versions of `numpy`, `scikit-learn` and `scipy` via .whl + For Windows, download from [this site](https://www.lfd.uci.edu/~gohlke/pythonlibs/), install with `pip install <fn.whl>` # Getting Started To get started with knowknow, you need to 1) specify where knowknow should store data and code ("init") 2) either create a new project or copy an existing one, and 3) start a JupyterLab environment. The following commands will help you perform these actions, getting you started conducting or reproducing analyses using `knowknow`. `python -m knowknow init`. Run this command first. It will prompt you for the directory to store data files and the directory where code will be stored. `python -m knowknow start <PROJ-NAME>` For instance, `python -m knowknow start citation-death`. Start a JupyterLab notebook in a knowknow code directory. If the directory doesn't exist, knowknow creates the directory. # [Recommended] Interfacing with GitHub In order to use the following commands you must install [Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git). This allows you to use others' code, and to publish your own code for others to use. `python -m knowknow clone <URL>` For instance, `python -m knowknow clone https://github.com/amcgail/lost-forgotten`. Clone someone else's repository. In order to make your own changes to others' code, or to share your code with the world, do the following: 1) Create a [GitHub](https://www.github.com/) account and log in. 2) Install [GitHub Desktop](https://desktop.github.com/), which is a simple connector between Git on your computer and GitHub, in the cloud. 3a) [Share your code] In GitHub Desktop, choose `File -> Create Repository`, navigate to the folder containing knowknow code. This folder was created by knowknow using the `start` command. Now press "Publish Repository" in the upper right to add this code to your GitHub account. 3b) [Contribute to others' code] In GitHub, `fork` the repository you would like to contribute to. This creates a personal copy of that repository in your GitHub account. Then clone this copy into knowknow's code directory using the `clone` command, or using GitHub desktop. Once you are satisfied with your updates, and they are pushed back to GitHub, submit a "pull request" to the original repository to ask them to review and merge your changes. # Auto-downloading Data and Code Data files will be automatically downloaded during code execution, if they are not alredy in the *data* directory you specified with the `init` command. This may take up significant bandwidth -- the data files for the Sociology dataset are ~750MB. Code specified by the `knowknow.reqiure` function will be automatically downloaded by knowknow into the *code* directory you specified with the `init` command. **Be sure you trust whoever wrote the code you download.** Running arbitrary code from random strangers on your computer is a security risk. # Developing If you want to contribute edits of your own, fork this repository into your own GitHub account, make the changes, and submit a request for me to incorporate the code (a "pull request"). This process is really easy with GitHub Desktop ([tutorial here](https://www.youtube.com/watch?v=BYzriB5aTWU)). There is a lot to do! If you find this useful to your work, and would like to contribute (even to the following list of possible next steps) but can't figure out how, please don't hesitate to reach out. My website is [here](http://www.alecmcgail.com), [Twitter here](https://twitter.com/SomeKindOfAlec). ## Possible projects + The documentation for this project can always be improved. This is typically through people reaching out to me when they have issues. Please [feel free](https://twitter.com/SomeKindOfAlec). + **complete** An object-oriented model for handling context would prevent the need for so much variable-passing between functions, reduce total code volume, and improve readability. + *ongoing* Different datasets and sources could be incorporated, if you have the need, in addition to JSTOR and WoS. + **complete - you can now upload data files to Harvard's Dataverse** If you produce precomputed binaries and have an idea of how we could incorporate the sharing of these binaries within this library, please [DM me](https://twitter.com/SomeKindOfAlec) or something. That would be great. + *ongoing, future work* All analyses can be generalized to any counted variable of the citations. This wouldn't be tough, and would have a huge payout. + *huge project, uncertain payout* It would be amazing if we could make a graphical interface for this. + user simply imports data, chooses the analyses they want to run, fill in configuration parameters and press "go" + the output is a PDF with the code, visualizations, and explanations for a given analysis + behind the scenes, all this GUI does is run `nbconvert` + also could allow users to regenerate any/all analyses for each dataset with the click of a button + could provide immediate access to online archives, either to download or upload similar count datasets

评论收藏

内容反馈