# Pandas NLP
It's an extension for pandas providing some NLP functionalities for strings.
[![build](https://github.com/jaume-ferrarons/pandas-nlp/actions/workflows/push-event.yml/badge.svg?branch=master)](https://github.com/jaume-ferrarons/pandas-nlp/actions/workflows/push-event.yml)
[![version](https://img.shields.io/pypi/v/pandas_nlp?logo=pypi&logoColor=white)](https://pypi.org/project/pandas-nlp/)
[![codecov](https://codecov.io/gh/jaume-ferrarons/pandas-nlp/branch/master/graph/badge.svg?token=UQUSYGANFQ)](https://codecov.io/gh/jaume-ferrarons/pandas-nlp)
[![pyversion-button](https://img.shields.io/pypi/pyversions/pandas_nlp.svg)](https://pypi.org/project/pandas-nlp/)
## Setup
### Requirements
- python >= 3.8
### Installation
Execute:
```bash
pip install -U pandas-nlp
```
To install the default spacy English model:
```bash
spacy install en_core_web_md
```
## Key features
### Language detection
```python
import pandas as pd
import pandas_nlp
pandas_nlp.register()
df = pd.DataFrame({
"id": [1, 2, 3, 4, 5],
"text": [
"I like cats",
"Me gustan los gatos",
"M'agraden els gats",
"J'aime les chats",
"Ich mag Katzen",
],
})
df.text.nlp.language()
```
**Output**
```
0 en
1 es
2 ca
3 fr
4 de
Name: text_language, dtype: object
```
with confidence:
```python
df.text.nlp.language(confidence=True).apply(pd.Series)
```
**Output**
```
language confidence
0 en 0.897090
1 es 0.982045
2 ca 0.999806
3 fr 0.999713
4 de 0.997995
```
### String embedding
```python
import pandas as pd
import pandas_nlp
pandas_nlp.register()
df = pd.DataFrame(
{"id": [1, 2, 3], "text": ["cat", "dog", "violin"]}
)
df.text.nlp.embedding()
```
**Output**
```
0 [3.7032, 4.1982, -5.0002, -11.322, 0.031702, -...
1 [1.233, 4.2963, -7.9738, -10.121, 1.8207, 1.40...
2 [-1.4708, -0.73871, 0.49911, -2.1762, 0.56754,...
Name: text_embedding, dtype: object
```
### Closest concept
```python
import pandas as pd
import pandas_nlp
pandas_nlp.register()
themed = pd.DataFrame({
"id": [0, 1, 2, 3],
"text": [
"My computer is broken",
"I went to a piano concert",
"Chocolate is my favourite",
"Mozart played the piano"
]
})
themed.text.nlp.closest(["music", "informatics", "food"])
```
**Output**
```
0 informatics
1 music
2 food
3 music
Name: text_closest, dtype: object
```
### Sentence extraction
```python
import pandas as pd
import pandas_nlp
pandas_nlp.register()
df = pd.DataFrame(
{"id": [0, 1], "text": ["Hello, how are you?", "Code. Sleep. Eat"]}
)
df.text.nlp.sentences()
```
**Output**
```python
0 [Hello, how are you?]
1 [Code., Sleep., Eat]
Name: text_sentences, dtype: object
```
程序员Chino的日记
- 粉丝: 3715
- 资源: 5万+
最新资源
- 新年主题-3.花生采摘-猴哥666.py
- (6643228)词法分析器 vc 程序及报告
- mysql安装配置教程.txt
- 动手学深度学习(Pytorch版)笔记
- mysql安装配置教程.txt
- mysql安装配置教程.txt
- 彩页资料 配变智能环境综合监控系统2025.doc
- 棉花叶病害图像分类数据集5类别:健康的、蚜虫、粘虫、白粉病、斑点病(9000张图片).rar
- (176205830)编译原理 词法分析器 lex词法分析器
- 使用Python turtle库绘制哈尔滨亚冬会特色图像-含可运行代码及详细解释
- 2023年全国职业院校技能大赛GZ033大数据应用开发赛题答案(2).zip
- 【天风证券-2024研报-】水利部发布《对‘水利测雨雷达’的新质生产力研究》,重点推荐纳睿雷达.pdf
- 【国海证券-2024研报-】海外消费行业周更新:LVMH中国市场挑战严峻,泉峰控股发布盈喜.pdf
- 【招商期货-2024研报-】2024、25年度新疆棉花调研专题报告:北疆成本倒挂,南疆出现盘面利润.pdf
- 【宝城期货-2024研报-】宝城期货股指期货早报:IF、IH、IC、IM.pdf
- 【国元证券(香港)-2024研报-】即时点评:9月火电和风电增速加快,电力运营商盈利有望改善.pdf
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈