python对中文段落进行分词分句及分标点符号-Linux大棚

admin 管理员组

文章数量: 1087675

python对中文段落进行分词分句及分标点符号

发现一个分句好方便的包

相关文档：/

github:

------------------------------------------------------------------------2021年1月25日更新------------------------------------------------

打脸了。。。我发现上面的代码分全是中文的段落才是最有效的，若有数字或英文会有问题的，如下：

结果：

啊，真难过，只能自己写分句了。

不过zhon还能分离标点符号和汉字也挺好的。如下：

结果：

但其实用 jieba.posseg（分词获取词性），也能轻松得到标点符号的，如下：

会发现标点符号的flag都是x

---------------------------------------------------------

较方便的分句方法：

from nltk.tokenize import RegexpTokenizer
def SplitSentence(content): #对中文段落进行分句tokenizer = RegexpTokenizer(".*?[。！？]") #就是以[]中的符号为标识分割的rst = tokenizer.tokenize(content)# listreturn rst

本文标签： python对中文段落进行分词分句及分标点符号

版权声明：本文标题：python对中文段落进行分词分句及分标点符号内容由网友自发贡献，该文观点仅代表作者本人，转载请联系作者并注明出处：http://www.roclinux.cn/p/1697150891a262292.html，本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容，一经查实，本站将立刻删除。

更多相关文章

Linux大棚 – 不忘初心的技术博客，浮躁时代的安静角落

python对中文段落进行分词分句及分标点符号

python对中文段落进行分词分句及分标点符号

更多相关文章

python对中文段落进行分词分句及分标点符号

发表评论

推荐文章

javascript - Ramda path for array values - Stack Overflow

Ceph分布式存储系统的搭建与使用

Windows 安装hadoop 3.1.3

Ubuntu系统下Docker部署Moredoc保姆级教程：手把手搭建个人知识库

Windows:模拟屏幕点击，按键刷票

热门文章

javascript - How to create a react project with specific version of create-react-app? - Stack Overflow

U盘插入遭遇格式化提示？别急，数据还能救！

javascript - Angular router issue when upgrading from angular 2 to angular 4.4 - Stack Overflow

javascript - How do I calculate Uniswap V3 gas units to token1 ? Response from Uniswap V3 quoteExactInputSingle - Stack Overflow

javascript - 5xx or 4xx error with “No 'Access-Control-Allow-Origin' header is present” - Stack Overflow

laravel - How to find model with condition on multiple relationship? - Stack Overflow

在windows电脑上安装docker服务

简洁移除 Windows 11 多余输入法的全流程

请使用微信浏览器打开页面。想在浏览器打开？

如何将 Windows 10 Enterprise LTSC 2021 评估版升级到完整版进行激活

最新文章

javascript - How do I toggle the readonly attribute of all child element with jquery - Stack Overflow

javascript - Might it be possible to block an entire US state from accessing my site, using PHP? - Stack Overflow

c++ - Is dereferencing std::span::end always undefined? - Stack Overflow

javascript - Delay function execution if it has been called recently - Stack Overflow

javascript - Google Maps Autocomplete List - Stack Overflow

【免费下载】联想拯救者Y7000 2020H原厂Win10系统镜像：重拾纯净体验

【免费下载】 Java 11 下载 - 版本 11.0.17 (Windows 各版本)

【免费下载】 Windows7旗舰版简体中文ISO镜像下载：轻松获取正版系统安装镜像

【免费下载】 Ventory-u盘启动制作工具：让你的Ubuntu之旅更加顺畅

【免费下载】 Ventory-u盘启动制作工具：轻松打造高效启动盘

Exploring the Finest Accommodations: A Comprehensive Guide to Ruston LA Hotels

The Enchanting Experience of ScaliniTella NYC: A Culinary Gem in the Heart of Manhattan

Exploring the Exquisite Aloft Chicago O'Hare: A Blend of Modern Luxury and Convenience

A Culinary Journey: Discovering the Finest Dining Experiences in Waco, TX

A Culinary Journey: Discovering the Finest Dining Experiences in Athens, GA

Linux大棚 – 不忘初心的技术博客，浮躁时代的安静角落

python对中文段落进行分词分句及分标点符号

python对中文段落进行分词分句及分标点符号

更多相关文章

python对中文段落进行分词分句及分标点符号

发表评论

推荐文章

javascript - Ramda path for array values - Stack Overflow

Ceph分布式存储系统的搭建与使用

Windows 安装hadoop 3.1.3

Ubuntu系统下Docker部署Moredoc保姆级教程：手把手搭建个人知识库

Windows:模拟屏幕点击，按键刷票

热门文章

javascript - How to create a react project with specific version of create-react-app? - Stack Overflow

U盘插入遭遇格式化提示？别急，数据还能救！

javascript - Angular router issue when upgrading from angular 2 to angular 4.4 - Stack Overflow

javascript - How do I calculate Uniswap V3 gas units to token1 ? Response from Uniswap V3 quoteExactInputSingle - Stack Overflow

javascript - 5xx or 4xx error with “No &#39;Access-Control-Allow-Origin&#39; header is present” - Stack Overflow

laravel - How to find model with condition on multiple relationship? - Stack Overflow

在windows电脑上安装docker服务

简洁移除 Windows 11 多余输入法的全流程

请使用微信浏览器打开页面。想在浏览器打开？

如何将 Windows 10 Enterprise LTSC 2021 评估版升级到完整版进行激活

最新文章

javascript - How do I toggle the readonly attribute of all child element with jquery - Stack Overflow

javascript - Might it be possible to block an entire US state from accessing my site, using PHP? - Stack Overflow

c++ - Is dereferencing std::span::end always undefined? - Stack Overflow

javascript - Delay function execution if it has been called recently - Stack Overflow

javascript - Google Maps Autocomplete List - Stack Overflow

【免费下载】 联想拯救者Y7000 2020H原厂Win10系统镜像：重拾纯净体验

【免费下载】 Java 11 下载 - 版本 11.0.17 (Windows 各版本)

【免费下载】 Windows7旗舰版简体中文ISO镜像下载：轻松获取正版系统安装镜像

【免费下载】 Ventory-u盘启动制作工具：让你的Ubuntu之旅更加顺畅

【免费下载】 Ventory-u盘启动制作工具：轻松打造高效启动盘

Exploring the Finest Accommodations: A Comprehensive Guide to Ruston LA Hotels

The Enchanting Experience of ScaliniTella NYC: A Culinary Gem in the Heart of Manhattan

Exploring the Exquisite Aloft Chicago O'Hare: A Blend of Modern Luxury and Convenience

A Culinary Journey: Discovering the Finest Dining Experiences in Waco, TX

A Culinary Journey: Discovering the Finest Dining Experiences in Athens, GA

javascript - 5xx or 4xx error with “No 'Access-Control-Allow-Origin' header is present” - Stack Overflow

【免费下载】联想拯救者Y7000 2020H原厂Win10系统镜像：重拾纯净体验