python之正则表达式

当前位置:

首页 > temp > python入门教程 >

python之正则表达式

1,什么是正则表达式？

正则表达式（regular expression）是用来简洁表达一组字符串的表达式。

2,作用是什么？

①表达文本类型的特征。 ②同时查找或替换一组字符串。 ③匹配字符串的全部或部分。

3,常用的操作符：

操作符	explain	example
.	匹配除\n之外的任何一个字符
[]	对单个字符给出取值范围	[efg]表示e,f,g,[c-y]表示c-y单个字符
[^]	对单个字符给出排除范围	[^efg]表示除e,f,g之外的单个字符
*	匹配前一个字符0次或多次	efg*表示ef,efg或efgg或efggg等等等
+	匹配前一个字符1次或多次	efg+表示efg或efgg或efggg等等等
？	匹配前一个字符1次或者0次	efg表示ef(不出现),efg(出现)
\|	匹配 \| 左边或右边任意一个	efg\|def表示efg或def任意一个
{x}	匹配前一个字符x次	ef{3}g表示efffg
{x,y}	匹配前一个字符x到y次，含x,y次	ef{1,3}g表示efg或effg或efffg
^	匹配字符串开头	^efg表示efg且在字符串的开头
$	匹配字符串结尾	abc$表示abc且在字符串的结尾
()	分组标记，里面只能使用 \| 操作符	（efg）表示efg，(efg \| xyz)表示efg或xyz
\d	表示数字，相当于[0,9]
\w	单个字符，相当于[匹配大小写字母或数字或下划线或汉字等等]
\A	同^
\Z	同$

4,正则表达式的一些语法实例

正则表达式	对应的字符串
P(Y\|YT\|YTH\|YTHO)?N	"PN","PYN","PYTN","PYTHN","PYTHON"
PYTHON+	"PYTHON","PYTHONN","PYTHONNN".......
PY[TH]ON	"PYTON","PYHON"
PY[^TH]?ON	"PYON","PYAON","PYBON","PYCON"......
PY{:3}N	"PN","PYN","PYYN","PYYYN"

5,经典的正则表达式实例

^[A-Za-z]+$	由26个字母组成的字符串
^[A-Za-z0-9]+$	由26个字母和数字组成的字符串
^-?\d+$	整数形式的字符串
^[0-9][1-9][0-9]$	正整数形式的字符串
[1-9]\d{5}	中国境内的邮政编码
[\u4e00-\u9fa5]	匹配中文字符
\d{3}-\d{8}\|\d{4}-\d{7}	国内的电话号码，010-12345678
[1-9]?\d	0-99
1\d{2}	100-199
2[0-4]\d	200-249
25[0-5]	250-255
(([1-9]?\d\|1\d{2}\|2[0-4]\d\|25[0-5]).){3}([1-9]?\d\|1\d{2}\|2[0-4]\d\|25[0-5])	匹配ip地址

6,re库的基本使用

re库的主要功能函数
re.search()	在一个字符串中搜索匹配正则表达式的第一个位置，返回match对象
re.match()	从一个字符串的开始位置起匹配正则表达式，返回match对象
re.findall()	搜索字符串，以列表类型返回全部能匹配的子串
re.split()	将一个字符串按照正则表达式匹配结果进行分割，返回列表类型
re.finditer()	搜索字符串，返回一个匹配结果的迭代类型，每个迭代元素是match对象
re.sub()	在一个字符串中替换所有匹配正则表达式的子串，返回替换后的字符串

①search(pattern, string, flags=0)

pattern：正则表达式的字符串或原生字符串表示
string：待匹配字符串
flags：正则表达式使用时的控制标记

			
import re
match = re.search(r"[1-9]\d{5}", "haha 723300")
if match:
   print(match.group())

G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
723300

Process finished with exit code 0
			

②match(pattern,string,flags=0)

需要注意的是 match 函数是从字符串开始处开始查找，如果开始处不匹配，则不再继续寻找，若找到返回值为一个 match 对象，找不到时返回 None

			
import re
match = re.match(r"[1-9]\d{5}", "haha 723300")
print(type(match))
match = re.match(r"[1-9]\d{5}", "723300 haha")
if match:
   print(match.group())

G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
<class 'NoneType'>
723300

Process finished with exit code 0
			

可见search与match的区别在于：
match要求待匹配的子串必须在字符串的起始位置，否则查找不到，而search则无此要求

③findall（pattern，string，flags=0）

			
import re
c = re.findall(r"[1-9]\d{5}", "haha723300 xixi612203")
print(type(c))
print(c)

G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
<class 'list'>
['723300', '612203']

Process finished with exit code 0
			

④split(pattern，string，maxsplit=0，flags=0)

maxsplit：最大分割数，剩余部分作为最后一个元素输出

			
import re
a = re.split(r"[1-9]\d{5}", "haha723300 xixi612203")
print(type(a))
print(a)

a = re.split(r"[1-9]\d{5}", "haha723300 xixi612203", maxsplit=1)
print(a)

str1 = "name: hpl, age: 18"
b = re.split(r'\:|\,', str1)
print(b)


G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
<class 'list'>
['haha', ' xixi', '']
['haha', ' xixi612203']
['name', ' hpl', ' age', ' 18']

Process finished with exit code 0
			

⑤finditer(pattern，string，flags=0)

			
import re
for m in re.finditer(r"[1-9]\d{5}", "haha723300 xixi612203"):
   if m:
       print(m.group())

G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
723300
612203

Process finished with exit code 0
			

⑥sub(pattern，repl，string，count=0，flags=0)

repl：替换匹配字符串的字符串
count：匹配的最大替换次数

			
import re
m = re.sub(r"[1-9]\d{5}", "love", "haha723300 xixi612203")
if m:
   print(m)

G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
hahalove xixilove

Process finished with exit code 0
			

7,re库的match对象

属性：
string 待匹配的文本
re 匹配时使用的pattern对象（正则表达式）
pos 正则表达式搜索文本的开始位置
endpos 正则表达式搜索文本的结束位置

方法：
group() 获得匹配后的字符串
start() 匹配字符串在原始字符串的开始位置
end() 匹配字符串在原始字符串的结束位置
span() 返回（start）…（end）

			
import re
match = re.search(r"[1-9]\d{5}", "haha723300 xixi612203")
print(match.string)
print(match.re)
print(match.pos)
print(match.endpos)
print(match.group())
print(match.start())
print(match.end())
print(match.span())

G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
haha723300 xixi612203
re.compile('[1-9]\\d{5}')
0
21
723300
4
10
(4, 10)

Process finished with exit code 0
			

8,re库的贪婪匹配和最小匹配

①re库默认采用贪婪匹配，即输出匹配最长的子串

			
import re
match = re.search(r'PY.*N','PYANBNCNDN')
print(match.group())

G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
PYANBNCNDN

Process finished with exit code 0
			

②最小匹配的方法：在扩展操作符后加？

最小匹配操作符
操作符	说明
*？	前一个字符0次或无限次扩展,最小匹配
+？	前一个字符1次或无限次扩展,最小匹配
？？	前一个字符0次或1次扩展，最小匹配
[m,n]?	扩展前一个字符m至n次(含n),最小匹配

			
import re
match = re.search(r'PY.*?N','PYANBNCNDN')
print(match.group())

G:\Project1\venv\Scripts\python.exe G:/Project1/practice/lianxi2.py
PYAN

Process finished with exit code 0
			

出处：https://www.cnblogs.com/hpl201314/p/13907159.html

栏目列表