VB.net 2010 视频教程 VB.net 2010 视频教程 python基础视频教程
SQL Server 2008 视频教程 c#入门经典教程 Visual Basic从门到精通视频教程
当前位置:
首页 > Python基础教程 >
  • 爬虫之数据解析(2)

from pyquery import PyQuery as pq html = ''' <div href="wrap"> hello nihao <ul class="s_from"> asdasd <link class='active1 a123' href="http://asda.com"><a>asdadasdad12312</a></link> <link class='active2' href="http://asda1.com">asdadasdad12312</link> <link class='movie1' href="http://asda2.com">asdadasdad12312</link> </ul> </div> ''' doc = pq(html) its=doc("link").items() for it in its: print("修改:%s"%it.attr('class','active')) print("添加:%s"%it.css('font-size','14px'))
修改:<link class="active" href="http://asda.com"><a>asdadasdad12312</a></link>
            
添加:<link class="active" href="http://asda.com" style="font-size: 14px"><a>asdadasdad12312</a></link>
            
修改:<link class="active" href="http://asda1.com">asdadasdad12312</link>
            
添加:<link class="active" href="http://asda1.com" style="font-size: 14px">asdadasdad12312</link>
            
修改:<link class="active" href="http://asda2.com">asdadasdad12312</link>
        
添加:<link class="active" href="http://asda2.com" style="font-size: 14px">asdadasdad12312</link>

attr css操作直接修改对象的

  • remove

    remove 移除标签

from pyquery import PyQuery as pq
html = '''
    <div href="wrap">
        hello nihao
        <ul class="s_from">
            asdasd
            <link class='active1 a123' href="http://asda.com"><a>asdadasdad12312</a></link>
            <link class='active2' href="http://asda1.com">asdadasdad12312</link>
            <link class='movie1' href="http://asda2.com">asdadasdad12312</link>
        </ul>
    </div>
'''

doc = pq(html)
its=doc("div")
print('移除前获取文本结果:\n%s'%its.text())
it=its.remove('ul')
print('移除后获取文本结果:\n%s'%it.text())

运行结果:

移除前获取文本结果:
hello nihao
asdasd
asdadasdad12312
asdadasdad12312
asdadasdad12312
移除后获取文本结果:
hello nihao

其他DOM方法参考:

http://pyquery.readthedocs.io/en/latest/api.html

伪类选择器

from pyquery import PyQuery as pq
html = '''
    <div href="wrap">
        hello nihao
        <ul class="s_from">
            asdasd
            <link class='active1 a123' href="http://asda.com"><a>helloasdadasdad12312</a></link>
            <link class='active2' href="http://asda1.com">asdadasdad12312</link>
            <link class='movie1' href="http://asda2.com">asdadasdad12312</link>
        </ul>
    </div>
'''

doc = pq(html)
its=doc("link:first-child")
print('第一个标签:%s'%its)
its=doc("link:last-child")
print('最后一个标签:%s'%its)
its=doc("link:nth-child(2)")
print('第二个标签:%s'%its)
its=doc("link:gt(0)") #从零开始
print("获取0以后的标签:%s"%its)
its=doc("link:nth-child(2n-1)")
print("获取奇数标签:%s"%its)
its=doc("link:contains('hello')")
print("获取文本包含hello的标签:%s"%its)

运行结果:

第一个标签:<link class="active1 a123" href="http://asda.com"><a>helloasdadasdad12312</a></link>
            
最后一个标签:<link class="movie1" href="http://asda2.com">asdadasdad12312</link>
        
第二个标签:<link class="active2" href="http://asda1.com">asdadasdad12312</link>
            
获取0以后的标签:<link class="active2" href="http://asda1.com">asdadasdad12312</link>
            <link class="movie1" href="http://asda2.com">asdadasdad12312</link>
        
获取奇数标签:<link class="active1 a123" href="http://asda.com"><a>helloasdadasdad12312</a></link>
            <link class="movie1" href="http://asda2.com">asdadasdad12312</link>
        
获取文本包含hello的标签:<link class="active1 a123" href="http://asda.com"><a>helloasdadasdad12312</a></link>

相关教程