2 回答
TA贡献1824条经验 获得超5个赞
尝试使用requestsPython 的标准urllib.request. requests模块打开页面时出现问题:
import urllib.request
from bs4 import BeautifulSoup
url='http://www.sis.itu.edu.tr/tr/ders_programlari/LSprogramlar/prg.php'
html_content = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html_content, "lxml")
url_course_main='http://www.sis.itu.edu.tr/tr/ders_programlari/LSprogramlar/prg.php?fb='
url_course=url_course_main+soup.find_all('option')[1].get_text()
html_content_course=urllib.request.urlopen(url_course).read()
soup_course=BeautifulSoup(html_content_course,'lxml')
for j in soup_course.find_all('td'):
print(j.get_text(strip=True))
印刷:
2019-2020 Yaz Dönemi AKM Kodlu Derslerin Ders Programı
...
TA贡献1816条经验 获得超6个赞
问题是在末尾get_text()给出空格并发送带有此空格的 url - 服务器找不到带有空格的文件。'AKM 'requests'AKM '
我用><字符串'>{}<'.format(param)来显示这个空间 - >AKM <- 因为没有><它似乎没问题。
代码需要get_text(strip=True)或get_text().strip()删除这个空格。
import requests
from bs4 import BeautifulSoup
url = 'http://www.sis.itu.edu.tr/tr/ders_programlari/LSprogramlar/prg.php'
html_content = requests.get(url).text
soup = BeautifulSoup(html_content, 'lxml')
url_course_main = 'http://www.sis.itu.edu.tr/tr/ders_programlari/LSprogramlar/prg.php?fb='
param = soup.find_all('option')[1].get_text()
print('>{}<'.format(param)) # I use `> <` to show spaces
param = soup.find_all('option')[1].get_text(strip=True)
print('>{}<'.format(param)) # I use `> <` to show spaces
url_course = url_course_main + param
html_content_course = requests.get(url_course).text
soup_course = BeautifulSoup(html_content_course, 'lxml')
for j in soup_course.find_all('td'):
print(j.get_text())
添加回答
举报