1 回答
TA贡献1993条经验 获得超5个赞
不使用 NCBI 的 REST API,
import time
from bs4 import BeautifulSoup
from selenium import webdriver
# Opens a firefox webbrowser for scrapping purposes
browser = webdriver.Firefox(executable_path=r'your\path\geckodriver.exe') # Put your own path here
# Allows you to load a page completely (with all of the JS)
browser.get('https://www.ncbi.nlm.nih.gov/ipg/?term=WP_000177210.1')
# Delay turning the page into a soup in order to collect the newly fetched data
time.sleep(3)
# Creates the soup
soup = BeautifulSoup(browser.page_source, "html")
# Gets all the links by filtering out ones with just '/nuccore' and keeping ones that include '/nuccore'
links = [a['href'] for a in soup.find_all('a', href=True) if '/nuccore' in a['href'] and not a['href'] == '/nuccore']
笔记:
你需要这个包
selenium
您需要安装GeckoDriver
- 1 回答
- 0 关注
- 99 浏览
添加回答
举报