使用python來調用pubmed API快速整理文獻

在pubmed上用關鍵字取得的文獻後,想要把這些文獻直接收集起來,可以使用pubmed所提供的API,可以很簡單快速的達到自己想要的資料收集方式,這邊使用python來實作:

#載入需要用到的包
import requests
import json
try:
    import xml.etree.cElementTree as ET
except ImportError:
    import xml.etree.ElementTree as ET

#第一部分,先取得目標搜尋字串,相關的UID
此部分先使用pubmed的ESearch utilities
db = "pubmed"

#要查詢的字串

query = "Hsuan-Yu Chen"
base = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/'
url = base + "esearch.fcgi?db=" + db + "&term=" + query + "&retmode=json"+ "&reldate=360&datetype=pdat" + "&retmax=100" + "&usehistory=y"
re = requests.get(url)
result = re.text
 

可以看以下,搜尋到返回的json格式文件長什麼樣子:

<pre>{
    "header": {
        "type": "esearch",
        "version": "0.3"
    },
    "esearchresult": {
        "count": "14",
        "retmax": "14",
        "retstart": "0",
        "querykey": "1",
        "webenv": "NCID_1_45838404_165.112.9.28_9001_1475480153_920070247_0MetA0_S_MegaStore_F_1",
        "idlist": [
            "27536893",
            "27480787",
            "27449093",
            "27437769",
            "27429846",
            "27384480",
            "27323831",
            "26930648",
            "26904216",
            "26859295",
            "26824984",
            "26655923",
            "26645716",
            "26580398"
        ],
        "translationset": [
            {
                "from": "Hsuan-Yu Chen",
                "to": "Chen, Hsuan Yu[Full Author Name]"
            }
        ],
        "translationstack": [
            {
                "term": "Chen, Hsuan Yu[Full Author Name]",
                "field": "Full Author Name",
                "count": "82",
                "explode": "N"
            },
            {
                "term": "2015/10/09[PDAT]",
                "field": "PDAT",
                "count": "0",
                "explode": "N"
            },
            {
                "term": "2016/10/03[PDAT]",
                "field": "PDAT",
                "count": "0",
                "explode": "N"
            },
            "RANGE",
            "AND"
        ],
        "querytranslation": "Chen, Hsuan Yu[Full Author Name] AND 2015/10/09[PDAT] : 2016/10/03[PDAT]"
    }
}
</pre>

我們只想要其中的idlist,在使用這一串UID再去下載其原本的文獻內容!

#將搜索到回傳的text,提取UID列表
data = json.loads(result)
idlist = data["esearchresult"]["idlist"]
string = ""
number = len(idlist)
lastone = idlist[number - 1]
for item in idlist:
    if item == lastone:
        string = string + item
    else:
        string = string + item + "," 

接者使用NCBU API E-utilities中的EFetch,來取得相關文獻的摘要、作者、連結等資料。
以下為pubmed,所能提供回傳的資料型態
screenshot.png


#決定要回傳的格式
retmode = "xml"
rettype = ""
re = requests.get("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=" + retmode + "&id=" + string + "&rettype=" + rettype)
#使用python 內建的xml處理函數
tree = ET.fromstring(re.text.encode("utf-8"))

#提取跟文獻iod相關的資訊
d = tree.findall("PubmedArticle/")
doi_list = []
n = -1
for item in d:
temp = len(doi_list)
if item.tag == "PubmedData":
for doii in item.findall("ArticleIdList"):

for link in doii:
if link.attrib["IdType"] == "doi":
doi_list.append(link.text)
if temp == len(doi_list):
doi_list.append("No doi")

#提取文章標題的資訊
p = tree.findall("PubmedArticle/MedlineCitation/Article/ArticleTitle")

#提取文章相關摘要的資訊
a = tree.findall("PubmedArticle/MedlineCitation/Article/Abstract/AbstractText")

#提取跟文章相關的引用資料
lastname_list = tree.findall("PubmedArticle/MedlineCitation/Article/")
name_list = []
for item in lastname_list:
if item.tag == "AuthorList":
lastnamelist = []
firstnamelist = []
name = []
for lastname in item.findall("Author/LastName"):
#print lastname.text
lastnamelist.append(lastname.text)
for firstname in item.findall("Author/ForeName"):
#print firstname.text
firstnamelist.append(firstname.text)
for number in range(len(lastnamelist)):
name.append("{second} {first}".format(first = lastnamelist[number], second = firstnamelist[number]))
name_list.append(name)
result = dict()

for numb in range(len(p)):

result[p[numb].text] = [a[numb].text, name_list[numb], doi_list[numb]]

#重整全部提取出來的文章資料,在此整理匯出的資料樣子
for item in result:
print item, "\n"

string = ""
number = len(result[item][1])
lastone = result[item][1][number - 1]

for name in result[item][1] :
if name == lastone:
string = string + name
else:
string = string + name + ", "
print string, "\n"
print result[item][2], "\n"
print result[item][0], "\n\n\n"
print "-"*90, "\n\n\n"

Implementation and Quality Control of Lung Cancer EGFR Genetic Testing by MALDI-TOF Mass Spectrometry in Taiwan Clinical Practice.

Kang-Yi Su, Jau-Tsuen Kao, Bing-Ching Ho, Hsuan-Yu Chen, Gee-Cheng Chang, Chao-Chi Ho, Sung-Liang Yu

10.1038/srep30944

Molecular diagnostics in cancer pharmacogenomics is indispensable for making targeted therapy decisions especially in lung cancer. For routine clinical practice, the flexible testing platform and implemented quality system are important for failure rate and turnaround time (TAT) reduction. We established and validated the multiplex EGFR testing by MALDI-TOF MS according to ISO15189 regulation and CLIA recommendation in Taiwan. Totally 8,147 cases from Aug-2011 to Jul-2015 were assayed and statistical characteristics were reported. The intra-run precision of EGFR mutation frequency was CV 2.15% (L858R) and 2.77% (T790M); the inter-run precision was CV 3.50% (L858R) and 2.84% (T790M). Accuracy tests by consensus reference biomaterials showed 100% consistence with datasheet (public database). Both analytical sensitivity and specificity were 100% while taking Sanger sequencing as the gold-standard method for comparison. EGFR mutation frequency of peripheral blood mononuclear cell for reference range determination was 0.002 ± 0.016% (95% CI: 0.000-0.036) (L858R) and 0.292 ± 0.289% (95% CI: 0.000-0.871) (T790M). The average TAT was 4.5 working days and the failure rate was less than 0.1%. In conclusion, this study provides a comprehensive report of lung cancer EGFR mutation detection from platform establishment, method validation to clinical routine practice. It may be a reference model for molecular diagnostics in cancer pharmacogenomics.


Purine-Type Compounds Induce Microtubule Fragmentation and Lung Cancer Cell Death through Interaction with Katanin.

Ting-Chun Kuo, Ling-Wei Li, Szu-Hua Pan, Jim-Min Fang, Jyung-Hurng Liu, Ting-Jen Cheng, Chia-Jen Wang, Pei-Fang Hung, Hsuan-Yu Chen, Tse-Ming Hong, Yuan-Ling Hsu, Chi-Huey Wong, Pan-Chyr Yang

10.1021/acs.jmedchem.6b00797

Microtubule targeting agents (MTAs) constitute a class of drugs for cancer treatment. Despite many MTAs have been proven to significantly improve the treatment outcomes of various malignancies, resistance has usually occurred. By selection from a two million entry chemical library based on the efficacy and safety, we identified purine-type compounds that were active against lung small cell lung cancer (NSCLC). The purine compound 5a (GRC0321) was an MTA with good effects against NSCLC. Lung cancer cells H1975 treated with 5a could induce microtubule fragmentation, leading to G2/M cell cycle arrest and intrinsic apoptosis. Compound 5a directly targeted katanin and regulated the severing activity of katanin, which cut the cellular microtubules into short pieces and activated c-Jun N-terminal kinases (JNK). The microtubule fragmenting effect of 5a is a unique mechanism in MTAs. It might overcome the resistance problems that most of the MTAs have faced.


對「使用python來調用pubmed API快速整理文獻」的一則回應

發表留言