अजगर में डेटा को परिवर्तित करें

Dec 08 2020

@JaSON की मदद से, यहां एक कोड है जो मुझे स्थानीय html से तालिका में डेटा प्राप्त करने में सक्षम बनाता है और कोड सेलेनियम का उपयोग करता है

from selenium import webdriver

driver = webdriver.Chrome("C:/chromedriver.exe")
driver.get('file:///C:/Users/Future/Desktop/local.html')
counter = len(driver.find_elements_by_id("Section3"))
xpath = "//div[@id='Section3']/following-sibling::div[count(preceding-sibling::div[@id='Section3'])={0} and count(following-sibling::div[@id='Section3'])={1}]"
print(counter)

for i in range(counter):
    print('\nRow #{} \n'.format(i + 1))
    _xpath = xpath.format(i + 1, counter - (i + 1))
    cells = driver.find_elements_by_xpath(_xpath)
    for cell in cells:
         value = cell.find_element_by_xpath(".//td").text
         print(value)

इन पंक्तियों को मान्य तालिका में कैसे बदला जा सकता है जिन्हें मैं सीएसवी फ़ाइल में निर्यात कर सकता हूं? यहाँ स्थानीय HTML लिंक हैhttps://pastebin.com/raw/hEq8K75C

** @ पाऊल ब्रेनन: काउंटर एडिट करने की कोशिश करने के बाद counter-1मुझे अस्थायी रूप से पंक्ति 18 की त्रुटि को छोड़ने के लिए 17 पंक्तियाँ मिलीं, मुझे फ़ाइलनाम मिला। यहाँ और आउटपुट का स्नैपशॉट मिला।

जवाब

1 PaulBrennan Dec 08 2020 at 19:56

मैंने एक साधारण आउटपुट करने के लिए आपके कोड को संशोधित किया है। यह बहुत पाइथोनिक नहीं है क्योंकि यह डेटाफ़्रेम की वेक्टरकृत रचना का उपयोग नहीं करता है, लेकिन यहां बताया गया है कि यह कैसे काम करता है। पहला सेट पंडों ने दूसरा एक डेटाफ़्रेम सेट किया (लेकिन हम अभी तक कॉलम नहीं जानते हैं) फिर पहले पास पर कॉलम सेट करें (यह समस्या पैदा करेगा यदि चर स्तंभ लंबाई हैं तो डेटाफ़्रेम में मान इनपुट करें

import pandas as pd
from selenium import webdriver

driver = webdriver.Chrome("C:/chromedriver.exe")
driver.get('file:///C:/Users/Future/Desktop/local.html')
counter = len(driver.find_elements_by_id("Section3"))
xpath = "//div[@id='Section3']/following-sibling::div[count(preceding-sibling::div[@id='Section3'])={0} and count(following-sibling::div[@id='Section3'])={1}]"
print(counter)

df = pd.Dataframe()

for i in range(counter):
    print('\nRow #{} \n'.format(i + 1))
    _xpath = xpath.format(i + 1, counter - (i + 1))
    cells = driver.find_elements_by_xpath(_xpath)
    if i == 0:
        df = pd.DataFrame(columns=cells) # fill the dataframe with the column names
    for cell in cells:
        value = cell.find_element_by_xpath(".//td").text
        #print(value)
        if not value:  # check the string is not empty
            # always puting the value in the first item
            df.at[i, 0] = value # put the value in the frame

df.to_csv('filename.txt') # output the dataframe to a file

यह बेहतर कैसे बनाया जा सकता है कि एक पंक्ति में आइटमों को एक शब्दकोश में रखा जाए और उन्हें डेटफ्रेम में डाल दिया जाए। लेकिन मैं इसे अपने फोन पर लिख रहा हूं इसलिए मैं इसका परीक्षण नहीं कर सकता।

YasserKhalil Dec 11 2020 at 11:32

@Paul ब्रेनन की बड़ी मदद से, मैं कोड को संशोधित कर सका ताकि अंतिम वांछित आउटपुट प्राप्त कर सकूं

import pandas as pd
from selenium import webdriver

driver = webdriver.Chrome("C:/chromedriver.exe")
driver.get('file:///C:/Users/Future/Desktop/local.html')
counter = len(driver.find_elements_by_id("Section3"))
xpath = "//div[@id='Section3']/following-sibling::div[count(preceding-sibling::div[@id='Section3'])={0} and count(following-sibling::div[@id='Section3'])={1}]"
finallist = []

for i in range(counter):
    #print('\nRow #{} \n'.format(i + 1))
    rowlist=[]
    _xpath = xpath.format(i + 1, counter - (i + 1))
    cells = driver.find_elements_by_xpath(_xpath)
    #if i == 0:
        #df = pd.DataFrame(columns=cells) # fill the dataframe with the column names
    for cell in cells:
        try:
            value = cell.find_element_by_xpath(".//td").text
            rowlist.append(value)
        except:
            break
    finallist.append(rowlist)
    
df = pd.DataFrame(finallist)
df[df.columns[[2, 0, 1, 7, 9, 8, 3, 5, 6, 4]]]

कोड अब अच्छी तरह से काम करता है लेकिन यह बहुत धीमा है। क्या इसे तेजी से बनाने का कोई तरीका है?