Extract the values inside a tag, beautifulsoup

Asked 1 month ago
Viewed 17 times

[<td>#</td>, <td>Preview</td>, <td>Item</td>, <td>Price</td>, <td>QTY</td>, <td>Total</td>, <td>1</td>, <td><img alt="item1" src="x.jpg"/></td>, <td>
    <span class="title">xx6969</span>
    <p>item1</p>
    </td>, <td>
                                            Dollar $                                        1.15                                    </td>, <td><span>5</span></td>, <td>
                                            Dollar $                                        5.75                                    </td>, <td>2</td>, <td><img alt="item2" src="itemx.jpg/></td>, <td>
    <span class="title">xx3131</span>
    <p>itemx2</p>
    </td>, <td>
                                            Dollar $                                        1.49                                    </td>, <td><span>5</span></td>]

The numbers inside these tags are the number I want to extract,

<td><span></span></td>

I want the output to be like this:

5

5

Thanks a lot, I hope ya'll having a great day...

asked 1 month ago

Correct Answer

I'd use regex. Turned your code into text since I'm not sure how you want to read it.

Code:

import re

text = '''
    <span class="title">xx6969</span>
    <p>item1</p>
    </td>, <td>
                                            Dollar $                                        1.15                                    </td>, <td><span>5</span></td>, <td>
                                            Dollar $                                        5.75                                    </td>, <td>2</td>, <td><img alt="item2" src="itemx.jpg/></td>, <td>
    <span class="title">xx3131</span>
    <p>itemx2</p>
    </td>, <td>
                                            Dollar $                                        1.49                                    </td>, <td><span>5</span></td>'''

find_this = re.findall('<td><span>([0-9])</span>', text)

print("\n\n".join(find_this))

Output:

5

5

[Program finished]
answered 1 month ago

Other Answer

Note: Question needs some improvment so take a minute to read [ask]

Assuming your example is a ResultSet of a selection via BeautifulSoup you have to iterate it:

for e in soup.select('td'):
    if e.span and not e.span.get('class'):
        print( e.span.text) 

Output:

5
5

Better approach would be to select your elements more specific, but requires more information about HTML or source

answered 1 month ago