Python - 크롤링

ETC

Python - 크롤링

개발 일기92 2025. 2. 23. 17:34

파이썬을 이용한 request , beautifulSoup 예제

# 웹 사이트의 제목을 크롤링하고 추출

import requests
from bs4 import BeautifulSoup

# Define the target URL
url = "https://example.com"

# Send an HTTP request to the website
response = requests.get(url)

# Check if the request was successfulif response.status_code == 200:
    # Parse the HTML content using BeautifulSoup
    soup = BeautifulSoup(response.text, "html.parser")

    # Extract all titles (h1, h2, h3, etc.)
    for heading in soup.find_all(["h1", "h2", "h3"]):
        print(heading.get_text(strip=True))
else:
    print(f"Failed to retrieve page: {response.status_code}")

요약

requests.get (url) → 웹 페이지를 가져옴.
BeautifulSoup (response.text, "html.parser") → HTML을 구문 분석.
find_all ([ "H1", "H2", "H3"]) → 페이지에서 제목을 추출.

#Pagination
#여러 페이지 (예 : 블로그, 전자 상거래 사이트)를 크롤링 해야하는 경우 페이지를 반복

for page in range(1, 6):  # Crawl first 5 pages
    url = f"https://example.com/page/{page}"
    response = requests.get(url)
    
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, "html.parser")
        articles = soup.find_all("h2", class_="post-title")
        
        for article in articles:
            print(article.get_text(strip=True))
    else:
        print(f"Page {page} not found.")

'ETC' 카테고리의 다른 글

project - Pipe Line 구성1 (0)	2025.04.05
Kubernetes - 명령어 모음 (0)	2025.01.12
Hadoop - Webhdfs (0)	2025.01.05
압축 알고리즘 등 (0)	2024.12.28
OpenSearch (0)	2024.12.22

현재글Python - 크롤링

개발 일기92

공부기록

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

개발 일기92

Python - 크롤링

'ETC' 카테고리의 다른 글

'ETC'의 다른글

티스토리툴바

Python - 크롤링

'ETC' 카테고리의 다른 글

'ETC'의 다른글

관련글

티스토리툴바