How to use curl in Python

How to use curl in Python
Posted on
Jun 22, 2022

Almost every person who uses the Internet works with cURL invisibly on a daily basis.

Because of its flexibility and freeness, cURL is widely used everywhere: from cars and TVs to routers and printers.

What is cURL?

CURL is a tool used to transfer data to and from a web server, and to make various types of data requests over different data protocols:

  1. HTTP and HTTPS. Designated to transfer text data between the client and the server. The main difference between them is that HTTPS has encryption of the transmitted data.
  2. FTP, FTPS, and SFTP. Designated to transfer files over a network. FTPS is a secure file transfer protocol that uses SSL/TLS technologies to encrypt its communication channels. SFTP is a protocol that transfers files using SSH technology.
  3. IMAP and IMAPS (IMAP over SSL) - application layer protocol for email access.
  4. POP3 and POP3S (POP3 over SSL) - protocol for receiving email messages.
  5. SMB - application layer network protocol for remote access to files, printers, and other network resources.
  6. SCP - protocol for copying files between computers using encrypted SSH as transport.
  7. TELNET - network protocol for remote access to a computer using a command interpreter.
  8. GOPHER - network protocol for distributed search and transfer of documents.
  9. LDAP and LDAPS (LDAP over SSL) - protocol used to authenticate directory services.
  10. SMTP and SMTPS (SMTP over SSL) - network protocol for e-mail transfer.

Also, cURL supports HTTPS authentication, HTTP post, FTP upload, proxy, cookies, and username + passwords.

CURL is a cross-platform command-line utility, so it can be used on any operating system. To check if cURL is installed, go to the cmd (command line) and type curl -V:

C:\Users\Admin>curl -V
curl 7.79.1 (Windows) libcurl/7.79.1 Schannel
Release-Date: 2021-09-22
Protocols: dict file ftp ftps http https imap imaps pop3 pop3s smtp smtps telnet tftp
Features: AsynchDNS HSTS IPv6 Kerberos Largefile NTLM SPNEGO SSL SSPI UnixSockets

C:\Users\Admin>

For example, to get HTML code one can write using command-line tool:

curl example.com

The result:

C:\Users\Admin>curl example.com
<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;

    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>
</head>

<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
    <p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
</html>

C:\Users\Admin>

To write a web page scraper on cURL one can use our API which helps to scrape pages. Just fill in the required fields, and then use as needed: either run from the site or paste the code into the program.

Requests Builder
Requests Builder

However, as a rule, without such tools, cURL is not enough because it necessary as a part of the Python program. Therefore, an API to the cURL functionality, libCURL library, was created. For Python, there is a wrapper for libCURL called pyCURL. So, pyCURL is a curl to Python.

How to Use cURL in Python

To begin with, it is worth saying that there are many services that can translate cURL commands into code. This option is suitable for those who already have experience in writing commands, but who need program code.

However, it is worth noting that such services usually use the standard library for requests on Python - Requests library. And, although this approach is somewhat limiting, it may suit some.

Another way is to use PycURL and write program code oneself.

PycURL Library

As was mentioned, PycURL being a thin wrapper above libcURL inherits all the features of libcURL. For example, PycURL is extremely fast (much faster than Requests, which is a Python library for HTTP requests), has multiprotocol support, and also contains sockets to support network operations. Moreover, PycURl supports the same data protocols as a cURL.

To install PycURL go to cmd and write the next command:

pip install pycurl

If there are problems with installation by command pip install, one can go to the official PycURL site, where the latest version of installation files are located.

GET request

The simplest code sample of using PycURL is getting data with a GET request. To do this, it will be necessary to connect one more module - BytesIO (a stream that uses a buffer of bytes in memory).

import pycurl 
from io import BytesIO

After that, one needs to declare the objects used:

b_obj = BytesIO()
crl = pycurl.Curl()

And set the URL:

crl.setopt(crl.URL, 'https://example.com/get)

Then open transfer, get data, and display it:

# To write bytes using charset utf 8 encoding
crl.setopt(crl.WRITEDATA, b_obj) 
# Start transfer 
crl.perform() 
# End curl session 
crl.close() 
# Get the content stored in the BytesIO object (in byte characters) 
get_body = b_obj.getvalue() 
# Decode the bytes and print the result 
print('Output of GET request:\n%s' % get_body.decode('utf8'))

Remember, that JSON data will be displayed. If there are errors during the upload process, the response code will be returned. For example, status code 404 means the page wasn't found.

POST request

POST Request allows sending data to the server. It's just like GET is an HTTP request. There are two different ways to send data using the POST method: sending text data and sending a file.

Firstly, import Pycurl and urllib for encoding and declare the objects used:

import pycurl
from urllib.parse import urlencode

crl = pycurl.Curl()
crl.setopt(crl.URL, 'https://example.com/post')

Then set HTTP request method to POST and data to send in the request body:

post_data = {'field': 'value'}
postfields = urlencode(post_data)
crl.setopt(crl.POSTFIELDS, postfields)

And the last step - perform POST:

crl.perform()
crl.close()

Sending data from the physical file is similar:

import pycurl

crl = pycurl.Curl()
crl.setopt(crl.URL, 'https://example.com/post')

crl.setopt(crl.HTTPPOST,[('fileupload',(clr.FORM_FILE, __file__, )),])

clr.perform()
clr.close()

If the file data is in memory, one can use BUFFER/BUFFERPTR in his code:

clr.setopt(clr.HTTPPOST, [('fileupload', (clr.FORM_BUFFER, 'readme.txt',
        clr.FORM_BUFFERPTR, 'This is a readme file',  )), ])

Another code will be the same.

PUT request

A PUT request is like a POST request. Their difference is that PUT can be used to upload a file in the body of the request. At the same time, PUT can be used both to create and overwrite a file at a given address. When using PUT with PycURL, it is important to remember that the file must be open at the time of transfer.

This part is similar to the POST method:

import pycurl

clr = pycurl.Curl()
clr.setopt(clr.URL, 'https://example.com/put')

Then one needs to open and read the file:

clr.setopt(clr.UPLOAD, 1)
file = open('body.json')
clr.setopt(clr.READDATA, file)

After that transferring data can be started:

clr.perform()
clr.close()

And only at the end, the file can be closed:

file.close()

DELETE request

And the last example is an HTTP DELETE request. It sends a request to delete the target resource to the server:

import pycurl 

crl = pycurl.Curl() 
crl.setopt(crl.URL, "http://example.com/items/item34") crl.setopt(crl.CUSTOMREQUEST, "DELETE")

crl.perform() 
crl.close()

Downloading files

Sometimes it happens that the received data need to be written to a file. For this, one can use the same code as for transferring data from a file, with one exception - function setopt uses not READDATA, but WRITEDATA:

crl.setopt(crl.WRITEDATA, file)

Using Proxy in PycURL

CURL proxy is a curl utility key that allows one to send an HTTP request indirectly, through a proxy server. In other words, this is an indispensable thing for web scraping.

The proxy setting is relevant for parsing a large amount of data. By sending hundreds of requests per minute from a single IP address, there is a chance of getting blocked.

At the server level, protection is activated to prevent DoS attacks. Using different cURL proxies will solve this problem and allow to scrape data without the risk of blocking.

To use proxy one needs to install certifi library:

pip install certifi

But in most cases users already have it because it is a built-in library:

C:\Users\Admin>pip install certifi
Requirement already satisfied: certifi in c:\users\admin\appdata\local\programs\python\python310\lib\site-packages (2022.5.18.1)

C:\Users\Admin>

To use it import certifi in the project:

import certifi
Python 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import certifi
>>> _

To use it in the program first of all import all libraries:

import pycurl
from io import BytesIO
import certifi

And then set proxy:

def set_proxy(self, proxy):
     if proxy:
         logger.debug('PROXY SETTING PROXY %s', proxy)
         self.get_con.setopt(pycurl.PROXY, proxy)
         self.post_con.setopt(pycurl.PROXY, proxy)
         self.put_con.setopt(pycurl.PROXY, proxy)   

Conclusion and Takeaways

CURL is a handy query utility that supports most transfer protocols. The LibcURL API was created for its use in one's own programs. And for use in Python was created a thin wrapper above libcURL, which is called PycURL.

With the help of this library, it is possible to use all requests and work with all protocols supported by cURL. At the same time, the PycURL library is much faster than its Python analog, the Requests library.

Tired of getting blocked while scraping the web?

Try out Web Scraping API with proxy rotation, CAPTCHA bypass, and Javascript rendering.

  • 1,000 Free API Credits
  • No Credit Card Required
  • 30-Day Trial
Try now for free

Get structured data in the format you need!

We offer customized web scraping solutions that can provide any data you need, on time and with no hassle!

  • Regular, custom data delivery
  • Pay after you receive sample dataset
  • A range of output formats
Get a Quote
Valentina Skakun
Valentina Skakun
Valentina Skakun’s profile

I'm a technical writer who believes that data parsing can help in getting and analyzing data. I'll tell about what parsing is and how to use it.

Request a Quote

Tell us more about you and your project information.