如何在Python中从URL下载文件？详细实现示例

2021年11月16日18:12:23 发表评论 1,546 次浏览

Python如何从URL下载文件？本文带你了解如何使用请求和 tqdm 库使用 Python 构建带有进度条的强大文件下载器。

从 Internet 下载文件是在 Web 上执行的最常见的日常任务之一。这很重要，因为许多成功的软件都允许用户从 Internet 下载文件。

如何在Python中从URL下载文件？在本教程中，你将学习如何使用requests库在 Python 中通过 HTTP 下载文件。

相关： 如何使用 hashlib 在 Python 中使用哈希算法。

让我们开始吧，安装所需的依赖项：

pip3 install requests tqdm

我们将在这里使用tqdm模块只是为了在下载过程中打印一个漂亮的进度条。

Python从URL下载文件示例代码：打开一个新的 Python 文件并导入：

from tqdm import tqdm
import requests
import cgi
import sys

我们将从命令行参数中获取文件 URL：

# the url of file you want to download, passed from command line arguments
url = sys.argv[1]

Python如何从URL下载文件？现在我们用来从网络下载内容的方法是requests.get()，但问题是它会立即下载文件，我们不希望那样，因为它会卡在大文件上并且内存会被填满。幸运的是，我们可以设置一个属性True，即stream 参数：

# read 1024 bytes every time 
buffer_size = 1024
# download the body of response by chunk, not immediately
response = requests.get(url, stream=True)

如何在Python中从URL下载文件？现在只下载响应头并且连接保持打开状态，因此允许我们通过使用iter_content()方法来控制工作流。在我们看到它运行之前，我们首先需要检索总文件大小和文件名：

# get the total file size
file_size = int(response.headers.get("Content-Length", 0))
# get the default filename
default_filename = url.split("/")[-1]
# get the content disposition header
content_disposition = response.headers.get("Content-Disposition")
if content_disposition:
    # parse the header using cgi
    value, params = cgi.parse_header(content_disposition)
    # extract filename from content disposition
    filename = params.get("filename", default_filename)
else:
    # if content dispotion is not available, just use default from URL
    filename = default_filename

我们从Content-Length响应头中获取文件大小（以字节为单位），我们也在头中获取文件名Content-Disposition，但我们需要使用cgi.parse_header()函数解析它。

Python从URL下载文件示例代码：现在让我们下载文件。

# progress bar, changing the unit to bytes instead of iteration (default by tqdm)
progress = tqdm(response.iter_content(buffer_size), f"Downloading {filename}", total=file_size, unit="B", unit_scale=True, unit_divisor=1024)
with open(filename, "wb") as f:
    for data in progress.iterable:
        # write data read to the file
        f.write(data)
        # update the progress bar manually
        progress.update(len(data))

iter_content()方法迭代响应数据，这避免了将内容一次读入内存以获取大响应，我们指定buffer_size为它应该在每个循环中读入内存的字节数。

然后我们用一个tqdm对象包装迭代，它将打印一个漂亮的进度条。我们还将tqdm默认单位从迭代更改为字节。

之后，在每次迭代中，我们读取一个数据块并将其写入打开的文件中，并更新进度条。

Python如何从URL下载文件？这是我尝试下载文件后的结果，你可以选择任何你想要的文件，只需确保以文件扩展名（.exe、.pdf 等）结尾：

C:\file-downloader>python download.py https://download.virtualbox.org/virtualbox/6.1.18/VirtualBox-6.1.18-142142-Win.exe
Downloading VirtualBox-6.1.18-142142-Win.exe:   8%|██▍                             | 7.84M/103M [00:06<01:14, 1.35MB/s]

这是工作！

如何在Python中从URL下载文件？好的，我们已经完成了，如你所见，使用诸如requests 之类的强大库在 Python 中下载文件非常容易，你现在可以在你的 Python 应用程序中使用它，祝你好运！

以下是你可以实施的一些想法：

从网页下载所有图像。
一个 Python 脚本，用于从 Internet 下载压缩存档文件并自动提取它们。

发表评论取消回复

登录 注册 找回密码

登录注册找回密码