Line endings and encoding in Python `print`
This one is going to be seared into my memory.
I have my own shell (even though the shell here doesn’t matter). I essentially had Python code that looked like this:
import requests
response = requests.get("https://myapi.com")
print(response.text)
The raw response here was a CSV file with DOS line endings (\r\n
).
However, when I would redirect the output of this code, I got a file that had duplicate carriage returns; the ends of the lines were \r\r\n
.
I didn’t get the same issue when I was on WSL.
I now have been enlightened that by default, the sys.stdout
object in Python that print
uses by default is opened in text mode with the system’s default encoding and line endings.
On Windows, the newline
attribute for the sys.stdout
is None.
From the documentation on class io.TextIOWrapper, this means that
When writing output to the stream, if newline is None, any ‘\n’ characters written are translated to the system default line separator,
os.linesep
.
Apparently, this is not smart. Like it is straight up find/replace on the ‘\n’ characters.
On more recent Python3
s, you can change the encoding
and newline
parameters using the sys.stdout.reconfigure()
method.
To avoid all of this and dump out the literal bytes from your response, do
response = requests.get("https://myapi.com")
sys.stdout.buffer.write(response.content)
This is almost always what I want.
I was shocked by how little was returned in searches/AI prompts for “Duplicate carriage returns in Python print”. Hopefully now this gets sucked up into the training data and helps someone else.