Decoding error receiving messages

I’ve written a chat bot and it has been working fine until recently. I am not sure what has been added to the messages but I have been getting decode errors when receiving messages. I’ve written my bot in python and the error line is this one:
data = socket.recv(2048).decode(‘UTF-8’)
The errors I’m getting are:
‘utf-8’ codec can’t decode byte _ in position _ : unexpected end of data/ invalid start of data
I didn’t start getting these til recently this bot worked perfectly fine in December when I playing around with Twitch IRC bots.
Did something change in the messages where not everything is UTF-8 anymore.
Any help would be appreciated thank you.

We have not changed any encoding within the last 2 years afaik, and UTF-8 should encompass everything (I think). Can you spit out raw lines in like UTF-32 to see what’s up?

I wonder if the issue is caused by getting incomplete outputs from the IRC server. I’m not quite sure.
I’ll show a output of raw lines in UTF-32 when joining a channel:

error: ‘utf-32-le’ codec can’t decode bytes in position 0-3: code point not in range(0x110000)
error: ‘utf-32-le’ codec can’t decode bytes in position 0-3: code point not in range(0x110000)
error: ‘utf-32-le’ codec can’t decode bytes in position 0-3: code point not in range(0x110000)
error: ‘utf-32-le’ codec can’t decode bytes in position 0-3: code point not in range(0x110000)
error: ‘utf-32-le’ codec can’t decode bytes in position 0-3: code point not in range(0x110000)

Here’s UTF-8 of the same channel join:

:tmi.twitch.tv 001 sato_chat :Welcome, GLHF!
:tmi.twitch.tv 002 sato_chat :Your host is tmi.twitch.tv
:tmi.twitch.tv 003 sato_chat :This server is rather new
:tmi.twitch.tv 004 sato_chat :-
:tmi.twitch.tv 375 sato_chat :-
:tmi.twitch.tv 372 sato_chat :You are in a maze of twisty passages, all alike.
:tmi.twitch.tv 376 sato_chat :>
:tmi.twitch.tv CAP * ACK :twitch.tv/membership
:tmi.twitch.tv CAP * ACK :twitch.tv/tags
:tmi.twitch.tv CAP * ACK :twitch.tv/commands
:sato_chat!sato_chat@sato_chat.tmi.twitch.tv JOIN #dansgaming
@badges=;color=;display-name=sato_chat;emote-sets=0;mod=0;subscriber=0;user-type= :tmi.twitch.tv USERSTATE #dansgaming
@broadcaster-lang=;emote-only=0;followers-only=20;r9k=0;room-id=7236692;slow=5;subs-only=0 :tmi.twitch.tv ROOMSTATE #dansgaming
:sato_chat.tmi.twitch.tv 353 sato_chat = #dansgaming :ghentbot gibbed 9steven ascothero miturner moobot fur3x dansgaming analyticsbot
:sato_chat.tmi.twitch.tv 353 sato_chat = #dansgaming :sato_chat
:sato_chat.tmi.twitch.tv 366 sato_ch

Can you provide the raw binary bytes off the socket?

I’m going to add that I suspect you’ve received exactly 2048 bytes, and that the last X bytes are a partial code point.
As an example, © under UTF-8 is C6A9. And if the final byte in what you just received is C6?
b’\xc6’.decode(‘utf-8’)
UnicodeDecodeError: ‘utf8’ codex can’t decode byte 0xc6 in position 0: unexpected end of data

You need to use something like the codec module’s incremental decoder so it can join one socket read to another and recognize a stub code point split between recv calls.

The incremental decoder fixed everything after running test yesterday. Thank you so much.