$data_structure = utf8::is_utf8($json)
? from_json($json)
: decode_json($json);
taken, together with the
is_character
suggestion, from otherwise very informative post: Quick note on using module JSON. I have seen similar code in many places. The idea is to check if the string you have is character data or a string of bytes and treat it appropriately. Unfortunately is_utf8
does not do that check:
use strict;
use warnings;
use utf8;
use HTML::Entities;
use JSON;
my $a = HTML::Entities::decode( ' ' );
my $json = qq{{ "a": "$a" }};
print 'is_utf8: ' . ( utf8::is_utf8( $json ) ? 'yes' : 'no' ) . "\n";
my $data_structure = utf8::is_utf8( $json )
? from_json( $json )
: decode_json( $json );
This fails (on my machine) with following output:
is_utf8: no
malformed UTF-8 character in JSON string, at character offset 8 (before "\x{8a0}" }") at a.pl line 12.
If that still is a mystery try this:
use strict;
use warnings;
use HTML::Entities;
use Devel::Peek;
Dump( HTML::Entities::decode( ' ' ) );
the output (on my machine) is:
SV = PV(0x24f2090) at 0x24f3de8
REFCNT = 1
FLAGS = (TEMP,POK,pPOK)
PV = 0x2501620 "\240"\0
CUR = 1
LEN = 16
this string is internally encoded as "\240" i.e. "\x{0a}" which is Latin1 encoding of non-breaking space. It does not have the utf8 flag set - so the code above tries to treat it as UTF8 encoded stream of bytes and fails.
I don't know if we can have
is_character
easily - but the lack of introspection here is surely painful.