[dw-free] For XMLRPC, clear UTF8 flag from all strings, not just subject and event
[commit: http://hg.dwscoalition.org/dw-free/rev/c1e5d7213dfc]
http://bugs.dwscoalition.org/show_bug.cgi?id=2310
Before, we were only checking utf8 encoding for subject and event. It's
better to handle all utf8 strings from the request properly, using the
Encode module.
Codemerge from LiveJournal; prepared for Dreamwidth by
denise.
Files modified:
http://bugs.dwscoalition.org/show_bug.cgi?id=2310
Before, we were only checking utf8 encoding for subject and event. It's
better to handle all utf8 strings from the request properly, using the
Encode module.
Codemerge from LiveJournal; prepared for Dreamwidth by
![[staff profile]](https://www.dreamwidth.org/img/silk/identity/user_staff.png)
Files modified:
- cgi-bin/Apache/LiveJournal.pm
-------------------------------------------------------------------------------- diff -r 99ed26f60316 -r c1e5d7213dfc cgi-bin/Apache/LiveJournal.pm --- a/cgi-bin/Apache/LiveJournal.pm Wed Jun 30 02:12:57 2010 +0800 +++ b/cgi-bin/Apache/LiveJournal.pm Tue Jun 29 14:37:20 2010 -0500 @@ -1804,6 +1804,7 @@ sub anti_squatter } package LJ::Protocol; +use Encode(); sub xmlrpc_method { my $method = shift; @@ -1818,10 +1819,9 @@ sub xmlrpc_method { } my $error = 0; if (ref $req eq "HASH") { - foreach my $key ('subject', 'event') { - # get rid of the UTF8 flag in scalars - $req->{$key} = LJ::no_utf8_flag ( $req->{$key} ) - if $req->{$key}; + # get rid of the UTF8 flag in scalars + while ( my ($k, $v) = each %$req ) { + $req->{$k} = Encode::encode_utf8($v) if Encode::is_utf8($v); } } my $res = LJ::Protocol::do_request($method, $req, \$error); --------------------------------------------------------------------------------
no subject
That said, I think you missed the part where
no subject
The thing I was missing is that encode_utf8 chains to encode which munges the SVf_UTF8 flag to be off. It was also throwing me off the scent to see that we added a is_utf8 check -- confusing me, because that changes the semantics of the loop. Before it was 'unconditionally turn this off' now it's 'turn this off if the data is already valid utf8'.
If the user submits non-utf8 data, it seems like the behavior of this loop is going to change. This may not actually affect us, though, because we might be doing checks for valid input earlier on or later... but still, it just looks like we're changing how it operates.
I didn't miss that part -- and I totally agree. But this seems like a fairly random swap from one method of accomplishing the utf8 flag removal to another method, with an added bonus of being odd code ('each' instead of iterating over 'keys'?), not following the style guide, and apparently being semantically different (with the is_utf8 check thrown in).
I fully admit that I only spent about 20 minutes looking at it. But that's all the time I had, at which point I threw an exception and asked for help to try to understand what this is doing.
no subject