Strings variations in python - what do they mean?

Go To StackoverFlow.com

0

This is a complete noob question....

but why do some strings in python appear as:

{u'foobar: u'bar}

while others appear as:

{foobar: bar}

are they equivalent? How do you convert between the two?

2012-04-03 23:08
by 9-bits
As a note, all strings are Unicode strings in Python 3.x - Gareth Latty 2012-04-03 23:13
Also watch this 25 min talk by Ned Batchelder about Unicode. He explains the Unicode vs String differences very well. http://nedbatchelder.com/text/unipain.htm - Praveen Gollakota 2012-04-03 23:54


3

The u prefix means the string is Unicode..

http://docs.python.org/reference/lexical_analysis.html

Refer to section 2.4.1:

A prefix of 'u' or 'U' makes the string a Unicode string. Unicode strings use the Unicode character set as defined by the Unicode Consortium and ISO 10646. Some additional escape sequences, described below, are available in Unicode strings. A prefix of 'b' or 'B' is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A 'u' or 'b' prefix may be followed by an 'r' prefix.

As you can see, Python will be able to compare strings of various encodings automatically:

>>> a = u'Hello'
>>> b = 'Hello'
>>> c = ur'Hello'
>>> a == b
True
>>> b == c
True

You can learn more about Unicode strings in Python (as well as how to convert or encode strings) by referring to the documentation.

2012-04-03 23:13
by Mike Christensen


3

No, They are not Equivalent

The "u" that prefixes the string means it's Unicode. Unicode was designed to be an extended character set to accommodate languages that aren't English. You can read this entertaining and non-technical history of Unicode.

http://www.reigndesign.com/blog/love-hotels-and-unicode/

As Lattyware mentions, in Python 3.x, all strings are Unicode.

If you're working with Python 2.x, especially for the web, it's worth making sure that your program handles Unicode properly. Lots of people like to gripe about websites that don't support Unicode.

2012-04-03 23:19
by Josh Infiesto


2

Using u'string' defines that the string is of unicode type.

>>> type('hi')
<type 'str'>
>>> type(u'hi')
<type 'unicode'>

You can read all about it in the uncode documentation page.

2012-04-03 23:13
by veiset
Ads