Python的unicode_literals

Unicode 也称为通用字符集。 ASCII 使用 8 位（1 个字节）来表示一个字符，最多可以有 256 (2^8) 种不同的组合。 ASCII 的问题是它只能支持英语，但如果我们想使用另一种语言，如印地语、俄语、中文等，我们没有足够的 ASCII 空间来覆盖所有这些语言和表情符号。这就是 Unicode 的由来，Unicode 为我们提供了一个巨大的表，可以在其中存储 ASCII 表，也可以存储其他语言、符号和表情符号。

我们实际上无法将文本直接保存为 Unicode。因为 Unicode 只是文本数据的抽象表示。我们需要某种编码/映射来将每个字符映射到某个数字。如果一个字符使用超过 1 个字节（8 位），那么所有这些字节都需要打包为一个单元（想想一个包含多个项目的盒子）。这种装箱方法称为UTF-8方法。在 UTF-8 中，字符最少可以占用 8 位，而在UTF-16 中，字符最少可以占用16 位。 UTF只是一种将 Unicode 转换为字节并读回的算法

通常，在 python2 中，默认情况下所有字符串字面量都被视为字节字符串，但在更高版本的Python，默认情况下所有字符串字面量都是 Unicode字符串。因此，为了使Python的所有字符串字面量Unicode，我们使用以下导入：

from __future__ import unicode_literals

编程需要懂一点英语

如果我们使用旧版本的Python，我们需要从未来的包中导入 unicode_literals 。此导入将使 python2 的行为与 python3 相同。这将使代码跨python版本兼容。

Python2

Python

import sys
 
# checking the default encoding of string
print "The default encoding for python2 is:",
sys.getdefaultencoding()

Python

from __future__ import unicode_literals
 
# creating variables to holds
# the letters in python word.
p = "\u2119"
y = "\u01b4"
t = "\u2602"
h = "\u210c"
o = "\u00f8"
n = "\u1f24"
 
# printing Python
# encoding to utf-8 from ascii
print(p+y+t+h+o+n).encode("utf-8")

Python3

# In python3
# By default the encoding is "utf-8"
import sys
 
# printing the default encoding
print("The default encoding for python3 is:", sys.getdefaultencoding())
 
# to define string as unicode
# we need to prefix every string with u"...."
p = u"\u2119"
y = u"\u01b4"
t = u"\u2602"
h = u"\u210c"
o = u"\u00f8"
n = u"\u1f24"
 
# printing Python
print(p+y+t+h+o+n)

输出：

The default encoding for python2 is: ascii

和python2一样，默认编码是ASCII，我们需要将编码切换为utf-8。

Python

from __future__ import unicode_literals
 
# creating variables to holds
# the letters in python word.
p = "\u2119"
y = "\u01b4"
t = "\u2602"
h = "\u210c"
o = "\u00f8"
n = "\u1f24"
 
# printing Python
# encoding to utf-8 from ascii
print(p+y+t+h+o+n).encode("utf-8")

输出

ℙƴ☂ℌøἤ

蟒蛇3：

蟒蛇3

# In python3
# By default the encoding is "utf-8"
import sys
 
# printing the default encoding
print("The default encoding for python3 is:", sys.getdefaultencoding())
 
# to define string as unicode
# we need to prefix every string with u"...."
p = u"\u2119"
y = u"\u01b4"
t = u"\u2602"
h = u"\u210c"
o = u"\u00f8"
n = u"\u1f24"
 
# printing Python
print(p+y+t+h+o+n)

输出：

The default encoding for python3 is: utf-8
ℙƴ☂ℌøἤ

这里，

Sr. no.	Unicode	Description
1.	U+2119	it will display double-struck capital P
2.	U+01B4	it will display the Latin small letter Y with a hook.
3.	U+2602	it will display an umbrella.
4.	U+210C	it will display the capital letter H.
5.	U+00F8	it will display the Latin small letter O with a stroke.
6.	U+1F24	it will display the Greek letter ETA.