http://blog.codingmylife.com/?p=23
今天在尝试抓取起点中文网首页的时候遇到了一个问题 — 如果编码没有用对的话是没办法读取任何东西的. 这也算是C#用的太多养成的坏习惯, 以前基本没怎么考虑过编码问题. 应该说, C#里面就算编码错了, 也能读进来东西, 只是一片乱码而已. Cocoa里面就狠了点, 直接抛异常了. 下面是刚开始写的一段代码, 把起点中文网的主页下载到一个字符串中.
1
2
3
4
5
6
7
8
9
10
11
12
NSURL *url = [[NSURL alloc] initWithString:@"http://www.cmfu.com"];
NSError *error;
NSString *xml = [NSString stringWithContentsOfURL:url encoding:NSUTF8StringEncoding error:&error];
if(xml == nil)
{
NSLog(@"Error reading url at %@", [error localizedFailureReason]);
}
else
{
[result setString:xml];
}
死活下载失败, 错误信息就是编码不对. 好吧, 我打开了帮助查看了下所有的编码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
enum {
NSASCIIStringEncoding = 1,
NSNEXTSTEPStringEncoding = 2,
NSJapaneseEUCStringEncoding = 3,
NSUTF8StringEncoding = 4,
NSISOLatin1StringEncoding = 5,
NSSymbolStringEncoding = 6,
NSNonLossyASCIIStringEncoding = 7,
NSShiftJISStringEncoding = 8,
NSISOLatin2StringEncoding = 9,
NSUnicodeStringEncoding = 10,
NSWindowsCP1251StringEncoding = 11,
NSWindowsCP1252StringEncoding = 12,
NSWindowsCP1253StringEncoding = 13,
NSWindowsCP1254StringEncoding = 14,
NSWindowsCP1250StringEncoding = 15,
NSISO2022JPStringEncoding = 21,
NSMacOSRomanStringEncoding = 30,
NSProprietaryStringEncoding = 65536
};
我一个一个的试, 居然全都不行! 崩溃了, 这都什么年代了, 难道Cocoa还不支持中文? 不可能啊. 估计是上面那份文档里面只是列出了最长用的几种编码(这里是苹果认为最长用的, 可见对于中国基本是无视了, 鄙视下!), 我就写了下面这段代码输出了所有支持的编码:
1
2
3
4
5
6
7
8
9
const NSStringEncoding *encodings = [NSString availableStringEncodings];
NSMutableString *str = [[NSMutableString alloc] init];
NSStringEncoding encoding;
while ((encoding = *encodings++) != 0)
{
[str appendFormat: @"%@ === %in", [NSString localizedNameOfStringEncoding:encoding], encoding];
}
[result setString: str];
好家伙, 果然被我猜中了, 下面就是所有支持的编码列表
Western (Mac OS Roman) === 30 Japanese (Mac OS) === -2147483647 Traditional Chinese (Mac OS) === -2147483646 Korean (Mac OS) === -2147483645 Arabic (Mac OS) === -2147483644 Hebrew (Mac OS) === -2147483643 Greek (Mac OS) === -2147483642 Cyrillic (Mac OS) === -2147483641 Devanagari (Mac OS) === -2147483639 Gurmukhi (Mac OS) === -2147483638 Gujarati (Mac OS) === -2147483637 Thai (Mac OS) === -2147483627 Simplified Chinese (Mac OS) === -2147483623 Tibetan (Mac OS) === -2147483622 Central European (Mac OS) === -2147483619 Symbol (Mac OS) === 6 Dingbats (Mac OS) === -2147483614 Turkish (Mac OS) === -2147483613 Croatian (Mac OS) === -2147483612 Icelandic (Mac OS) === -2147483611 Romanian (Mac OS) === -2147483610 Celtic (Mac OS) === -2147483609 Gaelic (Mac OS) === -2147483608 Keyboard Symbols (Mac OS) === -2147483607 Farsi (Mac OS) === -2147483508 Cyrillic (Mac OS Ukrainian) === -2147483496 Inuit (Mac OS) === -2147483412 Unicode (UTF-32LE) === -1677721344 Unicode (UTF-8) === 4 Unicode (UTF-16) === 10 Unicode (UTF-16BE) === -1879047936 Unicode (UTF-16LE) === -1811939072 Unicode (UTF-32) === -1946156800 Unicode (UTF-32BE) === -1744830208 Western (ISO Latin 1) === 5 Central European (ISO Latin 2) === 9 Western (ISO Latin 3) === -2147483133 Central European (ISO Latin 4) === -2147483132 Cyrillic (ISO 8859-5) === -2147483131 Arabic (ISO 8859-6) === -2147483130 Greek (ISO 8859-7) === -2147483129 Hebrew (ISO 8859-8) === -2147483128 Turkish (ISO Latin 5) === -2147483127 Nordic (ISO Latin 6) === -2147483126 Thai (ISO 8859-11) === -2147483125 Baltic Rim (ISO Latin 7) === -2147483123 Celtic (ISO Latin === -2147483122 Western (ISO Latin 9) === -2147483121 Romanian (ISO Latin 10) === -2147483120 Latin-US (DOS) === -2147482624 Greek (DOS) === -2147482619 Baltic Rim (DOS) === -2147482618 Western (DOS Latin 1) === -2147482608 Greek (DOS Greek 1) === -2147482607 Central European (DOS Latin 2) === -2147482606 Cyrillic (DOS) === -2147482605 Turkish (DOS) === -2147482604 Portuguese (DOS) === -2147482603 Icelandic (DOS) === -2147482602 Hebrew (DOS) === -2147482601 Canadian French (DOS) === -2147482600 Arabic (DOS) === -2147482599 Nordic (DOS) === -2147482598 Cyrillic (DOS) === -2147482597 Greek (DOS Greek 2) === -2147482596 Thai (Windows, DOS) === -2147482595 Japanese (Windows, DOS) === 8 Simplified Chinese (Windows, DOS) === -2147482591 Korean (Windows, DOS) === -2147482590 Traditional Chinese (Windows, DOS) === -2147482589 Western (Windows Latin 1) === 12 Central European (Windows Latin 2) === 15 Cyrillic (Windows) === 11 Greek (Windows) === 13 Turkish (Windows Latin 5) === 14 Hebrew (Windows) === -2147482363 Arabic (Windows) === -2147482362 Baltic Rim (Windows) === -2147482361 Vietnamese (Windows) === -2147482360 Western (ASCII) === 1 Japanese (Shift JIS X0213) === -2147482072 Chinese (GBK) === -2147482063 Chinese (GB 18030) === -2147482062 Japanese (ISO 2022-JP) === 21 Korean (ISO 2022-KR) === -2147481536 Japanese (EUC) === 3 Simplified Chinese (EUC) === -2147481296 Traditional Chinese (EUC) === -2147481295 Korean (EUC) === -2147481280 Japanese (Shift JIS) === -2147481087 Cyrillic (KOI8-R) === -2147481086 Traditional Chinese (Big 5) === -2147481085 Western (Mac Mail) === -2147481084 Simplified Chinese (HZ GB 2312) === -2147481083 Traditional Chinese (Big 5 HKSCS) === -2147481082 Ukrainian (KOI8-U) === -2147481080 Traditional Chinese (Big 5-E) === -2147481079 Western (NextStep) === 2 Non-lossy ASCII === 7 Western (EBCDIC Latin 1) === -2147480574
终于看到了熟悉的 GBK 编码, 对应的代码是 -2147482063. Ok, 更改一下最开始的代码
1
2
3
4
5
6
7
8
9
10
11
12
13
NSURL *url = [[NSURL alloc] initWithString:@"http://www.cmfu.com"];
NSError *error;
NSStringEncoding encoder;
NSString *xml = [NSString stringWithContentsOfURL:url encoding:encoder=-2147482063 error:&error];
if(xml == nil)
{
NSLog(@"Error reading url at %@", [error localizedFailureReason]);
}
else
{
[result setString:xml];
}
终于搞定了! 看到熟悉的中文真是激动了.
posted under Cocoa
One Comment to
“读取任意编码的文件.”
On August 21st, 2007 at 10:05 am
Glider Says:
如果用CoreFoundation的Framework似乎就没有那么复杂了,直接用CFStringCreateWithBytes(),参数带kCFStringEncodgingGB180302000就可以了。
Name:
Email: Email will not be published
Website Address: Website example
Your Comment:
iphone入门教程 例子
需积分: 0 88 浏览量
2011-03-30
16:18:55
上传
评论
收藏 2.34MB RAR 举报
ysb0234yang
- 粉丝: 7
- 资源: 28
最新资源
- 62道Redis高频题整理(附答案背诵版).md
- 后台请求的数据解析的东西
- WinForms 绘制时钟.zip
- 工具变量巡回DID数据(2000-2022).txt
- EDA实验课设-基于FPGA设计的贪吃蛇小游戏quartus工程Verilog源码+课设文档报告.zip
- 111111111111111111111111111111
- 基于深度学习的肿瘤辅助诊断系统,以图像分割为核心,利用人工智能完成肿瘤区域的识别勾画并提供肿瘤区域的特征来辅助医生进行诊断
- EDA实验课设-基于FPGA设计的洗衣机控制器quartus工程Verilog源码+课设文档报告.zip
- ffmpeg2.tar.gz
- layer.open弹出框加载时间选择器
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
评论0