NicePlayer是一套免费好用的播放器_好用的媒体播放器资源-CSDN文库

共849个文件

nib：168个

h：155个

m：127个

版权申诉

应用工具

95 浏览量 2023-06-11 10:03:09 上传评论收藏 5.7MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

NicePlayer是一套免费好用的播放器（849个子文件）

AppscriptDemo.1 3KB

AttachToMail.1 3KB

CallScript.1 3KB

NicePlayer.app.ad 42KB

NicePlayer.app.ad 35KB

Remove NicePlayer support files.app 13KB

applet 13KB

SendThreadSafe.c 15KB

md5.c 12KB

nsMBCSSM.cpp 24KB

JpCntx.cpp 19KB

LangCyrillicModel.cpp 18KB

LangBulgarianModel.cpp 13KB

LangGreekModel.cpp 13KB

LangHungarianModel.cpp 13KB

LangHebrewModel.cpp 12KB

LangThaiModel.cpp 12KB

nsEscSM.cpp 10KB

nsUniversalDetector.cpp 8KB

nsHebrewProber.cpp 8KB

nsSBCSGroupProber.cpp 6KB

nsLatin1Prober.cpp 6KB

nsMBCSGroupProber.cpp 6KB

nsUdetXPCOMWrapper.cpp 6KB

nsCharSetProber.cpp 4KB

nsSBCharSetProber.cpp 4KB

nsUniversalCharDetModule.cpp 4KB

CharDistribution.cpp 4KB

nsEscCharsetProber.cpp 3KB

UniversalChardetTest.cpp 3KB

nsSJISProber.cpp 3KB

nsEUCJPProber.cpp 3KB

nsGB2312Prober.cpp 3KB

nsEUCKRProber.cpp 3KB

nsEUCTWProber.cpp 3KB

nsBig5Prober.cpp 3KB

nsUTF8Prober.cpp 3KB

full.css 2KB

.cvsignore 30B

.cvsignore 9B

UniversalCharsetDetection.doc 189KB

application_architecture.gif 29KB

aemreferenceinheritance.gif 9KB

application_architecture2.gif 5KB

relationships_example.gif 5KB

ruby_to_itunes_event.gif 4KB

finder_to_textedit_event.gif 4KB

.gitmodules 239B

SEReferenceGlue.h 17KB

SEConstantGlue.h 13KB

MLReferenceGlue.h 10KB

nsHebrewProber.h 8KB

MLConstantGlue.h 8KB

specifier.h 8KB

CharDistribution.h 8KB

STSharedEnum.h 7KB

NiceWindow.h 7KB

SendThreadSafe.h 6KB

Preferences.h 6KB

reference.h 6KB

NiceDocument.h 5KB

nsSBCharSetProber.h 5KB

Defines.h 5KB

PreferencesController.h 5KB

NSArray-STEnumAdditions.h 5KB

constant.h 4KB

JpCntx.h 4KB

TEReferenceGlue.h 4KB

event.h 4KB

TCallScript.h 4KB

NPMovieView.h 4KB

RemoteControl.h 4KB

NPMovieProtocol.h 4KB

SECommandGlue.h 4KB

nsCodingStateMachine.h 3KB

NiceController.h 3KB

nsUdetXPCOMWrapper.h 3KB

types.h 3KB

md5.h 3KB

NiceDocumentScripting.h 3KB

nsCharSetProber.h 3KB

nsPkgInt.h 3KB

terminology.h 3KB

AEDescUtils.h 3KB

codecs.h 3KB

RSS.h 3KB

NPApplication.h 3KB

nsEUCJPProber.h 3KB

nsSJISProber.h 3KB

nsMBCSGroupProber.h 3KB

NSDictionary-STEnumAdditions.h 3KB

nsGB2312Prober.h 3KB

MLCommandGlue.h 3KB

nsEUCTWProber.h 3KB

nsEUCKRProber.h 3KB

共 849 条

A Composite approach to Language/Encoding Detection

International Unicode Conference 1 San Jose, September 2001

A composite approach to language/encoding detection

Shanjian Li (shanjian@netscape.com)

Katsuhiko Momoi (momoi@netscape.com)

Netscape Communications Corp.

(Please do not distribute this paper before 09/15/2001. Thanks!)

1. AbstractSummary:

This paper presents three types of auto-detection methods to determine encodings of documents without

explicit charset declaration. We discuss merits and demerits of each method and propose a composite

approach in which all 3 types of detection methods are used in such a way as to maximize their strengths

and complement other detection methods. We argue that auto-detection can play an important role in

helping transition browser users from frequent uses of a character encoding menu into a more desirable

state where an encoding menu is rarely, if ever, used. We envision that the transition to the Unicode would

have to be transparent to the users. Users need not know how characters are displayed as long as they are

displayed correctly -- whether it’s a native encoding or one of Unicode encodings. Good auto-detection

service could would help significantly in this effort as it takes most encoding issues out of the user’s

concerns.

2. Background:

Since the beginning of the computer age, many encoding schemes have been created to processing certain

language and languages for certain region. With the trend of global village, and especially the development

of Internet, Information exchange across regions become more andrepresent various writing

scripts/characters for computerized data. With the advent of globalization and the development of the

Internet, information exchanges crossing both language and regional boundaries are becoming ever more

important. But the existence of multiple coding schemes presents a established a bigsignificant barrier.

Thedevelopment of Unicode provideshas provided a universal coding scheme, but it could not replace

existing coding scheme for many reasons. Manyhas not so far replaced existing regional coding schemes

for a variety of reasons. This, in spite of the fact that many W3C and IETF recommendations list UTF-8 as

the default encoding, e.g. XML, XHTML, RDF, etc. Thus, today's global software applications haveare

required to handle multiple encodings in addition to supporting Unicode.

This work is done in the context to developing Internet browser. Today’s Internet is full of web pages in

various languages using various encoding. A lot of effort have been put into browser development to

handle web pages in various encoding, but inThe current work has been conducted in the context of

developing an Internet browser. To deal with a variety of languages using different encodings on the web

today, a lot of efforts have been expended. In order to get the correct display result, browser’s rely on http

server, html author or end user to provide the correctbrowsers should be able to utilize the encoding

information inprovided by http servers, web pages or end users via a character encoding menu. order to

interpret the text data correctly. Unfortunately, this piecetype of information is missing in many http server

and/or html pages, and manyfrom many http servers and web pages. Moreover, most average users are

unable to provide this piece of information. Withoutinformation via manual operation of a character

encoding menu. WBut without this charset information,the web pages are sometimes displayed inas

‘garbage’ characters, and users are rejected from accessing thatunable to access the desired information.

This also leadsto many users to believeconclude that their browser is not functioning well. To auto-

detectmal-functioning or buggy.

As more Internet standard protocols designate Unicode as the default encoding, there will undoubtedly be a

significant shift toward the use of Unicode on web pages. Good universal auto-detection can make an

A Composite approach to Language/Encoding Detection

International Unicode Conference 2 San Jose, September 2001

important contribution toward such a shift if it works seamlessly without the user ever having to use an

encoding menu. Under such a condition, gradual shift to Unicode could be painless and without noticeable

effects on web users since for users, pages simply display correctly without them doing anything or paying

attention to an encoding menu. in this situation thus becomes very important.Such a smooth transition

could be aided by making encodings issues less and less noticeable to the users. Auto-detection would play

an important role for such a scenario.

Problem Scope:

3.1. General Schema

Let us begin with a general schema. For most applications, the following represents a general framework of

auto-detection use:

� �

An application/program takes the returned result(s) from an auto-detector and then uses this information for

a variety of purposes such as setting the encoding for the data, displaying the data as intended by the

original creator, pass it on to other programs, and so on.

The auto-detection methods discussed in this paper use an Internet Browser application as an example.

These auto-detection methods, however, can be easily adapted for other types of applications.

3.2. Browser and auto-detection

Browsers may use certain detection algorithms to auto-detect the encoding of web pages. A program can

potentially interpret a piece of text in any number of ways assuming different encodings, but except in

some extremely rare situations, only one interpretation is desired by the page'’s author. , Tand this is

normally thenormally the only reasonable way for the browser user to see that page correctly in the

intended language.

To list major factors in designing an auto-detection algorithm, we begin with might macke certain

assumptions about input text and approaches to them, i.e. web pages. . Taking For example,web page data

as an example,

1,1. Input text is composed ofreadable words/sentences to human reader. readable to readers of a particular

language. (= The data is not gibberish.)

2,2. Input text is from typical webpages that you could meet on internet everyday.web pages on the Internet.

(= The data is usually not from some dead or ancient language.)

3. The input text may contains certain noise which has no relation with its encoding, like some HTML tags,

English words,contain extraneous noises which have no relation to its encoding, e.g. HTML tags, non-

native words (e.g. English words in Chinese documents), space and other format/control characters found

in a typical web page. characters.

To enumerate and cover all languages and encoding is almost a mission impossible.cover all the known

languages and encodings for auto-detection is nearly an impossible task. In the current approaches, wWe

tried to cover all popular encodings used in East Asian languages, and provided a generic model to handle

single-byte encodings at the same time. The Russian language encodings was chosen as an implementation

example of the latter type and also our test bed for single-byte encodings.

Returned results

Input Data

Auto-detector

Returns results

A Composite approach to Language/Encoding Detection

International Unicode Conference 3 San Jose, September 2001

4. Target multi-byte encodings include UTF8, Shift-JIS, EUC-JP, GB2312, Big5, EUC-TW, EUC-KR,

ISO2022-XX, and HZ.

5. Providing a generic model to handle single-byte encodings, andencodings – Russian language encodings

(KOI8-R, ISO8859-5, window1251, Mac-cyrillic, ibm866, ibm855) are covered in a test bed and as an

implementation example.

4. Three Methods of Auto-detectionBasic idea, Coding scheme,

Character Distribution and Precedence Distribution:

4.1. Introduction:

In this section, we discuss 3 different methods used infor detecting the encoding of text data. They are 1)

Coding scheme method, 2) Character Distribution, and 3) 2-Char Sequence Distribution. Each one has its

strengths and weaknesses used onby its ownself, but if we use all 3 in a complementary manner, the results

can be quite satisfying.

4.2. Coding Scheme Method:

This method is probably the most obvious and the one most often tried first for multi-byte encodings. In

any of the multi-byte encoding coding schemes, not all possible code points are used. If an illegal byte or

byte sequence (i.e. unused code point) is meetencountered when verifying a certain encoding, we can

immediately conclude that this is not the right guess. A small number of code points are also specific to a

certain encoding, and that fact can lead to an immediate positive conclusion. Frank Tang (Netscape

Communications) developed a very efficient algorithm to detecting character set using coding scheme

through a parallel state machine. His basic idea is :

For each coding scheme, a state machine is implemented to verify a byte sequence for this particular

encoding. For each byte the detector receives, the detectorit will feed that byte to every active state machine

available, -- one byte at a time.. The state machine changes its state based on its previous state and the byte

it receive. Among all states a state machine might reach, there are 3 states are of detector’s interest receives.

There are 3 states in a state machine that are of interest to an auto-detector:

▪ START state-: Thiswhich is the state to start with, or acharacter’s legal byte sequence (i.e. a valid

code point) for character has been identified.

▪ ME state: - whichTthis indicates that the state machine identified a byte sequence that is specific to the

charset it is designed for and that there is no other possible encoding which can contain this byte

sequence. This will to lead to an immediate positive answer for the detector.

▪ ERROR state: This -which indicates the state machine identified an illegal byte sequence for that

encoding. This will lead to an immediate negative answer for this encoding. Detector will exclude this

encoding from nowconsideration from here on.

In a typical example, eventually one state machine will eventually provide a positive answer and all others

will provide a negative answer.

The version of PSM (Parallel State Machine) used in the current work is a modification of Frank Tang's

original work. Whenever a state machine reaches the START state, meaning it has successfully identified a

legal character, we can query the state machine to see how many bytes this character has. This information

is used in 2 ways.

� First, for UTF-8 encoding, if several multi-byte characters are identified, itthe input data is a very

unlikely to beother encodinganything other than UTF-8. So we count the number of multi-byte

characters identified by the UTF-8 state machine. When it reaches a certain number (= the

threshold), conclusion is made.

A Composite approach to Language/Encoding Detection

International Unicode Conference 4 San Jose, September 2001

� Second, for other multi-byte encodings, this information is fed to Character Distribution analyzer

(see below) so that , so that no more judgement is needed inside the Coding Scheanalyzer can deal

with character data rather than raw data.me analyzer.

4.3. Character Distribution Method:

SomeIn any given language, some characters are used more often than other characters. This fact can be

used to devise a data model for each language script. This is particularly useful for languages with a large

number of characters such as Chinese, Japanese and Korean. We often hear anecdotally about such

distributional statistics, but we have not found many published results. Thus for the following discussions,

we relied mostly on our own collected data.

This is true in almost any languages.

4.3.1. Simplified Chinese:

Our A research ofresearch on 6763 Chinese characters data encoded in GB2312 shows the following

distributional resultsresult:

Table 1. Simplified Chinese Character Distribution Table

4.3.2. Traditional Chinese:

A similar researchResearch by Taiwan’s Mandarin Promotion Council conducted annually(Year??) shows

thea similar result for traditional Chinese encoded in Big5. <http://www.edu.tw:81/mandr/

Number of Most Frequent

Characters

Accumulated

Percentage

0.11713

0.29612

128

0.42261

256

0.57851

512

0.74851

1024

0.89384

2048

0.97583

4096

0.99910

Number of Most Frequent

Characters

Accumulated Percentage

0.11723

0.31983

128

0.45298

256

0.61872

512

0.79135

1024

0.92260

2048

0.98505

4096

0.99929

67636763

1.00000

A Composite approach to Language/Encoding Detection

International Unicode Conference 5 San Jose, September 2001

Table 2. Traditional Chinese Character Distribution Table

4.3.3. Japanese:

We collected our own data for Japanese, then wrote a utility to analyze them. The following table shows

the results:

We could not find any work aboutpublished work on Japanese and Korean language yet.languages. We

therefore collected somenetwork materials in these 2 languages and wrote some utilities to analyze those

texts. Following is the table I got.a utility to analyze the text data. The following are the results we

obtained:

Number of Most Frequent

Characters

Accumulated

Percentage

0.27098

0.66722

128

0.77094

256

0.85710

512

0.92635

1024

0.97130

2048

0.99431

4096

0.99981

Table 3. Japanese Character Distribution Table

4.3.4. Korean:

Similarly for Korean, we collected our own data from the Internet and run our utility on it. The results are

as follows:

Number of Most Frequent

Characters

Accumulated Percentage

0.25620

0.64293

128

0.79290

256

0.92329

512

0.98653

1024

0.99944

2048

0.99999

4096

0.99999

Table 4. Korean Character Distribution Table

4.4. General characteristics of the distributional results:

In all those four language, we can seethese four languages, we find that a rather small set of coding points

covers a significant percentage of characters used in our defined application scope. Moreover, A

closecCloser examination of those frequently used code points shows, however , that they are scattered

aroundover a rather largewide coding range. , This gives us a way to overcome the common problem

encountered in the Coding Scheme analyzer, i.e. different national encodings may share overlapping code

评论收藏

内容反馈

版权申诉

Java程序员-张凯

粉丝: 1w+
资源: 6656

NicePlayer是一套免费好用的播放器

视频播放器 Niceplayer.zip

NicePlayer-开源

NiceVieoPlayer（播放器）

NicePlayer:全屏，无边框多引擎播放器-开源

C编程原理第一章程序

C编程原理第一章程序2

echarts 全国地图json数据

axure谷歌浏览器插件

element-ui v2.15.13离线文档.rar

marktext0.17.1中文版本-【免费资源未商用不存在版权问题】-【请审核大大放过】.zip

html网页做的动态爱心（超好看）

微信小程序登录注册界面代码，包含当前界面的所有源码，使用手机验证码注册或账号登录

2023跨年代码(烟花+背景音乐)

HTML新年代码之2024新年快乐龙年大吉免费下载 happynewyear2024.zip

大屏数据可视化 Big screen data visualization demo

计算机毕业设计-ASP.NET图像的检索技术毕业设计(源代码++开题报告+外文翻译+文献综述+答辩PPT)-毕设源码实例

JSP企业电子投票系统(源代码+论文+开题报告+文献综述).rar

个人网页制作 大学生个人网页设计 个人网站模板 简单静态HTML个人网页作品 个人博客

13款echarts可视化大屏源码+效果图，适用于多个行业可视化大屏，免费下载

基于SpringBoot+Vue的学生选课管理系统的毕业设计，Vue+SpringBoot+MybatisPlus+MySQL

微信小程序源码包1000套-免费下载

web前端网页设计作品web期末大作业web前端作业网页制作代码web大作业制作网页代码

Threejs专用天空盒素材，五种天空盒素材下载

微信小程序本地生活案例源码

vue-devtools 扩展程序

五个拿来就能用的炫酷登录页面

Google Chrome浏览器下载

毕业设计-基于JAVA的springboot超市进销存系统(源代码+论文）

vue+js+海康web开发包接入海康威视摄像头

最新资源

个人网页制作大学生个人网页设计个人网站模板简单静态HTML个人网页作品个人博客