DelphiUnicode移植（从Delphi2007及一下版本移植到Delph2009以上）_delphi2007TMemoryStreamString乱码资源-CSDN文库

4星 · 超过85%的资源需积分: 13 28 浏览量 2012-05-11 15:03:46 上传评论 1 收藏 243KB PDF 举报

标题与描述均聚焦于“Delphi Unicode移植”的过程，即如何将从Delphi 2007及其更早版本的应用程序迁移到支持Unicode的Delphi 2009及以上版本。这一迁移对于确保应用程序能够在多语言环境中正确处理文本至关重要。 ### Delphi Unicode移植的核心概念在Delphi 2007及之前的版本中，字符串类型主要基于ANSI编码，这意味着它们可能不支持所有语言字符集，尤其是在东亚、中东和某些欧洲语言中。然而，随着Delphi 2009的发布，Embarcadero Technologies引入了全面的Unicode支持，使开发者能够创建出能在全球范围内使用的应用。 ### Unicode与ANSI编码的区别 Unicode是一种国际化的字符编码标准，旨在支持世界上几乎所有语言的书写系统，而ANSI编码通常仅限于特定的语言区域。在Delphi中，这一转变意味着字符串类型的底层实现从`AnsiString`变为`UnicodeString`，后者能够存储更广泛的字符集，包括表情符号和其他特殊字符。 ### 迁移过程中的关键挑战 1. **代码审查**：由于字符串类型的改变，旧代码中对字符串大小和操作的假设可能不再适用。开发者需要仔细检查每一处涉及字符串操作的地方，确保它们能正确处理Unicode字符串。 2. **数据库兼容性**：如果应用程序依赖于外部数据库，确保这些数据库也支持Unicode是非常重要的。这可能涉及到数据类型的调整以及查询语句的修改。 3. **资源文件和用户界面**：所有UI元素和资源文件，如菜单、对话框和帮助文件，都需要进行审查，以确认它们能够正确显示Unicode字符。 4. **性能考量**：由于Unicode字符串通常占用更多内存，可能会影响应用的性能。优化内存管理策略是迁移过程中必须考虑的。 5. **错误处理和调试**：Unicode相关的错误可能比ANSI编码下更难以定位。开发者需要熟悉新的错误消息，并可能需要更新调试工具和技巧。 ### 实施策略与建议 - **分阶段迁移**：考虑到迁移的复杂性和潜在风险，建议采用逐步推进的方式，先从最核心的功能模块开始，逐步扩展到整个应用程序。 - **测试**：在每个阶段完成后进行全面的测试，确保功能的正确性和性能的稳定性。利用自动化测试框架可以帮助识别潜在问题。 - **文档更新**：随着代码的修改，相应的技术文档也需要同步更新，以反映新的编码实践和标准。 ### 结论尽管Delphi Unicode移植是一项复杂的任务，但它为开发者提供了创建真正国际化应用的机会。通过遵循上述指南，开发者可以有效地克服迁移过程中的挑战，确保应用程序在全球市场上的竞争力和适应性。此外，掌握Unicode相关的知识和技能也将成为开发者职业生涯中的宝贵财富。

资源详情

资源评论

资源推荐

Americas Headquarters

EMEA Headquarters

Asia-Pacific Headquarters

100 California Street, 12th Floor

San Francisco, California 94111

York House

18 York Road

Maidenhead, Berkshire

SL6 1SF, United Kingdom

L7. 313 La Trobe Street

Melbourne VIC 3000

Australia



Delphi Unicode Migration for

Mere Mortals:

Stories and Advice from the Front Lines

Cary Jensen, Jensen Data Systems, Inc.



December 2009

(updated October 2010)

Delphi Unicode Migration for Mere Mortals: Stories & Advice from the Front Lines

Embarcadero Technologies - 1 -

SUMMARY

With the release of Embarcadero

RAD Studio XE (and beginning with the release of RAD

Studio 2009), Embarcadero Technologies has empowered you, the Delphi

and

C++Builder

developer, to deliver first class, Unicode-enabled applications to your

customers. While this important development is opening new markets for your software, in

some cases it presents a challenge for existing applications and development techniques,

especially where code has included assumptions about the size of strings.

This paper aims to guide your Unicode migration efforts by sharing the experiences and

insights of numerous Delphi developers who have already made the journey. It begins with

a general introduction of the issues, followed by a brief overview of Unicode basics. This is

followed by a systematic look at the various aspects of your applications that may require

attention, with examples and suggestions based on real world experience. A list of

references that may aid your Unicode migration efforts can be found at the end of this

paper.

INTRODUCTION

Embarcadero introduced full Unicode support in RAD Studio for the first time in August of

2008. In doing so, they ensured that Delphi and C++Builder would remain at the forefront

of native application development on the Windows platform for a very long time to come.

However, unlike many of the other major enhancements that have been introduced in

Delphi over the years, such as variants and interfaces (Delphi 3), frames (Delphi 5), function

inlining and nested classes (Delphi 2005) and generics (Delphi 2009), enabling Unicode

didn't involve simply adding new features to what was already supported in Delphi.

Instead, it involved a radical change to several fundamental data types that appear in

nearly every Delphi application. Specifically, the definitions for the String, Char, and PChar

types changed.

These changes were not adopted lightly. Instead, they were introduced only after

extensive consideration for the impact that these changes would have for existing

applications as well as how they would affect future development. In addition,

Embarcadero sought the input and advice of many of its Technology Partners who support

and promote Delphi.

In reality, there was no way to implement the Unicode support without some

inconvenience. As one of the contributors to this paper, who requested that I refer to him

simply as Steve, noted, "I think PChars and Strings should never have changed meaning.

... Having said that, any choice the developers of Delphi made would have been criticized.

It was a bit of a no-win situation."

Delphi Unicode Migration for Mere Mortals: Stories & Advice from the Front Lines

Embarcadero Technologies - 2 -

In the end, changing the meaning of String, Char, and PChar was determined to be the

least disruptive path, though not without consequences. On the plus side, Embarcadero

instantly enabled RAD Studio developers to build world class applications that treat both

the graphical interfaces and the data they help manipulate in a globally-conscious manner,

removing substantial barriers to building and deploying applications in an increasingly

global marketplace.

But there was a down side as well. The changes to String, Char, and PChar introduced

potential problems, significant or otherwise, for the migration of applications, libraries,

shared units, and time-test techniques from earlier versions of Delphi/C++Builder.

Let's be realistic about this. Nearly every upgrade of an existing application can potentially

encounter migration issues that require changes to the existing code or require upgrades

to newer versions of third-party component sets or libraries. The same is true when

upgrading to Delphi 2009 or later. Some upgrades will be easier, and some will be more

challenging.

And now we get to real point of this paper. Because of the changes to several

fundamental data types, data types that we have relied upon since Delphi 1 (Char and

PChar) or Delphi 2 (String), it is fair to say that migrating an existing application to Delphi

2009 or later requires more effort than any previous migration.

Contributor Roger Connell of Innova Solutions Pty Ltd offered this observation, "While

[the Delphi team has], in my view, done a sterling job [adding Unicode support, this] has

been the most challenging (in fact the only really challenging) Delphi migration."

Fortunately, there are solutions for every challenge you will encounter, and this paper is

here to help.

I began this project by asking the Delphi community for their input. Specifically, I asked

developers who successfully migrated their existing applications to Delphi 2009 and later

to share their insights, advice, and stories of Unicode migration. What I received in

response was fascinating.

The developers who responded represent nearly every category of developer you can

imagine. Some are independent developers while others are members of a development

team. Some produce vertical market products, some build in-house applications, and

some publish highly popular third-party component sets and tools used by application

developers. Yet others are highly respected authorities on Delphi, developers who speak

at conferences and write the books most of us have read.

Their stories, advice, and approaches were equally varied. While some described

migration projects that were rather straightforward, others found the migration process

difficult, especially in the cases of applications that have been around for a long time, and

included a wide variety of techniques and solutions.

Delphi Unicode Migration for Mere Mortals: Stories & Advice from the Front Lines

Embarcadero Technologies - 3 -

Regardless of whether a particular migration was smooth or challenging, a set of common

approaches, practical solutions, and issues to consider emerged, and I look forward to

sharing those with you.

But the story does not end with the publication of this white paper. I hope to continue to

collect Unicode migration success stories, and update this paper sometime in the future.

As a result, if you are inspired by what you read, and have a story of your own that

complements or extends what you read here, consider becoming a contributor yourself. I'll

say more about this at the end of this paper.

In the next section, I provide a brief summary of basic Unicode definitions and

descriptions. If you are already familiar with Unicode, have a basic understanding of UTF-8

and UTF-16, and know the difference between code pages and code points, you should

either skip this section, or quickly skim if for terms you are unfamiliar with.

But before we continue, there is one more point that I want to make. RAD Studio's support

for Unicode has two complementary, though distinct, implications for those applications

you build. The first is related to how strings are treated differently in code written in Delphi

2009 and later versus how they are treated in earlier versions of Delphi. The second relates

to localization, the process of adapting software to the language and culture of a market.

This paper is designed specifically to address the first of these two concerns.

Implementing support for multiple languages and character sets is beyond the scope of

this paper, and will not be discussed further.

WHAT IS UNICODE?

Unicode is a standard specification for encoding all of the characters and symbols of all of

the worlds written languages for storage, retrieval, and display by digital computers.

Similar to the ANSI (American National Standards Institute) code standard character set,

which represents both control characters (such as tab, line feed, and form feed) and

printable characters of the 26 character Latin alphabet, Unicode assigns at least one

unique number to every character.

Also like the ANSI code standard, Unicode represents many types of symbols, such as

those for currency, scientific and mathematical notation, and other types of exotic

characters. In order to reference such a large number of symbols (there are currently more

than a million), Unicode characters can require up to 4 bytes (32 bits) of data. By

comparison, the ANSI code standard is based on 8-bit encoding, which limits it to 255

different characters at a time.

Each control character, character, or symbol in Unicode is assigned a numeric value, called

its code point. The code point for a given character, once assigned by the Unicode

Delphi Unicode Migration for Mere Mortals: Stories & Advice from the Front Lines

Embarcadero Technologies - 4 -

Technical Committee, is immutable. For example, the code point for ‘A’ is 65 ($0041 hex,

which in Unicode notation is represented as U+0041). Each character is also assigned a

unique, immutable name, which in this case is ‘LATIN CAPITAL LETTER A.’ Both of these

can never be changed, ensuring that today’s encoding can be relied upon indefinitely.

Each code point can be represented in either one, two, or four bytes, with the bulk of

common code points (64K worth) being capable of being represented in two bytes or less.

In Unicode terms, these first 64K symbols are referred to as the basic multilingual plane, or

BMP (you'll want to remember these initials, as they will come up a lot in this paper).

To make things somewhat more complicated, the Unicode standard allows some

characters to be represented by two or more consecutive code points. These characters

are referred to as composite, or decomposable, characters.

For example, the character ö can be represented as $00F6. This character is referred to as

a precomposed character. However, it can also be represented by the o character ($006F)

followed by the diaeresis (¨) character ($0308). The Unicode processing rules compose

these two characters together to make a single character.

This is demonstrated in the following code segment:

var

s: String;

begin

ListBox1.Items.Clear;

s := #$00F6;

ListBox1.Items.Add('ö');

ListBox1.Items.Add(s);

ListBox1.Items.Add((IntToStr(Ord('ö'))));

s := #$006F + #$0308;

ListBox1.Items.Add(s);

The purpose of composite characters is to permit a finer grain analysis of the contents of a

Unicode file. For example, a researcher who wanted to count the frequency of the use of

the diaeresis (¨) diacritic, regardless of which character it appeared over, could decompose

all characters that use it, thereby making the counting process straightforward.

Although all currently assigned code points (as well as all imaginable future code points)

can be reliably represented by four bytes, it does not make sense in all cases to represent

each character with this much memory. Most English speakers, for example, use a rather

small set of characters (less than 100 or so).

As a result, Unicode also specifies a number of different encoding standards for

representing code points, each offering trade-offs in consistency, processing, and storage

requirements. Of these, the ones that you will run into most often in Delphi are UTF-8,

UTF-16, and UTF-32. (UTF stands for Unicode Transformation Format or UCS

剩余41页未读，继续阅读

ynkmdm

2013-01-06

纯英文版的，看起来有些吃力，不过对于移植程序还是很有帮助。

评论收藏

内容反馈

cjxhd

粉丝: 0
资源: 8

Delphi Unicode移植（从Delphi2007及一下版本移植到Delph2009以上）

评论1

最新资源