PRS Encoder and Decoder

NAND Flash Memories For SpacecraftECC Technologies, Inc.Page 1 of 22

NAND Flash Memories

For Spacecraft

ECC Technologies, Inc. (“ECCTek”)

4750 Coventry Road East

Minnetonka, MN55345-3909

Phone: 952-935-2885

Fax: 952-935-2491

Email:

Notice

This document does not contain any information ECC Technologies, Inc. considers confidential so you may copy and distribute the document freely.

Phil White

President

ECC Technologies, Inc. (ECCTek)

4750 Coventry Road East

Minnetonka, MN 55345-3909

Phone:952-935-2885

Fax:952935-2491

Email:

Website:

Revision History

05-02-10pewCreated initial version of this document based on previous documents and drawings.

05-07-10pewCreated v1 of this document.

05-10-10pewCreated v2 of this document.

05-12-10pewAdded sections to handle NAND Flash memory configurations with and without data path width converters.

05-12-10pewCreated v3 of this document. Widened the second sample configuration to handle 65-bit wide data instead of 48 as in v2.

05-13-10pewStopped using version numbers so that we can focus on developing one final document. Readers will know what version they have by looking at the Revision History and the date of the document.

Sections

1.Introduction......

2.Brief History of ECC in NAND Flash......

3.Issues Involved in Continuing to Increase t in Binary BCH Codes......

4.N-Page Blocks Divided into Segments......

5.2D Encoding Concept......

6.Hardware Encoding and Decoding of Segments......

7.Correcting Erasures Plus Errors......

8.Repeated Decodings......

9.Interface to the NAND Flash Memory......

10.High-Level Block Diagram of Data Collection and Downloading System......

11.NAND Flash Memory Configurations With Data Path Width Converters......

11.1.2D RS Encoding for First Sample Configuration......

11.2.2D RS Decoding for First Sample Configuration......

11.3.Percentage Redundancy for First Sample Configuration......

12.NAND Flash Memory Configurations Without Data Path Width Converters......

12.1.2D RS Encoding for Second Sample Configuration......

12.2.2D RS Decoding for Second Sample Configuration......

12.3.Percentage Redundancy for Second Sample Configuration......

Figures

Figure1 N Pages Divided into Segments......

Figure2 Encoded Data for 2D RS ECC Schemes

Figure3 Encoding and Decoding of 2D Arrays in Hardware

Figure4 64-bit to 44-bit Data Path Width Converter......

Figure5 64-bit to 40-bit Data Path Width Converter......

Figure6 Spacecraft Data Collection and Downloading System......

Figure7 Encoded 2R RS Array for First Sample Configuration......

Figure8 2D RS Encoding for First Sample Configuration......

Figure9 2D RS Decoding for First Sample Configuration......

Figure10 Percentage Redundancy for First Sample Configuration......

Figure11 Encoded 2D RS Array for Second Sample Configuration......

Figure12 2D RS Encoding for Second Sample Configuration......

Figure13 2D RS Decoding for Second Sample Configuration......

Figure14 Percentage Redundancy for Second Sample Configuration......

1.Introduction

This document describes how to apply ECCTek’s two-dimensional (2D) RS error-correction (ECC) system to create fault-tolerant NAND Flash memories for spacecraft.

Many commercial companies are currently developing NAND Flash storage systems. For example, Fusion-IO has recently received an additional $45M in funding and Pliant Technology has recently received an additional $27M. Established hard disk drive (HDD) companies Seagate and Western Digital also sell NAND FlashSSD devices.

NAND Flash solid state drives (SSDs) are attractive as replacements for HDDs because they are faster, take less power, and aremore shock resistantthan HDDs. Unlike HDDs, SSDs have no moving parts and are silent.

However, there are serious problems associated with using NAND Flash devices that need to be solved. NAND Flash devices were originally designed as electrically erasable programmable read only memories (EEPROMs) and were not originally designed for random access memories. NAND Flash devices cannot be reprogrammed/rewritten an unlimited number of times as HDDs can because they degradewith use. Also, blocks of storage cells in NAND Flash devices must be erased before sub blocks (Pages) can be programmed/written.

The wear-out and failure characteristics of NAND Flash devices require the use of a powerful error-correction system to allow NAND Flash memories to be reliable over a long period of time.

The use of NAND Flash memories in spacecraft will also require the ECC system to tolerate entire chip failures.

This document presents a general methodology for implementing highly reliable and fault-tolerant NAND Flash memories. In order to quickly and effectively convey the concepts involved, twosample configurations are described. The first configuration requires data path width converters and the second one does not. The two configurations show designers how to create other configurations with the same characteristics as the two samples. The first configuration is optimized to reduce gate count. The second configuration is optimized to make it as easy as possible to match the data path width of the NAND Flash memory to the widths of other system data paths. Readers should keep in mind that other cases can be easily created with different Reed-Solomon (RS) symbol sizes and different levels of correction. The methodology is not limited to the two described sample cases.

ECC Tek strongly believes there is very little risk in implementing the methodology described in this document because ECC Tek has more than 30 years experience in designing ECC circuits, the idea of correcting erasures proposed in this document has been implemented in a slightly different form in multi-track tape drives to tolerate the failure of multiple tracks, and the idea of using error-correction software as a recovery procedure in the event that hardware correction fails was used in the first Reed-Solomon HDD implementations at what is now Seagate.

With a 2D RS scheme as described in this document the designer of a NAND Flash memory has the freedom of arbitrarily increasing the level of protection and also the number of failed chips the system can tolerate. No other existing ECC system gives designers that freedom.

2.Brief History of ECC in NAND Flash

Shortly after NAND Flash devices were first manufactured, an ECC system was recommended by Samsung that corrected t=1 bit in each Page. Pages were then ~512 bytes. The t=1 ECC recommended by Samsung is very similar to a Hamming code which is a binary BCH code that can correct t=1 bit.

ECCTek has designed and licensed numerousECC designs for NAND Flash – a number of them are currently in mass production. The original ECC designECCTeklicensed for NAND Flash was a ReedSolomon (RS) system that operated on 10-bit symbols and corrected t=5 10-bit symbol errors. Two of the bits in each symbol were forced to 0 in the data field so that the data field actually consisted of 8-bit bytes. ECCTek’s first RS system for NAND Flash was implemented in 2005 and went into mass production shortly thereafter.

In 2007, ECCTek developed and licensed its first programmable binary BCH encoder and decoder designs for use with NAND Flash devices which could correct up to t=18 bits. Shortly thereafter, ECCTek developed and licensed a programmable binary BCH ECC system that could correct up to t=30 bits. In the last 3 years, ECCTek has licensed a number of similar binary BCH encoder and decoder designs for NAND Flash.

ECCTek has developed and synthesized abinary BCH design that can correct up to t=44 bits in 1024-byte Pages, and there has been interest in binary BCH designs for NAND Flash that can correct up to t=60 bits.

3.Issues Involved in Continuing to Increase t in Binary BCH Codes

When the number of bits correctable, t, and/or the data field length, K, increases, the complexity of binary BCH encoders and decoders increase exponentially whichlimits the degree to which t and K can be increased. ECCTek has estimated that correcting t=60 bits in 1024-byte data field lengths will require around 250K gates in an ASIC when a fairly low level of parallelism is implemented in the decoder. Implementing a t=60 binary BCH design in an FPGA may not be practical at the present time.

When implementing binary BCH encoders and decoders, the codeword contains binary symbols “bits” and a Galois or Finite Field must be used to uniquely identify, tag, address or locate each codeword symbol/bit. For NAND Flash Page sizes larger than 1024 bytes but less than 2048 bytes, a locator field with m=14-bit elements must be used. Fourteen-bit finite field elements are relatively large.

The number of redundant bits required by a t-bit error-correcting binary BCH code is r mt where m is the width of the locator field elements. Usually the number of redundant bits required is r= mt.

When implementing RS codes, the codewordcontains nonbinary symbols and smaller locatorfields can be used than what are required for binary BCH codes. For example a RS locator field with 10-bit elements was used in ECCTek’s first RS design for Flash, and a locator field with 13-bit elements was used in ECCTek’s first binary BCH design. Both designs were for 512-byte data field lengths.

Apparently as time passes and feature sizes decrease the reliability of NAND Flash devices is decreasing because ECCTekcontinues to receive requests to correct more and more bits – up to t=60 bits per Page. Seven years earlier only 1 bit per Page was corrected. This trend will most likely continue especially with multi-level cell (MLC) NAND Flash devices.

Designing binary BCH encoders and decoders with very large t is difficult because the amount of time it takes to compute the error locator polynomial, L(x), from the Syndrome, S(x), and the complexity (gate count) of the decoderincreases rapidly as t is increased. Synthesis results indicate that a t=50 binary BCH decoder would probably take ~250K gates in an ASICif the decoder was designed so that it could keep up with continuous input data when correcting 50 bits in each received word.

The above-mentioned ECCschemes do not include any provision to correct errors caused by entire chip failures. In order to achieve fault tolerance, some type of 2D ECC scheme must be used.

4.N-Page Blocks Divided into Segments

Data will be written to N NAND Flash chips in N-Page Blocks which are divided into logical Segments as illustrated in Figure1. Think of each column in Figure1 as containing a single column of “bits”. Assume one Page contains approximately 4K bytes.

Figure1 N Pages Divided into Segments

Each Segment will contain N vertical codewords.

For binary BCH codes currently being implemented, each vertical codeword is approximately 512 or 1024 bytes with 4 or 8 Segments per Page.

ECC Tek is proposing the use of RS vertical codewords for the 2D ECC system instead of binary BCH codewords and much smaller Segments than what are currently being used. The maximum height (in number of symbols) of one vertical codeword in a Segmentis determined by the size of the RS column symbol being used.

5.2DEncoding Concept

Think of the data in one Segment as a matrix or two-dimensional array of data items. Think of the columns asvertical RScodewords and the rows that contain data as horizontal RS codewords.

Figure2 illustrates howredundant data items are appended onto freely chosen data items in a 2DECCencoding scheme. Most likely 2D encoding schemes will be widely implemented in the near future. In Figure2, the rows are encoded first and then the columns.

Figure2 Encoded Data for 2D RS ECC Schemes

6.Hardware Encoding and Decoding of Segments

A high-level block diagram of a NAND Flash storage device that encodes and decodes Segments in hardware using a parallel Reed-Solomon (PRS) row encoder and decoder and multiple RS column encoders and decoders is illustrated in Figure3.

Figure3 Encoding and Decoding of 2D Arrays in Hardware

When writing, rows are encoded by the PRS row encoder, columns are encoded by multiple RS column encoders, and a 2D encoded Segment, as illustrated in Figure2, is written to N NAND Flash chips.

When reading, columns read from the N NAND Flash chips are decoded by multiple RS column decoders and rows are decoded by a PRS row decoder. The PRS row decoder will pass through the vertical redundancy unaltered if any of the row decodings for one Segment fails. If none of the row decodings for one Segment fails, the 2D decoder has determined that either no errors occurred or that all of the errors have been properly corrected.

Since multiple column encoders and decoders are required, the complexity of one column encoder and decoder pair must be reasonable in order for the 2D scheme to be practical – especially when the 2DECC schemeis implemented in FPGAs as is the standard practice for spacecraft. Since we are considering relatively small symbol sizes, there will necessarily be multiple Segments per Page. For example, if we consider 4096-byte Pages and 6-bit column symbols, there will be more than 92 Segments per Page.

7.Correcting Erasures Plus Errors

Both the RS column and row encoders and decoders implemented in hardware can be designed to correct erasures plus errors. An erasure means a codeword location where the codeword symbol in that location has a high probability of being in error, but is not necessarily in error.

The PRS row decoder should always be designed to correct errors plus erasures, but the RS column decoders would probably be designed to correct errors only. When a column decoding fails, the decode fail signal from the column decoder indicates that all of the row symbols in that verticalcodewordare erased. The PRS row decoder can correct t errors and s erasures as long as 2t + s R where R is the number of redundant symbols per row codeword.

8.Repeated Decodings

Both the row and column decoders can output entire corrected codewords (both corrected data and corrected redundancy) so both column and row decoding on each Segment can be repeated an unlimited number of times if desired or necessary.

If none of the row decodings on a Segment fail, then it is assumed that the data in that segment either has no errors or that all of the errors have been correctly corrected. If any of the row decodings fail, the PRS row decoder will pass the corrected column redundancy through the PRS row decoder without altering it so that column and row decodings can be repeated if desired.

It is possible to duplicate the column and row hardware decoders multiple times or to feed the data back to the input by using more FIFOs and muxes, but that would probably be overkill since one column decoding followed by one row decoding will most likely correct all of the errors almost all of the time.

Most likely the best way to implement multiple decodings would be to do repeated decodings in software only when needed. Software correction time is probably not critical because repeated decodings would rarely be done. That way, row and column decodings can be repeated any number of times and no additional hardware is required so it keeps the complexity of the ECC hardware to a minimum.

When an error pattern occurs that is too severe for one column and row decoding to correct it (which may never happen or may happen only once a year), then some means would need to be provided for the software to access and redecoded the codewords resulting from one correction and repeat the correction any number of times in an attempt to recover the data. Since ECCTek licenses both C and Verilog code, the C code can be used for that purpose.

9.Interface to the NAND Flash Memory

N is the number of NAND Flash chips that are written and read simultaneously. In the first sample configuration N=15, the row symbol width is 4 and the number of redundant symbols is 4 so the input and output data path widths for the NAND Flash memory can be up to 11 symbols =44 bits.

What if we want to use a 44bit wide NAND Flash memory in a 64bit system?

To do that, we need a 64bitto44bit converter at the input of the NAND Flash memory and a 44bitto64bit converter at the output.

For the input converter in this case we need to find the smallest integers x and y such that 64x = 44y. Note that x/y=44/64=22/32=11/16. Since 11 and 16 have no common factors, we need 11, 64-bit registers to convert a 64-bit data path to a 44-bit data path as illustrated inFigure4.

The circuit shown in Figure4 can be thought of as a FIFO with different input and output widths. Control logic similar to a standard FIFO’s control logic can be developed.

There are several ways to handle the clock with a 64bitto44bit input converter. One way is to pause five clock cycles for every 11 clock cycles while writing to the NAND Flash memory. Another way is to implement a state machine which will monitor the circuit and issue a pause signal to the input as needed. Another way is to use a higher frequency clock for the NAND Flash memory than what is used for the input logic.

Figure4 64-bit to 44-bit Data Path Width Converter

Other converters are simpler such as the 64bitto40bit input converter shown in Figure5.

Figure5 64-bit to 40-bit Data Path Width Converter

Converters required at the output follow the same pattern as those shown and are easy to design.

10.High-Level Block Diagram of Data Collection and Downloading System

We will describe how to implement highly reliable, fault-tolerant NAND Flash memories to be used inspacecraft data collection and downloading systems as illustrated in Figure6.

Figure6 Spacecraft Data Collection and Downloading System

Assume that the Data Collector can be paused between words written to the Main Data FIFO(MDFIFO).

The MDFIFO can be implemented using a dual-port RAM. Assume the read and write address registers are initially reset to 0. When a word is written to the MDFIFO, the write address is incremented by 1. When a word is read from the MDFIFO, the read address is incremented by 1. Whenever the write and read addressesequal the end of the RAM, they are reset to 0.