04.14.08

Three is a Magic Number

Posted in Apple ][ at 6:30 pm by site admin

A bit of history to begin. One of the first attempts to support multiple layers in GTE was to allocate a large block of Bank $00 memory and draw the screen by alternative PEA and PEI instructions with the soft switches set to Bank $00 Read/Bank $01 Write. I was turned on to this method by one of Alex Eddy’s demos.

A severe downside to this method was that it used an enormous amount of precious Bank $00 memory. Also, there was no way to “wrap” the addressing around, so each line of data has to be duplicated. This meant that I could never support a full-screen second layer. I eventually abandoned this approach once I figure out how to use indirect addressing to place the second layer in any memory bank I chose. This solved the “wrapping” problem too, and I’ve never looked back.

Now that I’m considering generalizing the blitter to remove the three byte limit on instruction sequences, I’ve decided to revisit the Bank $00 approach in order to incorporate a limited third layer into the engine. Some Bank $00 memory is already in use for Animated Tiles, so one would have to chose one or the other, but not both.

Because Bank $00 memory must be conserved, the third layer is allocated such that it provides a horizontally repeating patters. The width of this layer must be a divisor of the BG0 field width — 84 words. In order to further optimize memory usage, the starting address of each line of the third layer can be set independently. This allows certain lines to be duplicated which is handy for vertically symmetric backgrounds like caves.

The blitter code fragment for a full BG0 + BG1 + BG2 blit becomes

    lda   00,x
    and   [00],y
    ora   (00),y
    and   #MASK
    ora   #DATA
    pha

which occupies 13 bytes. In order to support masking of BG1, we use a clever trick of packing the direct page with long addresses (3 bytes) that point to a mask buffer. The Data Bank register is set to the BG1 data bank, so the same direct page address will access the data or mask buffer depending on the addressing mode.

Also, the 13 byte code fragment is the worst case scenario. Typically BG0 will be either totally transparent or opaque. This is determined at run time when the tile is written to the BG0 code buffer and the code is optimized appropriately. It would be very useful to be able to optimize for an opaque/transparent BG1, but this cannot be computed at the time the tile is drawn and checking at runtime is cost prohibitive.

Once the tile compiler is finished, I’ll try and whip out a proof-of-concept.

04.10.08

The Pursuit of Pixel Perfection

Posted in Apple ][ at 2:56 pm by site admin

So the alliterations in these posts are probably wearing thin at this point, but please bear with me for a bit. During my biweekly flight to the West Coast this week (thank goodness I didn’t fly American Airlines!) I had some time to contemplate the limitations of GTE compared to the graphical prowness of SNES-level hardware. One of the biggest technical limitations of the current design is that the selection between BG0 and BG1 can only be done on a word basis. So, as a designer, you are left with the limitation that every group of 4 pixels must be totally in one of the two layers.

This limitation has always bothered me for two reasons, 1) I don’t like limitations, and 2) it makes the graphics look really ugly. I’ve always wanted to have pixel-level parallax granularity for the final release of GTE, and now I think I know how to pull it off. First, I should explain the reason behind the current limitation.

When GTE blits the playing field to the graphics screen, it does so with zero overdraw by executing a single code buffer that contains either a PEA $1234 instruction for drawing BG0 data, or a LDA (00),y PHA pair that copies dat from the BG1 data bank. These instructions take up 3 bytes and the size of the code buffer is 84 words wide. This is the minimum width necessary for supporting scrolling with 16×16 tiles since the graphics screen is 80 words width (320 pixels) and an extra block is needed to draw both “edges” of the playing field.

So, 84 words times three bytes per instruction results in a 252 bytes of code per line just to copy data. There is a JMP instruction at the end of the code that jumps back to the beginning of the line, so that brings the total to 255 bytes. In order to draw the correct portion of each line, GTE patches each code line at runtime to branch out at the appropriate time. Currently, we use a BRA instruction since it only requires a single load and store to patch. The BRA instruction barely has enough range to branch out of the code line. If the code were extended by just one byte, it will no longer work.

In order to get the pixel perfect parallax I’m after, I’m willing to make some compromised in the implementation of the blitter, but there is one iron-clad rule: do not slow down the fast path! Whatever runs quickly now, should run quickly in the new design, too. With that in mind, let’s proceed.

My plan is break up the code buffer to be more block-oriented. The code will be split up among multiple banks and a total of 9 bytes will be allocated per word, although most bytes will be unused. In addition to the PEA and LDA/PHA instruction sequences, I need to introduce the following code sequence for the cases where a BG0 word is partially transparent.

   lda   (00),y
   and   #MASK
   ora   #DATA
   pha

Good. 9 bytes, nothing complicated. This is just the standard “mask and draw” method of blitting. The real work coming in dealing with the previously fast instruction which now must contend with 6 empty bytes. The solution is to simply add a branch instruction to skip to the next code fragment. This may work, but is costly — the cycle count for a solid BG0 word jumps from 5 o 8 cycles. Not good!

Instead of handling the alignment on a word basis, we’ll do it on a block basis. For 4×4 blocks this reduces to the same thing, but if you’re using 4×4 blocks, performance is probably not your most critical concern. If we consider a 16 pixel wide block, then we can pack 4 word fragments together and only add padding at the end if needed. For example:

   pea   $1234
   lda   (02),y
   pha
   lda   (04),y
   and   #$00FF
   ora   #$7800
   pha
   pea   $9ABC
   bra   next

Not bad! We’ve only added an additional 3 cycles of overhead per 4 words, less than a cycle per word, which is more than compensated for by the fact that I make sure my direct page accesses are page-aligned. :)

Overall, I’m pretty happy with the design on paper and will give it a shot for an implementation. There is some additional overhead, of course. The tile blitters have to do extra work in order to update some internal tables that identify the address of each block in a line. Dispatching to and from a code buffer that spans multiple banks may require some 24-bit address manipulation, but nothing too serious. Most importantly, the graphical quality of the engine is improved and the performance does not suffer. Using complex BG0 masks will slow things down, of course. But the execution is directly tied to complexity of the scene, which is about as good as one can hope for.

04.07.08

Basic Block BitBlit Babbling

Posted in Apple ][ at 5:01 pm by site admin

Well, it’s happened again. I’ve become so totally swamped with Real Life that any work on GTE has been pushed to the rear of the stove. Even though I haven’t done any coding in several weeks, I’d like to describe what the next phase of development holds.

Currently, the bare bones of GTE are working, but many of the supported features do not have the proper low-level support. This support takes the form of a myriad of custom tile bitblit functions. A different bitblit is required for all the different combinations of BG0, BG1, Fringe and Animated tiles. Also, based on the capability bits of the tool set, the same bitblit routine can be optimized in different ways. Plus, a separate routine is required for each tile size and orientation.

For example, the simplest block blit (as in, the only one currently implemented), copies the tile data, without masking, directly into the operand field of the PEA instructions that blit the BG0 field to the graphics screen. Simple, straightforward and fast. Now consider what happens if Animated Tile support is activated. In this case, some of the PEA opcodes are replaced with LDA/PHA instruction pairs which copy data from the animated tile data buffers to the screen. Unfortunately, this means that the opcodes must be reset to PEA instructions when copying a regular tile to the buffer. Hence, tile blitting is about 40% slower in this case.

Other feature require similar considerations. Activating BG1 requires that the tile mask data be evaluated in order to select between PEA and LDA/PHA instructions on a per-word basis. If the Fringe layers are active, then there is yet another set of masks and data to consider. Of course, the more complicated blits require to most optimization in order to keep things fast.

So, rather than manually code the multitude of blitting routines, I’m planning to write a bitblit generator to create the blitters for me. While this is just as much work as writing the code manually, it will be much easier to change things and I can ensure the all the blit routines are always up to date.

I’d like to finish with an short example that illustrates the difference in complexity between the simplest blitter and the most complex. The tile data and masks are stored in LocInfo records, thus each tile takes up 8 bytes per row. This is a bit wasteful for 8×8 and 4×4 tiles, but it helps maintain data structure compatibility with QuickDraw II. I’ll assume that there are four pointer to the tile data and mask, and the fringe data and mask named dptr, mptr, fdptr, and fmptr respectively.

; This copies data directly to the BG0 code buffer.  Assume that the x register contains the
; proper address.
simple8x8 anop
          lda   [dptr]
          sta   |$0001,x
          ldy   #2
          lda   [dptr],y
          sta   |$0004,x          ; finish row 0

          ldy   #8
          lda   [dptr],y
          sta   |BG0_STRIDE+$0001,x
          iny
          iny
          lda   [dptr],y
          sta   |BG0_STRIDE+$0004,x          ; finish row 1

          …

; This merges BG0, BG1 and Fringe data together.  Very slow….
complex8×8 anop
          lda   [mptr]          ; combine the tile and fringe mask data
          and   [fmptr]
          inc                   ; Is it totally transparent (0xFFFF)?
          bne   is_solid

          lda   #LDA_OPCODE     ; show BG1
          sta   |0,x
          lda   operand0        ; load the LDA operand and PHA instruction
          sta   |1,x
          bra   nextWord

is_solid anop
          lda   #PEA_OPCODE
          sta   |0,x

          lda   [dptr]         ; Fill the transparent regions with the background color
          eor   bgColor
          and   [mptr]
          eor   bgColor        ; have the base word, merge with fringe data
          eor   [fdptr]
          and   [fmptr]
          eor   [fdptr]
          sta   |$0001,x       ; save the data
          …

As you can see, the amount of code needed for the most complex cases is considerable. Also, the large number of logical operations given some hope that this code might be further optimized.

We need four of these routines since we must be able to combine all possible combinations of horizontal and vertical flipping of the Fringe and Base tiles. Lots of work ahead. :)

03.22.08

Source Code Available

Posted in General at 7:05 am by site admin

I’ve uploaded the source code for the demo program on the disk image along with a work-in-progress header file for the ToolSet and the source code for the BMP reader in case it’s interesting to anyone. Lisk below:

Enjoy!

03.21.08

Maze Demo Uploaded

Posted in Apple ][ at 12:15 pm by site admin

As promised, I uploaded a disk image of the current GTE ToolSet along with a very simple maze demo. As a bonus, I also included a couple of very old SMB prototype demos. I can’t guarantee that any of this will run on your system, but give it a try.

Get the disk image here.

03.20.08

Rendering Complete

Posted in Apple ][ at 8:27 am by site admin

Finally finished and mostly debugged the new rendering architecture of the engine. Aside from some strange crashes and random functionality failures (probably some bad initialization code), I’m pretty happy with everything. I thought that the animation would be rock-solid under KEGS since it emulates the VBL signal decently, but there is still flicker at high speeds. It goes away when I drop the emulated speed down, so maybe it actually is working properly. Need to dig out a real machine to test…

The bulk of the work was not in the rendering path itself, but the sprite compiler. By changing the line-by-line sprite decomposition, I had to rewrite the compiler to produce whole sprites. I also lost the ability to reuse the same data for vertically flipped drawing. I’m a little disappointed that the size of the sprites has ballooned so much. For instance, a simple 16×16 sprite takes up over 4K of memory!

I may rework the new compiler to produce small subroutines that call the sprite code fragments in the proper order to restore the space advantage. This is doable now because the sprite code has its own private stack in Bank $01, so it can freely use JSR/RTS instructions without stomping over memory.

03.14.08

A Revised Approach

Posted in Apple ][ at 10:41 am by site admin

After getting down to brass tacks with regard to the blitter code reorganization, I found the order of operations needed to be tweaked a bit. The blit order is now:

  • Turn shadowing off
  • Blit the lines with sprites
  • Blit the sprites
  • Turn shadowing on
  • Blit the lines without sprites and PEI updates in one pass

I like this for two reasons. First, I save a very small amount of time avoiding repeated toggling of the shadow register. Second, the actually display of the screen is done in a top to bottom fashion, rather than “filling in” the lines that had sprites on them. This is just a nice property to have in case you want to be fancy and try synchronizing with the VBL.

Another small victory is that the non-sprite blitter and PEI slammer is the fastest code in the engine, so the graphics are updated in the shortest amount of time. If the sprites are drawn in the middle of the rendering loop, then a delay is introduced between when the drawing begins and ends. This can lead to additional tearing of the screen.

03.06.08

A Different Approach

Posted in Apple ][ at 8:56 am by site admin

So I was thinking about GTE last night and come to the realization that the fundamental structure of the blitter needs to be changed. “What’s that!”, you cry. “How can he possibly consider throwing everything out at this point! Why, he’s not even released a functional tool yet!!”. True, I have not gotten the Tool Set finished, but I think it’s important to have a solid 1.0 release that can be incrementally extended. I have no desire to overhaul to code once it gets out into the wild.

That said, the changes I’m going to propose do not require rewriting a significant amount of code. It’s really a reorganization of the sequence of operations. Currently, the core rendering algorithm of GTE is scanline based. A full line of data is blitted and the sprites are composited on top of it. This is synchronized with the Vertical Blank (VBL) in order to avoid flicker. This works.

While converting the code base to a ToolSet, I’ve been consciously trying to reduce the complexity of the code and, consequentially, lowering the amount of overhead. By overhead, I mean all the instructions that need to be executed to maintain data structures, shuffle data to the proper location, etc. It is often the case that a simpler, but less efficient approach can out-perform a theoretically faster method if the overhead is significantly reduced.

The renderer in GTE in theoretically fast because it only draws data once. There is no erasing or restoration of the background data when sprites are composited on top. However a significant amount of overhead is required to decompose the sprites on a per-line basis and integrate their drawing into the blitter inner loop. I thought that there was no way around this problem without introducing flicker into the blitter, but after reviewing TN #70, I have changed my mind.

I think the renderer can be changed to the following

  • Turn shadowing on
  • Blit all lines without sprites
  • Turn shadowing off
  • Blit the lines with sprites
  • Blit the full sprites onto the shadow screen
  • Turn shadowing on
  • Expose the lines via PEI slamming

If you read through TN #70, it documents the amount of time it takes to copy data to the graphics screen with shadowing on or off. Because the IIgs does not need to synchronize the fast and slow sides when shadowing is off, the code can run faster. Still, copying the data via PEI slamming is still much slower than the time saved by avoiding synchronization. In the worst case, we would have to save an additional 320 to 480 cycles per rendered scan line to make up the difference in full screen (320×200) mode.

Fortunately, we get a real win from disentangling the sprite rendering from blitting the background data. Just a cursory check of the code shows that by removing some excess code for VBL synchronization, sprite dispatch and softswitch toggling, we can save over 100 cycles per scan line which is already over a fifth of the required saving. There are similar savings to be had in the sprite rasterizer and I’m sure that the inner loop of the blitter will be able to enjoy some simplification as well. This is a conservative analysis, since it doesn’t take into account the fraction of times that the renderer is required to wait for the VBL, which can stall the code for up to 10 scanlines, or 630 microseconds which corresponds to around 1,500 cycles.

So, to summarize, what appears to be a suboptimal way of rendering may actually be faster by reducing the overhead of the renderer. In addition, we are guaranteed to have flicker-free updates and the frame rate will be more consistent when a fixed number of sprites are on screen due to the removal of VBL synchronization.

All in all, a pretty nice win!

02.28.08

Huzzah!

Posted in Apple ][ at 10:30 am by site admin

After I wrote my last post, I started thinking about GTE again, so I had to do a little bit of hacking and fixed the memory alignment bug that I described. It’s not much, but the ToolSet Allocation and Initialization code appears to be working and I can finally peck away at getting the tile and sprite routines wrapped up and tested.

I probably shouldn’t have rewrote the sprite dispatch code a while back, but it just seemed logical to make the changes since I had to rewrite a lot of things in order to use the ToolSet Direct Page for variable storage rather than allocating the memory from the stack. The changes were for the best since the sprite dispatch became a bit faster and scales with the number of sprites per line a bit better, but it’s always had to get motivated to debug new and gnarly code.

I’ve actually been giving some thought to imposing a restriction of the sprite sizes in order to speed up the code. Most of the late 80’s-era video game systems — which GTE is targeting — restrict the sprites to be something like 8×8, 16×8, 24×16, etc. It probably wouldn’t make much difference since the sprites are compiled anyway. More annoying than productive.

02.23.08

I’m not dead yet….

Posted in Apple ][ at 12:35 pm by site admin

I haven’t posted for several months as I’ve been putting all my time and energy into finishing up my graduate studies and trying to move on to the next chapter of my life. Since there’s no immediate end to school in sight, I thought I’d write a quick post in case anyone actually stops by this blog to see if I’m still active or not. :)

Work on GTE and MapEdit has pretty much frozen since the last update. I got bogged down trying to get to the bottom of some very obscure crashes in the library. As I’ve posted before, I’m trying to convert GTE into a proper ToolSet, but this conversion has broken some simplifying assumptions I had previously been able to make and caused some subtle bugs to be exposed. This latest one was particularly insidious.

GTE has to allocate several banks of memory in order to buffer various pixel data as it runs. It is very important that this memory be bank aligned since it assumes a one-to-one correspondence between the buffers and the physical SHR screen. I had simply added a call to NewHandle in the GTEStartUp function and thought that would be sufficient — after all, if I’m allocating a multiple of 64K, shouldn’t the memory manager return a bank-aligned memory block. Turns out the answer if “no”. This Tech Note explains the problem in more detail. It really would have been convenient for the Memory Manager to implement a bank-aligned memory flag, or, even better, a power-of-two alignment argument. But, alas, it did not. Apparently the IIgs Loader has to deal with this problem as well, since OMF segments can have a bank-aligned flag set.

I had not been properly dealing with this complication and GTE was using memory buffers that were not bank-aligned. This broke all sorts of critical assumptions within the code and was the root cause of repeated crashes. I cannot implement the solution described in the tech note, because GTE requires multiple, consecutive banks as it relies on the bank-crossing behavior of the indexed addressing modes. Instead, I decided to waste an extra 64K of memory and allocate X + contiguous bytes and round the address up to the nearest bank. This ensures that I will have at least X bytes of memory to work with. Even with all of its features turned on, GTE requires only 384K of memory, so it’s not such a waste. I could always free the allocated memory and reallocate the proper amount using the fisr bank address, too. I may once I’m assured that the rest of the system is functioning properly.

That’s all for now!

07.29.07

Another Map Editor update

Posted in Apple ][ at 5:03 pm by site admin

I’ve posted another version of the MapEdit tool. This should be feature complete and I’ve made it a proper Application Bundle under OS X. Here’s a the quick summary of changes:

  • Fixed a rendering bug that prevented Fringe tiles from displaying in the Rendered view.
  • Packaged the application with a proper Manifest in the JAR file. You should be able to launch the editor via java -jar MapEdit.jar
  • Packaged the application as a proper Mac OS X bundle. Just double-click MapEdit.app.
  • Expanded the sample map to show how to use the Fringe layers to simulate overlapping objects
  • Added a 16-color version of the tile set for experimentation named smb16.png

I have not done much with GTE itself for the past few days because I intend to use the MapEdit tool to produce all the level data I need and make sure that it can be read in and displayed on a real GS. The whole “eat your own dogfood” idea. Now that the editor is in a feature-complete state, I can turn my attention to the engine.

That’s all for now!

07.25.07

New Map Editor

Posted in General at 2:14 pm by site admin

I just posted an updated version of my MapEdit Java application for creating GTE-compatible tile maps. I supports most of the new features I’ve introduced into the engine over the past couple of weeks and should still be able to load any maps created with the previous version.

The interface is much nicer as you can see in the screen shot below. Grab yourself a copy. The source code is included.

07.23.07

Sprite Dispatch

Posted in General at 5:41 pm by site admin

As a follow-up to my previous post, I thought I’d share how GTE rasterized and integrates the sprite dispatcher with the fast blitting code. This can serve as an aid to understanding the source code in blitters.asm and sprites.asm.

A single frame is rendered by calling the GTEUpdate(int flags). At a high level, this routine performs the following steps

  1. Initialize the blitter data structures
  2. Call _OAMEnqueue() to rasterize each sprite in Object Attribute Memory (OAM).
  3. Enter the blitter loop
  4. For each visible scan line, i
    1. Jump to the subroutine pointed at JTable[i] via the jmp (abs,x) instruction.
    2. Return via a direct-page address using the JML (abs) instruction.
    3. Increment the visible scan line index

When a sprite is rasterized in _OAMEnqueue, it updates the pointer in the JTable and direct page return address. Here are two code fragments. The first represents the execution path for a line without sprites and the second for a line with sprites.

loop: ldy line_index
      ldx JTable,y
      jmp (blitter,x)

blitter_001:
      ldx #animate_tile_addr
      ldy #layer_two_addr
      lda #screen_addr
      tcs
      jml (blitter_entry)

blitter_code_entry:
      pea $1234
      pea $4321
      lda 04,x
      pha
      lda (06),y
      pha
      ....
      jml (blitter_return_addr)

blitter_return:
      restore state
      jmp loop

Now, the code for sprites

loop: ldy line_index
      ldx JTable,y
      jmp (blitter,x)

blitter_001:
      lda VBL_counter
      cmp flicker_free,y
      bcs blitter_001
      ldx #animate_tile_addr
      ldy #layer_two_addr
      lda #screen_addr
      tcs
      jml (blitter_entry)

blitter_code_entry:
      pea $1234
      pea $4321
      lda 04,x
      pha
      lda (06),y
      pha
      ....
      jml (blitter_return_addr)

blitter_return:
      ldx line_index
      lda #sprite_stack
      tcs
      jmp (SHead,x)

sprite_headers:
      lda #screen_address
      tcd
      ldy #mask_addr
      ldx #mask_addr2
      jsl sprite_raster

sprite_raster:
      lda 00
      eor #sprite_data
      and screen_mask,y
      and >screen_mask2,x
      eor 00
      sta 00
      rtl
      jmp next_sprite
      restore state
      jmp loop

So, as you see, the sprite rasterization code inserts a small code fragment to synchronize with the vertical blanking signal just before blitting the line and forces the blitter to return into the sprite dispatch code before moving on to the next line. Since all these actions are table-driven, the code can be set up outside the inner loop which reduces the latency of drawing sprites and minimizes the chance of flicker.

I’ll post more information as things fall into place.

Sound Off!

Posted in Apple ][ at 5:12 pm by site admin

Well, K-Fest wrapped up this weekend and, even though I didn’t attend, I was able to follow the events much more closely this year thanks to Ryan Suenaga’s live podcast and the regular updates on c.s.a.2 from Kirk Mitchell.

On the GTE front, I have not has as much time as I had hoped due to other projects needing my attention. I did finish a fair bit of refactoring in the sprite dispatcher, though. I was able to remove nearly a thousand lines of code and make everything faster as well! This probably won’t be a very visible change to the users, but I always like to keep improving the architecture of the tool

A related highlight is that Chris Shepherd caught the hacking bug and released a great new version of his Oversampler-derived sound library. It can now play one-shot samples in addition to “unlimited” tracks from disk. I’m looking forward to providing Chris with the updated GTE soon, so watch for some interesting new work in the IIgs world.

07.16.07

Back to GTE

Posted in Apple ][ at 6:13 am by site admin

Inspired by Christopher Shephard’s resurrection of his own IIgs, I’ve been putting in some time of GTE to finish the conversion to a Tool Set and polish the code itself. It’s amazing how many more ideas you get by looking at code afte an extended break. I’ve been refactoring a large percentage of the code base, finding new ways to optimize the execution and just generally making the code better.

I’m quite happy with the amount of progress made over the past week. The engine is significantly simpler in it’s implementation, which makes it faster due to less overhead, the sprite rendering is especially cleaned up. I’ll try to get some screen shot up soon to show off some of the new capabilities.

03.13.06

Moving toward Beta

Posted in Apple ][ at 1:05 pm by site admin

With school and work, I certainly have not had as much time to work toward the Beta release of GTE as I would have liked. None the less, quite a bit of progress has been made, especially considering the amount of changes that are happening between releases.

The major pieces of work have centered around

  • Documentation
  • Repackaging as a user tool
  • Streamlining the API

When the Beta is released, it will be accompanied by a full set of toolbox-style documentation as well as a FAQ and sample code. Hopefully this will lower learning curve to the point where it will be usable by others.

The conversion to a user toolset was primarily motivated by a request that the library be callable from GSoft BASIC. Since GSoft requires that user libraries be implemented as user tool sets, this choice was, in effect, made for me. A secondary benefit to the conversion, is that I have the opportunity to integrate the library more tightly with the other toolsets. This should enable Desktop programs to use GTE.

Here is a partial list of what has changed so far:

  • GTE is loaded as a user toolset
  • [Create|Destroy] renamed to [New|Dispose] to follow toolbox naming conventions
  • NewSprite and NewTile take locInfo records to specify the pixel data instead of direct pointers. This matches QuickDraw II conventions
  • Sprites and Tiles are now referenced by Handles instead of Pointers
  • The on-screen field geometry is specified by a Rect. This matches QuickDraw II convention.
  • Only two TileMaps are needed, one for BG0 and one for BG1. Separate maps for Fringe and Mask are eliminated.
  • TileMap elements are now 16bits, versus 8bit. The extra space is used to expand the tile range to 512 tiles and incorporate control bits like the SNES graphics subsystem. There are bits to control horizontal flip, vertical flip, priority, and fringe. The BG1 tile map ignores the priority bit.
  • The sprite and tile compilation routines were rewritten in assembly. The compilation times were a significant bottleneck and should be much faster.

I’m not making any promises on a release date, but I expect it sooner than later.

02.04.06

GTE Alpha Release

Posted in Apple ][ at 1:40 pm by site admin

Well, the first non-development alpha release of GTE is out. The library itself is pretty feature complete, but I’m sure there are still things that will need to be added and the entire library needs constant and consistent attention regarding the ease of use of the API and internal conventions.

I’ve already added a couple of significant features over the past few days. First, there will be a new pair of functions: GTEGetTileRange and GETSetTileRange. These allow the user to get/set a retangular range of tiles in a particular tile set. GTEGetTileRange is probably most useful for helping with collision detection between sprites and the background. GTESetTileRange can be used to change the playing field dynamically, e.g. breaking blocks. This function takes care of all the details in order to make sure the screen is updated correctly.

Another addition is that the DrawForth (horrible name) sprite type has been removed and is incorporated into the Sprite object automatically. This sprite type was meant to be used to place sprites “behind” the playing field, but that capability is now exposed via the flags field of the OAM array. When the fPriority bit is set, the sprite is placed behind the playing field. This is a much closer match to the NES/SNES programming model.

The biggest obstacle between now and an Alpha 2 release is a hard-to-replicate bug which causes sprites to not be displayed on-screen. It doesn’t cause a crash, which I might prefer since it’d help pin down what is happening, but changing the optimization level of the C code triggers it. Thus, I think it’s probably a stack-based bug, but you never know. I’m not looking forward to tracking this down since it will probably be very difficult to isolate.

Based on some feedback from comp.sys.apple2 newsgroup, I may try and convert the library to pascal calling convensions and make it a User ToolSet. That way it can be easily interfaced to GSoft BASIC. I bet a lot more people would use it from BASIC than C. :)

’till next time.

01.31.05

One Feature Down, Three to Go

Posted in Apple ][ at 1:00 pm by site admin

I was finally able to debug the full-screen dynamic masking feature of the engine which will allow sprites to be placed behind foreground graphics, which will make for really nice effects. This is also a critical feature in order to bring the engine up to parity with the NES’ priority bit behavior. This feature is used to good effect in SMB3 where Mario could fall “behind” the screen by holding down on the big white blocks.

The next phase of work will focus on cleaning up the modularity of the blitter code. Right now the blitter assumes a rectangular screen and I plan to remove this limitation by making the blitter more table-driven. Also, each line that is drawn by the blitter has several internal bits set which determine what code path to take. This is currently limited to specializing for even/odd aligned blits and spite/non-sprite lines. This mechanism will be generalize and exposed to the engine programmer in order to mix and match different blitter mechanism for a given update. Once these changes are in place, it should be quite simple to create a demo which incorporates a pseudo-three-layer parallax effect.

The final feature to integrate is applying the idea described in a previous post which will provide a full-screen, independent scrolling background. Since this is all blitter related work, this will all be done concurrently.

01.27.05

New Links and Kalah

Posted in General, Machine Learning at 12:59 pm by site admin

I’ve updated the links on the right panel from the default ones that came with the WordPress installation to something a little more personal.

I haven’t had much time to work on IIgs related items because I’ve gotten distracted writing a Kalah AI for the class I’m TAing this quarter. Kalah is a simple board game for which there are countless variations. Do a google search to see some of them. Anyway, I never really had much of a chance to work on “traditional” AI search problems before, and I’ve gotten really interested in making a good AI for this project.

All the information I’ve used has come from one of three sources:

My present AI uses interative deepening search around MTD(f) which, in turn, uses a soft-fail alpha-beta pruning version of the negamax search algorithm with successor node ordering. While this sounds very complicated, my AI is only about 150 lines of Java code and, running on a 1GHz G4 PowerBook, routinely gets down to ply-10. I understand that a state-of-the-art AI should be getting down to around 30-ply for this game, so there’s still some work to be done. The biggest shortcoming of the current AI is the lack of a transposition table. This really slows down the MTD(f) since it is making multiple alpha-beta searches and is forced to reevaluate many positions. I’ll be quite happy if I can read plys in the mid-teens and keep the code size around 250 lines.

My intent with this project is not to produce a world-class AI, but try to make something that performs well in a small amount of code that is clearly written. The fun part of this, for me, is to see how clean and simple the code for an AI program really can be.

One last thing worth mentioning, the AI I wrote currently beats me roughly 50-22 on a Kalah(6,6) board. I suck.

01.24.05

Parallax Redux

Posted in Apple ][ at 8:45 pm by site admin

So now that I’ve been back working on the engine for a while, I’ve had a chance to rethink and revisit some issues that have been left for far too long. One of the most limiting aspects of GTE is that the BG1 layer, which lives in Bank $00, is severely memory limited and, since the current scheme requires a duplicate copy of the graphics data for proper scrolling, there is no way to provide a full-screen BG1.

The reason for this limitation is due to how BG1 is exposed in the blitter. At the lowest level, the blitter uses either PEA instructions to push data from BG0, or PEI instruction to push data from BG1 onto the screen. However, since the PEI addresses are shifted when the scren scrolls, the data must be duplicated so that the DP register can be moved to compensate for the scrolling. This introduces complexity in the code and a dependency between the positioning of BG0 and BG1.

Now, I think I’ve come up with solution which removes the requirement to keep an extra copy of BG1 data around and removes the dependency between the layers. The core change is to switch the 3-byte BG1 code sequence from

PEI 00
NOP
PEI 02
NOP
.
.

to the following

LDA (00),y
PHA
LDA (02),y
PHA
.
.

The reason this fixes the problem is that now the proper offsets can be stored in a double length array in Bank 00 which will only take up about 336 bytes and the y-register is the base address of each line. This will complicate the blitter code a bit since another register needs to be set, but the advantage of having a full-screen BG1 is worth it.

Look for this feature to be added to the next release of GTE!!

« Previous entries