Virtual machine / assembler design

-=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- (c) WidthPadding Industries 1987 0\|117\|0 -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=- -=+=-
Socoder -> On Topic -> Virtual machine / assembler design
Tue, 07 Dec 2010, 09:14
HoboBen	With a low-level virtual machine and assembler that you might hypothetically have created, how do you decide where something should be an assembly/machine-code instruction or implemented as a host-API function? Should strcat, strcpy and malloc be assembly instructions like mov, jmp and cmp? My thinking is that in a multi-threaded VM, individual instructions should be "atomic" to the VM through whatever magic. strcat and strcpy could be made atomic in this way, but that just seems silly. On the other hand: STRCPY dest, src vs PUSH dest PUSH src HAPI strcpy A strcpy instruction means that there's less virtual code (so it's faster) and less code to write. Hopefully that made any sense at all. -=-=- blog \| work \| code \| more code
Tue, 07 Dec 2010, 10:48
JL235	Hoboben My thinking is that in a multi-threaded VM, individual instructions should be "atomic" to the VM through whatever magic. strcat and strcpy could be made atomic in this way, but that just seems silly. I don't agree at all. This could seriously impact performance; especially since multi-threaded applications should ideally work on their own data as much as possible (to avoid locks as they serialize code). The majority of the instructions won't need to be atomic. In an ideal multi-threaded world it should not matter if your instructions are atomic or not; therefore make them non-atomic. Good multi-threaded applications have as little atomic code as possible because atomicity is only needed when you are sharing state or have side effects. Moving on there are two parts to your question. First it depends on what your VM does. For example in Java I cannot just malloc a block of data. I can create arrays and objects, but not random chunks of memory. When I interact with anything I've created the JVM ensures that it is fully type safe. It also guarantees all my pointers either point to a valid object or null. So will people be able to point to any memory in your VM, and will it guarantee type safety (this can also be skipped at runtime if pre-check your instructions)? Next I would not implement any of these as opcodes in order to keep the VM as small, tight and simple as possible. Instead I would add an 'invoke' instruction which then allows you to invoke an external function or method on an object. This allows print and all the other common things to be implemented through one instruction. How I would implement that is by having your compiler embed the information needed to call that function and leave the 'invoke' function only partially done. At runtime I would then find the function, store a function pointer to it and swap out the instruction for a working 'invoke' instruction which just calls the function directly. This could be done by leaving the invoke opcode empty and then filling them in once you've looked up the function. This avoids the expense of always looking up the function when the invoke instruction is hit and replaces it with a direct and verified pointer to what your calling. Of course when I say pointer this could also be a reference if your in a higher level language. Function could also be replaced with a lambda, closure or anything else executable.
Tue, 07 Dec 2010, 11:37
waroffice	*whooosh*
Tue, 07 Dec 2010, 14:17
HoboBen	I think (although I should definitely check) function pointers are far, far slower than a switch block of a small number of instructions. There's Computed GOTOs, which would be even better (each instruction would be an index to an element in an array of GOTO addresses!), but that's a non-portable GNU C extension. Also I suspect compiler optimizations would almost make the two equivalent. Function pointers are ideal for registering Host API functions though. I'm leaning against having STRCAT, etc. defined as built-in instructions simply because real-world CPUs don't! I know it's a virtual machine (and I've seen others that do) but it also makes less work for the assembler. But thanks, I'll have a muse. I think you're right on the atomicity part; it sounds idiotic now. -=-=- blog \| work \| code \| more code
Tue, 07 Dec 2010, 22:01
mindstorm8191	This might not be the route you wanted to go, but I had an idea of an assembler with a different approach: the plan was to keep it simple yet capable. Graphics and keyboard operations were handled by a device command; memory was managed like an array (or blitz bank), and could be expanded as needed. I didn't get too awfully far with it though... And I don't really know much about writing a virtual machine either! -=-=- Vesuvius web game
Sat, 11 Dec 2010, 07:27
Scherererer	This is a classic CISC (Complex Instruction Set Computing) vs. RISC (Reduced Instruction Set Computing). And to answer your question, modern systems are aimed at being more RISC-like. The idea is to compute with only very simple instructions, let the processor execute them at very very fast speeds, and in modern processors, they can do this out-of-order. This is opposed to CISC machines which, if we take for example the VEX machines, had crazy instructions like polynomial evaluate! Now, as a disclaimer, if you look at an instruction set like x86, you will find string functions present. However, it should also be noted that X86 has existed for a millenia, thus many of the instructions are just left in there for legacy purposes so that there is binary compatibility with old software. It has become a monster. -=-=- YouTube Twitter Computer Science Series: Logic (pt1) (part 2) (part 3) 2's Complement Mathematics: Basic Differential Calculus
Sat, 11 Dec 2010, 10:51
JL235	These days RISC and CISC designs are quite blurred to the point that there isn't a huge amount of difference any more. Modern CISC CPUs break down instructions and are pretty much RISC chips underneath. Lots of other features have also been ported across. One of the original reasons for designing RISC chips can around because compilers weren't taking advantage of the complex instructions in CISC chips at the time. So in relation to the VM; you need to remember that using a more complex instruction set will only improve speed if your compiler takes advantage of those extra instructions. If it will then a more complex design would probably be best. If it doesn't then a simpler design would probably be best. Depending on the instructions your adding this could also lead to you having a more complex compiler. When I was building a VM a few years ago I was toying around with the idea of implementing instructions as objects. Then I could extend the instruction set on the fly as I wanted without affecting the VM (you could even implement your own and add them to a folder which were loaded in on start up). You could maybe toil around with a similar idea for your VM. Maybe not implemented as objects, but to essentially be able to automatically add in instructions when your VM is compiled. This way you could add/remove instructions with ease and it would make it simpler to experiment with finding the right balance.
Sat, 11 Dec 2010, 12:01
HoboBen	Those are great points, thanks.