Question:

source program vs object program

by Guest4458  |  12 years, 7 month(s) ago

0 LIKES UnLike

source program vs object program

 Tags: Object, Program, source, vs

   Report

1 ANSWERS

  1. paafamily
    The notion of source and object code as disjoint classes of computer
    code is a false dichotomy common among non-programmers.  The general
    public's understanding is that computer programs are written as
    "source code" that is human readable and not immediately executable by
    the machine.  Source code is supposed to contain meaningful variable
    names and helpful comments that are intended only to be read by
    humans.  A piece of software called a "compiler" must convert source
    code to object code before the program described in the source code
    can be executed.  Object code cannot be read by people; it is a
    sequence of bytes that encode specific machine instructions that will
    be executed by the microprocessor when it runs (executes) the program.

    The above statements are not exactly false; they are in fact the usual
    way of explaining to laymen how a computer works.  However, any
    attempt to draw legal distinctions between "source" code and "object"
    code, or between code that is executable and code that is not, will
    immediately encounter difficulties, because these dichotomies are only
    a convenient fiction.  Computer scientists don't view things this way
    at all.  Here are some of the problems:

    1. "Source" and "object" are not well-defined classes of code.  They
    are actually relative terms.  Given a device for transforming programs
    from one form to another, source code is what goes into the device,
    and object code (or "target" code) is what comes out.  The target code
    of one device is frequently the source code of another.  For example,
    both early C++ compilers and the Kyoto Common Lisp compiler produced C
    code as their output.  C++ (or Common Lisp) was the source language; C
    was the target language.  This output was then run through a C
    compiler to produce symbolic assembler code as output.  This then had
    to be run through yet another proram, the "assembler", to produce
    binary machine code that could be directly executed by a processor.

    In summary, programs typically go through a series of transformations
    from higher level to lower level languages.  The assembler language
    code that a C compiler produces as "object" code is source code for
    the assembler.  Today, the GNU C compiler (known as gcc) will deliver
    assembler language code instead of binary if the user requests it by
    specifying the -S switch.

    2. Even binary machine code is perfectly readable by humans.  It was,
    after all, designed by humans.  It may be tedious to read, but this
    can be helped somewhat by using a program called a disassembler to
    translate the raw binary instructions back into symbolic form.  For
    example, the Pentium instruction "add 7 to register AL" is written
    0000010000000111 in the machine's binary language; it is written in
    symbolic form as "ADD AL,7".  (Reference: Intel Architecture
    Developer's Manual, 1997 edition, volume 2, page 3-17.)  Converting
    between these two forms is trivial.

    The instant litigation seems to be a result of a person having read
    and understood a piece of machine code.  A document signed by 1) a
    Norwegian minor, Jon Johansen, 2) an "anonymous German cracker" from
    the group MoRE (Masters of Reverse Engineering), who is designated the
    group's "top coder/disassembler", and 3) an individual referred to
    only as "[dEZZY/DoD]", suggests that the original CSS "crack" was
    performed by disassembling a software DVD player and rewriting the
    descrambling algorithm in C.  The "anonymous C source code" in my
    Gallery of CSS Descramblers (http://www.cs.cmu.edu/~dst/DeCSS/Gallery)
    may be the fruit of that effort.

    3. Human-readable programs do not have to be compiled in order to be
    executed.  Compilation merely transforms the program into a form that
    can be executed more efficiently.  But a piece of software called an
    "interpreter" can execute source code directly, without compiling it.
    Programs normally run more slowly when they are being executed by an
    interpreter than when they are first compiled into machine
    instructions that the processor can execute directly.  But they do
    run.  And interpreters have some advantages: they are easier to write
    than compilers, and they are better suited for debugging purposes.

    An interpreter for the C programming language, called EIC, is
    available for free on the web at http://www.anarchos.com/eic/.  There
    are other C interpreters as well.

    Some languages, such as MATLAB, are normally executed in interpreted
    mode.  Today, most laser printers have a computer inside running a
    PostScript interpreter.  When a document prepared in a text editor
    such as Microsoft Word is to be printed, it is first converted to a
    Postscript program.  The Postscript interpreter inside the printer
    then executes the program to construct the page image that is sent to
    the printing engine.  Postscript programs are easily human readable;
    they do not even require a disassembler.  Yet they are executable.

    4. Binary "machine code" isn't necessarily directly executable by the
    computer's microprocessor.  Languages such as Java and PERL compile to
    what is called 'byte code", or "virtual machine code".  This is binary
    code for an idealized, imaginary processor that does not correspond to
    any particular commercial processor architecture, such as the the
    Pentium, SPARC, or PowerPC architectures.  The virtual machine code
    must then be executed by a piece of software called a "byte code
    interpreter" that simulates the imaginary processor.  The advantage of
    this approach is that it allows one to quickly produce Java or Perl
    implementations for new computer architectures, because the same
    compiler can be used.  Only a new byte code interpreter is required.
    Byte code interpreters are easier to write than compilers.

    Another approach taken by some implementations is to compile a piece
    of Java byte code into native machine instructions (e.g., Pentium
    instructions, SPARC instructions, or PowerPC instructions) when that
    piece begins to execute.  In this scheme, the Java byte code becomes
    the "source code" and the native mode instructions are the target
    code.

    The new Crusoe processor from Transmeta treats Pentium code as virtual
    machine code.  Although it appears to execute Pentium code, internally
    the chip translates the code into another machine language that is
    considerably more complex, in order to take better advantage of
    pipelining and other optimization techniques.  Hence, what the layman
    would call object code is actually the source code for the Crusoe's
    transformation process.

    5. "Binary executable" code sometimes isn't.  Consider the DECSS.EXE
    file that is the subject of the instant litigation.  This file
    contains a program expressed as machine instructions directly
    executable by a Pentium processor.  It also includes "system calls",
    specific to the Microsoft Windows operating system, to perform
    input/output operations.  If this file were downloaded to a Sun SPARC
    workstation running Unix, or a PowerPC running the Macintosh operating
    system, it would not be directly executable.  But all is not lost.

    The SPARC or PowerPC owner could construct a Pentium emulator program
    to simulate the operation of the Pentium processor.  Such programs are
    commonplace; in fact they are a necessary step in the development of
    any new processor architecture.  He or she would also need to build a
    small portion of a Windows emulator in order to handle the Windows
    system calls.  At that point, DECSS.EXE could be executed by the
    emulator program.  An alternative approach would be to build a
    translator program to convert Pentium instructions to SPARC or PowerPC
    native instructions.  Again, this is a well-known technique in
    computer science and not technically difficult.  DECSS.EXE would then
    be the input, or source code, for the emulator or translator program.

    6. Executable binary code has much of the same expressive content as
    code written in a symbolic language, be it assembly language or a
    higher level language such as C or Lisp.  Appendix A shows a small C
    program for computing 5 factorial (the product of the integers from 1
    to 5).  Appendix B shows the assembler language output produced by gcc
    version 2.95.3 on a Sun UltraSparc 170 running the Solaris operating
    system.  Appendix C is a dump of the binary output file produced by
    the assembler.  (The values are actually displayed in hexadecimal,
    which is a more compact form of binary.)  Appendix D is the result of
    disassembling the binary file, going back to symbolic assembler
    language code.  Note that line number 20 in Appendix D contains an
    integer compare instruction, "cmp %o0, 5".  The equivalent hexadecimal
    form is also shown: it is 80a22005.  Appendix C shows this same
    sequence in the binary file, mid-way through the line labeled 0000220.
    The same instruction, "cmp %o0, 5", also appears in Appendix B, two
    lines after the label .LL3.  And the equivalent in the C code of
    Appendix A is the sequence "i<6".  The fact that the constant is 6 in
    the C code and 5 in the assembly language code is the sort of thing
    one can learn only by looking at the assembly language code, or the
    binary file that results from it.  (The reason for the difference, 5
    vs. 6, has to do with the particular strategy used by the gcc compiler
    to implement the FOR loop.)

    The lessons that should be drawn from the above are:

    1) All computer code is human readable.  Some forms are simply more
    convenient to read than others.

    2. All computer code is expressive.  Many of the ideas expressed in C
    code are also expressed in the assembly language code that results
    from compiling that C code, and again in the binary machine language
    that is the output of the assembler.  Some content may be lost, e.g.,
    source code comments are typically not preserved in object code,
    although variable names may be.  But some ideas that are only implicit
    in the source code may be made more apparent in the object code, such
    as how a particular sequence of actions should be best expressed in
    terms of processor operations in order to obtain maximum performance
    from the machine.

    3) All computer code is executable.  In some instances it may be
    advantageous to transform the code into another form first, but
    transformation is by no means mandatory.  An interpreter can be
    employed instead.  Interpreters are in common use in computer systems.

    4) "Source" and "object" are relative terms, not absolute categories.

    5) The file DECSS.EXE is a particular expression of an algorithm for
    converting video files from one format to another.  It expresses the
    same algorithm as the C code from which it was compiled.  DECSS.EXE is
    coded in a binary language that is more tedious to read than the C
    code, but more efficiently executable by a Pentium processor.  These
    are differences in degree only.  If C code is protected speech because
    of its expressive content --- and one can argue that a computer
    program is nothing but expressive content -- then code written in a
    binary machine language that expresses the same algorithm should not
    be regarded any differently.

Sign In or Sign Up now to answser this question!

Question Stats

Latest activity: 14 years, 6 month(s) ago.
This question has 1 answers.

BECOME A GUIDE

Share your knowledge and help people by answering questions.