A Brief Introduction to Computer Science


Computer science is not programming

Being introduced to computing via programming, it is easy to reach the conclusion that computer science is all about programming. The reality is that programming is the language of computer science, just as mathematics is the language of physics, but there is much more to computer science than programming. This chapter introduces a few ideas and areas of computer science to give you a flavor of it.

The underlying question that computer science addresses is, "what can be automated?" For those processes that can be automated, the obvious question is how well the automation can work, and how such automation can be done in a reliable and cost-effective way.


What can be computed?

A basic kind of computational problem is a function, for example one that takes one string as an argument and produces another string as its answer. You have already seen how a program can be written that solves some examples of those problems.

But what about a more general question? Is it possible, if you are clever enough, to write a program for any such function that you can describe, or are there inherent limitations in our ability to write programs that can never be overcome?

Surprisingly, the answer is that there are some inherent limitations. Some problems cannot be solved on a computer, even though they have relatively simple descriptions. An example is the following problem.

Input. The input is an expression, as a string. To write the expression, you can use variable names, integer constants and the operations + (addition), - (subtraction) and * (multiplication). For example, the input might be "x*x - 2*y + 1".

Problem. Say whether there exist values that the variables can have that make the expression have value 0. The answer is either "yes" or "no".

There are actually two versions of this problem. In version 1, the variable's values can be any real numbers. In version 2, the variables are required to have integer values. For input "x*x - 2*y + 1", the answer to both versions is "yes", since x = 3 and y = 5 makes that expression have value 0. But for input "x*x - 2", the answer is "yes" when x is allowed to be a real number by "no" when x must be an integer.

It is possible to write a computer program that solves version 1 of this problem. The program is certainly not easy to write, but it can be written, with enough knowledge and patience. On the other hand, version 2 has no program. It is not a matter of how clever you are. It is simply impossible to define a function that takes any string that is an expression and tells you whether that expression can be made to have value 0 by selecting integer values for the variables. (This remarkable fact answers a question posed by the mathematician David Hilbert in 1900. It was not until about 1970 that the answer was found.)


Efficiency of algorithms

Some computational problems that can be solved on a computer have very efficient algorithms, while others do not. One branch of computer science is concerned with designing efficient algorithms, and with ways of demonstrating that they are indeed efficient. A complementary area is concerned with demonstrating that some solvable problems only have very inefficient solutions.

Prime numbers and factoring a number into its prime factors offer examples of both sides of the coin. Before about 1970, no efficient algorithm was known to determine whether an integer is prime. Then one was discovered, but it had the difficulty that it occasionally (but very rarely) gave the wrong answer. It was only in 2003 that the first always-correct efficient algorithm was found to tell whether a number is prime.

The problem of finding the prime factors of a number is quite different. Nobody knows of an efficient algorithm to do that, and it is widely believed that no such efficient algorithm exists. The problem is easy for small numbers. For example, the prime factors of 15 are 3 and 5, since 3*5 = 15, and 3 and 5 are prime. But before you conclude the problem is easy, what are the prime factors of 267 - 1? Practical applications of factoring require you to be able to factor much larger numbers than that.


Databases and data mining

Computers need to be able to store large amounts of information, and to find particular information among it very quickly. For example, when you use an internet search engine, the service provider's computers need to locate a needle in a haystack; they find just what you are looking for in the enormous amount of information that is available on the internet.

Part of the problem is how to store information and search it quickly. That is partly an algorithmic problem. The constraints are very stringent. For a search engine, you need to be able to search many trillions of bytes of information in less than a second.

Another issue is how questions about the information can be formulated so that you get just what you want. Some internet search engines offer you options for narrowing your search, but searches that are far more specific than those supported by search engines are available in typical database systems.

Data mining is concerned with extracting much more diffuse or difficult-to-obtain information from databases. For example, when Amazon tells you which books other people bought who bought your book, the server has solved a data mining problem.


Human-computer interaction

One area of computer science is the problem of learning effective ways of mediating interaction between a human and a computer. Early methods were based entirely on a keyboard and text-based terminal. The mouse and graphical monitors improved on that tremendously, and touch-sensitive displays are one computer feature that has enabled even more natural interactions. But even with a given technology, there are issues concerning the most effective way to use the technology to make it easy for users to understand the interaction process. Even effective design of web pages (something sorely lacking in many places) falls under this general area. Try doing a google search for web pages from hell.


Artificial intelligence

Artificial intelligence is concerned with programming computers to perform tasks that previously appeared to require intelligence. For example, designing software that allows you to interact with a computer using English is a problem in artificial intelligence.

The problems faced by artificial intelligence are tremendous. The computer needs to store somewhat poorly defined information (from a mathematical perspective), like "I feel that we should work harder to preserve our environment," and it needs to reason and draw conclusions based on it. Even information such as "the almonds are stale" can be difficult to handle. There are also issues of inconsistency that the program needs to work around. For example, the program might know that there are no vampires and also that Dracula is a vampire.

A curious feature of artificial intelligence is that, once it solves a problem really well, the problem is no longer thought of as artificial intelligence at all. Early computers were capable of performing computations that previously had required trained people. Shortly, though, the ability to add up a list of numbers was no longer thought of as an intelligence-requiring activity. The problem of speech recognition was once firmly in the realm of artificial intelligence. Although that problem is not completely solved, and is still partly an artificial intelligence problem, is it slowly migrating into the realm of solved problems that are thought of as being dealt with algorithmically.


Programming languages

Algorithms can be expressed in a variety of different ways, and some ways work better for certain kinds of problems than others. The area of programming languages is concerned with facilitating the program development process by providing the programmer with a view of computaton that is suitable to the problem that needs to be solved.

You have already seen ideas of defining functions by equations, possibly using several cases. That is usually called declarative programming, because you state (or declare) that certain equations are true. You have also seen procedures and loops, a general approach called imperative programming because the program issues commands. In the second half of this course you will see yet another idea of programming languages called object-oriented programming.


Software engineering

Just as the Library of Congress is qualitatively different from your own private bookcase, large pieces of software are qualitatively different from small programs. Large software projects requires teams of developers, and no one person ever sees the whole thing.

Software engineering is concerned with ways of designing large pieces of software that are reliable, on time and within budget, and that do what the intended users want them to do.


Numerical computation

Weather forecasters start with data from observations. They create computer models of the atmosphere (really just computer programs) and employ known equations of physics to predict how conditions will evolve over time. Those computations need to be as efficient and as accurate as possible. It does no good to predict tomorrow's weather using a computer program that takes a week to run.

Generally, numerical computation is concerned with modeling something using equations and then finding approximate solutions to the equations by a computer. The amount of computation required can be tremendous. Supercomputers capable of performing 1000 trillion instructions per second have been employed. Effective use of them requires programs to be designed so that different components of the computer can be working on separate parts of the problem concurrently.


Computer hardware

Computer hardware has improved at a staggering pace. Each improvement required some kind of innovation. The evolution of computer hardware over time is probably better described as a long sequence of miniature revolutions in thinking. For example, just the problem of switching from aluminum wires inside computer chips to copper wires took decades to solve, but once the techniques were mastered, they yielded one more important step to smaller and faster computer chips.

Further innovations are ongoing. Ideas for improving computer hardware come from physics, electrical engineering and computer science.


Operating systems

An operating system is the software that interacts most closely with the computer hardware. Part of its responsibility is to manage resources, including the computer's memory, processor(s), network connections, etc. An operating system builds a fictional view of the computer on top of the raw hardware. For example, you would expect a computer with just one processor only to be able to run one program at a time, and for very early computers that was true. But an operating system makes it seem that the computer can be running many programs at once, by sharing the processor(s) among the programs. The operating system allows each program to imagine that it owns the entire memory, when in reality it owns only a small part of the memory. In computers with more than one processor or more than one core, the operating system allows a program to parallelize itself to take advantage of concurrent execution by two or more processors. An operating system typically also handles devices such as the display and the mouse.

One tool that is becoming commonplace in operating systems is the virtual machine. A virtual machine manager does not try to present an improved or simplified view of the raw hardware. Instead, it just makes it appear that you have several copies of that raw hardware available, even though you really only have one. That way you can run several operating systems together on a single computer. Each operating system thinks that it is running directly on the hardware of its own computer.


Cryptography

Cryptography is (partly) concerned with keeping information secure. For example, there is a program called PGP (Pretty Good Privacy) that uses ideas of cryptography to allow you to send secure emails, encrypt files and so on. Even if somebody intercepts your email, that person will be unable to read the encoded messages.

Some forms of cryptography have been in use since ancient times. But in the 1970s a revolutionary idea was discovered, called public key cryptography, that opened up new possibilities. Public key cryptography allows you to tell people how to use your cipher system to encode messages without giving those other people the ability to decode messages. So encoding and decoding become separated. One effect of that is the ability to do digital signatures, where you (figuratively) sign an electronic document. Your signature can be verified by anyone; the information on how to check signatures relies on encoding, which you can afford to tell everyone about. But it is not possible for somebody else to take your signature on one document and put it on a different document. Doing that requires the ability to decode.

Ideas in cryptography allow two people talking over a telephone to get together to toss a coin in such a way that both people are convinced that the coin toss was fair, even when neither trusts the other. They allow Alice to offer a list of secrets for sale and for Bob to purchase one of them without Alice knowing which secret Bob bought or for Bob to get information about more than one of the secrets, even when Alice and Bob's only communication medium is the internet. They allow you to prove that you possess some secret information to a skeptical person without revealing anything about the details of the information. More ideas are yet to be discovered.


Quantum computing

Researchers are employing ideas from quantum physics to produce revolutionary devices called quantum computers that promise to make extremely secure communication possible and to make computations possible that just could not be performed in other ways. (The problem of factoring an integer into its prime factors, thought to be very difficult for ordinary computers, is solvable very quickly on a quantum computer.) Work on quantum computers is still in its infancy, and there are a lot of problems to be solved before any full-scale quantum computers are available to use.