Recent in Technology

Data Hierarchy For Newbies

Bits

A bit is short for “binary digit”—a digit that can assume one of two values—and is a computer’s smallest data item. It can have the value 0 or 1. Remarkably, computers’ impressive functions involve only the simplest manipulations of 0s and 1s—examining a bit’s value, setting a bit’s value and reversing a bit’s value (from 1 to 0 or from 0 to 1).

Characters

Work with data in the low-level form of bits is tedious. Instead, people prefer to work with decimal digits (0–9), letters (A–Z and a–z) and special symbols such as $@%&*()–+":;,?/. Digits, letters and special symbols are known as characters. The computer’s character set contains the characters used to write programs and represent data items. Computers process only 1s and 0s, so a computer’s character set represents each character as a pattern of 1s and 0s. C uses the ASCII (American Standard Code for Information Interchange) character set by default. C also supports Unicode® characters composed of one, two, three or four bytes (8, 16, 24 or 32 bits, respectively).

Unicode contains characters for many of the world’s languages. ASCII is a (tiny) subset of Unicode representing letters (a–z and A–Z), digits and some common special characters. You can view the ASCII subset of Unicode at this link. For the lengthy Unicode charts for all languages, symbols, emojis and more, visit this link.

Fields

Just as characters are composed of bits, fields are composed of characters or bytes. A field is a group of characters or bytes that conveys meaning. For example, a field consisting of uppercase and lowercase letters could represent a person’s name, and a field consisting of decimal digits could represent a person’s age in years.

Records

Several related fields can be used to compose a record. In a payroll system, for example, the record for an employee might consist of the following fields (possible types for these fields are shown in parentheses):

  • Employee identification number (a whole number).
  • Name (a group of characters).
  • Address (a group of characters).
  • Hourly pay rate (a number with a decimal point).
  • Year-to-date earnings (a number with a decimal point).
  • Amount of taxes withheld (a number with a decimal point).

Thus, a record is a group of related fields. All the fields listed above belong to the same employee. A company might have many employees and a payroll record for each.

Files

A file is a group of related records. More generally, a file contains arbitrary data in arbitrary formats. Some operating systems view a file simply as a sequence of bytes—any organization of the bytes in a file, such as organizing the data into records, is a view created by the application programmer. It’s not unusual for an organization to have many files, some containing billions, or even trillions, of characters of information. As we’ll see below, with big data, far larger file sizes are becoming increasingly common.

Databases

A database is a collection of data organized for easy access and manipulation. The most popular model is the relational database, in which data is stored in simple tables. A table includes records and fields. For example, a table of students might include first name, last name, major, year, student ID number and grade-point-average fields. The data for each student is a record, and the individual pieces of information in each record are the fields. You can search, sort and otherwise manipulate the data based on its relationship to multiple tables or databases. For example, a university might use data from the student database combined with data from databases of courses, on-campus housing, meal plans, etc.

Big Data

The table below shows some common byte measures:

Unit Bytes Which is approximately
1 kilobyte (KB) 1024 bytes 103 bytes (1024 bytes exactly)
1 megabyte (MB) 1024 kilobytes 106 (1,000,000) bytes
1 gigabyte (GB) 1024 megabytes 109 (1,000,000,000) bytes
1 terabyte (TB) 1024 gigabytes 1012 (1,000,000,000,000) bytes
1 petabyte (PB) 1024 terabytes 1015 (1,000,000,000,000,000) bytes
1 exabyte (EB) 1024 petabytes 1018 (1,000,000,000,000,000,000) bytes
1 zettabyte (ZB) 1024 exabytes 1021 (1,000,000,000,000,000,000,000) bytes

The amount of data being produced worldwide is enormous, and its growth is accelerating. Big data applications deal with massive amounts of data. This field is growing quickly, creating lots of opportunities for software developers. Millions of information technology (IT) jobs globally already support big-data applications.

One big-data source favored by developers is Twitter. There are approximately 800,000,000 tweets per day. Though tweets appear to be limited to 280 characters, Twitter actually provides almost 10,000 bytes of data per tweet to programmers who want to analyze tweets. So 800,000,000 times 10,000 is about 8,000,000,000,000 bytes or 8 terabytes (TB) of data per day. That’s big data.

Prediction is a challenging and often costly process, but the potential rewards for accurate predictions are great. Data mining is the process of searching through extensive collections of data, often big data, to find insights that can be valuable to individuals and organizations. The sentiment that you data-mine from tweets could help predict the election results, the revenues a new movie is likely to generate and the success of a company’s marketing campaign. It could also help companies spot weaknesses in competitors’ product offerings.

Post a Comment

0 Comments

People