| |
tr>
|
Archived issues
|
|
|
|
|
#13: April 1997 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
 |
NEWSLETTER #13
mid-April 1997
To unsubscribe from this newsletter, send a blank message to
newsletter-unsubscribe@imdb.com - *not* newsletter@imdb.com. To subscribe,
fill out the survey form on the web site and check the appropriate box.
Welcome to issue 13 of the IMDb newsletter. The newsletter is intended to
keep database users and contributors informed of the latest developments
from the management team. Comments and suggestions are welcome and should
be directed to newsletter@imdb.com. Issue 14 is scheduled for June.
See the further information section at the end of this file for more
information about The Internet Movie Database (IMDb).
this issue edited by Jon Reeves
Contents
by Jon Reeves
Some of you may have gotten some mail recently that offered deals on
magazines, as well as extolling the virtues of our site. While we
appreciate the kind words of the message's author, we were totally unaware
of this message until people started forwarding it to us, and we deplore
the tactics of its sender. Rest assured that we are not advocates of junk
e-mail (in fact, I spend quite a bit of time each week dealing with it)
and would never use it; if you're getting this newsletter, you asked to
get it. We've also taken steps to prevent people from using us as a relay
for their junk mail. In addition, unless you enter a contest (in which
case we may give it to the sponsor) or write a bio or plot and don't
ask to be anonymous, we don't share your e-mail address with anyone.
by Jon Reeves
You may have seen an ad for IMDb on
the official Academy Awards site.
There's an interesting story behind it.
The designers of that site apparently used an ad from one of our old
campaigns, served from our machines, when they were testing their site.
We noticed and asked them not to do this, and they said they would stop.
However, after their site went live, they continued to serve up that ad
to people who had JavaScript turned off. We assumed this meant they
wanted to help us, and replaced it with an ad promoting the Internet Movie
Database, which got an excellent response. The ad was present on Oscar
night and most of the next few weeks but as of this writing is gone again.
Once again, our emphasis on information over glitz meant we were able
to do real-time updates and our servers were able to handle the load
with ease.
So, if you came here from the Oscars site, welcome! We are not affiliated
in any way with the Academy of Motion Picture Arts and Sciences.
by Col Needham
By special arrangement with the publisher, we're pleased to welcome the
Film Threat Weekly to the IMDb. Check the feature of the day every Monday
for the latest issue containing news and information with a focus on
independent film. Regular features include reviews of the latest movies;
film news; picks of the week in several categories; and the US box
office top 10.
All names and titles are linked into the IMDb where appropriate to provide
background information and we're also maintaining an archive of past issues.
by Michel Hafner
INTRODUCTION
After promising it for a long time IMDb has finally replaced its old
official character set, the 7 bit ASCII character set, with the new 8
bit ISO-8859-1 character set (aka ISO Latin 1). This new character set
belongs to a family of ISO sets that were designed to cover the majority
of the important languages of the world.
ISO-8859-1 is optimized for West European languages and can display
almost all characters that are used in Albanian, Catalan, Danish,
Dutch, English, Finnish, French, German, Irish, Icelandic, Italian,
Norwegian, Portuguese, Spanish and Swedish. That's one of the reasons
it was chosen. Many of the important film nations are fully covered
with this set. In addition it's widely supported by e-mail software,
web browsers and operating systems in general.
The main difference between ASCII and ISO Latin 1 is the addition of
96 new characters used in the above mentioned languages among others.
These characters are:
¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ® ¯
° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿
À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï
Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß
à á â ã ä å æ ç è é ê ë ì í î ï
ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ
More details about the different ISO sets can be found
here.
ISO Latin 1/2/3... and IMDb
ISO Latin 1 is now the official character set of IMDb. This means that
- all new names and titles and all other new text entered into the
database that need these new characters to be spelled correctly must
be submitted using ISO Latin 1.
- all old names and titles and all other old text already in the database
that need these new characters to be spelled correctly MUST be
converted to use ISO Latin 1.
We have used the long preparatory phase to industriously collect the
ISO versions of titles and names so we were able to start with a
sufficiently large portion of data already converted. But there remain
literally thousands of names to be adapted, lots of character names to
be swapped, attributes in different lists to be replaced and general
text to be updated. We hope you will help us here and mail in corrections
as time goes by. The corrections can be mailed in like regular
corrections using the mail server and the usual keywords.
Since not all computer systems/mail software do support ISO Latin 1 we
have provided for alternative ways of entering data.
- All names and titles that are mailed in with ASCII and have a ISO
Latin 1 counterpart already in the database are automatically swapped
to the ISO Latin 1 version. So you can mail them in with ASCII and
cause no problems doing so:
Examples:
| You mail in | The mail server swaps to |
| Bunuel, Luis | Buñuel, Luis |
| Aberg, Anders | Åberg, Anders |
| Aaberg, Anders | Åberg, Anders |
| Beart, Emmanuelle | Béart, Emmanuelle |
| Bene, Gyoezoe | Bene, Gyözö |
| Bene, Gyozo | Bene, Gyözö |
| Bressler, Gunter | Breßler, Günter |
| Forque, Jesus-Maria | Forqué, Jesús-María |
| Wer zweimal lugt (1993) | Wer zweimal lügt (1993) |
| Was fuer ein Genie (1985) | Was für ein Genie (1985) |
| Voeroes grofnoe, A (1984) | Vörös grófnö, A (1984) |
| Vi paa Vaeddoe (1958) | Vi på Väddö (1958) |
| Ultima pelicula, La (1971) | Última película, La (1971) |
- If a name has a ISO Latin 1 version but only the ASCII version is
correct, since the ASCII version is for one person and the ISO version
for another, you have to use Roman numerals to turn off auto swapping:
Example: Berger, Pamela (I) versus Berger, Paméla (II)
The Roman numerals are permanent in this case and are also used
throughout the database not just for input purposes.
- If you are familiar with the way HTML encodes ISO Latin 1 characters
you can use this encoding too in your mailings to the mail
server. The relevant mappings are:
| Æ --> Æ | Á --> Á | Â --> Â | À --> À |
| Å --> Å | Ã --> Ã | Ä --> Ä | Ç --> Ç |
| Ð --> Ð | É --> É | Ê --> Ê | È --> È |
| Ë --> Ë | Í --> Í | Î --> Î | Ì --> Ì |
| Ï --> Ï | Ñ --> Ñ | Ó --> Ó | Ô --> Ô |
| Ò --> Ò | Ø --> Ø | Õ --> Õ | Ö --> Ö |
| &Thorn; --> Þ | Ú --> Ú | Û --> Û | Ù --> Ù |
| Ü --> Ü | Ý --> Ý | á --> á | â --> â |
| æ --> æ | à --> à | å --> å | ã --> ã |
| ä --> ä | ç --> ç | é --> é | ê --> ê |
| è --> è | ð --> ð | ë --> ë | í --> í |
| î --> î | ì --> ì | ï --> ï | ñ --> ñ |
| ó --> ó | ô --> ô | ò --> ò | ø --> ø |
| õ --> õ | ö --> ö | ß --> ß | þ --> þ |
| ú --> ú | û --> û | ù --> ù | ü --> ü |
| ý --> ý | ÿ --> ÿ |
Example:
NAME
Béart, Emmanuelle
- If your mailer does support the ISO Latin 1 character set make sure
that all data you are mailing in is not direct 8 bit ISO Latin 1 but
MIME compatible encoded ISO Latin 1 data using the Quoted-Printable
encoding that uses only ASCII characters. This is necessary because
not all mail systems between your computer and ours that transport
your mail can handle raw 8 bit characters. Some simply ignore the
special ISO Latin 1 characters and remove them from your additions
so names and titles get mutilated. While we often can and will
recognize and correct these amputated versions they must be avoided
at all costs. So please configure your mailer properly or ask your
system administrator, if you can not do it yourself.
Be aware of excessive/missing use of ISO Latin 1 in certain culturally
biased sources. For example, French sources might use Marlène Dietrich
because Marlene is spelled Marlène if it is a French first name.
But since Marlene Dietrich is German and not French and she made her
career in Germany and USA for the most part which both spell her first
name as Marlene the ISO version is not correct here and has to be
avoided. Likewise be aware of English and other sources that often
ignore the need for ISO and spell everything using ASCII which is
again not correct and has to be avoided. This is very widespread!
A generally reliable source from country x spells correctly for data
from its own culture and language but fails to do so for data outside
this area (and competence). So it's safest to use Spanish sources for
Spanish data, French sources for French data, Italian sources for
Italian data etc.
While ISO Latin 1 covers most of the languages spoken in important
film nations it does not provide all necessary characters for languages
such as Czech, Hungarian, Romanian, Estonian, Latvian, Lithuanian,
Bulgarian, Macedonian, Russian, Polish, Serbian, Turkish and others.
In addition, languages using radically different character sets such
as Hindi, Greek, Arabian, Hebrew or pictogram based languages such
as Japanese and Chinese are not directly representable. The situation
concerning IMDb is as follows for the time being:
- Czech, Hungarian, Polish, Romanian, Croatian, Slovak, Slovene...
that have as native character set ISO Latin 2:
Data must be transliterated to ISO Latin 1. The mappings are
straightforward. If an accented character is missing in ISO Latin 1
use the non accented version.
Examples:
Svêrák, Jan --> Sverák, Jan (the ê should have the ^ upside down,
a character not in ISO Latin 1)
and not Svêrák, Jan
Kies'lowski, Krzysztof --> Kieslowski, Krzysztof (the s should
have a ' on top of it, a character not
in ISO Latin 1)
There is one exception so far: the characters u'' and o'' (the ''
should be on top of the u and o) as used in Hungarian are mapped
to ü and ö!
Examples:
Mihályi, Gyo''zo'' ---> Mihályi, Gyözö
Szu''cs, Gábor --> Szücs, Gábor
(If you are knowledgeable about any of these ISO Latin 2 languages
and feel strongly that the mappings should be different please let
me know so we can discuss it.)
There is also the possibility to mail in ISO Latin 2 data itself!
If you want to mail us the correct ISO Latin 2 version of a name or
title now in ISO Latin 1 use the new server keywords
ISO2NAME and
ISO2TITLE
Example:
ISO2NAME
Szegö, András|Szegõ, András|
Kieslowski, Krzysztof|Kie¶lowski, Krzysztof|
ISO2TITLE
Aniol ciemnosci (1991)|Anio³ ciemno¶ci (1991)|
Csillagszemü, A (1977)|Csillagszemû, A (1977)|
The trick here is to encode everything as ISO Latin 1 but using for
the right side the characters that are binary identical to the
correct ones for ISO Latin 2! So the left side looks correct and
the right side looks funny if you use a ISO Latin 1 font and vice
versa if you use a ISO Latin 2 font.
The data will not be used directly in the database since both
character sets can not be mixed together with current web and mail
software. It will be used later when this is possible. (See UNICODE
below.) The data collected so far will though be browsable on our
WWW servers so you can avoid mailing in data we already have.
- Galician, Maltese, Turkish and other languages with native character
set ISO Latin 3:
Data must be transliterated to ISO Latin 1. The mappings are
straightforward. If an accented character is missing in ISO Latin 1
use the non accented version.
There is no server support for direct ISO Latin 3 data for the time being.
- Languages with native character set ISO Latin 4: same as ISO Latin 3
(transliterate to Latin 1).
- Bulgarian, Macedonian, Serbian, Byelorussian, Ukrainian with native
character set ISO Latin 5 (Cyrillic): same as ISO Latin 2.
The new server keywords are
ISO5NAME and
ISO5TITLE
- Russian:
Data must be transliterated to ISO Latin 1. So far no unique system
has been enforced but English transliteration standards have been
used mostly.
There is also the possibility to mail in Cyrillic data itself! If
you want to mail us the correct Cyrillic version of a name or title
now in ISO Latin 1 use the new server keywords
RUSSIANNAME and
RUSSIANTITLE
These are expecting data in the KOI8-R character set, and not ISO
Latin 5! Again the trick here is to encode everything as ISO Latin
1 but using for the right side the characters that are binary
identical to the correct ones for KOI8-R.
Example:
RUSSIANNAME
Tarkovsky, Andrei|ôaÒËÏ×ÓËÉÊ, áÎÄÒÅÊ|
RUSSIANTITLE
Chapayev (1996)|þÁÐÁÅ× (1996)|
The right side here looks quite strange, but compiling the data is
easy if you know Russian and use a KOI8-R font while working on the
right side and a ISO Latin 1 font for the left side.
- Arabic (ISO Latin 6): same as ISO Latin 3 (transliterate to Latin 1).
- Modern Greek (ISO Latin 7): same as ISO Latin 2.
The new server keywords are
ISO7NAME and
ISO7TITLE
- Hebrew (ISO Latin 8): same as ISO Latin 3 (transliterate to Latin 1).
- Japanese:
Data must be transliterated to ISO Latin 1. So far no unique system
has been enforced but the official transliteration scheme we are
aiming at is modified Hepburn romanization. Circumflexes for long
vowels are accepted since macrons are not available. Capitalization
is lower case except for the first letter of the first word and
proper names in titles.
- Chinese (Mandarin/Cantonese):
Data must be transliterated to ISO Latin 1. So far no unique system
has been enforced. Input by knowledgeable users is most welcome so
we can look at defining a strict policy. If interested, mail me.
- Indian languages and all others not yet discussed: same as ISO Latin 3
(transliterate to Latin 1).
Ideally all data should be presented using its native character sets/
pictograms. Technically this is not possible though with current
widespread software for web access, e-mail and operating systems in general.
In the future there will be a new huge standardized 16 bit character set
called Unicode. It will offer the capability to freely combine Japanese
Kanji with ISO 1 text and Hindi, for example. We will use it as it becomes
widely available and supported by the industry.
I hope you enjoy the new more accurate ISO 1 data we offer now and also
use the new possibilities for data addition with ISO 2/5/7 and KOI8-R.
by Michel Hafner
Until now only alternative titles in the languages of the co-producing
countries were accepted. This policy was reasonable because
- a firm basis of primary titles had to be compiled first before a
flood of alternative titles in various languages can be added without
creating chaos.
- the old ASCII character set had to be replaced with the new ISO Latin
1 set so collecting large amounts of titles can be done using their
native character set or a better approximation to it than ASCII.
The prerequisites for general alternative titles are now given and
since demand for these is big we will introduce them within the next
few weeks. The new server keyword and format will be announced in
time. Until then the old policy is valid, so please do not start to
mail in Swahili titles for US movies right now! :-)
If you have large collections of such titles (at least several
hundred) that you would like to donate please mail me so we can
optimize the transfer.
by Jon Reeves
Here's the most popular searches people have done lately, based on total
pages for the week ending April 19.
Titles:
- 1. Star Wars (1977)
- 270. Saint, The (1997)
- 8. Romeo + Juliet (1996)
- 12. Batman & Robin (1997)
- 3. Jerry Maguire (1996)
- 4. English Patient, The (1996)
- 179. Liar Liar (1997)
- -. Grosse Pointe Blank (1997)
- 79. Devil's Own, The (1997)
- 10. Scream (1996)
- 18. Lost World: Jurassic Park, The (1997)
- -. Chasing Amy (1997)
- 7. Star Wars: Episode I (1999)
- 20. Pulp Fiction (1994)
- -. Anaconda (1997)
- 16. Empire Strikes Back, The (1980)
- 234. Fifth Element, The (1997)
- 5. Fargo (1996)
- 15. Independence Day (1996)
- 22. Return of the Jedi (1983)
The Star Wars juggernaut
rolls on, but it's losing some steam as the films
fade from the US screens; the whole series is only 2.5x the number 2
film now. Chasing Amy
has dragged Clerks
up from #154 to #32 and Mallrats
from nowhere to #104. Titanic
is at #21, up from #95; it should make the top 20 next time.
Huh factor: #22
"Alles Glück dieser Erde" (1993);
#49 Dis (1995);
#56 "And Everything Nice" (1949).
As always, if anyone can explain the sudden popularity of these obscure
titles, I'm interested.
[Note: since the mailing, I've learned that Dis was high on the "worst movies" list.]
People:
- 2. Pamela Anderson
- 1. Tom Cruise
- 3. Sharon Stone
- 49. Val Kilmer
- 21. Brad Pitt
- 8. Harrison Ford
- 80. Elisabeth Shue
- 6. Teri Hatcher
- 11. Leonardo DiCaprio
- 4. Demi Moore
- 14. Alyssa Milano
- 5. Kim Basinger
- 10. Sandra Bullock
- 9. Mel Gibson
- 12. Ralph Fiennes
- 17. Michelle Pfeiffer
- -. John Cusack
- -. Joey Lauren Adams
- 27. Helen Hunt
- 13. Bo Derek
The first tie, between
Shue and
Hatcher (and only one
reference behind
Ford). That won't last;
Hatcher becoming the new Bond girl will raise her score, and Shue
should drop as The Saint
leaves screens. Otherwise, the usual suspects shuffle around, and
Kilmer, Shue,
Cusack, and
Adams
enter the top 20 on the strength of popular new releases. Lots of "halo
effect" from Chasing Amy;
even Jay (Jason Mewes)
scores at #94.
#38 Petra Verkaik seems
to be the new pinup of the month, with her two titles at
#41 and
#101.
Huh factor: #46
Ricardo Franco (I).
by Col Needham
Movies opening in the US in March and April sorted by number of votes
(to April 16):
Movies opening in the US in March and April sorted by average votes
(to April 16):
by Jon Reeves
Just a few of the traditional media outlets that have mentioned us lately:
Boston Globe.
Curiocity.
Tribune-Review (Pittsburgh area).
TV-Movie (Germany).
Web Week,
twice.
Internet Oggi (Italy).
US News & World Report.
NTT telephone directory.
MSNBC.
LA Times.
Kansas City Star.
Vanity Fair (not by name, alas).
Discovery Channel.
Yahoo! Internet Life (one of Lucy Lawless' favorite sites).
Watch for articles in: CNR Magazine (Spain)
We've also won several new awards.
See selections from the gallery
here.
Point/Lycos top movie site.
UK Online Cool Site.
Our readers in the UK
should vote for us
in the UK Web Awards (and don't forget to use
our UK mirror site).
Our good friend Greg Bulmash's WASHED-UPdate has its awards:
PC Magazine site of the day.
And it was mentioned in:
Courier-Mail (Brisbane Australia).
US Magazine.
Late Show News.
by Col Needham
Traffic continues to increase across all our sites so we've recently doubled
our hardware capacity at the main US site, housed
at Exec-PC in Wisconsin.
We've made it much easier to locate the permanent URLs for bookmarking /
linking to IMDb pages. A button labelled "Link to this page" appears at
the bottom of most pages and will provide the direct URL. For more details
please see our linking guide. Remember
that linking to the IMDb from your own pages helps build awareness of the
IMDb and is very much encouraged.
The navigation menu at the bottom of each of our pages has been enhanced to
include a menu of useful and interesting destinations to help people
navigate the site easily. Simply select your destination and hit the "Go"
button (if you hit the button without making a selection, the system takes
you to a random page from the list).
The posters section has been enhanced to include links to posters stored
on other sites in addition to those stored locally. For example see the
poster for The Saint (1997).
If you're in the mood for browsing titles at random or looking for a good
movie to go out and see/rent, try our random title selector
(use your browser's RELOAD function to jump to another random title).
The recent/upcoming movie releases section has
been expanded to view the upcoming movies as far into the future as we
cover so start booking those tickets for Christmas releases now!
A new version of the local UNIX interface to
the database has been released
with support for the ISO-Latin-1 character set change and for the
distributors and crew completion lists.
by Jon Reeves
This is a regular section giving information about the current size
and growth of the IMDb. We receive between 50,000 and 75,000 additions
every week from users all over the world.
Big month for milestones, with all of the main statistics crossing a
threshold:
Number of filmography entries: 1,538,799
Number of people covered: 423,633
Number of movies covered: 104,149
Size of the database (Mb): 135
Recent milestones:
- 500 alternate version entries
- 2,000 miscellaneous company entries
- 4,000 literature list entries
- 5,000 business list entries
- 25,000 biographies
- 40,000 composer entries
- 45,000 cinematographer entries
- 100,000 movies
- 100,000 country entries
- 400,000 people
- 600,000 actor entries
- 1,500,000 filmography entries
This is a regular section listing some enhancements we're currently
looking at. Please bear in mind that some of these may take quite
a while to come to fruition or even fail to materialize because the
original volunteer decides not to proceed.
- a separate list of films in production, with their current status.
- outline list: a "one line" plot summary, short enough to display
on the main title page.
- a list of "influential scenes"... the scenes that launched a thousand
spoofs, became the director's trademark, changed cinema forever,
launched a star.
- a locally installable MS-Windows interface to the database is
under final testing for those of you who want to reduce your
phone bills!
- enhanced awards section for the database covering more
international festivals, national film institutes etc.
- general support for alternate titles in languages other than
English and the language of the producing country(s).
- a movie recommendation service that will use your vote records to
suggest other movies you might enjoy. Initially available via an
E-mail interface. Time to check you're up-to-date with your voting!
Academy Awards and Oscar are registered trademarks of the Academy of Motion
Picture Arts and Sciences. UNIX and X Window System are registered trademarks
of The Open Group. The WASHED-UPdate is a trademark of Greg Bulmash. All
other trademarks are the property of their respective owners.
|