Craigslist.org

I’m sure everyone has visited Craigslist at least once since its launch in 1995. Although at that early stage it was just an email distribution list among a small group of people about different local events and activities. However since then it has grown into one of the most visited English websites, with local classified in over 450 cities world wide. You can look for classifieds in virtually any category imaginable and it also has discussion boards for different relevant topics. Just today I visited Craigslist in search of a hairdresser, and found one at a relatively good price. 

Craigslist uses a few different engines for their database needs, including MySQL, Redis, MongoDB, and Sphinx. Craigslist stores such a vast amount of data that the combination of these many databases helps keep up with all of it, without slowing everything down. Redis allows the metadata search engines for the site and is multi-core allowing for more instances to be ran at a time. Sphinx offers full text indexing and searches of all postings, live and archived, as well as current forums. MySQL allows clusters, so they are able to store posting, finances, users, stats, etc. in one database. So as you can see this rather small company uses all the resources available to them to keep a simple and seamless website. 

 

http://www.percona.com/live/mysql-conference-2012/sessions/living-sql-and-nosql-craigslist-pragmatic-approach

http://money.howstuffworks.com/craigslist4.htm

e-ID

I came across a rather old article about a new system the state government is designing. It’s a master identity database of Virginia residents for use by state agencies using the DMV records as its basis. This system is suppose to help weed out frauds and helps residents do business with the state more easily. Such as transferring the title of a vehicle to its new owner. With this new system that could be done online instead of at a DMV location. They would achieve all of this by creating an e-ID for each voluntarily registered person. An e-ID is non-traceable and non-linkable unique ID to a single entity based on the entities characteristics. In simpler terms a virtual representation of someone’s drivers license or ID. So it creates an account for them and allows state officials to verify their information in turn enabling more complex transactions to be done on the web.

As one can imagine this whole idea could be quite controversial with the increase in hackers, and leak of vital top secrete information. In this age where everything is beginning to be done electronically, it becomes necessary to protect such important  information. I’m all for doing more things online, I believe its convenient, mostly hassle free, and best of all quick and easy. However making information more efficient and accessible for the user also makes it easier to potentially end up in the wrong hands. If the this e-ID can be safely protected then it’s a great idea. We are definitely headed towards a higher grade of technology, why not start now beginning with this system and safe guarding it from unwanted users.

 

http://www.timesdispatch.com/news/state-regional/virginia-politics/va-starting-to-develop-a-master-identity-database/article_772d70fe-28ac-11e3-95ec-001a4bcf6878.html

Ads that follow

I always thought it was so interesting how when you are surfing the internet and an extremely relevant ad is on the side of the page or even on your Facebook newsfeed. One day I was searching for a very specific pair of shoes and literally searched every popular shoe site. After finding the shoe, I decided to think on it before committing. For a little while after that I would see these shoes on different sites, being advertised at discounted prices. These ads eventually helped me decide to buy the shoes and where to buy them from. This is known as re-marketing and has been proven to be successful and pulling in customers to finally make that purchase. Research shows that only 2% of potential customers make purchases on their first visit. This re-marketing technique keeps their interest and helps convert that other 98%. 

Google has some marketing programs that help do just that. It is estimated that Google reaches about 80% of all web users, making their programs the optimal choice for marketing strategies. One of their most popular programs is AdWords. Using AdWords, users can can tag certain pages of their site that visitors have browsed and create a campaign to reserve relevant ads that as the visitor goes to different sites. Google originally implemented this system on MySQL database engine, moved it to Oracle, and then switched it back to MySQL due to speed issues. Eventually they developed a custom distributed Relational Database Management System known as Google F1, specifically for their Ad programs. F1 is a hybrid database that combines high availability, the scalability of NoSQL systems, and the consistency and usability of traditional SQL databases. F1 is built on Spanner, which helps provide the scalability necessary that enables you to store a few trillion database rows in millions of nodes distributed to hundreds of data centers. The database not only supports the ads, but also the all of the support systems Google offers with its programs. 

Image

 

 

http://www.cubrid.org/blog/dev-platform/spanner-globally-distributed-database-by-google/

http://en.wikipedia.org/wiki/AdWords

How Do Some Banner Ads Follow Me from Site to Site?

 

 

Metasearch engines

Recently I’ve been looking at hotels and car rentals for a short trip to NYC over spring break. My favorite site to use is Travelocity. I’ve found some really good deals there in the past and they’re pretty helpful when you actually book it. Today I looked at Kayak and Priceline instead. Travelocity is considered an online “travel agency” while Kayak and Priceline are metasearch engines. While using these metasearch engines I began to wonder how they work.

Essentially metasearch engines use whatever you’ve entered into the site to query other main search engine databases such as Google. So instead of searching just that one engine you are receiving result from Google, Bing, Yahoo, or wherever other databases that particular engine reaches out to. They use virtual database methods to actually query the main search engine databases. The user asks a question the metasearch engine searches the other database, compiles the results, and displays the best and most relevant to the user. I found a good image from Bright Hub that depicts this.

Image

Regular search engines, such as Google, take a users query and scans through indexes. These indexes are databases of information that are unique to each search engine which is why you don’t always get the same results. This is what makes metasearches so useful. They find the unique information for you. This also what makes them so popular for travel searches. They find the best prices and put them all on one page in a matter of seconds.

http://www.brighthub.com/internet/google/articles/93261.aspx

Pinterest Addict

Hello my name is Makayla and my addiction is Pinterest. I spend all of my free time searching anything you can possibly imagine and pin things that I’ll probably never refer too. Now that I have admitted to my addiction, I can begin to control it.

Today as I scrolled through my favorite boards, I began to wonder how the database is structured behind this highly addictive machine and how it handles the growing number of addicts such as myself. I did a quick search and several article came up. “Scaling Pinterest and adventures in database sharding” seem to describe in clear enough detail for me.

The article explained that Pinterest engineers had to keep the database infrastructure simple if it wanted to scale it as the site took off. When it began in 2010 It was running on MySQL database and using many tools such as Memcached. What they learned as they began to grow is that method over complicated things and it was best to focus on a few key tools that benefited the site as a whole.

The article also discussed how the makers were big fans of sharding. Unsure of what that meant, I went straight for wikipedia. Sharding is where rows of databese tables are held separately rather then split by columns. This technique is perfect for Pinterest because for applications that have loads of data it helps minimize response times for queries and it’s not necessary to have one big server. This all helps ensure that all the pinners out there can safely and quickly pin their faves with worries of crashes and missing data.

http://gigaom.com/2012/09/27/scaling-pinterest-and-adventures-in-database-sharding/

http://stackoverflow.com/questions/992988/what-is-sharding-and-why-is-it-important