顯示具有 crawler 標籤的文章。 顯示所有文章
顯示具有 crawler 標籤的文章。 顯示所有文章

6/23/2018

The history and talent market of crawler for online-store


This article contains a brief of history of product comparison of online-store and short survey of 8 TW’s organization
Crawler and full indexing of web for online store became a pretty mature technology in Taiwan. There are not much technical difficulties to build a decent crawler. The real key points are (1) patient (2) patient and mature engineer mindset (3) patient and customization on specific needs.
Almost every local companies leverage the same technology stack.

History


Price comparison requires web crawler and full text indexing search engine as foundation. Therefore, this brief of history shows not only the evolution of technology in past 23 years (#1) but also the overview of talent market in Taiwan.
Taiwanese started to build web crawler since 1995, the earliest local crawler is yam (1995) and kimo (1997). Kimo was acquired by yahoo in 2001. During 1995~2001, the technology of crawler was limited by environment itself. Meaning, network bandwidth and data-center cost were the main reason to limit crawler. Full index search engine was still task which need sophisticated knowledge in both data structure and file structure. Talents at early internet stage sometime worked more on infrastructure and basic implement of algorithm. For example, almost all EC need to host their own email server (#2) In university, the algorithm, math, protocol or conceptual classes were major thing students need to learn.
After 2000 dot-com Bubble, crawler and product listing in Taiwan grow slowly. However, many tools and platform emerged during 2000 to 2010. Firstly, stable version Python 2.0 released at 2000, Lucene (search engine) became top leave Apache project at 2005. However, the most critical service should be “google” in this period of time. Soon google’s service started to impact Taiwan, just a few years google became the most biggest crawler and web indexing around the world. It is still the best crawler and indexing service at this moment. Gradually PChome, yahoo store grows their own store-hosting during 2000~2008. Talents in this stage had more changes to focus on business requirement. There were more and more students in University learned application development, software engineering and software project management, those are the keys in eBiz to make things happened. Google, FB and AWS started impact the web development world since 2005, there were some interesting free service opened for example google shopping API(#3) In 2010, the elasticsearch’s first release created a whole new world. Although elasticsearch was just leverage Lucene as search engine and focus only on how to make this engine scalable and easy to use, it did help all engineers to focus only in building application, instead of reinvent the wheel. So does other big-data platform/tools, for example hadoop. The talent markets of software engineers also changed in Taiwan after 2010. There were more and more open positions purely looked for software engineers. Most engineers could easily implement their idea and to test their idea toward real world in an amazingly short of time. This also drive eBiz startups. Since there are useful and dedicate tools which make internet technology not rocket science anymore. The only drawback here was that some talents engineers will look for “more interesting” job in commercial world for example machine learn or AI. However, only very few engineers will do the cutting edge task, most of them on those “more interesting” job are actually try to invent the wheel not the business focus.

Talent position in history


Nevertheless, engineers’ skill set and how to be handy with tools and not limited in tools are the major difference between talent and normal engineer. It is also critical to have a few engineers’ leader who had went through different stage of internet and also some young talents who can make things from existing tool soon. It seems a 3-4 engineers team size could fit our target. One is pretty senior and lead 2-3 young talent or even just graduated. Companies in Product-Listing

The 8 organization


The 8 companies we survey shows to build a fair product comparison website is not secret anymore. However, to build a good one still require patient and fine tune on every detail. Feebee has more engineer resources, however, the visits numbers is not a target which NO2 to NO3 can’t reach. After interviewed with ezprice engineers and also gather information from Linkedin. We could easily understood that current crawler (python), full-indexing (elasticsearch) and NLP (jieba or others) are pretty mature. A team with 2-3 mature engineers and has 2-3 months to leverage tools could build a decent product.


URL
Visits (4month)
Names
38M
firstweb 第一網站股份有限公司 ( http://sitetag.us)
20.8M
樂方股份有限公司
Funmula
17.2M
ShineWant Tech. Co., Ltd. 嚮網科技股份有限公司
11.2M
環通資訊股份有限公司
UCS Inc.
9.2M
億普媒體股份有限公司Eprice Media,Inc
2.4M
EZpriceCo.,Ltd. Taichung
2.38M
Personal website
0.5M
Reddoor Media Group Co. 紅門互動股份有限公司 (dotmore.com.tw)


Comments 


#1. 23 years is from 1995 to 2018. 1995 was the year which first “easy to get” consumer OS (Windows95) released.

#2. Email server hosting was never easy and it did make trouble for EC in early internet age. However, that also created opportunities for mail hosting service around the world. Gmail should be one of the best but mail.com and zoho.com were also good.

#3. https://www.programmableweb.com/api/google-shopping-search

#4. at 2005, Apache Lucene became a top-level apache project and latter on it almost become a core of all free full-index engine.