How to become a dialogue system engineer

January 12, 2023

Dialogue systems (conversational robots) essentially let the machine understand the human language through techniques such as machine learning and artificial intelligence. It contains a combination of many subject methods and is a concentrated training camp for artificial intelligence. Figure 1 shows the main techniques involved in the development of a dialog system.

Dialogue system skills advanced road

What are the relevant technologies of the dialogue system given in Figure 1, from which channels can you understand? The explanation is given step by step below.

如何成为一名对话系统工程师


Figure 1 Dialogue System Skill Tree Mathematics

Matrix computing mainly studies some properties of a single matrix or multiple matrices. Various models of machine learning involve a lot of matrix-related properties. For example, PCA is actually calculating feature vectors, and MF is actually calculating singular value vectors in analog SVD. Many tools in the field of artificial intelligence are programmed in a matrix language, such as mainstream deep learning frameworks such as Tensorflow and PyTorch. There are a lot of textbooks for matrix calculations. Find the difficulty that suits you. If you want to understand more deeply, the book "Linear Algebra Done Right" is highly recommended.

Probability statistics is the basis of machine learning. Several commonly used concepts of probability and statistics: random variables, discrete random variables, continuous random variables, probability density/distribution (binomial distribution, polynomial distribution, Gaussian distribution, index family distribution), conditional probability density/distribution, prior density / distribution, posterior density / distribution, maximum likelihood estimation, maximum posterior estimation. For a simple understanding, you can go through classic machine learning materials, such as the first two chapters of Pattern RecogniTIon and Machine Learning, the first two chapters of Machine Learning: A ProbabilisTIc PerspecTIve. If you are studying systematically, you can find the textbooks in the probability statistics of the university.

Optimization methods are widely used in the training of machine learning models. Several optimization concepts commonly found in machine learning: convex/nonconvex functions, gradient descent, stochastic gradient descent, and original dual problems. General machine learning materials or courses will teach you a bit of optimization, such as the Convex OpTImization Overview by Zico Kolter in the Andrew Ng Machine Learning course. Of course, the best way to understand the system is to look at Boyd's "Convex Optimization" book, and the corresponding PPT (https://web.stanford.edu/~boyd/cvxbook/) and course (https://see. Stanford.edu/Course/EE364A, https://see.stanford.edu/Course/EE364B). Students who like to read the code can also look at the optimization methods involved in the open source machine learning project, such as Liblinear, LibSVM, Tensorflow is a good choice.

Some commonly used math Python packages:

NumPy: scientific calculation package for tensor calculation

SciPy: Mathematical Computing Toolkit for Science and Engineering

Matplotlib: drawing, visualization package

Machine learning and deep learning

Andrew Ng's "Machine Learning" course is still an introductory artifact in the field of machine learning. Don't underestimate the so-called introduction, you can understand the knowledge inside, you can apply for the position of algorithm engineer. Recommend several well-recognized textbooks: Hastie et al., The Elements of Statistical Learning, Bishop's Pattern Recognition and Machine Learning, Murphy's Machine Learning: A Probabilistic Perspective, and Zhou Zhihua's Watermelon Book Machine Learning . Deep learning materials recommend Yoshua Bengio's "Deep Learning" and the official tutorial of Tensorflow.

Some commonly used tools:

Scikit-learn: Python package containing various machine learning models

Liblinear: A variety of efficient training methods including linear models

LibSVM: A variety of efficient training methods including various SVMs

Tensorflow: Google's deep learning framework

PyTorch: Facebook's deep learning framework

Keras: High-level deep learning use framework

Caffe: Old-fashioned deep learning framework

Natural language processing

Many universities have NLP-related research teams, such as the Stanford NLP group, and the domestic Harbin Institute of Technology SCIR laboratory. The dynamics of these teams are worthy of attention.

NLP-related information is available online. The course recommends Stanford's "CS224n: Natural Language Processing with Deep Learning". The book recommends Manning's "Foundations of Statistical Natural Language Processing" (Chinese version is called "Statistical Natural Language Processing Fundamentals").

For information retrieval, Manning's classic book "Introduction to Information Retrieval" (Chinese version of "Introduction to Information Retrieval" translated by Wang Bin) and the Stanford course "CS 276: Information Retrieval and Web Search" are recommended.

Some commonly used tools:

Jieba: Chinese word segmentation and part-of-speech tagging Python package

CoreNLP: Stanford's NLP Tools (Java)

NLTK: Natural Language Toolkit

TextGrocery: Efficient short text categorization tool (Note: only for Python 2)

LTP: Harbin Institute of Technology's Chinese natural language processing tool

Gensim: a text analysis tool that contains a variety of topic models

Word2vec: Efficient word representation learning tool

GloVe: Stanford's word representation learning tool

Fasttext : Efficient word representation learning and sentence classification library

FuzzyWuzzy: A tool for calculating the similarity between texts

CRF++: Lightweight Conditional Accessory Library (C++)

Elasticsearch: Open Source Search Engine

Conversation robot

The dialogue system uses different frameworks technically for different types of users. Here are a few different types of dialogue robots.

Conversational robot creation platform

Fiber Optic Pigtails

The pigtail refers to an optical fiber or optical cable with an optical fiber connector installed at one end and an optical fiber or optical cable at the other end. Divide an optical jumper into two to become two optical pigtails. Optical pigtails are usually used for the end of the optical path (such as the actual test result box of the terminal point pair, the splice tray in the wiring equipment, etc.). Or the extraction of optical devices (such as optical splitters, lasers, detectors, etc.). The pigtail length is usually no more than 2 meters.
Same as the optical jumper, when the connecting wire is an optical cable (mostly indoor optical cable), it is called an optical fiber pigtail, and when the connecting wire is an optical fiber (usually a tight-buffered optical fiber), it is called an optical fiber pigtail. There is no special product standard for optical pigtails. Most buyers and sellers switch to "arbitrary type" in the form of optical patch cords when they deliver. The quality acceptance is the same as optical patch cords, which also apply optical fiber movable connector standards.
Pigtails are divided into multi-mode pigtails and single-mode pigtails. The multimode pigtail is orange, the wavelength is 850nm, the transmission distance is 5Km, and it is used for short-distance interconnection. The single-mode pigtail is yellow, with two wavelengths, 1310nm and 1550nm, and transmission distances of 10km and 40km, respectively.
Fiber is an important component of the optical communication system, which is mainly used to realize the two functions of the interconnection of the optical ports between the devices and the interconnection of the device and the fiber core of the optical cable. Different from conventional cables, the pigtail core wire has the characteristics of easy breakage and weak tensile performance, and there is no mature on-site processing plan for the interface components, and it is impossible to make a pigtail with a suitable length on site according to the actual distance. Therefore, in practical applications, the pigtails are usually factory-processed and manufactured according to a certain nominal length series. When installing and constructing on site, engineers can choose pigtails that are longer than the actual distance. Because pigtails have the characteristics of discretization and easy damage, pigtail reeling is the core link in the installation, return and storage of pigtails.
The inner core of the pigtail uses silica glass filaments to carry the optical path. The body is fragile and easy to break. The main line is usually a 48-core ADSS optical cable. These pigtails need to be sheathed in a corrugated tube and placed in the floor compartment or cable sandwich. There is no effective tool assistance in the traditional pigtail threading method, and the operation and maintenance personnel adopt the traditional brute force method to pass the pigtail through the corrugated tube, which causes more fiber jumper damage and high probability of service interruption.

Fiber Optic Pigtails,Fiber Connectivity,Lc Pigtail,Fc Pigtail,Sc Pigtail

Shenzhen GL-COM Technology CO.,LTD. , https://www.szglcom.com